You are on page 1of 102

JOHN CHRISTIAN ESPINOLA, LPT, MA

Lyceum of the Philippines University, Manila

https://bit.ly/30GGxvd
a direct or positive
relationship between
two variables
implies that an
increase in value of
one of the
variables correspond
to an increase in
value of the other
variable.
an inverse or
negative relationship
between two
variables means that
an increase in the
value of one variable
corresponds to a
decrease in the
value of the other
variable.
a zero relationship
exist between two
variables if an
increase in one is
not accompanied by
either an increase or
a decrease in
another.
Value of r Interpretation

Between 0.80 to 0.99 High Correlation

Between 0.60 to 0.79 Moderate High Correlation

Between 0.40 to 0.59 Moderate Correlation

Between 0.20 to 0.39 Low Correlation

Between 0.01 to 0.19 Negligible Correlation


The coefficient of determination can be thought of as a
percent. It gives you an idea of how many data points fall within
the results of the line formed by the regression equation.

The higher the coefficient, the higher percentage of points the


line passes through when the data points and line are plotted. If
the coefficient is 0.80, then 80% of the points should fall
within the regression line.

Values of 1 or 0 would indicate the regression line represents all


or noneof the data, respectively. A higher coefficient is an
indicator of a better goodness of fit for the observations.

https://bit.ly/3oFCk4M
Credits: Prof. Johnny Amora, 2021. PARSSU Multivariate Data Analysis Webinar
Correlation Coefficient is used to measure
the direction and strength of the linear
relationship between variables.

Pearson Correlation is used to test if a


continuous variable is correlated with another
continuous variable.
r  n  XY   X Y
n 
X2

 x: Independent Variable
 y: Dependent Variable
 n: Sample
Photo

m of
the

What is the relationship between


Official
Websit

the grades in 3 subjects?


Arrow in
desired
dependent
variable

Arrow in desired
variables

Click “OK”
Click “OK”
“There is LOW POSITIVE CORRELATION
between the grades of the students in
General Mathematics and Earth & Life
Science”

“There is NEGATIVE NEGLIGIBLE


CORRELATION between the grades of the
students in General Mathematics and
Oral Communication”

“There is NEGATIVE LOW CORRELATION


between the grades of the students in
Earth & Life Science and Oral
Communication”
Photo

m of
the

How can the relationship


between the 3 subjects be
presented visually?
Choose
“Matrix

Scatter”

Click
“Define”
Arrow in
desired
dependent
variable

Click “OK”
“There is LOW POSITIVE CORRELATION
between the grades of the students in
General Mathematics and Earth & Life
Science”

“There is NEGATIVE NEGLIGIBLE


CORRELATION between the grades of the
students in General Mathematics and
Oral Communication”

“There is NEGATIVE LOW CORRELATION


between the grades of the students in
Earth & Life Science and Oral
Communication”
Spearman Correlation is the nonparametric
version of the Pearson product-moment
correlation. Spearman's Correlation coefficient,
(ρ, also signified by rs) measures the strength
and direction of association between two ranked
variables.
Photo

m of
the

What is the relationship between


Official
Websit

the grades in 3 subjects


(Assumed that the scores are
ranked)?
Arrow in
desired
dependent
variable

Click “OK”
“There is LOW POSITIVE CORRELATION
between the grades of the students in
General Mathematics and Earth & Life
Science”

“There is NEGATIVE
NEGLIGIBLE CORRELATION
between the grades of the
students in General
Mathematics and Oral
Communication”

“There is NEGATIVE LOW CORRELATION


between the grades of the students in
Earth & Life Science and Oral
Communication”
Chi Square is used to find out if the
observed frequency differ significantly from
the theoretical and expected frequency.

It can help us decide whether a


distribution of frequencies for
a variable in a sample fits the
population.
 2
 
O 
E
E 2

 O:observed frequency
 E: expected frequency
Photo

m of
the

Is there a significant
relationship between Gender
and Nature of School?
Arrow in
desired
variables

Click “OK”
Reject Ho if p<0.05

“There is NO significant relationship


between Gender and Category of
Basic Education among the students”
Credits: Prof. Johnny Amora, 2021. PARSSU Multivariate Data Analysis Webinar
If we are given a series of values for two
correlated variables then it is possible to
predict or estimate the value of one variable
from the
knowledge of the other variable.

Problems concerning prediction,


estimation and forecasting can be solve
using Regression Analysis.

It deals with estimation of one


variable based on the changes and
movements of the other variable.
 Y  X2    X 
   
a 
XY  n  X2
Y  a  bX X 2
n XY   X
b 
Y

n  X 2   
X 2

 Y: dependent variable
 x: independent variable
 a, b: constant
Photo

m of
the

Approximately, what would be


O
We

my grade in General
Mathematics if I attended a 6-
hours Remedial Session?
For
Demonstration
Purposes
Only!
Add a column of
variable and name
it “HrsRem”
Arrow in
desired
Dependent
variable

Arrow in
desired
Independent
variable
Click “OK”
r-value:.297

Low Correlation
Value of Constant

Value of Coefficient

y  mx  b
y
1.234x
“If I have 6 hours of remedial
in General Mathematics
then my grade will
approximately be 86.047”
Choose “Scattered
Plot”
Drag to the space
above
Scroll down and drag
 “General Mathematics” to “ Y-axis
 “Hours Remedial” to X-axis
y  1.234x

Now the real
thing! THE
ASSUM PTION
S!
Arrow in desired
variables
General Mathematics:
Normally Distributed
→  OK!
Hours in Remedial:
Not normally distributed
→  
Click
“Statistics”

Arrow in
desired
Dependent
variable

Arrow in
desired
Independent
variable
Choose:
1. Model Fit
2. R Squared Change
3. Descriptives
4. Part and Partial
Correlations
5. Collinearity
Diagnostics

Click
“Continue”
Click “Plots”
Drag to Y: ZRESID
for Standardized
Residuals

Drag to X: ZPRED
for
Standardized
Predicted Value

Click: “Normal
Click Probability Plot”
“Continue”
Click “Save”
Click
“Cook’s”

Click
“Continue”
R-value should at least be greater than .3

https://bit.ly/3yloRUi
Should be
less than .05

If small sample size, use “Adjusted R Square”

Percentage of the variance from the


dependent variable is explained by
independent variable.
Minimum is 30%.

https://bit.ly/3yloRUi
ANOVA: to test if the slope of the line is
zero (to be rejected).

.282>.05 → Slope=0

https://bit.ly/3yloRUi
Standard Residuals:
Between -3 to 3

Cook’s Distance: Measures changes in all


the regression coefficient when a given
case is deleted; thus, it is a measure of
influence of that case on the regression
equation.
Maximum should not be greater than 1.

h t t p s: / /b i .t l y/ 3 y ol R U i
Points should be close to or on
the line.

https://bit.ly/3yloRUi
Should be rectangular.
None of the points should
fall outside -3 to 3.

https://bit.ly/3yloRUi
Going back
to the
scattered
plot.
Choose

“Simple

Scatter”

Click
“Define”
Arrow in
desired
dependent
variable

Click “OK”
Double click
the graph.
Click “Add
Fit Line”
Click
“Close”
1. Check for outliers. (Data points
that are far away from the cluster.)

https://bit.ly/3ypVuAh
1. Check for outliers. (Data points
that are far away from the cluster.)

2. Check for violation of


assumptions:
- If the cluster is a cigar shape, this
indicates that the assumption of
homoscedasticity is tenable.
- If you can draw a straight line
through main cluster then
tenable.

https://bit.ly/3ypVuAh
1. Check for outliers. (Data points
that are far away from the cluster.)

2. Check for violation of


assumptions:
- If the cluster is a cigar shape, this
indicates that the assumption of
homoscedasticity is tenable.
- If you can draw a straight line
through main cluster then
tenable.

3. Determine the strength and


direction of relationship: Positive or
Negative

https://bit.ly/3ypVuAh
There should be at
least 20 records per
predictor variable

2 predictors = 40

Sample size = 15

Outcome Predictor
Variables
Variables

https://bit.ly/3hIFweS
https://bit.ly/3hIFweS
Click
“Statistics”

Arrow in
desired
Dependent
variable

Arrow in
desired
Independent
variable

https://bit.ly/3hIFweS
Choose:
1. Model Fit
2. R Squared Change
3. Descriptives
4. Part and Partial
Correlations
5. Collinearity
Diagnostics

Click “Casewise
Click Diagnostics”
“Continue”

https://bit.ly/3hIFweS
Click “Plots”

https://bit.ly/3hIFweS
Drag to Y: ZRESID
for Standardized
Residuals

Drag to X: ZPRED
for
Standardized
Predicted Value

Click: “Normal
Click Probability Plot”
“Continue”

https://bit.ly/3hIFweS
Click “Save”

https://bit.ly/3hIFweS
Under Residuals, click:
“unstandardized”

Under Predicted
Values click:
“unstandardized

Click Click
“Continue” “Cook’s”

https://bit.ly/3hIFweS
Check Multicollinearity:
Should be less than .7
between predictor variables

Good!

Check correlation between


outcome variable and
predictor variables: Should
be greater than .3 between
.297
predictor variables
-.030
→ 
→ 

https://bit.ly/3hIFweS
Should be
less than .05

If small sample size, use “Adjusted R Square”


Percentage of the variance from the
dependent variable is explained by
independent variable.
Minimum is 30%.

hhttttppss::////bbiti.tlyl./
ANOVA: to test if the slope of
the line is zero (to be rejected).

.499>.05 →
Slope=0

hhttttppss::////bbiti.tlyl./
Coefficients: y =βx1 + βx2 + … + ε

y  1.465x1  (.168)x2  82.397


Standardizes the contribution of each
predictor variables to be able to compare.

Should be less than .05

https://bit.ly/3hIFweS
Eigenvalues that are near 0
indicate multicollinearity.

Standard Residuals:
Between -3 to 3.

Maximum should not be greater than 1.

hhttttppss::////bbiti.tlyl./
Points should be close to or on
the line.

https://bit.ly/3hIFweS
Should be rectangular.
None of the points should
fall outside -3 to 3.

https://bit.ly/3hIFweS
If added, sum is the dependent variable
84.34255 + (- 3.34255) = 81.00
The expected error mean of the
regression model is zero
Sum is not equal to
zero

Check for outliers: A case that


has a large residual
KEEP IN MIND!
Linearity
 Checked from a scatterplot of data. If positive or negative →
good

Equal Error Variance


 Checked from “Residual vs. Fit”. If no pattern → good

Independent Observation
 Checked from “Residual vs. Fit”. If no correlation → good

Normality of Errors
 Checked from “Normality Probability” plot.
 closer to the line the more normal they are → good

https://bit.ly/3lfcEJF
Questions?
Clarification
s?

You might also like