You are on page 1of 17

Correlation and Regression

Libeeth B. Guevarra
libeeth.guevarra@cit.edu
Reject Ho if p-value is less than or equal to sig level (alpha)
p-value is the minimum chance of rejecting a true Ho (error)

Psych152 - Psychological Statistics 1


Consider the following cases:
1 As salaries increase, the amount spent on luxury goods also
increases.
2 As unemployment increases, the amount spent on luxury goods
decreases.
3 There is a relationship between gender and the amount spent on
clothes.

Psych152 - Psychological Statistics 2


Correlation and Regression

Correlation is a statistical method used to determine whether a


relationship between variables exists.

Regression is a statistical method used to describe the nature of the


relationship between variables, that is, positive or negative, linear or
nonlinear.

A scatter plot is a graph of the ordered pairs (x, y ) of numbers


consisting of the independent variable x and the dependent variable y .

Psych152 - Psychological Statistics 3


Construct a scatter plot for the data on Internet use and Isolation
score of high school students. Internet use is measured by the
number of hours per week that the students spend on the computer.
Isolation is measured by having the students complete a
questionnaire. The questionnaire score ranges from 20 to 50, with 50
being the most isolated students.
1. input data in excel Internet Use Isolation Score
2. = correl
3. descripton : pp 9 32 45
4. graph : insert tab then 14 25
scatter 23 40
5. 8 25
12 29

Psych152 - Psychological Statistics 4


The Correlation coefficient measures the strength and direction of
a linear relationship between two variables. The range of the
correlation coefficient is from −1 to + 1.

Pearson Correlation Coefficient (r )


P P P
n( xy ) − ( x)( y )
r=p P P P P
[n( x 2 ) − ( x)2 ] − [n( y 2 ) − ( y )2 ]

where n is the number of data pairs.

Psych152 - Psychological Statistics 5


Example
Compute the correlation coefficient for the data:
Internet Use Isolation Score
32 45
14 25
23 40
8 25
12 29

from JASP, Pearson’s Moment correlation coefficient


r=0.956 which is a high positive correlation.
in excel it is multiple r

Psych152 - Psychological Statistics 6


When the value of r is near 0, the linear relationship is weak or
nonexistent. The closer r to +1 or −1, the more closely the two
variables are related.
r Descriptive Level
±1.00 Perfect correlation
between ±0.75 to ±0.99 High correlation
between ±0.51 to ±0.74 Moderately High correlation
between ±0.31 to ±0.50 Moderately low correlation
between ±0.01 to ±0.30 Low correlation
0.00 No correlation

Psych152 - Psychological Statistics 7


Take a closer look
1 You will rarely find ”perfect correlation”.
Individual differences and other factors may interfere with
observed relationships.
2 Negative correlation is not bad, it simply describes the direction
of the relationship.
It does not mean there is ’no relationship’ because it is ’less than
0’.
3 Correlation never implies cause and effect.
4 Don’t put too much emphasis on significance in larger samples;
focus on the correlation coefficient.
Always report significance with the coefficient in smaller samples.

Psych152 - Psychological Statistics 8


The Coefficient of Determination is a measure of the variation of
the dependent variable that is explained by the regression line and the
independent variable. The symbol for the coefficient of determination
is r 2 .

If r = 0.90, then r 2 = 0.81, which is equivalent to 81%. This result


means that 81% of the variation in the dependent variable is
accounted for by the variations in the independent variable. The rest
of the variation, 0.19, or 19%, is unexplained.

Psych152 - Psychological Statistics 9


Test for the Significance of the Correlation
Coefficient

t Test for the Correlation Coefficient


r
n−2
t=r
1 − r2

with degrees of freedom equal to n − 2.

Psych152 - Psychological Statistics 10


Example
Test the significance of the correlation between Internet Use and
Isolation Score at α = 0.01.
Ho: There is no
correlation between Internet Use Isolation Score
Internet use and 32 45
Isolation score.
14 25
Ha: There is a
correlation between 23 40
Internet use and 8 25
Isolation score. 12 29

2. significance of r
= correlation
3.t stat computed = =r *n-2/1-r^2
4. look for t table

Psych152 - Psychological Statistics 11


Correlation and Causation

Researchers must understand the nature of the linear


relationship between the independent variable x and the
dependent variable y .
When a hypothesis test indicates that a significant linear
relationship exists between the variables, researchers must
consider the possibilities:

Psych152 - Psychological Statistics 12


Possibilities:

There is a direct cause-and-effect relationship between the


variables.
There is a reverse cause-and-effect relationship between the
variables.
The relationship between the variables may be caused by a third
variable.
There may be a complexity of interrelationships among many
variables.
The relationship may be coincidental.

Psych152 - Psychological Statistics 13


Regression

If the value of the correlation coefficient is significant, the next step


is to determine the equation of the regression line, which is the
data’s line of best fit.
The purpose of the regression line is to enable the researcher to see
the trend and make predictions on the basis of the data.

Formula for the Regression line y = a + bx .


a = y-intercept (it means the value of the dependent variable (y) when the
independent variable (x) is 0.
b = slope (it means that for every 1 unit change in IV (x) there is a corresponding
b change in DV (y) ).

Psych152 - Psychological Statistics 14


Regression line : y = a + bx

y )( x 2 ) − ( x)( xy )
P P P P
(
a= P P
n( x 2 ) − ( x)2
P P P
n( xy ) − ( x)( y )
b= P P
n( x 2 ) − ( x)2

where a is the y intercept and b is the slope of the line.

Psych152 - Psychological Statistics 15


Example
Find the equation of the regression line for the Isolation score given
the number of hours of internet use.
x y
Internet Use Isolation Score
32 45
14 25
23 40
8 25
12 29
From JASP, excel
a= 16.623 1.Data analysis> regression> y ( dependent )
b= 0.909 x ( independent)
2. line fits plot
3. y=a+bx

Psych152 - Psychological Statistics 16


Example REGRESSION DRAFT IN EXCEL FILE
Given number of hours of sleep and the happy mood of 6 randomly
chosen working mothers,
Mother Sleep hours Happy mood
A 5 2
B 7 4
C 8 7
D 6 2
E 6 3
F 10 6

1 Find the equation of the regression line.


2 Explain the coefficients in the equation or model.
3 What is the happy mood score of those who sleep for 9 hours?

Psych152 - Psychological Statistics 17

You might also like