You are on page 1of 39

LESSON 6:

Correlation and
Linear Regression
2

Correlation Analysis
• The process of investigating a relationship between variables.
• Measures the association or strength of the relationship between two
variables say X and Y.
• Finding the relationship between two quantitative variables without being
able to infer causal relationships.
• Correlation is a statistical technique used to determine the degree to
which two variables are related.
3
Pearson’s Correlation – Moment Correlation Coefficient
It measures the nature and strength between two variables of
the quantitative type.
4

Interpretation of Pearson Product


Value ( r ) Interpretation

0.00 No Correlation

Perfect Correlation

Very High Correlation

High Correlation

Moderately Low Correlation

Very Low Correlation


Scatter diagram
5

 Rectangular coordinate
 Two quantitative variables
 One variable is called
Y
independent (X) and the second * *
is called dependent (Y) *
 Points are not joined X
 No frequency table
6

Scatter plots
The pattern of data is indicative of the type of
relationship between your two variables:
 Positive relationship

 Negative relationship

 No relationship
7
8

Positive
Relationship
Two variables are Place your screenshot here

positively correlated if
the two variables both
increase
9

Negative
Relationship
Two variables are Place your screenshot here

negatively correlated
if the one variables
increase while the
values of the other
decreases.
10

No
Relationship
Two variables are not Place your screenshot here

correlated or they have


zero correlation if one
variable neither
increase nor decrease
while the other
increases.
Positive
Correlation
12
Calculate and analyze the correlation coefficient between the number of study hours
and the number of the exam scores of different students.

Example problem:

                     
N (Student) 1 2 3 4 5 6 7 8 9 10
 
                     
X (No. of 45 6 8 4 2 1 5 7 4 6
Study
 
hours)
                     
Y (Exam 85 80 92 70 65 60 89 82 81 95
Score)
13

Ho: There is no significant relationship between the number of study hours and
the result of the exam score of different students.

Ha: There is significant difference between the number of study hours and the
result of the exam score of different students.
 
14

           
Student X (No. of study Y (Exam score) x.y
hours)
1
1 44 85
85 340
340 16
16 7,225
7,225
2
2 66 80
80 480
480 36
36 6,400
6,400
3
3 88 92
92 736
736 64
64 8,464
8,464
4
4 44 70
70 280
280 16
16 4,900
4,900
5
5 22 65
65 130
130 44 3,600
3,600
6
6 11 60
60 60
60 11 7,921
7,921
7
7 55 89
89 445
445 25
25 6,724
6,724
8
8 77 80
80 574
574 49
49 6,5724
6,5724
9
9 44 81
81 324
324 16
16 6,561
6,561
10
10 66 95
95 570
570 36
36 9,025
9,025
15

Conclusion:
The calculated correlation coefficient is positive. Therefore it implies direct
relationship between the number of study hours of the students and their
exam scores, Also the magnitude of the correlation coefficient is 0. 8156 or
0.82 which means the result of r implies a very high correlation.
Negative
Correlation
17
Calculate and analyze the correlation coefficient between the number of the
exam scores and the number of hours on social media of different students.
Example problem:
                     
Student 1 2 3 4 5 6 7 8 9 10
(N)  
                     
No. of 18 16 15 11 12 10 8 4 2 0
Exam Score  
(X)
                     
Number of 1 3 5 6 9 11 10 12 11 15
Hours on
Social
Media (Y)
18

Ho: There is no significant relationship between the number of the exam


scores and the number of hours on social media of different students.

Ha: There is significant relationship between the number of the exam


scores and the number of hours on social media of different students.
19
First calculate x̅ and ȳ
∑x = 96 ∑y = 83
N X (Exam Score) Y (No. of Hours
(Student) on Social Media)
1 18 1
2 16 3
3 15 5
4 11 6
5 12 9
6 10 11
7 8 10
8 4 12
9 2 11
10 0 15
20

 
Therefore
x̅ = = = 9.6
ȳ = = = 8.3
21
22
23

Conclusion:
The calculated correlation coefficient is negative. Therefore it
implies that there is an inverse relationship between the
number of the exam scores and the number of hours on social
media of different students.
 
No Correlation
Sample problem
x y
3 4
6 1
9 3
12 5
15 2
x y X-X Y-Y (X – X)^2 (Y-Y)^2
3 4 -6 1 -36 1
6 1 -3 -2 -9 -4
9 3 0 0 0 0
12 5 2 2 9 4
15 2 1 1 36 1
= 45 = 15
27

Regression Analysis
Regression: technique concerned with predicting some variables by
knowing others
The process of predicting variable Y using variable X

 Uses a variable (x) to predict some outcome variable (y)


 Tells you how values in y change as a function of changes in values
of x
28

Correlation and Regression


Correlation describes the strength of a linear
relationship between two variables
Linear means “straight line”
Regression tells us how to draw the straight
line described by the correlation
29

Regression
 Calculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of the
residuals smaller than for any other line
30
Regression minimizes residuals
SBP(mmHg)
220

200

180

160

140

120

100

80
Wt (kg)
60 70 80 90 100 110 120
31
By using the least squares method (a procedure that minimizes the
vertical deviations of plotted points surrounding a straight line) we
are able to construct a best fitting straight line to the scatter diagram
points and then formulate a regression equation in the form of:
Regression Equation
32

SBP(mmHg)
 Regression equation 220

200
describes the regression 180

line mathematically 160

 Intercept 140

120
 Slope 100

80
Wt (kg)
60 70 80 90 100 110 120
Linear Equations
33

Y
Y = bX + a
Change
b = S lo p e in Y
C h a n g e in X
a = Y -in te r c e p t
X
34

y=a+bx
 
a=

 
b=
35
Example 1. (line regression)
The data in the table represent the membership at
a university mathematics club during the past 5
years. Number of Years (x) Membership (y)

1 25
2 30
3 32
4 45
5 50
36
Form a curve of the form y=a+bx to predict the membership 5 years
from now.
x y xy
1 25 1 25
2 30 4 60
3 32 9 96
4 45 16 180
5 50 25 250
▫  
a=
= = 16.9
b=
= 6.5
The equation is y=a+bx
y=16.9+6.5x
y=16.9+6.5(10)
= 81.9 or 82
Therefore, five years from now, the club would have 82 members
39

THANKS!

Any questions?

You might also like