You are on page 1of 65

CO6:Correlation and

Regression Analyses
By: Rizalina Ponteras
Revised by: Redwaynne Jester T. Ipapo
Objectives
At the end of the lesson, the students should be able to:
• construct a scatter plot.
• describe shape (form), trend (direction), and variation (strength) based on a
scatter plot.
• calculate the Pearson’s sample correlation coefficient.
• solve problems involving correlation analysis.
• calculate the slope and y-intercept of the regression line.
• interpret the calculated slope and y-intercept of the regressionline.
• predict the value of the dependent variable given the value of the independent
variable.
• solve problems involving regression analysis using MS Excel
shs.mapua.edu.ph
Relationship Between
Common aim of research is to try to relate a variable of interest
to one or more other variables.

There are variables that tend to have a relation.


Examples:
Academic Achievement and Annual Family Income
Number of Absences and Grades

shs.mapua.edu.
Relationship Between
The question is, how can we determine if such relationship exist
between two variables?

Also, is there a way to know the strength of the existing


relationship?

shs.mapua.edu.
Correlation
1. Correlation – a statistical method used to determine whether a
relationship between variables exists.

2. Questions that Correlation and Regression Answer:


a. Are two or more variables related?
b. If so, what is the strength of the relationship?
c. What type of relationship exists?
d. What kind of predictions can be made from the relationship?

shs.mapua.edu.
Correlation
3. Simple vs. Multiple Relationship
a. Simple Relationship – involves only two variables.
i. Independent Variable – also known as explanatory variable or
a predictor variable.
ii. Dependent Variable – also known as response variable.
b. Multiple Relationship – also known as multiple regression, which
involves two or more independent variables used to predict one
dependent variable.

shs.mapua.edu.
Correlation
4. Positive vs. Negative Relationship
a. Positive Relationship – exists when both variables increase or
decrease at the same time.
Ex. Problem-solving Skills and Grades in Mathematics
a. Negative Relationship – exists when one variable decreases as
the other one increases, and vice versa.
Ex. Number of Absences and Grades

shs.mapua.edu.
The Scatter Plot Diagram

Retrieved from https://foxhugh.com/charts/describe-a-scatter-plot/, Sept


7, 2021

shs.mapua.edu.
Scatter Plot
Scatter Plot – a graph of the
ordered pairs 𝑋, 𝑌 of numbers
consisting of the independent
variable 𝑋 and the dependent
variable 𝑌.

Retrieved from https://www.cuemath.com/data/how- shs.mapua.edu.ph


to-calculate-correlation-coefficient/, Sept 7, 2021
Construct a Scatterplot Diagram
for the given data set.
Number of Cars in Car Rental Companies and its Revenue

shs.mapua.edu.
Construct a Scatterplot Diagram
for the given data set.
Number of Absences and final Grades in Statistics

shs.mapua.edu.
Construct a Scatterplot Diagram
for the given data set.
Numbers of hours spent on exercise per week and amount of milk(in
ounces consumed per week.

shs.mapua.edu.
Correlation
Correlation Coefficient / Pearson Product-Moment Correlation
Coefficient (𝑟) – measures the strength and direction of a linear
relationship between two variables.
𝒏 σ 𝑿𝒀 − σ 𝑿 σ 𝒀
𝒓=
𝒏 σ 𝑿𝟐 − σ 𝑿 𝟐 𝒏 σ 𝒀𝟐 − σ 𝒀 𝟐

shs.mapua.edu.
Correlation
𝒓 Interpretation
-1 Perfect Negative Correlation
-0.51 to -0. 99 Strong Negative Correlation
-0.50 Some Negative Correlation
-0.01 to -0.49 Weak Negative Correlation
0 No Correlation
0.01 to 0.49 Weak Positive Correlation
0.50 Some Positive Correlation
0.51 to 0.99 Strong Positive Correlation
1 Perfect Positive Correlation

shs.mapua.edu.
Exampl
The table below presents the number of absences and the final
grades in Statistics of seven randomly selected students from a
Biology Class. Find the correlation coefficient.

Student 1 2 3 4 5 6 7
No of absences
6 2 15 9 12 5 8
(X)
Final Grade (Y) 82 86 43 74 58 90 78

shs.mapua.edu.
Example
The table below presents the number of absences and the
final grades of seven randomly selected students from a
Biology Class. Find the correlation coefficient.
Student 1 2 3 4 5 6 7
No of absences
6 2 15 9 12 5 8
(X)
Final Grade (Y) 82 86 43 74 58 90 78

shs.mapua.edu.
Example
The table below presents the number of absences and the
final grades of seven randomly selected students from a
Biology Class. Find the correlation coefficient.

Student 1 2 3 4 5 6 7 Total
No of absences
6 2 15 9 12 5 8 57
(X)
Final Grade (Y) 82 86 43 74 58 90 78 511
XY 492 172 645 666 696 450 624 3745

X2 36 4 225 81 144 25 64 579

Y2 6724 7396 1849 5476 3364 8100 6084 38993

shs.mapua.edu.
Exampl

Solve for the correlation coefficient.

σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2

shs.mapua.edu.
Exampl

Solve for the correlation coefficient.

σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2

= 73745 − 57511
7579 − 57 2738993 − 511 2

𝒓 = −𝟎. 𝟗𝟒𝟒𝟐 (Strong Negative Correlation)

Number of absences and Final grades has a strong negative correlation.

shs.mapua.edu.
Exampl
The table below presents the height (in feet) and weight (in
pounds) of 10 randomly selected Volleyball players. Find
the correlation coefficient.
Volleyball Players 1 2 3 4 5 6 7 8 9 10
Height (X) 73 71 75 72 72 75 67 69 71 69
Weight (Y) 185 175 200 210 190 195 150 170 180 175

shs.mapua.edu.
Exampl
The table below presents the height (in feet) and weight (in
pounds) of 10 randomly selected Volleyball players. Find the
correlation coefficient.
Volleyball Players 1 2 3 4 5 6 7 8 9 10 Total
Height (X) 73 71 75 72 72 75 67 69 71 69
Weight (Y) 185 175 200 210 190 195 150 170 180 175
𝑋𝑌

𝑋2
𝑌2

shs.mapua.edu.
Exampl
The table below presents the height (in feet) and weight (in
pounds) of 10 randomly selected Volleyball players. Find
the correlation coefficient.
Student 1 2 3 4 5 6 7 8 9 10 Total
Height (X) 73 71 75 72 72 75 67 69 71 69 714

Weight (Y) 185 175 200 210 190 195 150 170 180 175 1830

𝑋𝑌 13505 12425 15000 15120 13680 14625 10050 11730 12780 12075 130990

𝑋2 5329 5041 5625 5184 5184 5625 4489 4761 5041 4761 51040

𝑌2 34225 30625 40000 44100 36100 38025 22500 28900 32400 30625 337500

shs.mapua.edu.
Exampl

Solve for the correlation coefficient.

σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2

shs.mapua.edu.
Exampl

Solve for the correlation coefficient.

σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2

shs.mapua.edu.
Exampl

10130990 − 7141830
= 1051040 − 714 210337500 − 1830 2

𝒓 = 𝟎. 𝟖𝟐𝟔𝟏 (Strong Positive Correlation)


Height and Weight has a strong positive correlation.

shs.mapua.edu.
Regression
1. Regression – a statistical
method used to describe the
nature of the relationship
between variables.
2. Regression Line – the data’s
line of best fit.

Retrieved from https://www.erp-information.com/wp-


content/uploads/2021/03/Regression-analysis-1.png, Sept 7, 2021

shs.mapua.edu.
Why Do We Care about Regression

• We may want to make a prediction.


• More likely, we want to understand the relationship.
How fast does CHD mortality rise with a one unit increase
in smoking?
By how much the Final Grades will change after incurring a
particular number of absence

Note: we speak about predicting, but often don’t actually predict.


shs.mapua.edu.
Correlation vs. Regression

Retrieved from https://keydifferences.com/difference-between-correlation-and-regression.html, Sept 7, 2021

shs.mapua.edu.
Regression
3. Regression Line Equation

𝒀′ = 𝒂 + 𝒃𝑿

where:

σ𝒀 σ 𝑿𝟐 − σ 𝑿σ 𝑿𝒀
𝒂=
𝒏 σ 𝑿𝟐 − 𝟐
σ𝑿
𝒏
𝒃= σ 𝑿𝒀 − σ 𝑿 σ𝒀
𝒏 σ 𝑿𝟐 − σ 𝑿 𝟐

shs.mapua.edu.
Regression
4. Coefficient of Determination (𝒓𝟐) – a measure of the variation of the
dependent variable that is explained by the regression line and the
independent variable.
𝟐
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
𝒓 = 𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
where:
𝒀′ −
ഥ𝒀 𝟐
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 = ෍

shs.mapua.edu.
Regression
𝒀−
ഥ𝒀 𝟐
𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 = ෍

shs.mapua.edu.
Regression
𝟐
𝑼𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 = ෍𝒀 − 𝒀′

Also, take note that:

𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
= 𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 + 𝑼𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏

𝟐
෍ 𝒀 −ഥ𝒀 = ෍ 𝒀′ −ഥ𝒀 𝟐
+ ෍𝒀 − 𝒀′ 𝟐

shs.mapua.edu.
Regression
5. Coefficient of Non-Determination

𝟏 − 𝒓𝟐

6. Standard Error of the Estimate (𝒔𝒆𝒔𝒕) – the standard′ deviation of the


observed values for 𝑌 about the predicted values for 𝑌 .

σ 𝒀 − 𝒀′ 𝟐
𝒔𝒆𝒔𝒕 =
𝒏−𝟐

shs.mapua.edu.
Example
1. The table below presents the number of absences and the final
grades of seven randomly selected students from a Biology
Class.
Student 1 2 3 4 5 6 7
No of absences (X) 6 2 15 9 12 5 8
Final Grade (Y) 82 86 43 74 58 90 78

a. Find the equation of the regression line and predict the score if
a student accumulates a total number of 11 absences.

shs.mapua.edu.
Example

Student 1 2 3 4 5 6 7 Total

No of absences (X) 6 2 15 9 12 5 8 57

Final Grade (Y) 82 86 43 74 58 90 78 511

XY 492 172 645 666 696 450 624 3745

X2 36 4 225 81 144 25 64 579

Y2 6724 7396 1849 5476 3364 8100 6084 38993

shs.mapua.edu.
Example
Solve for 𝒂.

𝑎 = σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑛 σ 𝑋2 − σ 𝑋 2

shs.mapua.edu.
Example
Solve for 𝒂.

σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑎=
𝑛 σ 𝑋2 − σ 𝑋 2

= 2
511 7 579
579 −
5757 3745

𝑎 = 102.492537

shs.mapua.edu.
Example
Solve for 𝒃.


�σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏= 𝑛 σ 𝑋2 − σ 𝑋 2

shs.mapua.edu.
Example
Solve for 𝒃.

�σ 𝑋𝑌 − σ 𝑋 σ 𝑌

𝑏= 𝑛 σ 𝑋 2
− σ 𝑋
2

shs.mapua.edu.
Example−
= 2
7 73745
579 −
5757 511

𝑏 = −3.621891

shs.mapua.edu.
Example
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋

If 𝑿 = 𝟏𝟏:

shs.mapua.edu.
Example
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋
= 102.492537 + −3.621891 𝑋
𝒀′ = 𝟏𝟎𝟐. 𝟒𝟗𝟐𝟓𝟑𝟕 − 𝟑. 𝟔𝟐𝟏𝟖𝟗𝟏𝑿

If 𝑿 = 𝟏𝟏:
𝑌′ = 102.492537 − 3.621891 11
𝒀′ = 𝟔𝟐. 𝟔𝟓

This regression line explains that for every unit increase on the number of absences, the
final grade will decrease by about 3.621891

shs.mapua.edu.
Example
Regression Line Equation:
𝒀′ = 𝟏𝟎𝟐. 𝟒𝟗𝟐𝟓𝟑𝟕 − 𝟑. 𝟔𝟐𝟏𝟖𝟗𝟏𝑿

The intercept (Y) of the regression line is the value of Y’ when 𝑿 = 𝟎.

It means that when a student incurred 0 absences, his/her predicted


grade is 𝟏𝟎𝟐. 𝟒𝟗𝟐𝟓𝟑𝟕 𝒐𝒓 𝟏𝟎𝟐. 𝟒𝟗

shs.mapua.edu.
Exampl
1. The table below presents the number of absences and the final
grades of seven randomly selected students from a Biology
Class.
Student 1 2 3 4 5 6 7
No of absences (X) 6 2 15 9 12 5 8
Final Grade (Y) 82 86 43 74 58 90 78

b. Calculate the coefficient of determination

shs.mapua.edu.
Example

SOLUTION: Simply square the coefficient of correlation 𝒓.

shs.mapua.edu.
Example

SOLUTION: Simply square the coefficient of correlation 𝒓.

𝑟 2 = 0.9442 2

𝒓𝟐 = 𝟎. 𝟖𝟗𝟏𝟓

This means that 89.15% of the dependent variable is explained by the regression line
and the independent variable.
It also means that 89.15% of the data fits the regression line (line of best fit).

shs.mapua.edu.
Exampl
2. The table below presents the number of hours spent on studying for
a Math test and scores on a Math test of 6 randomly selected Grade
11 students.
Student 1 2 3 4 5 6
No. of hours spent 0.5 0.5 1.0 2.0 2.5 2.5
studying for a
Math Test (X)
Math Test Scores (Y) 50 56 55 65 68 70
a. Find the equation of the regression line and predict the score if a
student spent 1.5 hours on studying for the test.
b. Find the correlation coefficient and the coefficient of determination.

shs.mapua.edu.
Example
Student 1 2 3 4 5 6 Total
No. of hours spent studying 0.5 0.5 1.0 2.0 2.5 2.5
for a Math Test (X)
Math Test Scores (Y) 50 56 55 65 68 70

XY
X2
Y2

shs.mapua.edu.
Example

Student 1 2 3 4 5 6 Total
No. of hours spent studying
for a Math Test (X) 0.5 0.5 1.0 2.0 2.5 2.5 9
Math Test Scores (Y) 50 56 55 65 68 70 364
XY 25 28 55 130 170 175 583

X2 0.25 0.25 1 4 6.25 6.25 18

Y 2 2500 3136 3025 4225 4624 4900 22410

shs.mapua.edu.
Example

Solve for 𝒂.

𝑎 = σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑛 σ 𝑋2 − σ 𝑋 2

shs.mapua.edu.
Example

Solve for 𝒂.

σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑎=
𝑛 σ 𝑋2 − σ 𝑋 2

= 2
3646 18
18 −99 583

𝑎 = 48.333333

shs.mapua.edu.
Example

Solve for 𝒃.


�σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏= 𝑛 σ 𝑋2 − σ 𝑋 2

shs.mapua.edu.
Example

Solve for 𝒃.

�σ 𝑋𝑌 − σ 𝑋 σ 𝑌

𝑏= 𝑛 σ 𝑋 2
− σ𝑋
2

shs.mapua.edu.
Example −
= 2
6 658318 −
9 9 364

𝑏 = 8.222222

shs.mapua.edu.
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋

If 𝑿 = 𝟏. 𝟓:

shs.mapua.edu.
Example 2a
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋
= 48.333333 + (8.222222)𝑋
𝒀′ = 𝟒𝟖. 𝟑𝟑𝟑𝟑𝟑𝟑 + 𝟖. 𝟐𝟐𝟐𝟐𝟐𝟐𝑿

If 𝑿 = 𝟏. 𝟓:
𝑌′ = 48.333333 + 8.222222(1.5)
𝒀′ = 𝟔𝟎. 𝟔𝟕

This regression line explains that for every unit increase on the number of
hours spent on studying, the score on Math Test will increase by about
8.222222

shs.mapua.edu.
Regression Line Equation:
𝒀′ = 𝟒𝟖. 𝟑𝟑𝟑𝟑𝟑𝟑 + 𝟖. 𝟐𝟐𝟐𝟐𝟐𝟐𝑿

The intercept (Y) of the regression line is the value of Y’ when 𝑿 = 𝟎.

It means that when a student spent 0 hour on studying for a test,


his/her score on the test is predicted to be 𝟒𝟖. 𝟑𝟑𝟑𝟑𝟑𝟑 𝒐𝒓 𝟒𝟖. 𝟑𝟑

shs.mapua.edu.
Example 2b
Student 1 2 3 4 5 6 Total
No. of hours spent studying
for a Math Test (X) 0.5 0.5 1.0 2.0 2.5 2.5 9
Math Test Scores (Y) 50 56 55 65 68 70 364
XY 25 28 55 130 170 175 583

X2 0.25 0.25 1 4 6.25 6.25 18

Y2 2500 3136 3025 4225 4624 4900 22410

shs.mapua.edu.
Example

Solve for the correlation coefficient.

σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2

shs.mapua.edu.
Example
Solve for the correlation coefficient.

𝑛 σ 𝑋𝑌 − σ𝑋 σ𝑌
𝑟=
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2

6 583 − 9 364
= 6 18− 9 2 622410 − 364 2

𝒓 = 𝟎. 𝟗𝟔𝟒𝟏 (Strong Positive Correlation)


Number of hours spent on studying for the test and scores

shs.mapua.edu.
Example
on a Math Test has a strong positive correlation.

shs.mapua.edu.
Example

shs.mapua.edu.
Example

SOLUTION: Simply square the coefficient of correlation 𝒓.

shs.mapua.edu.
Example

SOLUTION: Simply square the coefficient of correlation 𝒓.

𝑟 2 = 0.9641 2

𝒓𝟐 = 𝟎. 𝟗𝟐𝟗𝟓
This means that 92.95% of the dependent variable is explained by the
regression line and the independent variable.
It also means that 92.95% of the data fits the regression line (line of best
fit).

shs.mapua.edu.
End of Presentation

You might also like