Professional Documents
Culture Documents
Regression Analyses
By: Rizalina Ponteras
Revised by: Redwaynne Jester T. Ipapo
Objectives
At the end of the lesson, the students should be able to:
• construct a scatter plot.
• describe shape (form), trend (direction), and variation (strength) based on a
scatter plot.
• calculate the Pearson’s sample correlation coefficient.
• solve problems involving correlation analysis.
• calculate the slope and y-intercept of the regression line.
• interpret the calculated slope and y-intercept of the regressionline.
• predict the value of the dependent variable given the value of the independent
variable.
• solve problems involving regression analysis using MS Excel
shs.mapua.edu.ph
Relationship Between
Common aim of research is to try to relate a variable of interest
to one or more other variables.
shs.mapua.edu.
Relationship Between
The question is, how can we determine if such relationship exist
between two variables?
shs.mapua.edu.
Correlation
1. Correlation – a statistical method used to determine whether a
relationship between variables exists.
shs.mapua.edu.
Correlation
3. Simple vs. Multiple Relationship
a. Simple Relationship – involves only two variables.
i. Independent Variable – also known as explanatory variable or
a predictor variable.
ii. Dependent Variable – also known as response variable.
b. Multiple Relationship – also known as multiple regression, which
involves two or more independent variables used to predict one
dependent variable.
shs.mapua.edu.
Correlation
4. Positive vs. Negative Relationship
a. Positive Relationship – exists when both variables increase or
decrease at the same time.
Ex. Problem-solving Skills and Grades in Mathematics
a. Negative Relationship – exists when one variable decreases as
the other one increases, and vice versa.
Ex. Number of Absences and Grades
shs.mapua.edu.
The Scatter Plot Diagram
shs.mapua.edu.
Scatter Plot
Scatter Plot – a graph of the
ordered pairs 𝑋, 𝑌 of numbers
consisting of the independent
variable 𝑋 and the dependent
variable 𝑌.
shs.mapua.edu.
Construct a Scatterplot Diagram
for the given data set.
Number of Absences and final Grades in Statistics
shs.mapua.edu.
Construct a Scatterplot Diagram
for the given data set.
Numbers of hours spent on exercise per week and amount of milk(in
ounces consumed per week.
shs.mapua.edu.
Correlation
Correlation Coefficient / Pearson Product-Moment Correlation
Coefficient (𝑟) – measures the strength and direction of a linear
relationship between two variables.
𝒏 σ 𝑿𝒀 − σ 𝑿 σ 𝒀
𝒓=
𝒏 σ 𝑿𝟐 − σ 𝑿 𝟐 𝒏 σ 𝒀𝟐 − σ 𝒀 𝟐
shs.mapua.edu.
Correlation
𝒓 Interpretation
-1 Perfect Negative Correlation
-0.51 to -0. 99 Strong Negative Correlation
-0.50 Some Negative Correlation
-0.01 to -0.49 Weak Negative Correlation
0 No Correlation
0.01 to 0.49 Weak Positive Correlation
0.50 Some Positive Correlation
0.51 to 0.99 Strong Positive Correlation
1 Perfect Positive Correlation
shs.mapua.edu.
Exampl
The table below presents the number of absences and the final
grades in Statistics of seven randomly selected students from a
Biology Class. Find the correlation coefficient.
Student 1 2 3 4 5 6 7
No of absences
6 2 15 9 12 5 8
(X)
Final Grade (Y) 82 86 43 74 58 90 78
shs.mapua.edu.
Example
The table below presents the number of absences and the
final grades of seven randomly selected students from a
Biology Class. Find the correlation coefficient.
Student 1 2 3 4 5 6 7
No of absences
6 2 15 9 12 5 8
(X)
Final Grade (Y) 82 86 43 74 58 90 78
shs.mapua.edu.
Example
The table below presents the number of absences and the
final grades of seven randomly selected students from a
Biology Class. Find the correlation coefficient.
Student 1 2 3 4 5 6 7 Total
No of absences
6 2 15 9 12 5 8 57
(X)
Final Grade (Y) 82 86 43 74 58 90 78 511
XY 492 172 645 666 696 450 624 3745
shs.mapua.edu.
Exampl
σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2
shs.mapua.edu.
Exampl
σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2
= 73745 − 57511
7579 − 57 2738993 − 511 2
shs.mapua.edu.
Exampl
The table below presents the height (in feet) and weight (in
pounds) of 10 randomly selected Volleyball players. Find
the correlation coefficient.
Volleyball Players 1 2 3 4 5 6 7 8 9 10
Height (X) 73 71 75 72 72 75 67 69 71 69
Weight (Y) 185 175 200 210 190 195 150 170 180 175
shs.mapua.edu.
Exampl
The table below presents the height (in feet) and weight (in
pounds) of 10 randomly selected Volleyball players. Find the
correlation coefficient.
Volleyball Players 1 2 3 4 5 6 7 8 9 10 Total
Height (X) 73 71 75 72 72 75 67 69 71 69
Weight (Y) 185 175 200 210 190 195 150 170 180 175
𝑋𝑌
𝑋2
𝑌2
shs.mapua.edu.
Exampl
The table below presents the height (in feet) and weight (in
pounds) of 10 randomly selected Volleyball players. Find
the correlation coefficient.
Student 1 2 3 4 5 6 7 8 9 10 Total
Height (X) 73 71 75 72 72 75 67 69 71 69 714
Weight (Y) 185 175 200 210 190 195 150 170 180 175 1830
𝑋𝑌 13505 12425 15000 15120 13680 14625 10050 11730 12780 12075 130990
𝑋2 5329 5041 5625 5184 5184 5625 4489 4761 5041 4761 51040
𝑌2 34225 30625 40000 44100 36100 38025 22500 28900 32400 30625 337500
shs.mapua.edu.
Exampl
σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2
shs.mapua.edu.
Exampl
σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2
shs.mapua.edu.
Exampl
10130990 − 7141830
= 1051040 − 714 210337500 − 1830 2
shs.mapua.edu.
Regression
1. Regression – a statistical
method used to describe the
nature of the relationship
between variables.
2. Regression Line – the data’s
line of best fit.
shs.mapua.edu.
Why Do We Care about Regression
shs.mapua.edu.
Regression
3. Regression Line Equation
𝒀′ = 𝒂 + 𝒃𝑿
where:
σ𝒀 σ 𝑿𝟐 − σ 𝑿σ 𝑿𝒀
𝒂=
𝒏 σ 𝑿𝟐 − 𝟐
σ𝑿
𝒏
𝒃= σ 𝑿𝒀 − σ 𝑿 σ𝒀
𝒏 σ 𝑿𝟐 − σ 𝑿 𝟐
shs.mapua.edu.
Regression
4. Coefficient of Determination (𝒓𝟐) – a measure of the variation of the
dependent variable that is explained by the regression line and the
independent variable.
𝟐
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
𝒓 = 𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
where:
𝒀′ −
ഥ𝒀 𝟐
𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 =
shs.mapua.edu.
Regression
𝒀−
ഥ𝒀 𝟐
𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 =
shs.mapua.edu.
Regression
𝟐
𝑼𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 = 𝒀 − 𝒀′
𝑻𝒐𝒕𝒂𝒍 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
= 𝑬𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏 + 𝑼𝒏𝒆𝒙𝒑𝒍𝒂𝒊𝒏𝒆𝒅 𝑽𝒂𝒓𝒊𝒂𝒕𝒊𝒐𝒏
𝟐
𝒀 −ഥ𝒀 = 𝒀′ −ഥ𝒀 𝟐
+ 𝒀 − 𝒀′ 𝟐
shs.mapua.edu.
Regression
5. Coefficient of Non-Determination
𝟏 − 𝒓𝟐
σ 𝒀 − 𝒀′ 𝟐
𝒔𝒆𝒔𝒕 =
𝒏−𝟐
shs.mapua.edu.
Example
1. The table below presents the number of absences and the final
grades of seven randomly selected students from a Biology
Class.
Student 1 2 3 4 5 6 7
No of absences (X) 6 2 15 9 12 5 8
Final Grade (Y) 82 86 43 74 58 90 78
a. Find the equation of the regression line and predict the score if
a student accumulates a total number of 11 absences.
shs.mapua.edu.
Example
Student 1 2 3 4 5 6 7 Total
No of absences (X) 6 2 15 9 12 5 8 57
shs.mapua.edu.
Example
Solve for 𝒂.
𝑎 = σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑛 σ 𝑋2 − σ 𝑋 2
shs.mapua.edu.
Example
Solve for 𝒂.
σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑎=
𝑛 σ 𝑋2 − σ 𝑋 2
−
= 2
511 7 579
579 −
5757 3745
𝑎 = 102.492537
shs.mapua.edu.
Example
Solve for 𝒃.
�
�σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏= 𝑛 σ 𝑋2 − σ 𝑋 2
shs.mapua.edu.
Example
Solve for 𝒃.
�σ 𝑋𝑌 − σ 𝑋 σ 𝑌
�
𝑏= 𝑛 σ 𝑋 2
− σ 𝑋
2
shs.mapua.edu.
Example−
= 2
7 73745
579 −
5757 511
𝑏 = −3.621891
shs.mapua.edu.
Example
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋
If 𝑿 = 𝟏𝟏:
shs.mapua.edu.
Example
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋
= 102.492537 + −3.621891 𝑋
𝒀′ = 𝟏𝟎𝟐. 𝟒𝟗𝟐𝟓𝟑𝟕 − 𝟑. 𝟔𝟐𝟏𝟖𝟗𝟏𝑿
If 𝑿 = 𝟏𝟏:
𝑌′ = 102.492537 − 3.621891 11
𝒀′ = 𝟔𝟐. 𝟔𝟓
This regression line explains that for every unit increase on the number of absences, the
final grade will decrease by about 3.621891
shs.mapua.edu.
Example
Regression Line Equation:
𝒀′ = 𝟏𝟎𝟐. 𝟒𝟗𝟐𝟓𝟑𝟕 − 𝟑. 𝟔𝟐𝟏𝟖𝟗𝟏𝑿
shs.mapua.edu.
Exampl
1. The table below presents the number of absences and the final
grades of seven randomly selected students from a Biology
Class.
Student 1 2 3 4 5 6 7
No of absences (X) 6 2 15 9 12 5 8
Final Grade (Y) 82 86 43 74 58 90 78
shs.mapua.edu.
Example
shs.mapua.edu.
Example
𝑟 2 = 0.9442 2
𝒓𝟐 = 𝟎. 𝟖𝟗𝟏𝟓
This means that 89.15% of the dependent variable is explained by the regression line
and the independent variable.
It also means that 89.15% of the data fits the regression line (line of best fit).
shs.mapua.edu.
Exampl
2. The table below presents the number of hours spent on studying for
a Math test and scores on a Math test of 6 randomly selected Grade
11 students.
Student 1 2 3 4 5 6
No. of hours spent 0.5 0.5 1.0 2.0 2.5 2.5
studying for a
Math Test (X)
Math Test Scores (Y) 50 56 55 65 68 70
a. Find the equation of the regression line and predict the score if a
student spent 1.5 hours on studying for the test.
b. Find the correlation coefficient and the coefficient of determination.
shs.mapua.edu.
Example
Student 1 2 3 4 5 6 Total
No. of hours spent studying 0.5 0.5 1.0 2.0 2.5 2.5
for a Math Test (X)
Math Test Scores (Y) 50 56 55 65 68 70
XY
X2
Y2
shs.mapua.edu.
Example
Student 1 2 3 4 5 6 Total
No. of hours spent studying
for a Math Test (X) 0.5 0.5 1.0 2.0 2.5 2.5 9
Math Test Scores (Y) 50 56 55 65 68 70 364
XY 25 28 55 130 170 175 583
shs.mapua.edu.
Example
Solve for 𝒂.
𝑎 = σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑛 σ 𝑋2 − σ 𝑋 2
shs.mapua.edu.
Example
Solve for 𝒂.
σ 𝑌 σ 𝑋 2 − σ 𝑋 σ 𝑋𝑌
𝑎=
𝑛 σ 𝑋2 − σ 𝑋 2
−
= 2
3646 18
18 −99 583
𝑎 = 48.333333
shs.mapua.edu.
Example
Solve for 𝒃.
�
�σ 𝑋𝑌 − σ 𝑋 σ 𝑌
𝑏= 𝑛 σ 𝑋2 − σ 𝑋 2
shs.mapua.edu.
Example
Solve for 𝒃.
�σ 𝑋𝑌 − σ 𝑋 σ 𝑌
�
𝑏= 𝑛 σ 𝑋 2
− σ𝑋
2
shs.mapua.edu.
Example −
= 2
6 658318 −
9 9 364
𝑏 = 8.222222
shs.mapua.edu.
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋
If 𝑿 = 𝟏. 𝟓:
shs.mapua.edu.
Example 2a
Regression Line Equation:
𝑌′ = 𝑎 + 𝑏𝑋
= 48.333333 + (8.222222)𝑋
𝒀′ = 𝟒𝟖. 𝟑𝟑𝟑𝟑𝟑𝟑 + 𝟖. 𝟐𝟐𝟐𝟐𝟐𝟐𝑿
If 𝑿 = 𝟏. 𝟓:
𝑌′ = 48.333333 + 8.222222(1.5)
𝒀′ = 𝟔𝟎. 𝟔𝟕
This regression line explains that for every unit increase on the number of
hours spent on studying, the score on Math Test will increase by about
8.222222
shs.mapua.edu.
Regression Line Equation:
𝒀′ = 𝟒𝟖. 𝟑𝟑𝟑𝟑𝟑𝟑 + 𝟖. 𝟐𝟐𝟐𝟐𝟐𝟐𝑿
shs.mapua.edu.
Example 2b
Student 1 2 3 4 5 6 Total
No. of hours spent studying
for a Math Test (X) 0.5 0.5 1.0 2.0 2.5 2.5 9
Math Test Scores (Y) 50 56 55 65 68 70 364
XY 25 28 55 130 170 175 583
shs.mapua.edu.
Example
σ 𝑋𝑌 σ𝑋 σ𝑌
𝑟= 𝑛 −
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2
shs.mapua.edu.
Example
Solve for the correlation coefficient.
𝑛 σ 𝑋𝑌 − σ𝑋 σ𝑌
𝑟=
𝑛 σ 𝑋2 − σ 𝑋 2 𝑛 σ 𝑌2 − σ 𝑌 2
6 583 − 9 364
= 6 18− 9 2 622410 − 364 2
shs.mapua.edu.
Example
on a Math Test has a strong positive correlation.
shs.mapua.edu.
Example
shs.mapua.edu.
Example
shs.mapua.edu.
Example
𝑟 2 = 0.9641 2
𝒓𝟐 = 𝟎. 𝟗𝟐𝟗𝟓
This means that 92.95% of the dependent variable is explained by the
regression line and the independent variable.
It also means that 92.95% of the data fits the regression line (line of best
fit).
shs.mapua.edu.
End of Presentation