You are on page 1of 11

RIZAL TECHNOLOGICAL UNIVERSITY

Cities of Mandaluyong and Pasig

MODULE NO. 10
Title : Simple Regression Analysis

TOPICS OUTLINE :
1. Concept of simple regression analysis
2. Computation of slope and y-intercept of the regression line or
regression equation.
3. Interpreting the slope and y-intercept of the regression line.
4. Developing regression line and regression equation.
5. Predicting the value of the dependent variable using the
regression equation when the independent variable is known.

LEARNING OUTCOMES:
At the end of the lesson , the students will be able to :
1. Discuss the concept of simple regression analysis.
2. Compute the slope and y-intercept of the regression line or
regression equation.
3. Interpret the slope and y-intercept of the regression line.
4. Develop the regression line or regression equation.
5. Predict the value of the dependent variable using the regression
equation when the independent variable is known.
6. Investigate the linear relationship between two variables and
obtaining a regression equation to describe the relationship.

Overview :
When Francis Galton was working with large observation studies
on humans in the mid late 1800’s, he noticed that there is a regression
towards the mean effect after observing the heights of fathers and first
sons. That is how regression got its name.

In the previous Module, that is about Correlation we have studied how


to describe the strength of linear relationship between two
quantitative variables in a given distribution. In this Module, we will

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 1


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

describe the relationship by means of mathematical model or equation


that has a predictive value. But before that, we will introduce the basic
concept and procedures of simple linear regression analysis.

TOPIC PRESENTATION

What is Simple Linear Regression?


Simple Linear Regression - studies and summarizes the
relationship between two quantitative data particularly continuous
variables. The word “simple” is used because only one independent
variable or predictor and the dependent variable is also known as the
response or outcome.

Using the regression analysis to find the “line of best fit,” consider a
regression line equation:

𝒚 = 𝒂 + 𝒃𝒙

Where 𝒂 = y intercept or the ordinate of a point where


the line passes through y-axis

b = the slope of a line

We can find 𝒂 using the formula :

̅−𝒃𝒙
𝑎= 𝒚 ̅

∑𝒙
Where : ̅=
𝒙 (mean of distribution x)
𝒏

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 2


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

∑𝒚
̅=
𝒚 (mean of distribution y)
𝒏

𝒏= number of respondents

We can find 𝒃 using the formula :

𝑛∑𝑥𝑦 − ∑𝑥∑𝑦
𝑏=
𝑛∑𝑥 2 − (∑𝑥)2

Where : ∑𝒙 = sum of observed values in X ( in the independent


variable also known as Predictor
∑𝒚 = sum of the observed values in Y (the dependent
Variable also known as the response variable)
∑xy = sum of the products of X and Y
∑𝒙𝟐 = sum of the squares of X
(∑𝒙)𝟐 = square of the sum of X

𝒏= number of respondents

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 3


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Example :
Fifteen randomly selected students were asked about the number of
hours they spent in studying their lessons (X) before they took the
Advanced Statistics test. Their scores (Y ) are listed in the table below:

X Y
STUDENT Number of Hours
Scores
Spent in Studying
A 0 50
B 1 55
C 2 60
D 3 61
E 4 63
F 5 65
G 6 68
H 7 70
i 8 72
J 9 74
K 10 76
L 11 78
M 12 80
N 13 83
O 14 85

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 4


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Question No. 1 - Using the given data above determine the equation of
the regression line.

X Y
No. of XY X2 Y2
Hours Scores
Spent
0 50 0 0 2500
1 55 55 1 3,025
2 60 120 4 3,600
3 61 183 9 3,271
4 63 252 16 3,969
5 65 325 25 4,225
6 68 408 36 4,624
7 70 490 49 4,900
8 72 576 64 5,184
9 74 666 81 5,476
10 76 760 100 5,776
11 78 858 121 6,084
12 80 960 144 6,400
13 83 1,079 169 6,889
14 85 1,190 196 7,225

∑X = 105 ∑Y=1,040 ∑XY = 7,922 ∑X2=1,015 ∑Y2= 73,598

Using the Formula for b , find the value of b

𝑛∑𝑥𝑦− ∑𝑥∑𝑦
𝑏=
𝑛∑𝑥 2 −(∑𝑥)2
Substitution:

15(7922)−(105)(1040)
𝑏=
15(1015)−(105)2

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 5


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

118,830 −109,200
𝑏= 15,225−11,025
9,630
𝑏=
4,200

𝑏 = 2.29

Solving for 𝒂:
Formula ̅−𝒃𝒙
𝑎= 𝒚 ̅

𝟏𝟎𝟒𝟎 𝟏𝟎𝟓
𝑎= − (𝟐. 𝟐𝟗)
𝟏𝟓 𝟏𝟓

𝑎 = 𝟔𝟗. 𝟑𝟑 − (𝟐. 𝟐𝟗)(𝟕)

𝑎 = 𝟔𝟗. 𝟑𝟑 − 𝟏𝟔. 𝟎𝟑

𝑎 = 𝟓𝟑. 𝟑0

To get the equation of the regression we have :


𝒚 = 𝒂 + 𝒃𝒙
Substitution:
𝒚 = 𝟓𝟑. 𝟑𝟎 + 𝟐. 𝟐𝟗𝑿
The equation of the regression line is 𝒚 = 𝟓𝟑. 𝟑𝟎 + 𝟐. 𝟐𝟗𝑿.
We can predict the score of a certain student will get if he used a
certain number of hours in studying his lessons with the help of the
equation of the regression line.

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 6


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

Question No. 2 : Predict the score of the student who studied for
a) 15 hours
b) 17 hours
Solution: for letter a
𝒚 = 𝟓𝟑. 𝟑𝟎 + 𝟐. 𝟐𝟗(𝟏𝟓)
𝒚 = 𝟓𝟑. 𝟑𝟎 + 𝟑𝟒. 𝟑𝟓
𝒚 = 𝟖𝟕. 𝟔𝟓 𝒐𝒓 𝟖𝟖
a) If the student studied for 15 hours before taking the
Advanced Statistics test his predicted score is 88.

Solution : for letter b


𝒚 = 𝟓𝟑. 𝟑𝟎 + 𝟐. 𝟐𝟗(𝟏𝟕)
𝒚 = 𝟓𝟑. 𝟑𝟎 + 𝟑𝟖. 𝟗𝟑
𝒚 = 𝟗𝟐. 𝟐𝟑 𝒐𝒓 𝟗𝟐
b) If the student studied for 17 hours before taking the
Advanced Statistics test his predicted score is 92.

Question No. 3: Interpret the slope (b).


Answer : The computed value of the slope (b) from the example
above is 2.29. This tells us that the score of a student who spent a
certain number of hours studying is 2.29 points greater than the score
of a student who studied for one hour less. This implies that for every
additional one hour used by the student in studying the lessons about
2.29 points are added to his score. In general we can predict the mean
response to increase or decrease in the value of the slope (b) units for
every one-unit increase in X.

Question No. 4: Interpret the y-intercept (a).


Answer: The computed value of the y-intercept (a) from the example
above is 53.30. This tells us that when X=0, the predicted score is

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 7


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

53.30. This implies that the student who did not study his lessons at
all before taking the test will get a score of about 53.30 or 53. In
general, the y-intercept (a) represents the predicted value as long as
the scope of the regression model defines X=0. Otherwise y-intercept
(a) is not meaningful at all.

SOLVING REGRESSION USING MS EXCEL :


1. TYPE IN THE DATA AS SHOWN BY THE SCREENSHOT BELOW:
FIRST COLUMN VALUES OF X 2ND COLUMN VALUES OF Y

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 8


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

2. CLICK DATA THEN DATA ANALYSIS

3. SELECT REGRESSION THEN CLICK OK

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 9


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

4. HIGHLIGHT/ASSIGN THE VALUES OF X AND Y.


SELECT THE OUTPUT RANGE THEN CLICK OK

5. THE SUMMARY OUTPUT APPEARS

a = 53.28 rounded off = 53.30


b = 2.29

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 10


RIZAL TECHNOLOGICAL UNIVERSITY
Cities of Mandaluyong and Pasig

References

References and Open Educational Resources (OER)


Books:

1. Albert, Jose Ramon G., Ph.D. Training Manual, Teaching for Senior
High School Statistics and Probability, 2016
2. Dayrit, Benjamin C., Hernandez Rogelio, Ymas Sergio, Jr. E.
College Statistics 2016
3. Sirug, Winston S. Ph. D., Basic Probability and Statistics
A Step by Step Approach.

COURSE TITLE (Statistical Analysis AE9) YOLANDA P. EVANGELISTA 11

You might also like