You are on page 1of 20

TOPIC

Correlation & Regression

Part 02 : Least Square


Regression Line
Regression Line,
y  a  bx
Least square Regression Line
• If the value of the correlation coefficient is significant, the
next step is to determine the equation of the regression line,
which is the data’s line of best fit.

• the regression line will be applied as a tool for forecasting.

• The linear regression equation can be written in the form of


y = a + bx.

• In this equation y is the dependent variable, x is the independent


variable, a is the y-intercept and b is the slope/gradient of the
regression line.
Formula Least square Regression Line
y  a  bx
a = y -intercept b = gradient

nxy   x y 


b

n x 2  x  
2

y  x 
a  b 
n  n 
Example 1
A car's engineer wants to study the relationship between the cars speed (km per hour) and
the fuel consumption (km per liter) for the new models. Those cars were driven at multi-speed
test and the results were recorded in table as below. Find the linear regression equation of
cars’ speed and fuel consumption.
Speed (km/h), x 30 40 50 60 70 80 90 100 110
Fuel consumption, y 39 38 36 32 28 27 25 23 22

Thus, the regression line is


Example 1 (Using excel)
A car's engineer wants to study the relationship between the cars speed (km per hour) and the fuel
consumption (km per liter) for the new models. Those cars were driven at multi-speed test and the results
were recorded in table as below. Find the linear regression equation of cars’ speed and fuel consumption.

Speed (km/h), x 30 40 50 60 70 80 90 100 110


Fuel consumption, y 39 38 36 32 28 27 25 23 22

Data Data Analysis Regression OK


Example 1 (Using excel)

Select dependent variable, y and


independent variable, x

Click at any place you wish


your result will appear
Example 1 (Using excel)

Thus, the regression line is


Example 2
The following table presents sample data relating the number of study per week
spent by student and their scores in an examination. Find the linear regression
equation of study hours and examination scores.
Study hours, x 10 34 23 27 32 18 22 25
Examination score, y 51 84 70 88 92 65 75 77
Interpretation of the regression
coefficient.

Y=a+bx
Interpretation of the regression coefficient
Y=a+bx
a ‘a’ is the y-intercept, or value of y when x=0
• when x=0, y=a.
• We can state that when x=0 the expected y is a

‘b’ is the slope of the regression line;

b when b>0, it means y increase by ‘b’ units for an additional unit of x


 when b<o, it means y decrease by ‘b’ units for an additional unit of
x

• We can state that, on average the (y) will increase/decrease by b units


for an additional unit of (x).
From Example 1 – cars’ speed and fuel consumption
The regression line is

a = 46.331
We can state that when cars’ speed is 0 km/h the expected fuel
consumption is 46.331 km/l

b = -0.2333
We can state that, on average the fuel consumption will decrease by
0.2333km/l for an additional 1 km/h of cars’ speed.
From Example 2 – Study hours and examination score
The regression line is
Prediction using regression
line
Y=a+bx
- Extrapolation and interpolation
Prediction using regression line
● The regression line constructed is used for the purpose of
prediction or estimation for the variable of y when value of x is
given.

● Some predictions are more accurate than others, due to the


strength of the relationship.

● That is, the higher is the degree of the correlation between the
variables (significant value of r, r > 0.7), the more accurate is the
prediction made.
Extrapolation and interpolation
● Extrapolation means using the regression line to find the values of
y, when the value of x is outside the range of the observations.
Extrapolation results in the forecast being less accurate and
unreliable.

● Interpolation means using the regression line to find the values of


y, when the value of x within the range of the observations.
Interpolation leads to the forecast being more accurate and more
reliable than was the case for extrapolation
From Example 1 – cars’ speed, x and fuel consumption,y
The regression line is

Predict the fuel consumption if the cars’ speed is 65 km/h. ls the


estimation reliable? Explain your answer.
Solution

The fuel consumption is 46.10 km/l if the car's speed is 65km/h. This is an
interpolation estimate, it is reliable. ---1M
Speed (km/h), x 30 40 50 60 70 80 90 100 110
Fuel consumption, y 39 38 36 32 28 27 25 23 22
From Example 2 – Study hours and examination score
The regression line is

Predict the examination score if the study hours is 37 hours. ls the estimation
reliable? Explain your answer.

Study hours, x 10 34 23 27 32 18 22 25
Examination score, y 51 84 70 88 92 65 75 77
Exercise 2 (Correlation and Regression line by using excel)
The following table presents the number of pages in the book versus the price of a
book for 10 books in a lecturer room.

Number of pages 500 190 240 300 350 410 490 100 550 540
Price of book 50 25 50 75 50 40 45 32 60 55
(RM)
a. State the dependent and independent variable.
b. By using excel, draw a scatter diagram for the above data.
c. Find and interpret the Pearson’s Product Moment Correlation Coefficient.
d. Calculate and interpret the coefficient of determination.

e. Find the regression equation of number of pages and price of book.


f. Interpret the values of the regression coefficients.
g. Estimate the price of book if number of pages are 90 pages and 600 pages
respectively. Comment on the reliability of your estimates.
Part 02
THANK
YOU!
• Determine the least square regression line using excel.
• Interpretation of the regression coefficient.
• Predict using regression line

You might also like