You are on page 1of 38

Simple Linear Regression

1
Population Regression model

Population Random
Population Independent Error
Slope
y intercept Variable term, or
Coefficient
Dependent residual

Y  β 0  β1X  ε
Variable

Linear component Random Error


component

2
Linear Regression Assumptions
• Error values (ε) are statistically independent
• Error values are normally distributed for any given value of x
• The probability distribution of the errors is normal
• The probability distribution of the errors has constant variance
• The underlying relationship between the x variable and the y
variable is linear

3
Population Linear Regression
(continued)

y Yˆ  β̂ 0  β̂1X
Observed Value
of y for xi Slope = β1
εi
Predicted Value
Random Error
of y for xi
for this x value

Intercept = β0

xi x
4
Estimated Regression Model

The sample regression line provides an estimate of


the population regression line

Estimated Estimate of Estimate of the


(or predicted) the regression regression slope
y value
intercept
Independent

ŷ i  b0  b1x variable

The individual random error terms ei have a mean of zero

5
Scatter plot
• Plot of All (Xi, Yi) Pairs
• Suggests How Well Model Will Fit

Y
60
40
20
0 X
0 20 40 60
6
Thinking Challenge
• How would you draw a line through the points? How do you determine
which line ‘fits best’?

Y
60
40
20
0 X
0 20 40 60

7
Thinking Challenge
• How would you draw a line through the points? How do you determine
which line ‘fits best’?

Slope changed
Y
60
40
20
0 X
0 20 40 60
Intercept unchanged
8
Thinking Challenge
• How would you draw a line through the points? How do you determine
which line ‘fits best’?

Slope changed

Y
60
40
20
0 X
0 20 40 60
Intercept changed
9
Least Squares

• 1. ‘Best Fit’ Means Difference Between Actual Y Values & Predicted Y Values
Are a Minimum. So square errors!

    ˆ
n n
2
 Y  Yˆ 2
i
i i
i 1 i 1
• 2. LS Minimizes the Sum of the Squared Differences (errors) (SSE)

10
Least square estimators

11
Example
A study was made by a retail merchant to determine the relation between
weekly advertising expenditures (X) and sales (Y). Estimate regression line to
predict weekly sales from advertising expenditures and interpret it. Predict the
Sales for a weekly expenditures 29. Also test the Significance of the model and
find R2 and interpret it.
Y X
385 20
400 15
395 25
440 40

1620 100
12
Example - Computations
• Computation of Simple Linear Regression equation

Y X
385 20
400 15
395 25
440 40
1620 100
13
Example - Computations
• Computation of Simple Linear Regression equation

Y X x
385 20 -5
400 15 -10
395 25 0
440 40 15
1620 100 0
14
Example - Computations
• Computation of Simple Linear Regression equation

Y X x y
385 20 -5 -20
400 15 -10 -5
395 25 0 -10
440 40 15 35
1620 100 0 0
15
Example - Computations
• Computation of Simple Linear Regression equation

Y X x y xy
385 20 -5 -20 100
400 15 -10 -5 50
395 25 0 -10 0
440 40 15 35 525
1620 100 0 0 675
16
Example - Computations
• Computation of Simple Linear Regression equation

Y X x y xy x2
385 20 -5 -20 100 25
400 15 -10 -5 50 100
395 25 0 -10 0 0
440 40 15 35 525 225
1620 100 0 0 675 350
17
Example - Computations
• Computation of Simple Linear Regression equation

Y X x y xy x2 y2
385 20 -5 -20 100 25 400
400 15 -10 -5 50 100 25
395 25 0 -10 0 0 100
440 40 15 35 525 225 1225
1620 100 0 0 675 350 1750
18
Estimated Linear regression between Sales and
Advertising Expenditures

^ =356.75+1.93 𝐗
𝒀

^
𝑺𝒂𝒍𝒆𝒔 =356.75+1.93 𝐀𝐝𝐯 . 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞𝐬
19
Interpretation ^ =356.75+1.93 𝐗
𝒀
^
𝑺𝒂𝒍𝒆𝒔 =356.75+1.93 𝐀𝐝𝐯 . 𝐄𝐱𝐩𝐞𝐧𝐝𝐢𝐭𝐮𝐫𝐞𝐬

• The value of b1=1.93, indicates that the average sales are


expected to increase by 1.93 Rs. With each one Rs. increase
in the Advertising Expenditures.

• The value of b0=356.75 indicates the average sales without


any expenditures on Advertisements. The interpretation of
b0 is not always meaningful.
20
Test of Hypothesis for 𝛽1
Step-1:- Construction of Hypothesis Step-5:- Decision Rule, Reject H0 if

Step-2:- Level of Significance


Step-6:- Results
𝛼=0.05
Step-3:- Test Statistic


𝟐
𝒃𝟏 − 𝜷 𝟏 𝑺 𝒆
𝑡= 𝑤h𝑒𝑟𝑒 𝑺𝑬 ( 𝒃 𝟏 )=
𝑺𝑬 (𝒃 𝟏)
∑𝒙 𝟐

Step-4:- Calculations
21
Calculation of Residual Mean Square or

Y X (Y - )2
385 20 395.35 -10.35 107.12 ( ∑ 𝒙𝒚 )
𝟐

∑ 𝒚 𝟐−
400 15 ∑ 𝒙𝟐
204.49
𝟐
385.70 14.30 𝑺 𝒆 =
𝒏−𝟐
395 25 405.00 -10.00 100.00
440 40 433.95 6.05 36.60
1620 100 0 448.22 𝑺𝟐𝒆 =𝟐𝟐𝟒 .𝟏𝟏

∑ ^ 𝟐
𝟐 (𝒀 − 𝑌 )
𝑺 𝒆= =𝟐𝟐𝟒 . 𝟏𝟏
𝒏−𝟐
22
Test of Hypothesis for 𝛽1

Test statistic 𝒃𝟏 − 𝜷 𝟏 𝟏 .𝟗𝟑 − 𝟎


𝑡= = =𝟐 . 𝟒𝟏


𝑺𝑬 (𝒃 𝟏) 𝟐𝟐𝟒 .𝟏𝟏
𝟑𝟓𝟎
Table value 𝑡𝛼 = 4.303
, ( 𝑛 − 2)
2

Conclusion: We have not sufficient evidence from the sample to reject


the Null Hypothesis since the calculated value is not greater than the
table value.
Interpretation: Therefore, the Advertising expenditures do not have
significant effect on Sales at 5% level of significance. 23
PERCENTAGE POINT OF STUDENT'S t-DISTRIBUTION

Alpha
d.f. 0.250 0.100 0.050 0.025 0.0125 0.005
1 1.000 3.078 6.314 12.706 31.821 63.657
2 0.816 1.886 2.920 4.303 6.965 9.925
3 0.765 1.638 2.353 3.182 4.541 5.841
4 0.741 1.533 2.132 2.776 3.747 4.604
5 0.727 1.476 2.015 2.571 3.365 4.032
6 0.718 1.440 1.943 2.447 3.143 3.707
7 0.711 1.415 1.895 2.365 2.998 3.499
8 0.706 1.397 1.860 2.306 2.896 3.355
9 0.703 1.383 1.833 2.262 2.821 3.250
10 0.700 1.372 1.812 2.228 2.764 3.169
11 0.697 1.363 1.796 2.201 2.718 3.106
12 0.695 1.356 1.782 2.179 2.681 3.055
13 0.694 1.350 1.771 2.160 2.650 3.012
14 0.692 1.345 1.761 2.145 2.624 2.977
15 0.691 1.341 1.753 2.131 2.602 2.947
16 0.690 1.337 1.746 2.120 2.583 2.921
17 0.689 1.333 1.740 2.110 2.567 2.898
24
18 0.688 1.330 1.734 2.101 2.552 2.878
Confidence Interval for
𝑏1 ± 𝑡𝛼 ,ሺ𝑛−2ሻ𝑆𝐸 ሺ𝑏1 ሻ
2

• b1 is the estimate of
• SE(b1), already computed
• t is the table value
• There will be two limits of the confidence interval,
the lower limit and the upper limit

25
Confidence Interval for
𝑏1 ± 𝑡𝛼 ,ሺ𝑛−2ሻ𝑆𝐸 ሺ𝑏1 ሻ
2

𝟏 .𝟗𝟑 ± 𝟒 . 𝟑𝟎𝟑 (√ 𝟐𝟐𝟒 .𝟏𝟏


𝟑𝟓𝟎 )
( − 𝟏 . 𝟓𝟏 , 𝟓 . 𝟑𝟕 )

26
Test of Hypothesis for
Step-1:- Construction of Hypothesis Step-5:- Decision Rule, Reject H0 if

Step-2:- Level of Significance


Step-6:- Results
𝛼=0.05
Step-3:- Test Statistic

√ ( )
𝒃𝟎 − 𝜷 𝟎 𝟐 𝟏 𝑋
𝟐
𝑡= 𝑤h𝑒𝑟𝑒 𝑺𝑬 ( 𝒃 𝟎 )= 𝑺 +
𝑺𝑬 (𝒃 𝟎) 𝒆
𝒏 ∑ 𝒙𝟐
Step-4:- Calculations
27
Confidence Interval for
𝑏0 ± 𝑡𝛼 ,ሺ𝑛−2ሻ𝑆𝐸 ሺ𝑏0 ሻ
2

• b0 is the estimate of
• SE(b0), already computed
• t is the table value
• There will be two limits of the confidence interval,
the lower limit and the upper limit

28
ANOVA
• Partition of total variation in Response Variable Y into two components i.e.,
explained (due to Regression) and the unexplained (Residual) variation.
Explained variation is the variation due to regression i.e., variation due to X
and the unexplained variation is the variation due to uncontrolled factors
other than X.

TSS = Regression SS + Residual SS

29
Calculations

We will now construct ANOVA table to test the hypothesis that .

Source of Degree of Sum of squares Mean sum of Fcal Ftab


variation freedom (df) (SS) squares
(S.O.V) MSS=SS/df
Regression 1 1301.79 1301.79 5.80 18 . 513
Residual 2 448.21 224.11    
Total 3 1750      
As the calculated value of F is not greater than the table value, therefore, we
have not sufficient evidence against the Null Hypothesis and conclude that the
Sales have not been affected significantly from Adv. Expenditures. 30
Goodness of Fit (R ) 2

• A commonly used measure of the goodness of fit of a linear model is R2 called


coefficient of determination.
• The co-efficient of determination tells us the proportion of variation in the
response variable explained by the independent variable.

• The Advertisement Expenditures (X) explains 74.39% of the variation in Sales


(Y) and the rest is due to some unknown factors.
31
Example
Following data shows the Revenue of six firms in (000)$ along with the expenditures on
Research & Development (000)$.

Revenue in (000) $ Y 31 40 30 34 25 20
Expenditure on R & D in (000) $ X 5 11 4 5 3 2
• Draw a scatter plot and assess the relationship between Y and X.
• Fit simple linear regression equation and interpret the parameters.
• Test the hypothesis that there is no linear relation between Y and X i.e., β 1=0. Also
compute 95% Confidence Interval for β1.
• Test the hypothesis that β0 > 15. Compute 95% Confidence Interval for β0.
• Perform analysis of variance (ANOVA) and test the significance of the regression model.
Calculate coefficient of determination and interpret it.
• Test the hypothesis that the mean revenue for firm at X=9 is greater than 30 i.e., > 30. Also
construct 95% Confidence Interval.
32
Interpretation ^ =20 +2 𝑿
𝒀

• The value of b1=2, indicates that the average yield of rice is expected to
increase by 2 maunds with each one kg increase in the fertilizer.

• The value of b0=20 indicates that average yield of rice will be 20 kg without
using the fertilizer. The interpretation of b0 is not always meaningful.

Muhammad Usman 38
ANOVA
• Partition of total variation in Response Variable Y into two components i.e.,
explained (due to Regression) and the unexplained (Residual) variation.
Explained variation is the variation due to regression i.e., variation due to X
and the unexplained variation is the variation due to uncontrolled factors
other than X.

TSS = Regression SS + Residual SS

46
Calculation
s
We will now construct ANOVA table to test the hypothesis that .

Source of Degree of Sum of squares Mean sum of Fcal Ftab


variation (S.O.V) freedom (df) (SS) squares
MSS=SS/df

Regression 1 200 200.00 19.05 7.709


Error 4 42 10.50    
Total 5 242      
As the calculated value of F is greater than the table value of F i.e., 19.05 > 7.709,
therefore, we will reject the Null Hypothesis that the =0 and conclude that the
relationship between Y and X is significant. 47
Goodness of Fit (R ) 2

• A commonly used measure of the goodness of fit of a linear model is R2 called


coefficient of determination.
• The co-efficient of determination tells us the proportion of variation in the
dependent variable explained by the independent variable.

• The R& D expenditures (X) explains 82.64% of the variation in Revenue (Y) and
the rest is due to some unknown factors.
48
Test of hypothesis for

√ ( )
^
𝑌 𝑥 −𝜇 𝑦 / 𝑥 1 ( 𝑋 0− 𝑋 )
2

𝑡= ^
, 𝑤h𝑒𝑟𝑒 SE ( 𝑌 𝑥 )= 𝑺 𝒀 . 𝑿
𝟐
+
^
𝑆𝐸 ( 𝑌 )𝑥
𝑛 ∑ 𝒙
𝟐

49
Thanks

50

You might also like