You are on page 1of 27

Session 30

Simple Linear Regression


Assumptions About the Error Term 
1. The error  is a random variable with mean of zero.

2. The variance of  , denoted by  2, is the same for all values of the


independent variable.

3. The values of  are independent.

4. The error  is a normally distributed random variable.

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 2


Testing for Significance
• To test for a significant regression relationship, we must conduct a
hypothesis test to determine whether the value of b1 is zero.

• Two tests are commonly used:

t Test and F Test

• Both the t test and F test require an estimate of  2, the variance of  in


the regression model.

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 3


Testing for Significance
• An Estimate of  2
The mean square error (MSE) provides the estimate of  2, and the
notation s2 is also used.

s 2 = MSE = SSE/(n - 2)

where:

SSE=σ 𝑦𝑖 − 𝑦ො𝑖 2 = σ 𝑦𝑖 − 𝑏0 − 𝑏1 𝑥𝑖 2

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 4


Testing for Significance
• An Estimate of 
• To estimate s, we take the square root of s2.

• The resulting s is called the standard error of the estimate.

SSE
s = MSE =
𝑛−2

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 5


Testing for Significance: t Test
• Hypotheses

H0: b1 = 0
Ha: b1 ≠ 0

• Test Statistic

𝑏1 𝑠
𝑡= where 𝑠𝑏1 =
𝑠𝑏1 σ 𝑥𝑖 − 𝑥ҧ 2

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 6


Testing for Significance: t Test
• Rejection Rule

Reject H0 if p-value <  or t < -t or t > t

where:
t is based on a t distribution with n - 2 degrees of freedom

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 7


Testing for Significance: t Test
1. Determine the hypotheses. H0: b1 = 0
Ha: b1 ≠ 0

2. Specify the level of significance.  = .01

𝑏1
3. Select the test statistic. 𝑡=
𝑠𝑏1

4. State the rejection rule. Reject H0 if p-value < .005 or t > 3.355 (with
8 degrees of freedom)

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 8


Testing for Significance: t Test
𝑏1 5
5. Compute the value of the test statistic. 𝑡= = 0.5803 = 8.62
𝑠𝑏1

6. Determine whether to reject H0. As t = 8.6 > 3.355 We can reject H0.

𝑦ො = 10 + 5x = b1 = 5

𝑠= 𝑀𝑆𝐸 = 191.25 = 13.829


𝑠 13.829
𝑠𝑏1 = = = 0.5803
σ(𝑥𝑖 −𝑥)ҧ 2 568

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 9


Confidence Interval for b1
• The form of confidence interval for b1 is as follows:

𝑏1 ± 𝑡𝛼/2 𝑠𝑏1
where
b1 is the point estimator,
𝑡𝞪/2 𝑠𝑏1 is the margin of error, and
ta/2 is the t value providing an area of /2 in the upper tail of a
t distribution with n - 2 degrees of freedom
• We can use a 95% confidence interval for b1 to test the hypotheses just
used in the t test.

• H0 is rejected if the hypothesized value of b1 is not included in the


confidence interval for b1.
11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 10
Confidence Interval for b1
• Rejection Rule

Reject H0 if 0 is not included in the confidence interval for b1.

• 99% Confidence Interval for b1


𝑏1 ± 𝑡𝞪/2 𝑠𝑏1 = 5 +/- 3.355(0.5803) = 5 +/- 1.95 or 3.05 to 6.95

• Conclusion

0 is not included in the confidence interval. Reject H0

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 11


Testing for Significance: F Test
• Hypotheses

H0: b1 = 0
Ha: b1 ≠ 0

• Test Statistic

F = MSR/MSE

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 12


Testing for Significance: F Test
• Rejection Rule

Reject H0 if p-value < α or F > Fα

where:
F is based on an F distribution with
1 degree of freedom in the numerator and n - 2 degrees of
freedom in the denominator

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 13


Testing for Significance: F Test
1. Determine the hypotheses. H0: b1 = 0
Ha: b1 ≠ 0

2. Specify the level of significance.  = .05

3. Select the test statistic. F = MSR/MSE


MSR = SSR/ Number of independent variables
4. State the rejection rule.Reject H0 if p-value < .05 or F > 10.13 (with 1 d.f. in
numerator and 3 d.f. in denominator)

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 14


Testing for Significance: F Test
5. Compute the value of the test statistic.
For Armand’s Pizza Parlor, MSR = SSR / 1 (as there is only one
independent variable, i.e. student population)

F = MSR/MSE = 14200/191.25 = 74.25

6. Determine whether to reject H0.


The F distribution table shows that with 1 numeration d.o.f., and 10-
2=8 denominator d.o.f., F=11.26 provides an area of 0.01 in the upper
tail. Thus, the upper tail of test statistic = 74.25 provides an area much
less than of .01 in the upper tail. Hence, we reject H0.

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 15


Using the Estimated Regression
Equation for Estimation and Prediction
• A confidence interval is an interval estimate of the mean value of y for a
given value of x.

• A prediction interval is used whenever we want to predict an individual


value of y for a new observation corresponding to a given value of x.

• The margin of error is larger for a prediction interval.

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 16


Using the Estimated Regression
Equation for Estimation and Prediction
• Confidence Interval Estimate of E(y*)

𝑦ො ∗ ± 𝑡𝛼/2 𝑠𝑦ො ∗

• Prediction Interval Estimate of y*


𝑦ො ∗ ± 𝑡𝛼/2 𝑠𝑝𝑟𝑒𝑑

where:
confidence coefficient is 1 -  and t/2 is based on a t distribution with
n - 2 degrees of freedom

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 17


Confidence Interval for E(y*)
• Estimate of the Standard Deviation of 𝑦ො ∗

1 𝑥 ∗ − 𝑥ҧ 2
𝑠𝑦ො ∗ =𝑠 +
𝑛 σ 𝑥𝑖 − 𝑥ҧ 2

1 10 − 14 2
𝑠𝑦ො ∗ = 13.829 +
10 568

𝑠𝑦ො ∗ = 13.829 0.1282 = 4.95

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 18


Confidence Interval for E(y*)
The 95% confidence of the mean quarterly sales for all Armand’s
restaurants located near campuses with 10,000 students is:

𝑦ො ∗ ± 𝑡𝛼/2 𝑠𝑦ො ∗

110 + 2.306(4.95) in 000’s

110,000 + 11,415

$98,585 to $121,415

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 19


Prediction Interval for y*
• Estimate of the Standard Deviation of an Individual Value of y*

1 𝑥 ∗ − 𝑥ҧ 2
𝑠𝑝𝑟𝑒𝑑 =𝑠 1+ +
𝑛 σ 𝑥𝑖 − 𝑥ҧ 2

1 10 − 14 2
𝑠𝑝𝑟𝑒𝑑 = 13. . 829 1 + +
10 568

spred = 13.829 1.282 = 14.69

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 20


Prediction Interval for y*
The 95% prediction interval for quarterly sales for the new Armand’s
restaurant located near Talbot college, a campus with 10,000 students is:

𝑦ො ∗ ± 𝑡𝛼/2 𝑠𝑝𝑟𝑒𝑑

110 + 2.306(14.69) in 000’s

110,000 + 33,875

$76,125 to $143,875

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 21


Residual Analysis
• If the assumptions about the error term  appear questionable, the
hypothesis tests about the significance of the regression relationship
and the interval estimation results may not be valid.

• The residuals provide the best information about  .


• Residual for observation i
𝑦𝑖 − 𝑦ො𝑖
• Much of the residual analysis is based on an examination of graphical plots.

• If the assumption that the variance of  is the same for all values of x is
valid, and the assumed regression model is an adequate representation of
the relationship between the variables, then the residual plot should give
an overall impression of a horizontal band of points.
11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 22
Residual Plot Against x
𝑦 − 𝑦ො
Good Pattern
Residual

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 23


Residual Plot Against x
𝑦 − 𝑦ො
Non-constant Variance
Residual

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 24


Residual Plot Against x
𝑦 − 𝑦ො
Model Form Not Adequate
Residual

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 25


17-11-2021 (Wednesday) 2.15 PM - 4.15 PM
11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 26
Thank you

11/14/2021 Confidential - Prof. Vasanth Kamath, TAPMI 27

You might also like