You are on page 1of 4

Inferences on Regression Parameters

It is important to compute confidence intervals and conduct statistical hypothesis for


regression parameters. This helps to assess whether the fitted line predicts the response based on
the predictor variable and also to take into account the models are fitted based on a sample.
Since least square estimators are linear functions of y, we can consider they are normally
distributed. That is 𝑦~𝑁(𝜇, 𝜎 2 ). Then,
𝜎 2 1 𝑥̅ 2
𝛽̂ ~𝑁(𝛽, 𝑆 ) 𝛼̂~𝑁(𝛼, 𝜎 2 (𝑛 + 𝑆 ))
𝑥𝑥 𝑥𝑥

Then by standardizing (When 𝜎 2 unknown),


̂ −𝛼
𝛼 ̂ −𝛽
𝛽
1 ̅2
𝑥
~𝑡𝑛−2 ̂2
~𝑡𝑛−2
𝜎
̂ 2( +
√𝜎 ) √
𝑛 𝑆𝑥𝑥 𝑆𝑥𝑥

Hypothesis Confidence Interval


𝛼 𝐻0 : 𝛼 = 0 𝐻1 : 𝛼 ≠ 0
1 𝑥̅ 2
Test Statistic, Under 𝐻0 𝛼̂ ± 𝑡𝑛−2,𝛼⁄2 √𝜎̂ 2 ( + )
̂ −𝛼
𝛼 𝑛 𝑆𝑥𝑥
𝑇= 1 𝑥 ̅2
~𝑡𝑛−2
̂ 2( +
√𝜎 )
𝑛 𝑆𝑥𝑥
If |𝑇| > 𝑡𝑛−2,𝛼 , we can reject
𝐻0 and conclude that the
intercept is significant.
𝛽 𝐻0 : 𝛽 = 0 𝐻1 : 𝛽 ≠ 0
𝜎̂ 2
Test Statistic, Under 𝐻0 𝛽̂ ± 𝑡𝑛−2,𝛼⁄2 √
𝛽̂ − 𝛽 𝑆𝑥𝑥
𝑇= ~𝑡𝑛−2
𝜎̂ 2

𝑆𝑥𝑥
If |𝑇| > 𝑡𝑛−2,𝛼 , we can reject
𝐻0 and when 𝐻0 is not rejected
indicates that a straight-line
model in x is not the best
model to use and it cannot be
used in predicting the
response.
Analysis of Variance (ANOVA)
In regression analysis, Analysis of Variance (ANOVA) is a statistical technique that used
to determine the influence that predictor variables have on the response variable. Simply by the
ANOVA check whether the regression line fit the data well or not.
In ANOVA the main concept is the decomposition of the total sum of squares that is the
total deviation of the observations from the overall mean into two components which leads into
the computations of several different variances.

Let’s consider,
𝑇𝑆𝑆 = 𝐸𝑆𝑆 + 𝑅𝑆𝑆

∑(𝑦𝑖 − 𝑦̅)2 = ∑(𝑦𝑖 − 𝑦̂)2 + ∑(𝑦̂ − 𝑦̅)2

ANOVA Table
Source of Sum of Squares Degree of the Mean squares F-ratio
Variation freedom
Regression RSS 1 MRS=RSS/1 F=MRS/MSE
Residual ESS n-2 MSE=ESS/n-2
Total TSS n-1

Standard deviation of errors

The standard deviation 𝜎𝜀 measures the spread of errors around the population regression
line. The standard deviation of errors tells us how widely the errors and hence the values of 𝑦 are
spread for a given 𝑥.
However usually 𝜎𝜀 is unknown, in such cases it is estimated by 𝑠𝜀 which is the standard
deviation of errors for the sample data.
𝐸𝑆𝑆
𝑠𝜀 = √
𝑛−2

Co-efficient of determination

In regression analysis we need to measure the amount of variability in y that can be


attributed to or explained by the relationship between x and y. This measured is known as
coefficient of determination or 𝑅 2 .
𝑅𝑆𝑆 𝑇𝑆𝑆 − 𝐸𝑆𝑆 𝐸𝑆𝑆
𝑅2 = = =1−
𝑇𝑆𝑆 𝑇𝑆𝑆 𝑇𝑆𝑆
Example: A random sample of eight drivers selected from a small town insured with a company
and having similar minimum required auto insurance policies was selected. The following table
lists their driving experiences (in years) and monthly auto insurance premiums (in dollars).
Driving experience (in years) Monthly auto insurance premium ($)
5 64
2 87
12 50
9 71
15 44
6 56
25 42
16 60
a) Plot a scatter diagram and find the value of correlation coefficient.
b) Find the least square regression line by choosing appropriate response and predictor
variables and comment it.
c) Calculate the coefficient of determination and interpret your result.
d) Predict the monthly auto insurance premium for a driver with 10 years of driving
experience.
e) Compute the standard deviation of errors.
f) Construct a 90% confidence interval for 𝛼 and 𝛽.
g) Test at 5% significance level, whether 𝛽 is negative.
h) Test whether regression line fits data well, using the ANOVA at 5% level of significance.
Exercise:
1) The following are the scores that 12 students obtained on the mid-term and the final
examinations in a course in statistics.
Mid-term Examination Final Examination (y)
(x)
71 83
49 62
80 76
73 77
93 89
85 74
58 48
82 78
64 76
32 51
87 73
80 89
a) Find the equation of the least squares line that will enable us to predict a student’s
final examination score in this course on the basis of his or her score on the mid-
term examination.
b) Predict the final examination score of a student who received 84 marks on the mid-
term examination.
c) Test the significance of the regression parameter at 5% level of significance.
d) Using the ANOVA, test whether the regression line fit the data well.

2) The following table gives an information on GPAs and starting salaries (rounded to the
nearest thousand dollars) of seven recent college graduates.

GPA 2.90 3.81 3.20 2.42 3.94 2.05 2.25


Starting salary 48 53 50 37 65 32 37

a) Find the least square line by choosing appropriate response and predictor variables
and comment it.
b) Construct a 95% confidence interval for regression parameter.
c) Test at 1% significance level, whether 𝛽 is different from zero.
d) Using the ANOVA at 1% level of significance, test whether the regression line fit
the data well.

You might also like