Simple Linear Regression

I.
Simple linear regression
1. Introduction
We are analyzing the linear relationship between Exercise h/week and Stress level. The
dependent variable y will represent the stress level values given in the table, and the
independent variable x is represented by the exercise h/week value. We were given 498
observations. In a simple linear regression model, the response variable y is related to a
single explanatory variable x by: y=β 0 + β 1 x + ε
The intercept, β 0, captures the value E(y) when x is equal to zero.
The slope, β 1, measures the increase in E(y) associated with a unit increase in x.
The intercept and slope will be calculated using the least square method, which consists
of determining the values of β 0 and β 1 that minimize SS ( β 0, β 1). SS stands for sum of
squared.
The ordinary correlation coefficient, r, is a measure of a linear relationship between two
variables.
After applying the linear regression function in excel, we obtained the following results:
 b 0=2.04 , which implies that with Exercise h/week set equal to zero, Stress level will
be at a minimum value of 2.04 units.
 b 1=7.13, which implies that y increases by 7.13 for every one unit increase in x.
 r =0.8656, which implies a strong linear relationship between Exercise h/week and
Stress level.
 The fitted regression line has the following equation: ^y =2.04+7.13 x and is
represented in figure 1.
Figure 1
2. Assumptions
The assumptions followed in the least square method are:
 The y i are random variable, while the x i are non-random variable.
 It is important to highlight the response variable is conditional on the explanatory
variable.
1) E( y i)=β 0 + β 1 x i
2) {x 1 , … , x n } are non-stochastic variables.
3) Var ( y i )=σ 2
4) { yi } are independent random variables.
5) { yi } are normally distributed.
 In contrast to the observable’s representation, an alternative set of assumptions
focuses on the deviations, errors, in the regression, defined as:

1) ε i= y i−β 0 – β 1 x i
2) {x 1 , … , x n } are non-stochastic variables.
3) E(ε i )=0
4) Var (ε i)=σ 2
5) ε iare independent random variables.
3. Residuals
After finding the fitted values of y, we can calculate the residual for each observation.
The residual is the difference between the observed value and the fitted value. When
the SLR model is fitted by the method of least square, the sum of residuals is equal to
zero.
498
 ∑ ε^i=0
i=1
 Figure 2 represents the Exercise h/week residual plot of all our observations.
Exercice h/week Residual Plot
30
20
10
Residuals
0
1 2 3 4 5 6 7 8
-10
-20
-30
Exercice h/week
Figure 2
4. ANOVA table
In the basic linear regression model, the deviation of the response from the regression
line, is not an observable quantity because the parameters β 0 and β 1 are not observed.
However, by using the estimators b 0 and b 1, we can approximate this deviation the
residual.
An unbiased estimator of σ 2, the mean square error (MSE), is defined as:

498

∑ ε 2i
s2= i=1
n−2
 The positive square root, s= √ s2 , is called the residual standard deviation or
standard error of the estimate.
Thus, for this SLR model, the analysis of variance, or ANOVA table, which tabulates the
partitioning of the sum of square, is represented in table 1:

Table 1
Anova df SS MS F Significance F
Regression 1 76460.81307 76460.81307 1483.10998 3.72606E-151
Residual 496 25570.97164 51.55437831
Total 497 102031.7847
 DF represents the degree of freedom.
 SS represents the sum of squares.
 MS represents the mean square.
 The F-value is used to evaluate whether the regression sum of squares is large
enough to declare the usefulness of the SLR model in explaining the variation in
the response y.
 The hypothesis tested in this model is:
o H 0= β1=0, the variables are independent and identically distributed.
o H 0= β1 ≠ 0, SLR model is verified.
 The significance of F represents the probability that our null hypothesis in our
regression model cannot be rejected.
The estimated standard deviations of b 0 and b 1 called standard error are denoted by
SE(b 0) and SE(b ¿¿ 1)¿, respectively.
 SE ( b0 ) =√ s ¿ ¿)
2
 SE ( b1 ) =
√ s2
s xx
 The standard errors provide a measure of the reliability and precision of the LSEs.
 SE(b 0) and SE(b 1) increase as s2 increase and decrease when s xx increase. Thus,
the standard errors of the LSEs decrease when:
o The sample size is larger.
o The observed values of the explanatory variable are more spread out.
o The observations lie closer to the fitted line.
5. Hypothesis testing and Forecast
To examine the importance of x in the hypothesis testing framework, we test if β 1=0,
thus the basic linear regression model no longer includes an explanatory variable. In
general, the formulated hypothesis tests on the regression coefficient in the form
H 0= β j=d , where d is a user-specified hypothesized value, for j = 0,1.
Thus, the test statistic of is:
^ LSE−d
 t ( β j )=
SE ( ^
β j)
Under H 0, t ( ^
β j ) ~ t n−2, where t H 0 =β j=d n−2 is a t-distributed random variable with
df =n−2 degrees of freedom. The 100(1-α) % confidence interval for β j is:
^ ^
 β j ± t n−2 , α ∗SE ( β j )
2
Table 2 represents the standard error, t-statistic, p-value, and the 95% confidence
interval of β 0 and β 1.
Table 2
Coefficients Standard t Stat P-value Lower 95% Upper 95%
Error
Intercept 2.047331338 0.888172008 2.305106803 0.021572523 0.302288028 3.792374648
Exercise 7.137659988 0.185340007 38.51116697 3.726E-151 6.773511673 7.501808304
h/week
 The p-value represents the probability, when the null hypothesis is true, the
statistical summary would be equal to or more extreme than the actual observed
results.
o If the p-value < α, we reject the null hypothesis H 0,
o If the p-value > α, we fail to reject the null hypothesis H 0.
o The calculated p-value for the intercept is 0.02157 which is smaller than α, so
we reject the null hypothesis H 0= β0 =0.
o The calculated p-value for the Exercise h/week variable is 3.726E-151 which is
smaller than α (approx. 0), so we reject the null hypothesis H 0= β1=0.

Simple Linear Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Simple Linear Regression

Uploaded by

Copyright:

Available Formats

I.

Simple linear regression

observations. In a simple linear regression model, the response variable y is related to a

single explanatory variable x by: y=β 0 + β 1 x + ε

The intercept, β 0, captures the value E(y) when x is equal to zero.

The ordinary correlation coefficient, r, is a measure of a linear relationship between two

be at a minimum value of 2.04 units.

The assumptions followed in the least square method are:

 The y i are random variable, while the x i are non-random variable.

 It is important to highlight the response variable is conditional on the explanatory

2) {x 1 , … , x n } are non-stochastic variables.

4) { yi } are independent random variables.

5) { yi } are normally distributed.

 In contrast to the observable’s representation, an alternative set of assumptions

focuses on the deviations, errors, in the regression, defined as:

2) {x 1 , … , x n } are non-stochastic variables.

5) ε iare independent random variables.

An unbiased estimator of σ 2, the mean square error (MSE), is defined as:

 The positive square root, s= √ s2 , is called the residual standard deviation or

standard error of the estimate.

partitioning of the sum of square, is represented in table 1:

Regression 1 76460.81307 76460.81307 1483.10998 3.72606E-151

Residual 496 25570.97164 51.55437831

Total 497 102031.7847

 DF represents the degree of freedom.

 SS represents the sum of squares.

 MS represents the mean square.

 The hypothesis tested in this model is:

o H 0= β1=0, the variables are independent and identically distributed.

o H 0= β1 ≠ 0, SLR model is verified.

regression model cannot be rejected.

SE(b 0) and SE(b ¿¿ 1)¿, respectively.

the standard errors of the LSEs decrease when:

o The sample size is larger.

o The observations lie closer to the fitted line.

5. Hypothesis testing and Forecast

To examine the importance of x in the hypothesis testing framework, we test if β 1=0,

H 0= β j=d , where d is a user-specified hypothesized value, for j = 0,1.

Thus, the test statistic of is:

df =n−2 degrees of freedom. The 100(1-α) % confidence interval for β j is:

Coefficients Standard t Stat P-value Lower 95% Upper 95%

Intercept 2.047331338 0.888172008 2.305106803 0.021572523 0.302288028 3.792374648

Exercise 7.137659988 0.185340007 38.51116697 3.726E-151 6.773511673 7.501808304

o If the p-value < α, we reject the null hypothesis H 0,

o If the p-value > α, we fail to reject the null hypothesis H 0.

we reject the null hypothesis H 0= β0 =0.

smaller than α (approx. 0), so we reject the null hypothesis H 0= β1=0.

You might also like