You are on page 1of 8

I.

Simple linear regression

1. Introduction

We are analyzing the linear relationship between Exercise h/week and Stress level. The

dependent variable y will represent the stress level values given in the table, and the

independent variable x is represented by the exercise h/week value. We were given 498

observations. In a simple linear regression model, the response variable y is related to a

single explanatory variable x by: y=β 0 + β 1 x + ε

The intercept, β 0, captures the value E(y) when x is equal to zero.

The slope, β 1, measures the increase in E(y) associated with a unit increase in x.

The intercept and slope will be calculated using the least square method, which consists

of determining the values of β 0 and β 1 that minimize SS ( β 0, β 1). SS stands for sum of

squared.

The ordinary correlation coefficient, r, is a measure of a linear relationship between two

variables.

After applying the linear regression function in excel, we obtained the following results:

 b 0=2.04 , which implies that with Exercise h/week set equal to zero, Stress level will

be at a minimum value of 2.04 units.

 b 1=7.13, which implies that y increases by 7.13 for every one unit increase in x.

 r =0.8656, which implies a strong linear relationship between Exercise h/week and

Stress level.
 The fitted regression line has the following equation: ^y =2.04+7.13 x and is

represented in figure 1.

Figure 1

2. Assumptions

The assumptions followed in the least square method are:

 The y i are random variable, while the x i are non-random variable.

 It is important to highlight the response variable is conditional on the explanatory

variable.

1) E( y i)=β 0 + β 1 x i

2) {x 1 , … , x n } are non-stochastic variables.

3) Var ( y i )=σ 2

4) { yi } are independent random variables.

5) { yi } are normally distributed.

 In contrast to the observable’s representation, an alternative set of assumptions

focuses on the deviations, errors, in the regression, defined as:


1) ε i= y i−β 0 – β 1 x i

2) {x 1 , … , x n } are non-stochastic variables.

3) E(ε i )=0

4) Var (ε i)=σ 2

5) ε iare independent random variables.

3. Residuals

After finding the fitted values of y, we can calculate the residual for each observation.

The residual is the difference between the observed value and the fitted value. When

the SLR model is fitted by the method of least square, the sum of residuals is equal to

zero.

498
 ∑ ε^i=0
i=1

 Figure 2 represents the Exercise h/week residual plot of all our observations.
Exercice h/week Residual Plot
30

20

10
Residuals

0
1 2 3 4 5 6 7 8
-10

-20

-30
Exercice h/week

Figure 2

4. ANOVA table

In the basic linear regression model, the deviation of the response from the regression

line, is not an observable quantity because the parameters β 0 and β 1 are not observed.

However, by using the estimators b 0 and b 1, we can approximate this deviation the

residual.

An unbiased estimator of σ 2, the mean square error (MSE), is defined as:


498


∑ ε 2i
s2= i=1
n−2

 The positive square root, s= √ s2 , is called the residual standard deviation or

standard error of the estimate.

Thus, for this SLR model, the analysis of variance, or ANOVA table, which tabulates the

partitioning of the sum of square, is represented in table 1:


Table 1

Anova df SS MS F Significance F

Regression 1 76460.81307 76460.81307 1483.10998 3.72606E-151

Residual 496 25570.97164 51.55437831

Total 497 102031.7847

 DF represents the degree of freedom.

 SS represents the sum of squares.

 MS represents the mean square.

 The F-value is used to evaluate whether the regression sum of squares is large

enough to declare the usefulness of the SLR model in explaining the variation in

the response y.

 The hypothesis tested in this model is:

o H 0= β1=0, the variables are independent and identically distributed.

o H 0= β1 ≠ 0, SLR model is verified.

 The significance of F represents the probability that our null hypothesis in our

regression model cannot be rejected.

The estimated standard deviations of b 0 and b 1 called standard error are denoted by

SE(b 0) and SE(b ¿¿ 1)¿, respectively.

 SE ( b0 ) =√ s ¿ ¿)
2

 SE ( b1 ) =
√ s2
s xx
 The standard errors provide a measure of the reliability and precision of the LSEs.

 SE(b 0) and SE(b 1) increase as s2 increase and decrease when s xx increase. Thus,

the standard errors of the LSEs decrease when:

o The sample size is larger.

o The observed values of the explanatory variable are more spread out.

o The observations lie closer to the fitted line.

5. Hypothesis testing and Forecast

To examine the importance of x in the hypothesis testing framework, we test if β 1=0,

thus the basic linear regression model no longer includes an explanatory variable. In

general, the formulated hypothesis tests on the regression coefficient in the form

H 0= β j=d , where d is a user-specified hypothesized value, for j = 0,1.

Thus, the test statistic of is:

^ LSE−d
 t ( β j )=
SE ( ^
β j)

Under H 0, t ( ^
β j ) ~ t n−2, where t H 0 =β j=d n−2 is a t-distributed random variable with

df =n−2 degrees of freedom. The 100(1-α) % confidence interval for β j is:

^ ^
 β j ± t n−2 , α ∗SE ( β j )
2

Table 2 represents the standard error, t-statistic, p-value, and the 95% confidence

interval of β 0 and β 1.
Table 2

Coefficients Standard t Stat P-value Lower 95% Upper 95%

Error

Intercept 2.047331338 0.888172008 2.305106803 0.021572523 0.302288028 3.792374648

Exercise 7.137659988 0.185340007 38.51116697 3.726E-151 6.773511673 7.501808304

h/week

 The p-value represents the probability, when the null hypothesis is true, the

statistical summary would be equal to or more extreme than the actual observed

results.

o If the p-value < α, we reject the null hypothesis H 0,

o If the p-value > α, we fail to reject the null hypothesis H 0.

o The calculated p-value for the intercept is 0.02157 which is smaller than α, so

we reject the null hypothesis H 0= β0 =0.

o The calculated p-value for the Exercise h/week variable is 3.726E-151 which is

smaller than α (approx. 0), so we reject the null hypothesis H 0= β1=0.

You might also like