Professional Documents
Culture Documents
1. Introduction
We are analyzing the linear relationship between Exercise h/week and Stress level. The
dependent variable y will represent the stress level values given in the table, and the
independent variable x is represented by the exercise h/week value. We were given 498
The slope, β 1, measures the increase in E(y) associated with a unit increase in x.
The intercept and slope will be calculated using the least square method, which consists
of determining the values of β 0 and β 1 that minimize SS ( β 0, β 1). SS stands for sum of
squared.
variables.
After applying the linear regression function in excel, we obtained the following results:
b 0=2.04 , which implies that with Exercise h/week set equal to zero, Stress level will
b 1=7.13, which implies that y increases by 7.13 for every one unit increase in x.
r =0.8656, which implies a strong linear relationship between Exercise h/week and
Stress level.
The fitted regression line has the following equation: ^y =2.04+7.13 x and is
represented in figure 1.
Figure 1
2. Assumptions
variable.
1) E( y i)=β 0 + β 1 x i
3) Var ( y i )=σ 2
3) E(ε i )=0
4) Var (ε i)=σ 2
3. Residuals
After finding the fitted values of y, we can calculate the residual for each observation.
The residual is the difference between the observed value and the fitted value. When
the SLR model is fitted by the method of least square, the sum of residuals is equal to
zero.
498
∑ ε^i=0
i=1
Figure 2 represents the Exercise h/week residual plot of all our observations.
Exercice h/week Residual Plot
30
20
10
Residuals
0
1 2 3 4 5 6 7 8
-10
-20
-30
Exercice h/week
Figure 2
4. ANOVA table
In the basic linear regression model, the deviation of the response from the regression
line, is not an observable quantity because the parameters β 0 and β 1 are not observed.
However, by using the estimators b 0 and b 1, we can approximate this deviation the
residual.
∑ ε 2i
s2= i=1
n−2
Thus, for this SLR model, the analysis of variance, or ANOVA table, which tabulates the
Anova df SS MS F Significance F
The F-value is used to evaluate whether the regression sum of squares is large
enough to declare the usefulness of the SLR model in explaining the variation in
the response y.
The significance of F represents the probability that our null hypothesis in our
The estimated standard deviations of b 0 and b 1 called standard error are denoted by
SE ( b0 ) =√ s ¿ ¿)
2
SE ( b1 ) =
√ s2
s xx
The standard errors provide a measure of the reliability and precision of the LSEs.
SE(b 0) and SE(b 1) increase as s2 increase and decrease when s xx increase. Thus,
o The observed values of the explanatory variable are more spread out.
thus the basic linear regression model no longer includes an explanatory variable. In
general, the formulated hypothesis tests on the regression coefficient in the form
^ LSE−d
t ( β j )=
SE ( ^
β j)
Under H 0, t ( ^
β j ) ~ t n−2, where t H 0 =β j=d n−2 is a t-distributed random variable with
^ ^
β j ± t n−2 , α ∗SE ( β j )
2
Table 2 represents the standard error, t-statistic, p-value, and the 95% confidence
interval of β 0 and β 1.
Table 2
Error
h/week
The p-value represents the probability, when the null hypothesis is true, the
statistical summary would be equal to or more extreme than the actual observed
results.
o The calculated p-value for the intercept is 0.02157 which is smaller than α, so
o The calculated p-value for the Exercise h/week variable is 3.726E-151 which is