Professional Documents
Culture Documents
A staff restaurant conducted a survey collecting data from a random sample of 32 clients.
They were asked, among other things: how many times did they eat the restaurant during the
last month (variable: FREQUENCY); how much did they spend on a meal (variable:
SPENDING); how old they were (variable: AGE).
The restaurant manager would like to construct a model that would explain the spending
amount in terms of the frequency and the age for all clients.
You are asked to proceed with the different tests required to validate the linear
regression model for all clients among which the sample was taken.
Variable #1 (SPENDING)
Mean 4.17188
Corrected Standard Deviation 1.53249
Variable #2 (FREQUENCY)
Mean 10.59375
Corrected Standard Deviation 6.76738
Variable #3 (AGE)
Mean 35.75
Corrected Standard Deviation 11.6453
Count n 32
R-square 0.58999
ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 29.8504
Total ?
72.8047
2
SOLUTIONS:
ANOVA TABLE
SS df MS F p-value
Regression 42,9543 2 21,47715 20,8652933 <0,01
Residual 29,8504 29=32-(2+1) 1,02932414
Total 72,8047 32-1=31
42,9543=72,8047-29,8504
H0 : 1=2=0
H1 : at least one of the j is not 0 j=1,2
Frequency:
[ ]
95 % CI for β 1= ^β1 ±t 29 ,α /2 × s ^β =[−0.17208± 2,045 ×0.02782 ]
1
We will run a Student test for the coefficient 1 for the explanatory variable Frequency as follows:
H0: 1=0 in the presence of AGE
H1: 1≠0 in the presence of AGE
^β −0 −0.17208
1
We calculate the test statistic t= = =−6,18547807
s ^β 0.02782
1
The critical values associated with a Student distribution for df=29 and a type I error risk
=0,05 are t29;0.025 = 2,045 (two tailed test).
|t|>>critical value so we can reject H 0
Age:
[ ]
95 % CI for β 2= ^β2 ± t 29 ,α /2 × s ^β =[ −0.00401± 2,045 ×0.01617 ]
2
We will run a Student test for the coefficient 2 for the explanatory variable Age as follows:
H0: 2=0 in the presence of FREQUENCY
H1: 2≠0 in the presence of FREQUENCY
^β 2−0 −0.00401
We calculate the test statistic t= = =−0,24799011
s ^β 0.01617
2
The critical values associated with a Student distribution for df=29 and a type I error risk
=0,05 are t29;0.025 = 2,045 (two tailed test).
|t|<critical value so we can not reject H0
3
Exercise 2: Choosing a model and using it
The HR director of an industrial group would like to construct a model explaining the
monthly salary of all employees.
Using data collected from a random sample of 36 employees, he tests two explanatory
variables that he deems relevant: the number of years of graduate studies (X1) and the
number of years of service (X2).
You can find below results from three regression models he tested using Excel.
2) Can you help the HR director choose the most suitable model?
a. Using the information provided, which model would you suggest to use? Justify
your choice.
b. Estimate the parameters of the chosen model.
3) Pierre Durand, an employee of this group, is 38 years old, with 10 years of service
and 4 years of graduate studies. His monthly salary is 2050 Euros and he thinks he is
underpaid.
Calculate a 95% confidence interval for the mean salary of an employee with Pierre
Durand’s profile.
If you were the HR director of that firm, what would you tell Pierre Durand about his
salary?
Regression of Y w.r.t X1
ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 694 925
Total 22 170 000
Coefficients Standard-
error
Constant 706
Variable X1 326.9 10.08
4
Regression of Y w.r.t. X2
ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 22 156 025
Total ??
Coefficients Standard-
error
Constant 1 811.2
Variable X2 9.32 63.62
Coefficients Standard-
error
Constant 742
Variable X 1 327.27 10.15
Variable X 2 -8.98 11.35
SOLUTIONS:
a. r²=SSRegression/SSTotal=(SSTotal-SSResidual)/SSTotal
Regression of Y w.r.t X1
ANALYSIS OF VARIANCE
Sum of Squares
Regression 21475075
Residual 694 925
Total 22 170 000
r²=(22170000-694925)/22170000= 21475075/22170000=0,96865471
5
Regression of Y w.r.t. X2
ANALYSIS OF VARIANCE
Sum of Squares
Regression 13975
Residual 22 156 025
Total 22 170 000
?
r²=(22170000-681 981.4)/22170000=21488018,6/22170000=0,96923855
b.
Regression of Y w.r.t X1
Coefficients Standard-
error
Constant 706
Variable X1 326.9 10.08
We will run a Student test for the coefficient 1 for the explanatory variable X1 as follows:
H0: 1=0
H1: 1≠0
^β1 −0 326,9
We calculate the test statistic t= = =¿32,4305556
s ^β 10,08
1
The critical values associated with a Student distribution for df=n-2=36-2=34 and a type I
error risk =0,05 t35;0.025 are not included in the tables. We know t30;0.025 = 2.042 and
t40;0.025 = 2.021 (two tailed test).
|t|>>critical value so we can reject H 0
Regression of Y w.r.t X2
Coefficients Standard-
error
Constant 1 811.2
Variable X2 9.32 63.62
We will run a Student test for the coefficient 2 for the explanatory variable X2 as follows:
H0: 2=0
H1: 2≠0
6
^β 2−0 9.32
We calculate the test statistic t= = =¿ 0,14649481
s ^β
12
63.62
The critical values associated with a Student distribution for df=n-2=36-2=34 and a type I
error risk =0,05 t35;0.025 are not included in the tables. We know t30;0.025 = 2.042 and
t40;0.025 = 2.021 (two tailed test).
|t|<<critical value so we do not reject H0
X1:
We will run a Student test for the coefficient 1 for the explanatory variable X1 as follows:
H0: 1=0 in the presence of X2
H1: 1≠0 in the presence of X2
^β1 −0 327,27
We calculate the test statistic t= = =¿32,2433498
s ^β 10,15
1
The critical values associated with a Student distribution for df=n-(k+1)=n-3=36-3=33 and a
type I error risk =0,05 t33;0.025 are not included in the tables. We know t30;0.025 =
2.042 and t40;0.025 = 2.021 (two tailed test).
X2:
We will run a Student test for the coefficient 2 for the explanatory variable X2 as follows:
H0: 2=0 in the presence of X1
H1: 2≠0 in the presence of X1
^β 2−0 −8,98
We calculate the test statistic t= = =−¿ 0,79118943
s ^β 11,35
2
The critical values associated with a Student distribution for df=n-(k+1)=n-3=36-3=33 and a
type I error risk =0,05 t33;0.025 are not included in the tables. We know t30;0.025 =
2.042 and t40;0.025 = 2.021 (two tailed test).
|t|<critical value so we can not reject H0
2.b parameters
7
Coefficients Standard-
error
Constant 706
Variable X1 326.9 10.08
^β =706 ; ^β =326,9
0 1
ANALYSIS OF VARIANCE
Sum of Squares
Regression ?
Residual 694 925
Total 22 170 000
3) Pierre Durand, an employee of this group, is 38 years old, with X2=10 years of
service and X1=4 years of graduate studies. His monthly salary is Y=2050 Euros and he
thinks he is underpaid.
Calculate a 95% confidence interval for the mean salary of an employee with Pierre
Durand’s profile.
95 % CI for E ( y ) when x =4
[ √ ][ √ ]
2
1 ( x p−x )
2
1 ( 4−3,5 )
^y ± t × sε + = 2013,6 ± 2, 042 ×142,964928 +
n−2 ,
α
2
n ∑ ( x−x )2 36 35× 2.402