You are on page 1of 5

SECOND PARTIAL EXAM OF STATISTICS 30001 (or 6045/5047)

January 10th, 2018 – A


Course code Degree program Class____________
Last name First name ID(Matr.)_____________
Only work appearing inside the spaces provided below will be graded.
An outline of the procedure used to solve each problem and of the calculations performed is required.
At the end of the exam, all sheets (including all scrap paper, WHICH WILL HOWEVER NOT BE GRADED) must
be turned in.

PROBLEM 1 (4 points)
To properly plan the energy policy of a region, a sample survey is carried out aimed at getting to know residents'
consumption habits. The following table shows the results of the detection of the time (in minutes) of daily use of the air
conditioner during Summer:
time (minutes)

Mean 88.6
Standard Error 9.6632
Median 90
Standard Deviation 48.3158
Sample Variance 2334.4167
Kurtosis -0.6880
Skewness 0.2752
Range 170
Minimum 15
Maximum 185
Sum 2215
Count 25

a) Calculate a 90% confidence interval for the variance σ 2 of the time of daily use of the air conditioner for the
entire regional population, reporting any assumptions necessary for the validity of the result.
b) In a neighboring region, we want to determine a 90% confidence interval for the average time µ of daily use of
the air conditioner, such that we’ll have a margin of error not bigger than 4.5 minutes. Supposing that we know
the population variance, equal to 1322 minutes2, which is the minimum number of residents we need to
interview?
a)
Under the assumption that the time (in minutes) of daily use of the air conditioner during Summer is normally
distributed with mean 𝜇𝜇 and unknown variance 𝜎𝜎 2 , the confidence interval at 90% level for the variance σ 2 of the
time (in minutes) of daily use of the air conditioner for the population is:

(𝑛𝑛 − 1)𝑠𝑠 2 (𝑛𝑛 − 1)𝑠𝑠 2 (25 − 1)2334.417 (25 − 1)2334.417


𝐶𝐶𝐶𝐶90% (𝜎𝜎 2 ) = � 2 ; 2 �=� ; � = (1538.542; 4045.783)
𝜒𝜒𝑛𝑛−1,𝛼𝛼/2 𝜒𝜒𝑛𝑛−1,1−𝛼𝛼/2 36.415 13.848

b)
𝜎𝜎 2 1322
𝑀𝑀𝑀𝑀 = 𝑧𝑧0.05 ∗ � = 1.645 ∗ � ≤ 4.5
𝑛𝑛 𝑛𝑛
𝑛𝑛 = 176.66 ≈ 177 residents
 the minimum number of residents we need to interview is 177.

PROBLEM 2 (6 points)
The operator of a computer network must carry out a maintenance operation that will result in a shutdown of the network.
Usually this type of intervention is carried out during a working day at night, since it is considered that this constitutes an
inconvenience for no more than 30% of users. The operator suspects that, recently, the habits of users have changed and
is therefore considering the possibility of carrying out the intervention on a public holiday, but with a significantly higher
cost. In order to decide whether to modify the procedures for carrying out the intervention, the operator runs a sample
survey on 600 users, from which it emerged that 200 would suffer an inconvenience if the intervention was carried out
during the night of a weekday. Based on this, are the habits of users changed?
a) Write down and briefly explain the hypotheses we need to verify to establish if the habits of users have changed
b) Calculate the p-value of the test
c) If we accept α = 0.05, are habits changes, so should the operator carry out maintenance on a holiday?

Remember: if you attempt to solve this subproblem, any points you may have obtained in
the online assignment will not be considered.
d) A consultant of the operator suggests the following rule of thumb: "Ask to 80 users, if more than 35% of them
declares that they would suffer an inconvenience if the intervention was carried out on the night of a weekday,
then you’ll have to run it on a holiday”. Would you recommend to the operator to follow this rule? Motivate your
answer with appropriate calculations.

a)
𝑝𝑝 is the unknown proportion of users considering the maintenance operation during the night of a weekday as an
inconvenience.
𝐻𝐻0 : 𝑝𝑝 ≤ 0.3 (status quo)
𝐻𝐻1 : 𝑝𝑝 > 0.3 (what we want to verify, habits of users have changed and the proportion is higher)

b)
200
𝑝𝑝̂ = = 0.3333
600
𝑝𝑝�−𝑝𝑝0 0.3333−0.3
The observed value of the test statistic is: 𝑧𝑧𝑜𝑜𝑜𝑜𝑜𝑜 = (1−𝑝𝑝0 )
= = 1.78
�𝑝𝑝0 �0.3(0.7)
𝑛𝑛 600
𝑝𝑝 − 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 = 𝑃𝑃(𝑍𝑍 > 1.78) = 0.038.

c)
α = 0.05
having p-value < α there is enough empirical evidence to reject the null hypothesis, so the habits of users have
changed, the operator should carry out maintenance on a holiday.

d)
Now, we reject the null hypothesis if 𝑝𝑝̂ > 0.35
In that case:
0.05
𝛼𝛼 = 𝑃𝑃�𝑃𝑃� > 0.35�𝑝𝑝 = 0.3� = 𝑃𝑃 �𝑍𝑍 > 0.3 0.7
� =P(Z>0.9759) = 0.163

80

 We do not recommend to the operator to follow this rule because the probability of the first type error
(𝛼𝛼) (first type error is rejecting the null hypothesis when it is actually true) is too high.

PROBLEM 3 (5 points)
How much is the average electricity consumption of a typical family reduced due to the replacement of traditional light
bulbs with LED ones? To answer this question, monthly electricity consumption (kwh) of a sample of 5 families are
collected before and after they substituted their traditional light bulbs with led ones:
Consumption with Consumption with LED
Family
traditional light bulbs light bulbs
1 260 200
2 270 250
3 230 240
4 280 230
5 250 210
Total 1290 1130
a) After specifying which assumptions are necessary, determine a 95% confidence interval for the average
reduction in electricity consumption for a family that replaces light bulbs
b) Based on sample information, is there evidence that the average monthly consumption μ of households using
LED bulbs is less than 240 kwh? (α = 0.01).
a)
Consumption with traditional light bulbs (X)
Consumption with LED light bulbs (Y)
D=X-Y and assuming D normally distributed with mean (𝜇𝜇𝐷𝐷 ) and unknown variance, X and Y dependent (or paired)
samples
𝑠𝑠𝐷𝐷 27.7489
𝐶𝐶𝐶𝐶95% (𝜇𝜇𝐷𝐷 ) = �𝑑𝑑̅ ∓ 𝑡𝑡𝑛𝑛−1,𝛼𝛼/2 � = �32 ∓ 2.776 � = (−2.4493; 66.4493)
√𝑛𝑛 √5

b)
Monthly consumption of households using LED bulbs (Y) is normally distributed

𝐻𝐻0 : 𝜇𝜇 ≥ 240
𝐻𝐻1 : 𝜇𝜇 < 240
𝑦𝑦�−240
We reject 𝐻𝐻0 if 𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = < −𝑡𝑡𝑛𝑛−1,𝛼𝛼 = −𝑡𝑡4,0.01 = −3.747
𝑠𝑠𝑦𝑦 /√5

Based on sample values we have 𝑦𝑦� = 226 and 𝑠𝑠𝑦𝑦 = 20.7364


𝑦𝑦�−240
𝑡𝑡𝑜𝑜𝑜𝑜𝑜𝑜 = 𝑠𝑠𝑦𝑦 = −1.5097 so there is not enough empirical evidence (at α = 0.01 level) that the average monthly
√5
consumption μ of households using LED bulbs is less than 240 kwh. Not reject H0

PROBLEM 4 (4 points)
Based on a questionnaire given to a sample of customers, an electricity supplier company classified them into three energy
saving attention categories: low / medium / high. To understand if this classification can be considered independent of
the age of the consumer, the following contingency table has been constructed:
Saving attention low medium high
Age
underage 28 41 35
adult up to 50 years 61 39 88
adult more 50 years 24 36 48
a) Specify the hypotheses to be tested in this problem
b) What are the conclusions of the survey if we set α = 0.05?

a)
𝐻𝐻0 : no association between Saving attention and Age (Saving attention and Age are independent)
𝐻𝐻1 : there is an association between Saving attention and Age (Saving attention and Age are NOT independent)

b)
r c (O − Eij )
2
RiC j
∑∑ Eij =
ij
The test statistic is , where .
i =1 j =1 Eij n
Considering H0 true, the statistic has a Chi Squared distribution with (r – 1)(c – 1) degrees of freedom, where r and c
are the number of rows and the number of columns in the analyzed cross tab.
χ (2r −1)(c −1) = χ (23−1)(3−1) = χ 42 .

Saving attention

Observed frequencies (Oij) low medium high


TOTAL
underage 28 41 35 104
adult up to 50 years 61 39 88 188
Age adult more 50 years 24 36 48 108
TOTAL 113 116 171 400

Saving attention
Expected frequencies (Eij) low medium high
TOTAL
underage 29.38 30.16 44.46 104
adult up to 50 years 53.11 54.52 80.37 188
Age adult more 50 years 30.51 31.32 46.17 108
TOTAL 113 116 171 400
Saving attention
low medium high

underage 0.0648 3.8961 2.0129


adult up to 50 years 1.1721 4.4180 0.7244
Age adult more 50 years 1.3891 0.6993 0.0725
TOTAL Chi squared 14.4492
2
�𝑂𝑂𝑖𝑖𝑖𝑖 −𝐸𝐸𝑖𝑖𝑖𝑖 �
∑𝑟𝑟𝑖𝑖=1 ∑𝑐𝑐𝑗𝑗=1 = 14.4492,
𝐸𝐸𝑖𝑖𝑖𝑖
𝑃𝑃(𝜒𝜒42
> 14.4492) < 0.05
There is enough empirical evidence (at α = 0.05 level) to reject H0, the considered three energy saving attention
categories are not independent by the age of the consumer.

PROBLEM 5 (6 points)
In a sample of 11 comparable houses, in which the windows have been replaced over the last 10 years, these variables are
detected
Y = heating expenditure in a winter month
X = months passed since the replacement of windows and doors
=
The fitted linear model is: yˆ 258.2918 + 1.3162 x . Moreover, we have:
11 11
=x 56 i
2
=
=i 1 =i 1
y 332
13750 ∑ ( x −=
x) ∑ ( y −=
i y) 2
27630

a) Provide an estimate for the variance of the error component of the model.
(note: if you did not answer point a), assume that the result is equal to 452.1826 to solve the subsequent points)
b) Is there any empirical evidence of a positive linear relationship that links Y to X? (α = 0.01).
c) Determine the 99% confidence interval for the heating expenditure of a house whose windows have been
replaced 18 months ago.
a)
𝑆𝑆𝑆𝑆𝑆𝑆
An estimate for the variance of the error component of the model is 𝜎𝜎� 2 = dove SSE=SST-SSR.
𝑛𝑛−2
𝑆𝑆𝑆𝑆𝑆𝑆 = ∑11 (𝑦𝑦
𝑖𝑖=1 𝑖𝑖 − 𝑦𝑦
�)2
= 27630
𝑆𝑆𝑆𝑆𝑆𝑆 = 𝑏𝑏12 ∑11 2 2
𝑖𝑖=1(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) = 1.3162 ∗ 13750 = 23820.2586.
𝑆𝑆𝑆𝑆𝑆𝑆 27630 − 23820.2586 3809.7415
𝜎𝜎� 2 = 𝑠𝑠𝑒𝑒2 = = = = 423.3046
𝑛𝑛 − 2 11 − 2 9

b)
𝐻𝐻0 : 𝛽𝛽1 = 0
𝐻𝐻1 : 𝛽𝛽1 > 0
𝑏𝑏1
Reject H0 if ≥ 𝑡𝑡𝑛𝑛−2,𝛼𝛼
𝑠𝑠𝐵𝐵1
𝑠𝑠𝑒𝑒2 423.3046
Having 𝑠𝑠𝐵𝐵21 = ∑11 2 = = 0.0308 and 𝑡𝑡𝑛𝑛−2,𝛼𝛼 = 𝑡𝑡9,0.01 = 2.821
𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ ) 13750
𝑏𝑏1 1.3162
= = 7.5015 we have enough empirical evidence to reject H0.
𝑠𝑠𝐵𝐵1 √0.0308

c)
Point estimate prediction: 𝑦𝑦�𝑛𝑛+1 = 258.2918 + 1.3162 ∗ 18 = 281.9834
𝑡𝑡𝑛𝑛−2,𝛼𝛼 = 𝑡𝑡9,0.005 = 3.25 and 𝑠𝑠𝑒𝑒 = √423.3046 = 20.5744
2
Confidence interval for the prediction (for the forecast of a single outcome value)
1 (𝑥𝑥𝑛𝑛+1 −𝑥𝑥̅ )2 1 (18−56)2
C.I.0.99 = �𝑦𝑦�𝑛𝑛+1 ∓ 𝑡𝑡𝑛𝑛−2,𝛼𝛼 ∙ 𝑠𝑠𝑒𝑒 �1 + + ∑11 2 � = �281.9834 ∓ 3.25 ∙ 20.5744�1 + + �=
2 𝑛𝑛 𝑖𝑖=1(𝑥𝑥𝑖𝑖 −𝑥𝑥̅ ) 11 13750

= (208.859; 355.1078)
PROBLEM 6 (3 points)
Provide the definition of the power of a test. Also indicate two possible ways to increase the power of a test, justifying
the answer.

Please refer to the textbook and the lecture notes.

PROBLEM 7 (3 points)
Provide the definition of an interval estimator. Report also the formula of the confidence interval of the average of a
Normal population (known variance) and highlight the differences with the acceptance interval.

Please refer to the textbook and the lecture notes.

You might also like