You are on page 1of 6

STUDENT’S SIGNATURE

SECOND PARTIAL EXAM OF STATISTICS 30001 (or 6045/5047)


January 30th, 2018 – C
Course code Degree program Class____________
Last name First name ID(Matr.)_____________
Only work appearing inside the spaces provided below will be graded.
An outline of the procedure used to solve each problem and of the calculations performed is required.
At the end of the exam, all sheets (including all scrap paper, WHICH WILL HOWEVER NOT BE GRADED) must
be turned in.

I acknowledge that by solving Problem 2c I renounce any points I may have


obtained in the online assignment

Signature
(USE THIS SPACE AS ADDITIONAL SCRAP PAPER OR FOR YOUR ANSWERS)
PROBLEM 1 (6 points)
The Manager of a Shopping Center is interested in estimating the average monthly expenditure µ carried out by the
customers of the center. On a sample of 6 customers, the following expenses (Euro) have been obtained:
34.9 43.3 50.3 26.4 64.7 38.4

Assume that the variable "monthly expenditure" is Normally distributed.


a) Determine a 90% confidence interval for the mean µ.
b) Compute the p-value of a hypothesis test aimed to determine if there is empirical evidence that the average
expenditure µ is greater than 30 Euro.
c) Now suppose that the Manager of a Shopping Center wants also indications on the proportion p of customers
who, in a visit to the Shopping Center, spend more than 100 Euros. If the Manager wants to construct a 90%
confidence interval for p with length not exceeding 0.15, how many customers should he/she interview?
a)
a) ! = 34.9 + 43.3 + 50.3 + 26.4 + 64.7 + 38.4 6 = 43
! = 6 5 ∙ 34.9* + 43.3* + 50.3* + 26.4* + 64.7* + 38.4* 6 − 43* =
6 5 ∙ 1996.7667 − 1849 = 6 5 ∙ 147.7667 = 177.32 = 13.3162
1 − # = 0.90 è !"#$, ' = !*, +.+* = 2.015
(
, , *+.+*,- *+.+*,-
!-#$%&,
) ∙ ; ! + #$%&, ) ∙ è 43 − 2.015 ∙ ; 43 + 2.015 ∙ è (32.0458; 53.9542)
* $ * $ , ,

b)
The two hypotheses are:
!" : $ ≤ 30 (status quo)
!" : $ > 30 (to be verified)
The test statistic is
#-%& )*+*,
!= = = 2.3913.
' ( -*.*-/0 /
Hence, the p-value
p-value = ! "# > 2.3913 | ,- .
Since the t distribution table is not sufficiently detailed, we can’t calculate the exact value of the p-value, but
we can conclude
0.025 < p-value < 0.05.

c)
In this case, the 90% confidence interval (approximately) is given by
'∙ ()' '∙ ()'
!-#$ ∙ ; ! + #$ ∙ ,
% * % *

and the length is


(∙ )*(
! = 2 ∙ %& ∙ ,
' +
with !" = !%.%' = 1.645.
#
In the worst scenario, ! is equal to 0.5. Thus,
*.+∙*+ #.%&'( ∙*.+'
0.15 = 2 ∙ 1.645 ∙ è!= *.#' + (
= 120.2678 .
,

The needed sample size to obtain a 90% confidence interval (for p) with a length that does not exceed 0.15 is
121 customers. Therefore, the Director should interview at least n = 121 customers.

PROBLEM 2 (5 points)
The laundry of a shopping center offers convenient prices, but there are doubts about the quality of the service provided.
A customer is willing to use the laundry only if he has evidence that the percentage of items returned with an unsuitable
treatment is less than 5%. On a sample of 150 washed items, 3 items were observed that were not properly treated.
a) Write down the hypotheses we need to test and briefly justify your choice.
b) Choose between the previous hypotheses, with an α = 0.01 significance level. What decision must the customer
take?
Remember: if you attempt to solve this subproblem, any points you may have obtained in the online assignment
will not be considered.
c) Do you think that the following decision rule: “Use the laundry in the shopping center if, on a sample of 200
items washed, less than 4% is returned with an unsuitable treatment” is reasonable for a customer? Explain
using suitable computations.

a) The hypotheses to be tested are:


!" : $ ≥ 0.05 (status quo)
!" : $ < 0.05 (we want verify whether the percentage of returned items with an unsuitable treatment is less
than 5%)

This choice is justified by the fact that we want verify (!" ), namely whether the percentage of items returned
is less than 5%.
In this context, the type I error consists that the laundry has less than 5% of returned garments (reject !" ), but
really the opposite is true. Therefore this specific error would be "more serious" from an operational point of
view than the case in which the two hypotheses were exchanged.

b)
The observed value of the test statistic is
#-#%
!= .
&% ∙ ()&%
*

and we will reject the null hypothesis if the value of the test statistic is lower than the critical value
-"# = -"%.%' = −2.33.
#
Since ! = $%& = 0.02 , we obtain
#-#% +.+-.+.+/ .+.+1 .+.+1
!= = = = = −1.6859 > −!+.+2 = −2.33,
&% ∙ ()&% %.%0∙ ()%.%0 +.+++1234 +.+245+
* (0%

and we do not reject the null hypothesis è a customer is not willing to use the laundry.
Equivalently, the p-value of the test is given by
p-value = ! " < −1.6859 | - ≥ 0.05 = 1 − ! " ≤ 1.69 | - ≥ 0.05 = 1 − 0.9545 = 0.0455 .
Since the p-value turns out to be greater than the set level of significance (! = 1% ), we do not reject the null
hypothesis.

c)
According to this decision rule, the probability of the type I error is

0.04 − 0.05
! = # # < 0.04 | *+ = # , < = # , < −0.6488 = 1 − # , < 0.65
0.05 ∙ 1 − 0.05
200
= 1 − 0.7422 = 0.2578.
This probability is too large compared to the values usually adopted (1, 5 or 10%). Therefore we do not recommend
the costumer to follow this rule because the probability of the type I error (first type error is rejecting the null
hypothesis when it is actually true) is too high.
PROBLEM 3 (3 points)
The Manager of a Shopping Center wants to compare the average customer expenditure during a holiday and a working
day. For this purpose he/she extracts two independent samples of customers observed during a holiday and a working
day, obtaining:
holiday working day

Mean 71.54 66.4


Standard Error 14.0130 7.4276
Median 74.9 66.5
Standard Deviation 31.3340 19.6515
Sample Variance 981.818 386.18
Kurtosis -1.1775 -0.6042
Skewness -0.5527 -0.5767
Range 74.8 52.8
Minimum 27.6 34.8
Maximum 102.4 87.6
Sum 357.7 464.8
Count 5 7

If the Manager wanted to perform a test for the difference between two means, what assumptions should he make? On
the basis of the sample data provided, is it possible to verify the plausibility of these assumptions, setting α = 0.02?
(Please note: it is not required to carry out the test for the difference between two means as well.)
We define with X the expense of a customer in the Shopping Center on a public holiday and with Y the same amount
on a weekday.
To carry out a test on the difference between mean it is necessary to hypothesize that the two populations are normally
distributed with unknown variances but equal. Assuming that the normality is satisfied, therefore it is necessary to
verify if the sample data allow to conclude that also the variances are equal through the test.

!" : $%& = $(& vs. !" : $%& ≠ $(& .

We reject the null hypotheses if

!"#
> &' )*, ' )*, . ,
!$# ( - #

where !"# is the larger of the two sample variances.


On the basis of the sample data we obtain

!"# 981.818
= = 2.5424.
!$# 386.18

Here !"#$%, "($%, ) = !,, -, ...% = 9.15, we do not reject the null hypothesis.
*
There is not enough evidence to reject the null hypothesis, the variance of customer expenditure during a holiday is not
different from a working day. Therefore, we can perform the test for the difference between two means.

PROBLEM 4 (5 points)
A fast-food will be located in a newly built shopping center. During the design phase, four possible locations were
identified. Now we want to understand if the possible customers can be indifferent or not to the position of the fast-
food. In a sample of consumers, the following preferences were detected for the four location sites:
Location site A B C D
Number of preferences 22 21 33 19
a) Specify the statistical hypotheses to be tested.
b) Provide information about the p-value of the test.
c) Based on the previous results, can we consider the new customers to be indifferent about the fast-food location
site? (α=0.01)
a)
We run the test:
!" : $% = $' = $( = $) = $* = 0.25 (the probability distribution of the population is uniform)
vs.
!" : the probability distribution of the population is not uniform
b)

Location site A B C D Total


Observations drawn from the population (Oi) 22 21 33 19 n = 95
Assumed probability distribution of the population ¼ ¼ ¼ ¼ 1
Expected number of observations (!" = $ ∙ &" ) 23.75 23.75 23.75 23.75 95

The test statistic is


(
" " " " "
"
$% -'% 22 − 23.75 21 − 23.75 33 − 23.75 19 − 23.75
! = = + + + =5
'% 23.75 23.75 23.75 23.75
%)*

where !" = $ ∙ &" .


Therefore the p-value is
&
p-value = ! "#$% > " & = ! ")& > 5 .

Because of the few details on the chi-square distribution table, we can only conclude that the p-value is a
value between 0.1 and 0.9.

c)
Since the p-value is certainly greater than the assigned significance level (α = 0.01), it is not possible to reject
the null hypothesis è the data do not provide sufficient evidence to conclude that customers are not
indifferent about the fast-food location site.

PROBLEM 5 (6 points)
A customer wants to estimate the relationship between the quantity of vegetables on display in the counters of a
supermarket (variable Y, in Kg) and the time elapsed from the opening (variable X, in minutes). On a sample of 13
visits at the supermarket, the customer estimated the model yˆ = 214.0724 - 0.1924 x and obtained the following
information:
13 13
x = 385 y = 140 å
i =1
( xi - x )2 = 1068708 å(y - y)
i =1
i
2
= 83338

13
a) Determine the value of SSR = å ( yˆi - y ) 2
i =1

(Hint: If you did not answer point a), assume that the value is equal to SSR = 42337.2315 in the following points.)
b) Determine the 95% confidence interval for the quantity of vegetables on display in the counters if the customer
visits the supermarket 400 minutes after opening.
c) Verify the hypothesis H0: β1= 0 against H1: β1< 0, being β1 the slope coefficient of the model (α = 0.05).
a)
Note that we can rewrite the Sum of Squares Regression (SSR) in the following way
+

!!" = $%& ∙ () -( & ,


),%

therefore !!" = −0.1924 + ∙ 1068708 = 39561.1763.


b)
We have to calculate the prediction interval as follow

1 2"#$ -2 (
!"#$ ± &"'(, + ∙ -. ∙ 1+ + " (
.
( 1 45$ 24 -2

We need the following values:


- !"#$ = &' + &$ ∙ *"#$ è !"#$ = 214.0724 − 0.1924 ∙ 400 = 137.1124
- !"#$, ' = !**, +.+$- = 2.201
(
%%& )***)(*+,-...0-*
- !" = !"$ = = = 3979.7112 = 63.0850
'($ ..
Therefore,

. 0112/34 5
137.1124 ± 2.201 ∙ 63.0850 ∙ 1+ + è 137.1124 ± 2.201 ∙ 63.0850 ∙ 1.0771
./ .163713

è 137.1124 ± 2.201 ∙ 63.0850 ∙ 1.0378 è −6.9862; 281.2110 .

If you did not answer point a), the estimation of model error standard deviation is
%%& 83338 − 42337.2315
!" = !"$ = = = 3727.3426 = 61.0520
'−2 11

and the interval is

- /001.23 4
137.1124 ± 2.201 ∙ 61.0520 ∙ 1+ + è 137.1124 ± 2.201 ∙ 61.0520 ∙ 1.0771
-. -052602

è 137.1124 ± 2.201 ∙ 61.0520 ∙ 1.0379 è −2.3425; 276.5673 .

c)
We reject the null hypothesis when

!" -$"
< -)*-+, . .
%&'

Cosidering the data of the problem we obtain:


%& -..0120
- !"# = = = 0.0610
+ ' -' * 30-1401
(,# (

- !"-$, ' = !)), *.*, = 1.796

Therefore,

!" -$" −0.1924 − 0


= = −3.1541 < −1.796
%&' 0.0610

then we reject the null hypothesis.


In case you did not answer point a), then

!% 61.0520
!"# = *
= = 0.0591
'+, &' -& ) 1068708

!" -$" −0.1924 − 0


= = −3.2555
%&' 0.0591

and we reject the null hypothesis.

PROBLEM 6 (3 points)
Define Type I and Type II errors that occur in a hypothesis test. Why are they important in the search for an "optimal"
test? How is it possible to reduce the probability of both errors at the same time?

Please refer to the textbook and the lecture notes.

PROBLEM 7 (3 points)
In a simple linear regression model, what types of point and interval predictions can be provided for the dependent
variable? Discuss in detail the problem, writing the required formulas to justify your answer.

Please refer to the textbook and the lecture notes.

You might also like