# 1

ECON1203/ECON2292
Statistics
Week 12
Week 12 topics
 Regression case study
 Chi-squared distribution
 Hypothesis test for a population variance
Chi d t t f d f fit
2
 Chi-squared test of goodness of fit
 Chi-squared test of a contingency table
 Key references
 Berman, Brooks & Davidson (2000)
 Keller 4.6, 17.2, 12.2, 15.1-15.2
Sydney Olympic Games and
the stock market
 Did the Sydney Olympic Games announcement have a
positive stock market impact?
 BDD use market model in finance with added dummy
variable for announcement effect
R | + | R + | D + R
it
= |
0
+ |
1
R
mt
+ |
2
D
t
+ c
t
 R
it
= daily return on an industry i accumulation index
 R
mt
= market return on All Ordinaries accumulation index
 D
t
= 1 if time is the day of the Games announcement on 23 Sept 1993; =0
otherwise
 BDD use daily data 4 Jan 1988 to 29 Nov 1996 & for
several industries
3
2
Sydney Olympic Games & the
stock market…
 Augmented market model was estimated for 23 industry
portfolios including banks & building materials
 Which of these 2 industries would you expect to be impacted
more?
4
values are ) ( in numbers :
) 0003 . 0 (
0202 . 0
) 0000 . 0 (
9346 . 0
) 7383 . 0 (
0000 . 0 ˆ
: materials Building
) 7987 (.
0016 . 0
) 0000 . 0 (
0690 . 1
) 0570 . 0 (
0002 . 0 ˆ
: Banks
÷
+ + =
÷ + =
p Note
D R R
D R R
t mt t
t mt t
Sydney Olympic Games & the
stock market…
 Did the announcement have
a significant impact for both
industries?
 Note reported p-values refer to
null of zero coefficient
5
 BBD found only 4
industries for which there
was a significant impact –
building materials,
developers & contractors,
engineering & other
services
Sydney Olympic Games & the
stock market…
 |
1
parameters represent sensitivity of each stock to the
overall market
 Banks have b
1
=1.069
 This industry has rate of return that is more sensitive to changes in the
overall market than is the average stock
Converse true for building materials where b =0 9346  Converse true for building materials where b
1
=0.9346
 Thus H
0
: |
1
=1 is an interesting hypothesis
 Could you use reported information to test this hypothesis?
 No! Not with just reported p-values of 0.0000 for H
0
: |
1
=0
 Good presentation involves reporting se’s
6
3
Some different problems
 Previously have concentrated on inference
associated with location problems
 Means, proportions & conditional means (regression)
 Many other interesting statistical problems
7
 P1: CPRepairs has fluctuating demand for vehicle
repairs necessitating paying staff overtime
 Changing variability could present peak load problems for
staff availability &/or morale
 Inference problem: Has there been a change in the
variance of total overtime hours?
Some different problems…
 P2: CPRepairs concerned about consumer
satisfaction with service
 Industry benchmarks for satisfaction levels are available
 Inference problem: Do CPRepairs survey results differ from
8
industry benchmarks?
 P3: CPRepairs services a range of customer types
 If split customer satisfaction results by type can we observe
differences?
 Inference problem: Is customer satisfaction independent of
customer type?
 In testing hypotheses about variances obvious test
statistic is based on s
2
 In order to compare s
2
with o
2
need appropriate sampling
distribution
9
 Need to consider a new distribution
freedom of degrees 1 with
squared - Chi d distribute is statistic test ed Standardiz
~
) 1 (
population normal a from sampling random have If
2
1 2
2
2
n-
s n

÷
= _
o
_
4
Consider
before as same
ly conceptual is hypotheses testing of
process but on depends & right the
to skewed is on distributi squared - Chi
df
10
2
1 , 1 2
2
2
0
2
0
2
0
2
1
2
0
2
0
) 1 (
if Reject
: is rule rejection specified For
value ed hypothesiz where
: :
Consider
÷ ÷
<
÷
=
=
< =
n
s n
H
H H
o
_
o
_
o
o
o o o o
P1: CPRepairs overtime
 Staff numbers set assuming total of 50 hours
overtime per week & variance of 25
 Is there evidence of a different variance?
 Assume overtime hours per week is approximately normal
 Choose o = 10  Choose o = .10
 Sample of 12 weeks produces s
2
=28.1
 Need Chi-squared distribution with (12 – 1) = 11 degrees of
freedom
 What are relevant percentage points?
11
Chi-squared critical values
12
5
P1: CPRepairs overtime...
2
11 , 05 .
2
11 , 95 .
2
1 12 , 05 . 1
2
1
2
0
19.68 & 4.57
0.05 2 / test tailed Two
25 : 25 : H H
= = =
= ¬ ÷
= =
÷ ÷
_ _ _
o
o o
0
2
2
2
2
2
2
0
reject not do
36 . 12
25
1 . 28 ) 1 12 ( ) 1 (
As
19.68 or .57 4 is
) 1 (
if Reject
H
s n
s n
H
¬
=
÷
=
÷
=
> <
÷
=
o
_
o
_
13
P1: CPRepairs overtime…
g rearrangin then
1 )
) 1 (
Pr( or , 1 ) Pr(
for CI a construct could ely Alternativ
2
1 , 2 / 2
2
2
1 , 2 / 1
2
1 , 2 /
2 2
1 , 2 / 1
2
÷ = <
÷
< ÷ = < <
÷ ÷ ÷ ÷ ÷ ÷ n n n n
s n
o o o o
o _
o
_ o _ _ _
o
14
( ) 64 . 67 , 71 . 15 or
57 . 4
1 . 28 ) 1 12 (
,
68 . 19
1 . 28 ) 1 12 (
is CI 90% P1 For
) 1 (
,
) 1 (
is CI )100% - (1 the &
1
) 1 ( ) 1 (
Pr
g rearrangin then
2
1 , 2 / 1
2
2
1 , 2 /
2
2
1 , 2 / 1
2
2
2
1 , 2 /
2
(
¸
(

¸
÷ ÷
(
¸
(

¸
÷ ÷
÷ =
(
¸
(

¸
÷
< <
÷
÷ ÷ ÷
÷ ÷ ÷
n n
n n
s n s n
s n s n
o o
o o
_ _
o
o
_
o
_
P1: CPRepairs overtime…
 90% CI is (15.71, 67.64)
 Notice the CI is not symmetric about s
2
 Recall for population mean CI was sample mean ± margin of
error
B f l i i h CI i
15
 But for population variance the CI is
(s
2
– error
L

2
<s
2
+ error
U
) & error
L
≠ error
U
 CI includes o
2
= 25 & conclude would not reject
H
0
: o
2
= 25 at 10% level
 While the point estimate of o
2
is > 25 no statistical
evidence of an increase in population variance
 No evidence favouring a change in staff numbers
6
Chi-squared tests
 Data often occurs in nominal (categorical) form
 Private health insurance status & hospital type
 Customer satisfaction surveys
 There are several possible outcomes or
16
categories
 Categories are mutually exclusive & exhaustive
 Think of each respondent/observation as being a trial
 Recall binomial experiments now multinomial
extension
 Will often have expected or hypothesized distribution
of outcomes
Chi-squared tests…
 Want to compare observed & expected
distributions
 Obviously could calculate differences in expected &
observed category frequencies
I f bl i t d t i h th th  Inference problem is to determine whether those
differences are statistically large
 Chi-squared goodness of fit test used to
test if observed & expected distributions are
the same
17
Chi-squared tests…
 H
0
will specify probabilities p
i
that an observation
falls into i=1,…,c categories or cells
 H
0
implies expected frequencies for sample of size n
(e
i
= p
i
n)
A i  Assuming
 Random sampling (independent trials)
 Probabilities p
i
are constant over trials
 Note, the test can be unreliable if any values of
e
i
= p
i
n get too small (e.g. 3 or 4)
 Solution: merge categories where feasible
18
7
Chi-squared tests…
 The distribution theory underlying the test is not exact
 It is large sample theory (a reason for above limitation)
 Test statistic is given by
( ) e o
c 2
( )
i e
i o
e
e o
i
i
c
c
i i
i i
cell in frequency expected
cell in frequency observed where
correct, is hypothesis null if 1) - (c squared - Chi from
n observatio an like behave should statistic the i.e.
, ~
2
1
1
2
=
=
÷
=
÷
=
¿
_ _
19
P2: CPRepairs benchmarking
consumer satisfaction
 In a national survey of all auto repair centres
 “How would you rate the level of service provided by your
repair centre?”
20
 Distribution of responses:
Excellent (8%), Very good (47%), Fair (34%), Poor (11%)
 CPRepairs conducted their own survey of 207
customers to compare with national results
 Observed response frequencies:
Excellent (21), Very good (109), Fair (62), Poor (15)
P2: CPRepairs benchmarking
consumer satisfaction…
 Hypotheses
 H
0
: CPRepairs’ distribution of customer satisfaction is the
same as the national distribution for all auto repairers
 p
1
= .08, p
2
= .47, p
3
= .34, p
4
= .11
 H
1
: CPRepairs’ distribution is not the same as the national
21
1
p
distribution
 Test procedure
 As c = 4 test has 3 degrees of freedom
 Choose o = 0.05
 Decision rule is:
 Reject H
0
if _
2
> _
2
.05,3
= 7.815
8
P2: CPRepairs benchmarking
consumer satisfaction…
Response Observed frequency Expected frequency
oi ei (oi - ei)
2
/ ei
Excellent 21 .08x207=16.56 1.19
Very good 109 .47x207=97.29 1.41
Fair 62 34x207=70 38 1 00
22
Fair 62 .34x207=70.38 1.00
Poor 15 .11x207=22.77 2.65
Total 207 207
_
2
= 6.25

Notice that observed frequencies tend to indicate higher levels of satisfaction compared to the
national distribution

But as _
2
=6.25 < 7.815  do not reject H0 & conclude CPRepairs’ distribution of customer
satisfaction responses is not statistically different from the distribution of national responses

Contingency tables
 Recall SIA: private health insurance (PHI)
 Survey data were summarized in a 2-way cross-tabulation
or contingency table
 The “2 ways” were PHI status & admission to hospital
 PHI status had 2 levels (have PHI/don’t have)
Ad i i h d 3 l l ( t d itt d/ d itt d
23
 Previously used such tables as descriptive tools
 Also checked whether events were independent
 Now want to formally test whether random variables are
independent or not
 Is there a relationship between the 2 categorical random
variables?
Contingency tables…
 Testing strategy is similar to that used for the
goodness of fit test
 Compare observed cell frequencies with those expected
under null hypothesis of independence
24
 How do you calculate the expected frequencies?
 Previously these followed readily from the hypothesized
probability distribution
 Now H
0
simply asserts independence
 Recall what is required for independent events
 P(A ∩ B)=P(A)P(B)
9
Contingency tables…
 For a contingency table assume independence
 Then use marginal (row & column) totals to generate
expected frequencies for each cell
 Expected frequency for cell in row i & column j is:
25
c j n
r i n
n
n n
n
n
n
n
n
e
j
i
j i j i
ij
,..., 1 column in obs. total
,..., 1 row in obs. total where
.
.
. . . .
= =
= =
= × × =
Contingency tables…
( )
~
now is statistic Test
2
1 1
2
2
÷
=
¿¿
= =
e
e o
r
i
c
j ij
ij ij
_ _
v
26
) 1 ( ) 1 (
column row in cell of frequency expected
column row in cell of frequency observed where
1 1
÷ × ÷ =
=
=
= =
c r
j i e
j i o
e
ij
ij
i j ij
v
P3: CPRepairs satisfaction by
consumer type
 CPRepairs conducted their own survey to:
 Benchmark their results versus national results (P2)
 Investigate how well they were servicing different types of
customers (P3)
27
 CPRepairs responses were classified into 3 types of
 The 2-way contingency table is satisfaction response with 4
levels versus type with 3 levels
 Is customer satisfaction independent of customer type?
10
P3: CPRepairs satisfaction by
consumer type…
Type
Excellent 4 7 10 21
Very good 35 34 40 109
Fair 21 24 17 62
Poor 6 5 4 15
Total 66 70 71 207

28
 This is the contingency table with cross-tabulation of
 These are the observed survey responses for CPRepairs
 Now need to compare these with what would be
expected under independence
P3: CPRepairs satisfaction by
consumer type…
Type
Excellent 4 (6.696) 7 (7.101) 10 (7.203) 21
Very good 35 (34.754) 34 (36.860) 40 (37.386) 109
Fair 21 (19.786) 24 (20.966) 17 (21.266) 62
29
( ) ( ) ( )
Poor 6 (4.783) 5 (5.072) 4 (5.145) 15
Total 66 70 71 207

_
2
= (4 - 6.696)
2
/6.696 +(7 - 7.101)
2
/7.101 + … + (4 - 5.145)
2
/5.145 = 4.5164

As _
2
=4.5164 < _
2
.01,6 = 16.8119 do not reject H0 that type and satisfaction are independent

Further quantitative course
options
 Second year
 Introductory Econometrics
 Statistics for Econometrics
 Third year
 Econometric Methods
30
 Econometric Methods
 Econometric Theory
 Financial Econometrics
 Honours
 Applied Econometrics