You are on page 1of 9

ECON 1203/ECON 2292 BUSINESS & ECONOMIC STATISTICS

Final examination - Semester 1 2012

Do not penalize differences due to rounding error
Penalize initial errors but not later incorrect answers that are conditionally correct

Question 1 [12 marks in total]

(i) [2 marks] The distribution of distances would need to have been
symmetric (mean equal to median), unimodal and bell shaped.

(ii) [2 marks] ~(50,000, 12, 000
2
)
(34000 < < 56000) =
16000
12000
< <
6000
12000

= (1.33 < < .5) = 0.4082 +0.1915 = 0.5997

(iii) [2 marks]:
( < ) = <
50000
12000
= 0.05
50000
12000
= 1.645
= 50,000 1.645 12000 = 30260

( > ) = >
50000
12000
= 0.10
50000
12000
= 1.285
= 50,000 +1.285 12000 = 65,420

[1 mark each. Use of approximate percentiles from normal tables also acceptable.]

(iv) [4 marks] = 64 ; = 0.025 = 46,000

We wish to test
0
: = 50,000;
1
: < 50,000

Hence under the null hypothesis

~(50,000,12000
2
/64) and rejection region is:

=
50000
12000/8
<
0.025
= 1.96

or <

=
0

0.025

= 50000 1.96 1500 = 47,060

Since =
4600050000
1500
= 2.67 <
0.025
= 1.96 or 46,000 < 47,060
we reject the null hypothesis and conclude there is evidence to suggest the new
procedures have been effective.

[1 mark each for correct sampling distribution; hypotheses; decision rule; and
conclusion.]
2

(v) [2 marks] Since we have a large sample of 64 it is not necessary to rely on the
underlying population distribution of distances travelled being normal. Instead we
can invoke the Central Limit Theorem which states that with a sufficiently large
sample size the sample mean is approximately normal with mean equal to the
population mean and variance equal to the population variance divided by the
sample size. This holds irrespective of the underlying population distribution that
our samples are drawn from.

[Need to mention CLT holds irrespective of the underlying population distribution to
get full marks.]

3
Question 2 [9 marks in total]

(i) [1 mark] P(Partner) = 195/371 = 0.526

(ii) [1 mark] P(Female|Partner) = 13/195 = 0.067

(iii) [2 marks] If P(Partner|Female) < P(Partner|Male) as it is here; i.e.

P(Partner|Female) = 13/74 = 0.176 and P(Partner|Male) = 182/297= 0.613

then variables are not expected be independent as independence requires

P(Partner|Female) = P(Partner|Male) = P(Partner).

However, the dependence could be attributed to some confounding factors that are
related to gender but that have not been accounted for in this bivariate
relationship.

[1 mark each for why dependence is indicated & the threat to this conclusion]

(iv) [5 marks]
H0: Gender and partner status are independent;
H1: Gender and partner status are not independent

The test statistic will be distributed
2
with (2-1)(2-1) = 1 degree of freedom and
the decision rule will be to reject if
2
>
0.01,1
2
= 6.6349.

Male Female Totals
Associate 115
(140.89)
61
(35.11)
176
Partner 182
(156.11)
13
(38.89)
195
Totals 297 74 371
* Values in brackets are expected outcomes under independence

The test statistic is

2
=
(115 140.89)
2
140.89
+
(182 156.11)
2
156.11
+
(61 35.11)
2
35.11
+
(13 38.89)
2
38.89

= 4.758 +4.294 +19.091 +17.236 = 45.379

As
2
= 45.379 > 6.6349 we reject the null and hence conclude that there is
evidence that gender and partner status are not independent.

[1 mark each for correct hypotheses; decision rule; 2 marks for test statistic & 1 mark
for conclusion]

4
Question 3 [13 marks in total]
(i) [4 marks] H0: p = 0.5; H1: p > 0.5

Decision rule: Reject H0 if > 0.55

As n = 100 is large assume the normal approximation to the binomial:

~(,
(1)

)

The implied significance level is:

= ( > 0.55| = 0.5) =

>
0.55 0.5

0.5 0.5
100

= ( > 1) = 0.1587

[1 mark for each for hypotheses, mentioning normal approximation to the binomial,
the sampling distribution & .]

(ii) [2 marks] =
60
100
= 0.6 > 0.55

Hence according to managements decision rule there is sufficient evidence to
reject the null hypothesis and proceed with the introduction of the new upgrade.

[1 mark for each for the point estimate & the test outcome.]

(iii) [2 marks] A Type II error would occur here if the null hypothesis of 50% of
customers being willing to pay was not rejected when in fact the percentage of
customers willing to pay was in fact greater than 50%.
( ) = ( < 0.55| = 0.54) =

<
0.55 0.54

0.54 0.46
100

= ( < 0.2) = 0.5793

[1 mark for each for explanation (must be in terms of current problem & not just a
generic definition) & Type II error.]

(iv) [3 marks]
A 99% confidence interval for population proportion p is:

/2

(1 )

0.6 2.575 0.049 (0.474, 0.726)

5
The CI includes 0.5 and so if this was the basis for the test there would be
insufficient evidence to reject the null hypothesis.

Naturally the use of a CI implies a two-tailed test whereas we used a one-tailed test
previously but the main difference is the much smaller significance level being
used with the CI in comparison with before. A smaller significance level implies a
wider CI.

[1 mark each for CI, interpretation & reason for difference (either reason is
acceptable).]

(v) [2 marks]
CI width is:

2
2

(1 )

0.08 1.96

0.25

0.04

1.96 0.25
0.04

2

Hence n=600.25 or n=601 and the firm would have needed to interview far more
customers than the 100 they actually interviewed.

If the confidence level is changed to 0.8 the above method remains the same except
that the critical value changes from 1.96 to 1.285 and n=258, still much more than
the 100 used.

Note that before sampling takes place p would be unknown to management and so
p=0.5 has been assumed.

[1 mark for each n.]

6
Question 4 [15 marks in total]

(i) [2 marks]
1 is the population parameter that represents by how much income changes as
age increases by one year or it is the slope of the population regression line
representing the relationship between income and age. Because 0 is the
population mean income for men aged zero it is not a parameter of interest:

(| = 0) =
0

[1 mark for each for interpretation. Must be in terms of population parameters and
not estimates to get full marks.]

(ii) [3 marks] As the estimate of 1 is 892.1 income for this group of men is
predicted to increase on average by \$892.1 for each extra year. This effect is
significantly different from zero because the test statistic, 14.44 is greater in
absolute value than say the critical value of 2.576 if we chose a significance level of
say 0.01, (or p-value is 0.0000 and hence < typical choices of significance level such
as 0.01).

Normal critical values have been used because the large sample size allows us to
confidently invoke the central limit theorem and assume normality for the test
statistic.

[1 mark for each for interpretation, assumption and test]

(iii) [2 marks] The P-value reported by EXCEL is that for testing H0: i = 0 versus H1:
i 0. For the intercept have:

P-value = 2xP(| b0 | > 9.43x2637.4) = 0.0000

at any significance level greater than 0.00000 and hence at conventional choices
such as 0.01 or 0.05 we would reject H0 and conclude that the coefficient is
significantly different from zero.

[1 mark each for explanation and interpretation]

(iv) [3 marks] The standard error is the standard error of the estimate (or
regression) which is the estimate of the standard deviation of the disturbance in
the regression model. R Square is the regression R
2
; the proportion of total
variation in the dependent variable explained by the regression. The value of 0.035
only 3.5% of the variation in the income is explained by the explanatory
variable age and hence the fit is not very good.

[1 mark each for definitions and interpretation]

7
(v) [1 mark] The simple correlation between income and age is positive because the
slope coefficient is positive. As the R
2
= 0.035 we know the correlation while
positive is not very large and in fact the simple correlation, r=0.187. (As R
2
= r
2
in
simple linear regression.)

[Sufficient to say the correlation is small and positive for the full mark]

(vi) [1 mark] An unbiased estimator is one whose expectation equals the parameter it
is estimating. Here that means for the OLS estimator b1:

E(b1) = 1

(vii) [2 marks] Education and gender are two independent variables likely to explain
some of the variation in income. In order to better isolate the impact of age on
income, free from these two possible confounding variables, they have controlled
for their effect by making the sample homogenous in terms of education and
gender.

(viii) = 24874.4 +892.1 70 = \$87,321.4

We know from the formula for the forecast interval that the interval is wider; the
larger is the estimated standard deviation of the disturbance (the standard error of
the estimate) and the further away from the sample mean that we predict. Here we
have seen that the model does not fit well and hence the standard error of the
estimate is large and also were predicting for a 70 year old which is outside the
sample and hence very far from the sample mean age.

[1 mark each for the calculation and for one of the two reasons why the prediction is
likely to be inaccurate.]

8
Question 5 [11 marks in total]

(i) [2 marks] A comparison of the sample means of HRINCOME does not control for
other confounding factors that might impact of hourly income; i.e. it might be a
biased estimate of the difference due solely to gender.

While the difference in means is large in an economic sense the difference might
not be statistically significant. Without further information we cant determine this.

[2 marks for either explanation]

(ii) [3 mark] H0: 4 = 0; H1: 4 < 0

Given the large sample size we can invoke the central limit theorem and assume
normality for the test statistic. Using say =0.05 the rejection region is t-stat<-
1.645 and as t-stat=-0.06 we cannot reject the null hypothesis that 4 = 0; i.e. there
is insufficient evidence to indicate the presence of discrimination on the basis of
gender after having controlled for other factors affecting incomes.

Alternatively note that reported P-value of 0.955 and hence for a one sided test the
P-value is 0.478 > 0.05 and again we do not reject the null hypothesis that 4 = 0.

[1 mark each for hypotheses (should be one-sided), decision rule (including CLT
justification) and conclusion.]

(iii) [3 marks] The coefficient estimate for 1 is 1.948 with an associated t-stat=5.38
and p-value=0.000.

The estimate indicates that for every extra year of experience the average hourly
income increases by \$1.948 holding all other factors constant. That this is a
positive impact is as expected; more experienced lawyers are expected to be paid
more.

The effect is statistically significant. Because p-value=0.000, the null hypothesis of
no impact would be rejected at all significance levels greater than 0.0005.

In terms of economic significance, an increase of \$1.948 per hour per extra year
seems like a reasonably large amount compared to sample averages representing
an increase of either 3.3%, compared with \$59 for males; or 5.7% compared with
\$34 for females. Alternatively 13-14 years experience approximately equates to the
difference associated with being a partner (1.948 13 26.564).

[1 mark each for interpretation, statistical significance and something sensible for
economic significance with the emphasis on understanding that it is distinct from
statistical significance.]

9
(iv) [2 marks] H0: 3 = 0; H1: 3 >0
Using =0.01 the rejection region is t-stat>2.325 and the t-stat=3.69. Thus we
reject the null hypothesis that 3 =0. Alternatively the p-value for the 2-tailed test is
< 0.0005 and hence the p-value for the one-tailed test (which is half the 2-tailed p-
value) is <0.01 and again we reject the null hypothesis.

[1 mark each for test statistic and conclusion.]

(v) [1 mark] Predicted hourly income with EXP=10, SIZE=20, PARTNER=0, FEMALE=0,
is given by.