Final Exam S1 2018 Solutions

Venue ____________________
Student Number |__|__|__|__|__|__|__|__|
Research School of Finance, Actuarial Studies & Statistics
EXAMINATION
Semester 1 – Final Examination, 2018
STAT1008 – QUANTITATIVE RESEARCH METHODS
Writing Time: 180 mins

Reading Time: 15 mins
Exam Conditions:
Central Examination
Students must return the examination paper at the end of the examination
This examination paper is not available to the ANU Library archives
Materials Permitted In The Exam Venue:

(No electronic aids are permitted e.g. laptops, phones)
Non-programmable calculator
One (1) A4 sheet of notes, written on both sides.
Dictionaries (must not contain material added by the student)
Materials to Be Supplied To Students:

Scribble paper
Instructions to Students:
x Please write your student number in the space provided at the top of this page.
x Attempt ALL questions. There are 5 questions, not equally weighted.
x All answers are to be written on the exam paper.
x Please hand in the exam paper before you leave the room.
x To ensure full marks show all the steps in working out your solution. Marks may be deducted for
failure to show appropriate calculations or formulae.
x Answer for all numerical questions should be to 2 decimal places unless otherwise specified.
x If you need additional space, please use the rear of the page and state clearly on the front that you
have done so.
Q1 Q2 Q3 Q4 Q5 Total
Pages 2-3 4-6 7-8 9-11 12-13
Marks 18 24 12 26 20 100
Score
Page 1 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
Question 1 [18 Marks]
a. Suppose that a 95% confidence interval for ߤ is (54.8, 60.8). Which of the following is most
likely the p-value for the test of ‫ܪ‬଴ ǣߤ ൌ ͷ͸ versus ‫ܪ‬஺ ǣߤ ് ͷ͸ ? [1 marks]
A. 0.031 B. 0.001 C. 0.016 D. 0.231
b. Decreasing the significance level of a hypothesis test (say, from 5% to 1%) will cause the p-
value of an observed test statistic to: [1 marks]
A. Increase B. Decrease C. Stay the same
Use the following to answer parts c to e.

Consider a test of ‫ܪ‬଴ ǣߤଵ ൌ ߤଶ versus ‫ܪ‬஺ ǣߤଵ ൐ ߤଶ using the sample results ‫ݔ‬ҧଵ ൌ ͺʹǤ͵, ‫ݏ‬ଵ ൌ ͹ǤͷͶ with
݊ଵ ൌ ʹͺ and ‫ݔ‬ҧଶ ൌ ͹ͺǤ͸, ‫ݏ‬ଶ ൌ ͺǤͳ͸ with ݊ଶ ൌ ʹͶ.
c. What is the test statistic for this test? [1 marks]
A. -1.688
See additional B.
file for solution 1.688 C. 4.74 D. 0.7701
d. What are the degrees of freedom for this test? [1 marks]

See additional
A. 4
file for solution
B. 23 C. 25 D. 27
e. Would you reject the null hypothesis for the above test at a significance level of 5%?
[2 marks]
The critical value of a t-distribution with 23 degrees of freedom at 5% level of significance is

See additional file for solution
1.714. Since the test statistic is 1.688 < 1.714, hence there is not enough evidence to reject
the Null Hypothesis at 5% level.
f. Among international applicants to an Australian university, the average TOEFL score was 269,
the SD was about 11, and the highest score was 285. Do you think the TOEFL scores follow a
normal distribution? [2 marks]
The TOEFL scores do not seem to follow a normal distribution. Since for a normal distribution
95% of the data lies within 2 SD of the mean it is extremely unlikely that the maximum score
of 285 will be about 1.5xSD above the mean.
It is generally believed that the heights of adult males in Australia are approximately normally
distributed with mean 70 inches and standard deviation 3 inches and that the heights of adult females in
Page 2 of 13
Australia are also approximately normally distributed with mean 64 inches and standard deviation 2.5
inches. ANU is considering custom ordering beds for their dorm rooms. Answer the following
questions about the lengths of beds in dorm rooms at ANU.
g. The beds that the university currently purchases are 75 inches long. What proportion of males
will be able to fit on the bed while lying perfectly straight? [2 marks]
Male Height . Thus, the proportion of males who will fit in the current beds is:
. Proportion is 0.95 or 95.25%
h. Should ANU be concerned that females will not fit in the 75-inch beds? Numerically justify
your answer. [2 marks]
Female Height . Thus, the proportion of females who will not fit in the current beds is:
. ANU should not be concerned that any of the
females will not fit the current beds.
i. ANU plans on ordering custom sized beds such that 99% of male students are expected to fit
in them when lying perfectly straight. What length beds should they order? Round your answer
to the nearest inch. [2 marks]
If 99% of males are expected to fit in the new beds, then the length of the new bed should be: 77in.
j. ANU decides it is too expensive to replace all the beds. Suppose ANU has 2,150 beds all of
which are 75 inches long. How many beds should they replace? You may assume that only
those males taller than 75 inches will receive the longer beds and that females make up half of
the population that will need a dorm room bed. [4 marks]
The proportion of males who will need a bed longer than 75inches is 0.0475 or 4.75% (from
part g). Since only half the population is male. We expect there are about 1075 males.
Hence, the number of beds that needs to be replaced will be
Total number of beds needing replacement is 52.
Page 3 of 13
A survey was conducted amongst 438 students in an introductory statistics class. The questions asked
were whether the student has ever smoked, whether they have ever consumed alcohol and their gender.
The responses to the survey are given in the tables below.
Female Male Female Male
Non-Smoker 166 168 No Alcohol 58 39
Smoker 32 72 Alcohol 140 201
a. Construct a 95% confidence interval for the difference in proportion of females and males who
have responded that they have smoked in the past. [4 marks]
and . Sample sizes are
Standard error is
95% Confidence Interval is:
b. Test at 10% level of significance, if there is evidence that the proportion of females who have
never consumed alcohol differ significantly from the proportion of males who have never
consumed alcohol. [4 marks]
and . Sample sizes are
vs Test Statistic:
Thus, the p-value using a Standard Normal distribution is 0.00058
There is strong evidence to reject the Null hypothesis in favour of the alternative.
Page 4 of 13
c. If a random student is picked from the sample what is the probability that this student is a male
smoker? [2 marks]
There are 438 students in all and 72 of these are smokers. So, the probability that a student
picked at random is a male smoker:
d. If a random student is picked from the sample what is the probability that the student is a
smoker and has consumed alcohol? [2 marks]
Picking a random smoker: and picking a random consumer of alcohol:
. The two events are independent and hence the required
probability is:
In the same survey information was collected on each student’s height and weight. This information
was used to evaluate the BMI (body mass index) for each student. The BMI can be used to assess,
however imperfectly, the health of an individual. The table below gives us summary information for the
BMI variable for the survey respondents.
݊ ‫ݔ‬ҧ ‫ݏ‬
Non-Smoker 334 22.03 3.52
Smoker 104 22.73 3.32
e. What affect will increasing the sample size have on the center and spread of the BMI variable?
[4 marks]
Increasing the sample size will not have much effect on the center of the BMI variable.
But, increasing the sample size will decrease the standard deviation as there will be less
variability in a larger sample.
Page 2 of 13
f. It is suspected that the people who smoke tend to have a lower weight, hence they would also
have a lower BMI compared to non-smokers with the same height. Setup and carry out an
appropriate hypothesis test to verify this claim. [4 marks]
Null: vs Alternative:

Standard Error:
Test Statistic:
Comparing the test statistic to a t-distribution with degrees of freedom.
From the t-distribution tables at 100 d.f. the test statistic is > 1.664 but < 1.984
leading to a p-value between 0.95 and 0.975.
Hence, we cannot reject the null hypothesis at 5% level of significance. Thus, there is not
enough evidence to suggest that smokers have a significantly lower BMI than non-smokers.
g. What assumptions are you making in carrying out the above test? [2 marks]
The two main assumptions are that the underlying population BMI’s are approximately
normally distributed or that we have a large enough sample size.
h. One of the researcher suggests that gender may also be a contributing factor to the difference
in BMI. If you were to look at data on males and females separately would you expect to see a
result different from the one derived in part f? Please explain why or why not. [2 marks]
In our sample, the proportion of males who smoke is significantly larger than the proportion of
females who smoke (CI from part a suggests this). Also, males tend to be taller and therefore would
have a higher weight. Thus, the BMI for males would typically be higher than those for females.
Combining all of the information above, our sample of smokers have a higher proportion of males and
our sample of non-smoker have a higher proportion of females. This could be the reason why the
average BMI for non-smokers is lower than the average BMI for smokers. So, if we were to look at
males and females separately, there is a good chance to observe a different result.
Page 3 of 13
A multiple regression model ‘Model 1’ is fit to assess the ‘Time’ taken to commute to work using
various forms of public transport. Two predictors, ‘Distance’ of travel in kms and ‘Age’ of the
individual in years, were used in the regression model and the output from the model fitting is given
below.
Summary of Model 1
Predictor Coef SE Coef T P
Constant 5.08731 1.21706 4.18 0.000
Distance 1.09934 0.03306 33.258 0.000
Age 0.03190 0.02575 1.239 0.216
S = 7.937 R-Sq = 69.0% R-Sq(adj) = 68.9%
Analysis of Variance
Source DF SS MS F P
Regression 2 69770 34885 553.78 0.000
Residual Error 497 31308 63
A second model ‘Model 2’ is fit where Time is only regressed on Distance. The output from the model
fit is given below.
Summary of Model 2
Constant 6.40819 0.58764 10.90 0.000
Distance 1.09931 0.03307 33.24 0.000
S = 7.941 R-Sq = 68.9% R-Sq(adj) = 68.9%
Source DF SS MS F P
Regression 1 69673 69673 1104.8 0.000
Residual Error 498 31405 63
Mean(Distance) 14.156 SD(Distance) 10.75
a. Is ‘Age’ a significant predictor in this multiple regression model? What information have you
used to come to your conclusion? [2 marks]
Based on the summary of model 1, Age has a p-value of 0.216 for the predictor and this
suggests that the variable Age is not a significant predictor.
Page 4 of 13
b. Which model should you choose between the two models fit? Justify your choice. [2 marks]
Based on the information provided, one would choose Model 2. Both models explain a
similar proportion of variance. The standard error of residuals is quite similar too.
Age is a non-significant predictor in model 2 summary and so we should choose model 1. It is

also a parsimonious model.
c. Construct an appropriate 95% interval for the average time taken to commute to work for
individuals whose distance of travel is 20kms. Comment whether this is a prediction or a
confidence interval. [4 marks]
We are trying to evaluate a 95% confidence interval for .
Predicted value is:
Confidence Interval: (27.60, 29.19)
d. Construct an appropriate 95% interval for the time taken to commute to work for an individual
whose distance of travel is 10kms. Comment whether this is a prediction or a confidence
interval. [4 marks]
We are trying to evaluate a 95% prediction interval for .
Predicted value is:
Prediction Interval: (1.80, 33.00)
Page 5 of 13
Output for a model to predict the GPAs of students at a small university based on their Math scores,
Verbal scores, and the number of hours spent watching television in a typical week is provided.
Summary of Regression Model

Constant 1.8015 0.1842 9.78 0.000
Math 0.0010442 0.0002500 4.18 ???
Verbal 0.0014182 0.0002398 5.91 ???
TV -0.014708 0.003269 -4.50 ???
S = ??? R-Sq = ???? R-Sq(adj) = 19.0%
Source DF SS MS F P
Regression ??? ??? 4.8295 35.90 0.000
Residual Error ??? 59.7304 ???
Total 447 ???
a. Interpret the coefficient of TV in context. [2 marks]
If the Math and Verbal scores remain unchanged, when the amount of TV watched in a
typical week increases by 1 hour, the predicted GPA decreases by 0.0147.
b. Use the output to determine how many students were included in the sample. [2 marks]
The degrees of freedom for the Total in the ANOVA table is 447. Since this value must be
. This means that n = 448
Some of the information in the ANOVA table is missing. Evaluate the missing values to be able to
answer the following questions.
c. How many degrees of freedom should appear in the "Regression" row of the table? [1 marks]
Since there are 3 predictors the degrees of freedom for Regression is 3.
Page 6 of 13
d. How many degrees of freedom should be listed in the "Residual Error" row? [1 marks]
The Residual degrees of freedom is 444
e. At the 1% significance level, is the model effective according to the ANOVA test. Include all
details of the test (i.e., Null & Alternative Hypothesis, Test Statistic and P-Value) [4 marks]
Null: vs Alternative: at least one of
The test statistic as per the ANOVA table is and has a p-value of 0.000.
Hence there is strong evidence to reject the null hypothesis. This suggests the model is
effective in explaining GPA using the three predictors Math, Verbal and TV.
f. Which predictors are significant at the 5% level? What are their p-values? [6 marks]
All three predictors are significant based on the t-test carried out in the summary of
regression output. P-values are not given but the critical value from a t-distribution with 444
degrees of freedom is about 1.96 .
Since all of the test statistics are greater than 1.96 in numerical terms therefore each of the
p-values are < 0.05. In fact, all the p-values are less than 0.0005.
g. What is the standard error of the residuals? [2 marks]
The standard error of residuals is 0.36678.
From the ANOVA output SSResiduals . Since the therefore
Standard error of residuals is:
Page 7 of 13
h. The R2 for this model is missing in the provided output. Use the available information to
compute (round to three decimal places) and interpret R2 for this model. [4 marks]
SSResidual
SSRegression
Thus
R2 = 19.52%
The regression model explains about 19.5% of the overall variance in the response variable.
i. A dotplot of the residuals and a scatterplot of the residuals versus the predicted values are
provided. Discuss whether the conditions for a multiple linear regression are reasonable by
referring to the appropriate plots. [4 marks]
The main assumption behind the multiple regression is that the residuals are normally
distributed with constant variance. The histogram does not suggest any non-normality of the
residuals.
The scatterplot suggests that the residuals have mean 0 and a constant SD meeting the
underlying assumptions of homoscedasticity.
Page 8 of 13
Question 5 [20 marks]
About 36% of all students in ANU are international students. Suppose we take a random sample of 200
ANU students. Let X represent the number of students in this sample that are international.
Round all answers to three decimal places and represent proportions as percentages.
a. Explain why X is a binomial random variable. [2 marks]
1. Number of trials is fixed .

2. Probability of success is fixed
3. Since it is a random sample, we can assume independence between the trials.
b. What is the probability that exactly 50 people in the sample are international? [2 marks]
c. What is the probability that there are less than 3 international students in the sample?
[4 marks]
Given the probability of success is 0.36 there is a negligible chance of getting less than 3
international students in random sample of 200.
d. What is the mean and standard deviation of the random variable X? [4 marks]
Mean:
SD:
Page 9 of 13
A manufacturing firm uses three machines, A, B and C, to produce computer chips. Machine A
produces 50% of all chips, machine B produces 30% of all chips and machine C produces the
remaining chips. 1% of all chips produced by machine A are defective, whereas 2% of machine B chips
are defective and 1.5% of machine C chips are defective.
e. Draw a tree diagram to evaluate the probability that a random chip picked is not defective.
[4 marks]
D (0.5x0.01) = 0.005
A (p=0.5)
ND (0.5x0.99) = 0.495
D (0.3x0.02) = 0.006
Firm B (p=0.3)
ND (0.3x0.02) = 0.294
D (0.2x0.015) = 0.003 D = Defective Chip

C (p=0.2)
ND (0.2x0.985) = 0.197 ND = Non-Defective Chip
f. If a chip picked at random turns out to be defective, what is the probability that it was
manufactured from machine B? [4 marks]
Using Bayes rule:
END OF EXAMINATION
Page 10 of 13

Final Exam S1 2018 Solutions

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Final Exam S1 2018 Solutions

Uploaded by

Copyright:

Available Formats

Venue ____________________

Student Number |__|__|__|__|__|__|__|__|

Research School of Finance, Actuarial Studies & Statistics

STAT1008 – QUANTITATIVE RESEARCH METHODS

Writing Time: 180 mins

Materials Permitted In The Exam Venue:

Materials to Be Supplied To Students:

Question 1 [18 Marks]

A. 0.031 B. 0.001 C. 0.016 D. 0.231

A. Increase B. Decrease C. Stay the same

Use the following to answer parts c to e.

c. What is the test statistic for this test? [1 marks]

d. What are the degrees of freedom for this test? [1 marks]

The critical value of a t-distribution with 23 degrees of freedom at 5% level of significance is

. Proportion is 0.95 or 95.25%

. ANU should not be concerned that any of the

females will not fit the current beds.

Hence, the number of beds that needs to be replaced will be

Total number of beds needing replacement is 52.

Question 2 [24 Marks]

Female Male Female Male

Non-Smoker 166 168 No Alcohol 58 39

Smoker 32 72 Alcohol 140 201

and . Sample sizes are

95% Confidence Interval is:

and . Sample sizes are

See additional file for solution

Thus, the p-value using a Standard Normal distribution is 0.00058

picked at random is a male smoker:

Picking a random smoker: and picking a random consumer of alcohol:

. The two events are independent and hence the required

Non-Smoker 334 22.03 3.52

Smoker 104 22.73 3.32

See additional file for solution

Comparing the test statistic to a t-distribution with degrees of freedom.

Question 3 [12 Marks]

S = 7.937 R-Sq = 69.0% R-Sq(adj) = 68.9%

S = 7.941 R-Sq = 68.9% R-Sq(adj) = 68.9%

Mean(Distance) 14.156 SD(Distance) 10.75

Age is a non-significant predictor in model 2 summary and so we should choose model 1. It is

We are trying to evaluate a 95% confidence interval for .

Predicted value is:

95% Confidence Interval is:

Confidence Interval: (27.60, 29.19)

We are trying to evaluate a 95% prediction interval for .

Predicted value is:

95% Confidence Interval is:

Prediction Interval: (1.80, 33.00)

Question 4 [26 Marks]

Summary of Regression Model

S = ??? R-Sq = ???? R-Sq(adj) = 19.0%

a. Interpret the coefficient of TV in context. [2 marks]

Since there are 3 predictors the degrees of freedom for Regression is 3.

The Residual degrees of freedom is 444

Null: vs Alternative: at least one of

g. What is the standard error of the residuals? [2 marks]

The standard error of residuals is 0.36678.

From the ANOVA output SSResiduals . Since the therefore

Standard error of residuals is:

Question 5 [20 marks]

a. Explain why X is a binomial random variable. [2 marks]

1. Number of trials is fixed .

Student Number |||||||||