Professional Documents
Culture Documents
EXAMINATION
Semester 1 – Final Examination, 2018
Exam Conditions:
Central Examination
Students must return the examination paper at the end of the examination
This examination paper is not available to the ANU Library archives
Instructions to Students:
x Please write your student number in the space provided at the top of this page.
x Attempt ALL questions. There are 5 questions, not equally weighted.
x All answers are to be written on the exam paper.
x Please hand in the exam paper before you leave the room.
x To ensure full marks show all the steps in working out your solution. Marks may be deducted for
failure to show appropriate calculations or formulae.
x Answer for all numerical questions should be to 2 decimal places unless otherwise specified.
x If you need additional space, please use the rear of the page and state clearly on the front that you
have done so.
Q1 Q2 Q3 Q4 Q5 Total
Pages 2-3 4-6 7-8 9-11 12-13
Marks 18 24 12 26 20 100
Score
Page 1 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
a. Suppose that a 95% confidence interval for ߤ is (54.8, 60.8). Which of the following is most
likely the p-value for the test of ܪ ǣߤ ൌ ͷ versus ܪ ǣߤ ് ͷ ? [1 marks]
b. Decreasing the significance level of a hypothesis test (say, from 5% to 1%) will cause the p-
value of an observed test statistic to: [1 marks]
A. -1.688
See additional B.
file for solution 1.688 C. 4.74 D. 0.7701
e. Would you reject the null hypothesis for the above test at a significance level of 5%?
[2 marks]
f. Among international applicants to an Australian university, the average TOEFL score was 269,
the SD was about 11, and the highest score was 285. Do you think the TOEFL scores follow a
normal distribution? [2 marks]
The TOEFL scores do not seem to follow a normal distribution. Since for a normal distribution
95% of the data lies within 2 SD of the mean it is extremely unlikely that the maximum score
of 285 will be about 1.5xSD above the mean.
It is generally believed that the heights of adult males in Australia are approximately normally
distributed with mean 70 inches and standard deviation 3 inches and that the heights of adult females in
Page 2 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
Australia are also approximately normally distributed with mean 64 inches and standard deviation 2.5
inches. ANU is considering custom ordering beds for their dorm rooms. Answer the following
questions about the lengths of beds in dorm rooms at ANU.
g. The beds that the university currently purchases are 75 inches long. What proportion of males
will be able to fit on the bed while lying perfectly straight? [2 marks]
Male Height . Thus, the proportion of males who will fit in the current beds is:
h. Should ANU be concerned that females will not fit in the 75-inch beds? Numerically justify
your answer. [2 marks]
Female Height . Thus, the proportion of females who will not fit in the current beds is:
i. ANU plans on ordering custom sized beds such that 99% of male students are expected to fit
in them when lying perfectly straight. What length beds should they order? Round your answer
to the nearest inch. [2 marks]
If 99% of males are expected to fit in the new beds, then the length of the new bed should be: 77in.
j. ANU decides it is too expensive to replace all the beds. Suppose ANU has 2,150 beds all of
which are 75 inches long. How many beds should they replace? You may assume that only
those males taller than 75 inches will receive the longer beds and that females make up half of
the population that will need a dorm room bed. [4 marks]
The proportion of males who will need a bed longer than 75inches is 0.0475 or 4.75% (from
part g). Since only half the population is male. We expect there are about 1075 males.
Page 3 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
A survey was conducted amongst 438 students in an introductory statistics class. The questions asked
were whether the student has ever smoked, whether they have ever consumed alcohol and their gender.
The responses to the survey are given in the tables below.
a. Construct a 95% confidence interval for the difference in proportion of females and males who
have responded that they have smoked in the past. [4 marks]
Standard error is
b. Test at 10% level of significance, if there is evidence that the proportion of females who have
never consumed alcohol differ significantly from the proportion of males who have never
consumed alcohol. [4 marks]
vs Test Statistic:
There is strong evidence to reject the Null hypothesis in favour of the alternative.
Page 4 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
c. If a random student is picked from the sample what is the probability that this student is a male
smoker? [2 marks]
There are 438 students in all and 72 of these are smokers. So, the probability that a student
d. If a random student is picked from the sample what is the probability that the student is a
smoker and has consumed alcohol? [2 marks]
probability is:
In the same survey information was collected on each student’s height and weight. This information
was used to evaluate the BMI (body mass index) for each student. The BMI can be used to assess,
however imperfectly, the health of an individual. The table below gives us summary information for the
BMI variable for the survey respondents.
݊ ݔҧ ݏ
e. What affect will increasing the sample size have on the center and spread of the BMI variable?
[4 marks]
Increasing the sample size will not have much effect on the center of the BMI variable.
But, increasing the sample size will decrease the standard deviation as there will be less
variability in a larger sample.
Page 2 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
f. It is suspected that the people who smoke tend to have a lower weight, hence they would also
have a lower BMI compared to non-smokers with the same height. Setup and carry out an
appropriate hypothesis test to verify this claim. [4 marks]
Null: vs Alternative:
Test Statistic:
From the t-distribution tables at 100 d.f. the test statistic is > 1.664 but < 1.984
leading to a p-value between 0.95 and 0.975.
Hence, we cannot reject the null hypothesis at 5% level of significance. Thus, there is not
enough evidence to suggest that smokers have a significantly lower BMI than non-smokers.
g. What assumptions are you making in carrying out the above test? [2 marks]
The two main assumptions are that the underlying population BMI’s are approximately
normally distributed or that we have a large enough sample size.
h. One of the researcher suggests that gender may also be a contributing factor to the difference
in BMI. If you were to look at data on males and females separately would you expect to see a
result different from the one derived in part f? Please explain why or why not. [2 marks]
In our sample, the proportion of males who smoke is significantly larger than the proportion of
females who smoke (CI from part a suggests this). Also, males tend to be taller and therefore would
have a higher weight. Thus, the BMI for males would typically be higher than those for females.
Combining all of the information above, our sample of smokers have a higher proportion of males and
our sample of non-smoker have a higher proportion of females. This could be the reason why the
average BMI for non-smokers is lower than the average BMI for smokers. So, if we were to look at
males and females separately, there is a good chance to observe a different result.
Page 3 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
A multiple regression model ‘Model 1’ is fit to assess the ‘Time’ taken to commute to work using
various forms of public transport. Two predictors, ‘Distance’ of travel in kms and ‘Age’ of the
individual in years, were used in the regression model and the output from the model fitting is given
below.
Summary of Model 1
Predictor Coef SE Coef T P
Constant 5.08731 1.21706 4.18 0.000
Distance 1.09934 0.03306 33.258 0.000
Age 0.03190 0.02575 1.239 0.216
Analysis of Variance
Source DF SS MS F P
Regression 2 69770 34885 553.78 0.000
Residual Error 497 31308 63
A second model ‘Model 2’ is fit where Time is only regressed on Distance. The output from the model
fit is given below.
Summary of Model 2
Predictor Coef SE Coef T P
Constant 6.40819 0.58764 10.90 0.000
Distance 1.09931 0.03307 33.24 0.000
Analysis of Variance
Source DF SS MS F P
Regression 1 69673 69673 1104.8 0.000
Residual Error 498 31405 63
a. Is ‘Age’ a significant predictor in this multiple regression model? What information have you
used to come to your conclusion? [2 marks]
Based on the summary of model 1, Age has a p-value of 0.216 for the predictor and this
suggests that the variable Age is not a significant predictor.
Page 4 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
b. Which model should you choose between the two models fit? Justify your choice. [2 marks]
Based on the information provided, one would choose Model 2. Both models explain a
similar proportion of variance. The standard error of residuals is quite similar too.
c. Construct an appropriate 95% interval for the average time taken to commute to work for
individuals whose distance of travel is 20kms. Comment whether this is a prediction or a
confidence interval. [4 marks]
d. Construct an appropriate 95% interval for the time taken to commute to work for an individual
whose distance of travel is 10kms. Comment whether this is a prediction or a confidence
interval. [4 marks]
Page 5 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
Output for a model to predict the GPAs of students at a small university based on their Math scores,
Verbal scores, and the number of hours spent watching television in a typical week is provided.
Analysis of Variance
Source DF SS MS F P
Regression ??? ??? 4.8295 35.90 0.000
Residual Error ??? 59.7304 ???
Total 447 ???
If the Math and Verbal scores remain unchanged, when the amount of TV watched in a
typical week increases by 1 hour, the predicted GPA decreases by 0.0147.
b. Use the output to determine how many students were included in the sample. [2 marks]
The degrees of freedom for the Total in the ANOVA table is 447. Since this value must be
. This means that n = 448
Some of the information in the ANOVA table is missing. Evaluate the missing values to be able to
answer the following questions.
c. How many degrees of freedom should appear in the "Regression" row of the table? [1 marks]
Page 6 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
d. How many degrees of freedom should be listed in the "Residual Error" row? [1 marks]
e. At the 1% significance level, is the model effective according to the ANOVA test. Include all
details of the test (i.e., Null & Alternative Hypothesis, Test Statistic and P-Value) [4 marks]
The test statistic as per the ANOVA table is and has a p-value of 0.000.
Hence there is strong evidence to reject the null hypothesis. This suggests the model is
effective in explaining GPA using the three predictors Math, Verbal and TV.
f. Which predictors are significant at the 5% level? What are their p-values? [6 marks]
All three predictors are significant based on the t-test carried out in the summary of
regression output. P-values are not given but the critical value from a t-distribution with 444
degrees of freedom is about 1.96 .
Since all of the test statistics are greater than 1.96 in numerical terms therefore each of the
p-values are < 0.05. In fact, all the p-values are less than 0.0005.
Page 7 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
h. The R2 for this model is missing in the provided output. Use the available information to
compute (round to three decimal places) and interpret R2 for this model. [4 marks]
SSResidual
SSRegression
Thus
R2 = 19.52%
The regression model explains about 19.5% of the overall variance in the response variable.
i. A dotplot of the residuals and a scatterplot of the residuals versus the predicted values are
provided. Discuss whether the conditions for a multiple linear regression are reasonable by
referring to the appropriate plots. [4 marks]
The main assumption behind the multiple regression is that the residuals are normally
distributed with constant variance. The histogram does not suggest any non-normality of the
residuals.
The scatterplot suggests that the residuals have mean 0 and a constant SD meeting the
underlying assumptions of homoscedasticity.
Page 8 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
About 36% of all students in ANU are international students. Suppose we take a random sample of 200
ANU students. Let X represent the number of students in this sample that are international.
Round all answers to three decimal places and represent proportions as percentages.
b. What is the probability that exactly 50 people in the sample are international? [2 marks]
c. What is the probability that there are less than 3 international students in the sample?
[4 marks]
Given the probability of success is 0.36 there is a negligible chance of getting less than 3
international students in random sample of 200.
d. What is the mean and standard deviation of the random variable X? [4 marks]
Mean:
SD:
Page 9 of 13
Semester 1 - Final, 2018 STAT1008 – Quantitative Research Methods
A manufacturing firm uses three machines, A, B and C, to produce computer chips. Machine A
produces 50% of all chips, machine B produces 30% of all chips and machine C produces the
remaining chips. 1% of all chips produced by machine A are defective, whereas 2% of machine B chips
are defective and 1.5% of machine C chips are defective.
e. Draw a tree diagram to evaluate the probability that a random chip picked is not defective.
[4 marks]
D (0.5x0.01) = 0.005
A (p=0.5)
ND (0.5x0.99) = 0.495
D (0.3x0.02) = 0.006
Firm B (p=0.3)
ND (0.3x0.02) = 0.294
f. If a chip picked at random turns out to be defective, what is the probability that it was
manufactured from machine B? [4 marks]
END OF EXAMINATION
Page 10 of 13