You are on page 1of 5

January 2014-2015 stats questions

1. (32 marks)

A neuroscientist decided to look at the association between activation in the insula (a brain region
known to be involved in craving) and the number of cookies eaten after a brain scanning session.
The experiment worked by showing participants pictures of junk food whilst brain activation was
recorded in an fMRI scanner. After this participants were given the opportunity to eat as many
cookies as they liked whilst filling in questionnaires for an hour. The level of brain activation in the
insula and the number of cookies consumed is presented below for each participant. Is there any
evidence for the hypothesised effect?

Brain 0.1 .16 1 0.2 2.1 0.1 0.4 1.2 2.5 0.9 1.8 3
activation

Cookies 1 3 2 0 5 1 2 4 3 3 5 2
eaten

On the whole test selection for this question was excellent. Most of you chose a correlation test
(the clue being where it says “look at the association between activation in the insula and number
of cookies eaten”) but a few people went for comparing the central tendency. This would not
make a lot of sense for the current question because it would essentially be asking “Are the
amount of cookies eaten greater than the level of brain activation” which has a few logical
problems due to the different units for the two variables. People who selected a t-test would
therefore have lost a whole load of marks for this question (everything except the histogram,
mentioned below actually). When choosing the correlation test, most, if not all, of you then went
on to perform two histograms on the two variables separately and this should have revealed that
brain activation was not normally distributed (there was a negative skew towards the lower
scores) and therefore the Spearman’s test was optimal. For those of you who selected the
Pearson’s you would have lost a few marks (a couple in the test selection section, and any EDA
errors but you could still score full marks for the calculation and interpretation) but it was still
possible to score very well on this question with that test (high twenties) as it was a near miss. The
histograms were generally good although some lost marks for having an unequal range across the
bins or in the odd case missing out bits of the range in between bins. A lot of you did an outlier
calculation here which is not needed for a correlation test. This did not result in your losing marks
but it would have meant you spent time which could have been more profitably spent elsewhere.
Most people calculated the standard deviations for the two variables pretty well (and if there
were mistakes they were typically arithmetic rather than conceptual and in such cases most of the
marks for the SD’s were obtained). However, as always, the calculation of the covariance proved
more problematic with a number of people trying to take a short cut in the x-mean x * y – mean y
calculation doing this only once with the average across all participants whereas this operates in
exactly the same way as for the SD’s in that you need to do this calculation for each of the 12
individuals separately and only then sum them up at the end. This conceptual mistake would have
cost you 10 marks. Finally, in the interpretation you needed to make clear what the direction of
the effect was. This could be done either by stating it was a significant positive correlation or by
saying it was consistent with the hypothesised effect (which is stated to be positive). A failure to
do this would have cost 1 mark.

2. (17 marks) An estate agent boss wished to examine rigorously the claims in the media that house
prices have increased considerably over the last 12 months. They therefore got the average price of
houses in 10 regions across the United Kingdom in 2013 and 2014 to investigate this. The results for
these regions are shown in the table below. Is there any evidence to support the assumption that
house prices have increased during this time?

Region Region Region Region Region Region Region Region Region Region
1 2 3 4 5 6 7 8 9 10

2013 181 194 152 167 210 138 164 147 173 159

2014 187 201 154 168 251 134 164 153 180 166

Again test selection was pretty good in that most of you selected a related samples central
tendency. The most common mistake in terms of selection was to treat this as a between subject
measure which would have gone down as a serious conceptual error and would have cost you all
marks for this question (unless you did EDA’s). This is because it was just fundamentally the wrong
test to do. You can tell it was within subjects because there were only 10 regions and each region
has a score in both 2013 and 2014 (see also in the text “They therefore got the average price of
houses in 10 regions across the United Kingdom in 2013 and 2014 to investigate this.”). A few of
you also selected a variance test here which would again have achieved no marks because that
was testing a fundamentally different thing to what you were asked. That would be testing
whether there was a difference in consistency or spread of house prices between the 10 regions.
In contrast it asks you whether “house prices have increased considerably over the last 12
months” which is comparing the central tendency. Once choosing the central tendency the next
step would be to distinguish between the related samples t-test and the Wilcoxon matched pairs.
The first step here would be to do an outlier check on the differences between the two conditions.
A great many of you did two outlier checks - one on each group separately - which would have
cost a few marks. Doing the boxplot on the differences would have revealed that region 5 was an
outlier and this should therefore be removed from the rest of the calculation. This would have
then reduced the sample size to less than 10 so therefore there was no need to do a histogram as
there is insufficient data to check for normality and you should proceed directly on to the
Wilcoxon matched pairs test. Some of you did do a histogram (whether because you did not
remove an outlier or with an N of less than 10) and this did not cost you any marks; rather you
would not gain any as it was not necessary for the optimal answer and it was time that could have
been spent elsewhere. For those of you who did the related samples t-test you would have lost a
couple of marks for a sub-optimal test (plus any marks lost via the boxplot) but after that it was
error carried forward (as this was a near miss) so many people did the t-test and still got a very
good mark for this question. However, you had to go down the Wilcoxon branch to get full marks.
For the Wilcoxon calculation, generally these were pretty good but the most common mistakes
were a failure to remove the score of 0, not ignoring the sign in the ranking stage, or using the
actual difference score (in this case 4) as the T rather than the rank (which was 3). Some people
chose a one-tailed test showing a lot of confidence in the media. A two tailed test was appropriate
as no direct empirical evidence was produced.

3. (17 marks) A developmental psychologist wished to examine the episodic memory abilities of 3-
year old children. To do this, in an initial session children were shown a treasure chest but were
unable to open it as it was locked and required a key. 24 hours later they were asked to select which
of 6 objects they wished to have. One of these objects was a key that could open the treasure chest
whilst the other five had no extrinsic value. If the children are able to recollect the episode of the
day before then they should select the key to a greater extent than the other 5 objects. 5 children
were sampled and 3 of them did select the key. What is the probability of this happening at least as
often as this by chance?

This question produced the best test selection of all 5 questions as it was the binomial test.
Generally this was a well answered question but a reasonably high proportion of you went wrong
along the way. The most common mistake (as always with the binomial test) was to get p and q
wrong). It was nice that fewer of you spontaneously selected p - .5 and q = .5 this year showing
you at least thought about this but the most common mistakes were to select p = .6 and q = .4 or
to select p = .2 and q = .8. For the first one, there were six possible options: one of these was
correct and the other 5 were wrong. By saying that p was .6 you would in effect be saying that you
would have a 60% chance of guessing which one of the 6 options was correct by chance which
seems wildly overoptimistic . The important thing to note is that p and q are the values which you
would expect by chance not what was observed (which in passing I’ll note was discussed on the
forum the day before the exam…). You need what you would expect by chance because like a
number of the tests the calculation is in effect then comparing the observed values (the 3 out of 5
or .6 children who correctly selected the right option) to the expected values (a p of .167; see slide
35 of lecture 3) to see if the observed departs from the expected. The correct answer therefore
was to make the link that you have a 1/6 (or .167 chance) of guessing correctly by chance and a
4/5 or .833 chance of guessing wrongly by chance. A p of .2 was clearly closer but still not quite
right – it would have been appropriate if there were five objects, one which is correct and four
which are incorrect. I’m afraid if you did not choose the correct p and q scores you would have lost
a lot of marks for this question as this was a serious conceptual error. Another common mistake
was to get N wrong and select 6. There are indeed 6 objects but N is the total number of
observations (or children tested). In this case there were five children so N needed to = 5. Once
people made this link the calculations were in most cases done pretty well. There are always
rounding issues with the binomial test so plenty of leeway was allowed for this. You did though
need to get full marks to do the calculation not only for 3 out of 5 but also 4 and 5 out of 5 as the
question is asking you “at least as” rather than the precise probability of getting 3 out of 5.
4. (20 marks) A cognitive psychologist predicted that, in a group of undergraduate students,
differences would emerge in levels of attention between early in the morning and in the evening
with performance being better in the evening. To investigate this hypothesis, students performed an
attention task (higher scores indicate better performance) at both 7am and 7pm. Previous work
indicates that students do indeed perform better in the evening than the morning. The results are
shown in the table below. Is there any evidence for the psychologist's hypothesis?

Again test selection was generally pretty good and most of you thought it was the related samples
t-test/Wilcoxon matched pairs test. Those who went wrong typically thought it was between
subjects and selected an unrelated samples t-test/Wilcoxon rank sums test. The clue to the fact
that this was a within subjects design was where it says “students performed an attention task at
both 7am and 7pm” indicating that they contributed scores to both conditions. Other mistakes
were to do either a variance test or occasionally a correlation test. The clue that you needed to
compare the central tendency comes most obviously from the statement “Previous work indicates
that students do indeed perform better in the evening than the morning.” Having decided on the
related t-test/Wilcoxon matched pairs, the first step was to do an outlier check. Again quite a lot
of you went wrong and did one on each group separately even though it was a within subjects
design so you needed to do it on the differences (as in Q5 of the revision lecture). Similarly, many
of you removed the fractional component at the wrong stage which would have lost you a mark or
two. You should ONLY remove the fractional component for the median position, nowhere else.
This time there were no outliers. You should then do a single histogram on the differences to
check for normality – as for the outliers many of you did one histogram on each group separately
which would have cost you three marks. Those of you who did do a histogram on the differences
nearly all got full marks as the bins were appropriate. This histogram was normally distributed so
the related samples t-test was the optimal test. For those of you who did the Wilcoxon matched
pairs test, as this was a near miss if you did everything else correctly you would have still got
most, if not all, of the marks for the calculation (you would lose a couple for test selection, but get
potentially full marks for the calculation and interpretation). For those of you who did do the
related samples t-test the calculations were in the main excellent with no obvious common
conceptual mistake. Most of you also correctly identified it was a one tailed test. Like in Q1 though
you needed to make clear what was the direction of the effect rather than just saying it was
significant (because it was feasible that the effect could have been in the opposite direction to
that of prior evidence). Failure to do this would have lost you a mark or two.

Morning 42 43 39 28 39 24 51 56 38 19 42 58

Afternoon 46 34 40 34 47 44 45 54 50 35 47 61

5. (17 marks)

A local Crisp manufacturer was alarmed at the complaints from a series of consumers about the
large inconsistency in the number of crisps that were contained in each pack. The manufacturer
therefore purchased a state of the art new machine which packages the crisps to try and correct this
deficiency. The number of crisps in 12 packs were investigated with the old packaging machine and
12 packs were examined with the new machine. Is there any evidence that the new machine has
improved consistency? Assume that there are no outliers.

Old 25 26 19 18 22 24 27 23 20 19 24 26
machine

New 23 22 25 21 23 24 26 20 22 23 24 22
machine

The real trick to this question was in the test selection. The correct answer was the variance test
but a number of you did select either the Wilcoxon rank sums or the unrelated samples t-test (or
even the equivalent within subjects tests). You could in principle compare the central tendency for
this data set but this was not what was asked so if you did go down this route I’m afraid you’d
have lost pretty much all of the marks for this question (unless if you did the unrelated samples t-
test and did do the variance check, you’d have gained a few marks that way for the variance
calculation). The question is talking about consistency which is the clue that you need to do the
variance test. There is absolutely no mention in the question that the aim was to increase the
number of crisps in a packet with the new machine compared to the old machine (and that is
something I suspect the manufacturers would not be keen on in any case…). A fair number of you
went on to do an outlier check despite the last sentence specifically telling you not to perform
one. You wouldn’t have lost any marks for this but again you would have gained no marks so that
time could have been spent elsewhere. Similarly, there was no need to do any histograms here so
no marks were awarded even if you did some. Calculation for the variance test was generally very
good, with errors in the main being arithmetic (where you would still have got most of the marks).
This test statistic was actually very close to the critical value in this question (it was just non-
significant). If people got a significant result through a different rounding technique (in practice
not too many) then their conclusion would have still obtained full marks.

You might also like