You are on page 1of 2

Tutorial 3 – Solutions to Supplementary Questions

Measures of central tendency and spread, sampling variation & standard


error
1. In a report investigating the association between diet and the risk of postmenopausal breast
cancer (Giles G et al. Int J Cancer, 2006) from a cohort of 12,273 women, the distribution of
body mass index and fibre intake were presented as the following:-

Body mass index (BMI; kg/m2): mean = 27.1 kg/m2; standard deviation = 4.8 kg/m2
Fibre intake (g/day): median = 28.9 g/day; inter-quintile range = 21.1, 38.5 g/day

a. Why do you think the authors did not present the mean and standard deviation for fibre
intake?

The authors would have plotted histograms of each of the variables. For the histograms
with a bell-shaped distribution, the measure of central tendency to use is the mean and
the measure of spread is the standard deviation. The median and inter-quintile range
are the more appropriate measures of central tendency and spread respectively when
the data are skewed. Thus fibre intake probably had a skewed distribution.

b. What do you think the inter-quintile range represents?

In the lecture you learnt that the inter-quartile range was the range of values between
the lower quartile (25th percentile) and upper quartile (75th percentile). The inter-
quintile range is the range of values between the lower quintile (20th percentile) and
upper quintile (80th percentile). Thus, the inter-quintile range is wider than the inter-
quartile range. For fibre intake, the median of 28.9 g/day is closer to the 20th percentile
(21.1 g/day) than to the 80th percentile (38.5 g/day), suggesting fibre intake is positively
skewed.

2. The length of stay in hospital after a certain operation for the entire population of 1,200
patients treated by a clinic was distributed as the following:-
a. Calculate the standard error for the above sample of 100 patients, assuming the
population standard deviation () is unknown?

The standard error for the single sample of 100 patients equals:-

/√n = s/√n = 5.0 / √100 = 0.5 days


b. We then take twenty separate random samples, each of 100 patients. For each sample
we calculate the sample mean. The twenty sample means are listed below:-

11.50 11.87 11.05 11.22 10.64


11.62 10.50 10.50 11.05 10.17
10.92 11.35 10.99 10.85 10.71
10.67 10.84 11.53 11.42 11.33

Calculate the sample mean and standard deviation of the above 20 sample means.

The sample mean and standard deviation of the 20 sample means are:-

20
 xi
x= i =1 = 220.73 / 20 = 11.04 days
20

20
 ( xi − x)
2

s= i =1 = √(3.67 / 19) = 0.44 days


20 − 1

c. Which of the parameters (sample standard deviation or standard error) calculated for
the single random sample of 100 patients, is similar to the standard deviation of the
above sample means (calculated from twenty separate random samples, each of 100
patients), and why?

The standard error calculated for the single random sample of 100 patients (i.e. 0.5 days) is
similar to the standard deviation of the 20 sample means (i.e. 0.44 days) because the
standard error gives an estimate of the variability of sample means that would arise from
repeated sampling of 100 patients of the population.

You might also like