You are on page 1of 5

ECON 1203 Tutorial Sample Solutions

Semester 2 2016
WEEK 1 TUTORIAL PROBLEMS (To be discussed in the week starting 1 August 2016)
1. (a) What is meant by a variable in a statistical sense? Distinguish between qualitative and quantitative
statistical variables, and between continuous and discrete variables. Give examples.
A variable in a statistical sense is just some characteristic of an object. It may take different values.
Data on a quantitative variable can be expressed numerically in a meaningful way (e.g., height of an
individual, number of children in a family). Data on qualitative variables cannot be expressed
numerically in a meaningful way (e.g., sex or hair colour of an individual) although such data can
be coded into numerical expressions. In the case that qualitative data has an innate ordering (e.g.,
survey answers to the question, how happy are you, all things considered? rated on a scale of
Very Happy to Very Unhappy), the numerical coding can contain some meaning.
A discrete quantitative variable can assume only certain discrete numerical values on the number
line (can be a finite or infinite number of these values). A continuous quantitative variable can
assume any value in a specific range or interval; e.g. length of a pipe. In some cases, what in
theory is a continuous variable must be in practice measured as a discrete variable because of
limitations to measurement precision.
(b) Distinguish between (i) a statistical population and a sample; (ii) a parameter and a statistic. Give
examples.
A statistical population is the set of measurements or observations of a characteristic of interest for
all elementary units in a frame; e.g., the shoe sizes of all men in Australia. A statistical sample is a
subset of a population, e.g., the shoe sizes of all the men enrolled in ECON 1203 is a sample of the
population represented by the shoe sizes of all men in Australia.
A parameter is a numerical description of a population. For example, the average shoe size of all
Australian men is a parameter (of the population of the shoe sizes of all Australian men). A statistic
is a numerical description of a sample. For example, the average shoe size of all men in this
classroom is a statistic (calculated from the sample of the shoe sizes of all men in this classroom).

2. In order to know the market better, the second-hand car dealership, Anzac Garage, wants to analyze
the age of second-hand cars being sold. A sample of 20 advertisements for passenger cars is selected
from the second-hand car advertising/listing website www.drive.com.au The ages in years of the
vehicles at time of advertisement are listed below:
5, 5, 6, 14, 6, 2, 6, 4, 5, 9, 4, 10, 11, 2, 3, 7, 6, 6, 24, 11
(a) Calculate the frequency, cumulative frequency and relative frequency distributions for the age
data using the following bin classes:
More than 0 to less than or equal to 8 years
More than 8 to less than or equal to 16 years
More than 16 to less than or equal to 24 years.

Bin

Relative
Frequency

Frequency

Cumulative
Frequency

0 < 8

0.7

14

14

8 < 16

0.25

19

16 < 24

0.05

20

(b) Sketch a frequency histogram using the calculations in part (a). What can you say about the
distribution of the age of these second-hand cars? Is there anything that concerns you about the
frequency table and histogram? Specifically, is the choice of bin classes appropriate? What needs
to be done differently?
Relative frequency histogram for Age
0.8
0.7

Frequency

0.6
0.5
0.4
0.3
0.2
0.1
0
8

16

24

Bin

From this graph, the age distribution appears to be skewed to the right. 70% of observations
have age between 0 and 8. However, this histogram only provides limited information about the
age distribution because there are too few bins and they are very wide.
(c) Halve the width of the bins (0 to 4, 4 to 8, etc) and recalculate the frequency, cumulative
frequency and relative frequency distributions. Using the new distributions and histogram, what
can you now say about the distribution of the age of second-hand cars?
Relative
Frequency
0.25
0.45
0.2
0.05
0
0.05

Bin
0 < 4
4 < Age 8
8 < Age 12
12 < Age 16
16 < Age 20
20 < Age 24

Frequency
5
9
4
1
0
1

Cumulative
Frequency
5
14
18
19
19
20

Frequency

Figure 3.1: Revised histogram for age


of cars
10
9
8
7
6
5
4
3
2
1
0
2

10

14

18

22

Age

There still appears to be a skew to the right, but now we can also see that there is an outlier in
the 21~24 Age category. 5~8 are the most frequently observed ages. A quite sizable proportion
of the second-hand cars are relatively new (25% being less than or equal to 4 years old).

3. Health expenditure
A recent report by Access Economics provides a comparison of Australian expenditures on health with
that of comparable OECD countries. Data from that report relating to the year 2005 have been used to
reproduce their Figure 2.2 (below denoted as Figure 2.1).
(a) What are the key features of these data?

A strong positive association more per capita GDP implies more health expenditure per
capita.
There are (at least) 2 outliers, the observation with the largest health expenditure (Luxembourg)
and the observation with the highest GDP (USA). Without these 2 the relationship is
approximately linear. With them, there is a suggestion of a non-linear relationship.
An indication of more variability in health expenditures when GDP is larger.

(b) While this is a bivariate scatter plot, there are three variables involved: health expenditure, GDP
and population. Why account for population by expressing health expenditure and GDP in per
capita terms?

Health expenditure per capita (US$000)

Figure 2.1 OECD Health Expenditure and


GDP
7
6
5
4

3
2
1
0
0

10

20

30

40

50

60

70

GDP per capita (US$000)

This line of questioning is intended to prompt the recognition that there may be factors other
than GDP associated with health expenditures per capita, and population size is one obvious
factor since (for example) there may be returns to scale in health care delivery, and/or
differences in how concentrated the health care industry is in larger countries versus smaller
ones. Expressing everything in per capita terms is one way to control for population variation
and hence isolate the GDP-health expenditure relationship, so it is good if that is the
relationship we want to know about. However, controlling for population size in this fashion
makes it harder to see the relationship between population and health care expenditure, so if
that relationship were our target of analysis, this would not be a good way to present the data.

4. Australian housing prices


Recent research by Dr Nigel Stapledon at the UNSW School of Economics provides an extensive
analysis of Australian housing prices since 1880. In Figure 2.2 his data are used to provide a
comparison of Sydney and Melbourne housing prices over time.

(b)

What are the key features of these data?


The time series evolution is quite similar for Sydney and Melbourne housing prices they track
each other quite well and hence we would say there is a strong positive association between
these two series.
Sydney prices are typically above Melbourne prices
There seem to be 2 regimes. In the first regime, up until the 1950s, there is little growth in
housing prices and they are quite stable from year to year (low variability). In the second
regime, since the 1950s, there have been quite dramatic increases in housing prices in both
cities and there is much greater year-to-year variability more volatility. (In his analysis,
Stapledon notes that this two-regime pattern is quite common and has been observed in the US
as well.)

Why have prices been expressed in constant dollars?


One reason housing prices increase over time is inflation and if all prices and incomes increase
by the same proportion then there are no real changes, meaning no changes that people
would feel in their wallets. So just as in the previous question, we control for this other factor
(inflation) so as to better see the relationship between real housing prices and time.

Figure 2.2 Comparison of Sydney and Melbourne median


house prices in constant 2007-08 Dollars
600
500
Thousands of dollars

(a)

400
300
200
100
0
1860

1880

1900

1920

1940

1960

1980

2000

2020

Year
Sydney

Melbourne

5. Using the car data from Question 2:


(a) Calculate the mean, median and mode for this sample of data and use these statistics to further
describe the distribution of car ages.
Mean 5 5 6 ... 24 11 7.3

20

Ordering the data from lowest to highest:


2, 2, 3, 4, 4, 5, 5, 5, 6, 6, 6, 6, 6, 7, 9, 10, 11, 11, 14, 24,
Median = (6+6)/2=6
Mode = 6
The sample mean is to the right of mode and median, suggesting that the sample distribution is
skewed towards the right. The cause seems to be the large outlier one car had an age of 24,
which appeared to be very different to the age of other cars. Given the skewness and the outlier,
the median is possibly a better measure of central tendency. Hence a typical second-hand car is
6 years old.
Alternatively the EXCEL output is:
Age
Mean
Standard Error
Median
Mode
Standard Deviation
Sample
Variance
Kurtosis
Skewness
Range
Minimum
Maximum
Sum
Count

7.3
1.126476
6
6
5.037752
25.37895
5.712234
2.0983
22
2
24
146
20

(b) If the largest observation were removed from this data set, how would the three measures of
central tendency you have calculated change?
5 5 6 ... 6 11
(Now closer to median)
6.4
19
Median = 6 (unchanged, but now not an average of the two middle values but the actual middle
value, since we now have an odd number of observations)
Mode = 6 (unchanged)

Mean

You might also like