Professional Documents
Culture Documents
Stats - BBFH107 PDF
Stats - BBFH107 PDF
Business Statistics II
Mount Pleasant
Harare, ZIMBABWE
Layout : S. Mapfumo
I.S.B.N: 978-1-77938-732-5
the errors), they still help you learn the correct thing as the tutor may dwell on matters irrelevant to the
as much as the correct ideas. You also need to be ZOU course.
open-minded, frank, inquisitive and should leave no
stone unturned as you analyze ideas and seek
clarification on any issues. It has been found that Distance education, by its nature, keeps the tutor
those who take part in tutorials actively, do better in and student separate. By introducing the six hour
assignments and examinations because their ideas are tutorial, ZOU hopes to help you come in touch with
streamlined. Taking part properly means that you the physical being, who marks your assignments,
prepare for the tutorial beforehand by putting together assesses them, guides you on preparing for writing
relevant questions and their possible answers and examinations and assignments and who runs your
those areas that cause you confusion. general academic affairs. This helps you to settle
down in your course having been advised on how
Only in cases where the information being discussed to go about your learning. Personal human contact
is not found in the learning package can the tutor is, therefore, upheld by the ZOU.
provide extra learning materials, but this should not
be the dominant feature of the six hour tutorial. As
stated, it should be rare because the information
needed for the course is found in the learning package
together with the sources to which you are referred.
Fully-fledged lectures can, therefore, be misleading
Note that in all the three sessions, you identify the areas
that your tutor should give help. You also take a very
important part in finding answers to the problems posed.
You are the most important part of the solutions to your
learning challenges.
Module Overview .................................................................................................................................. 1
Unit 1 ...................................................................................................................................................... 2
The Normal Distribution ...................................................................................................................... 2
1.1 Introduction ................................................................................................................................. 2
1.2 Unit Objectives ............................................................................................................................ 2
1.3 The Normal Curve ................................................................................................................... 2
iii
2.5.2 Estimation of the population proportion ......................................................................... 17
Activity 2.6 ....................................................................................................................................... 17
2.6 Determining Sample Size in Estimation .................................................................................. 17
2.6.1 Sample size for estimating population mean ................................................................... 18
Activity 2.7 ....................................................................................................................................... 19
2.6.2 Sample size for estimating a population proportion ....................................................... 19
Activity 2.8 ....................................................................................................................................... 20
2.7 Summary .................................................................................................................................... 20
References ........................................................................................................................................ 21
Unit 3 .................................................................................................................................................... 22
Hypothesis Testing .............................................................................................................................. 22
3.1 Introduction ............................................................................................................................... 22
3.2 Unit Objectives .......................................................................................................................... 22
3.3 Statistical Hypotheses ............................................................................................................... 22
3.3.1 Types of hypotheses ........................................................................................................... 22
3.3.2 Deciding on the null hypothesis ........................................................................................ 23
Activity 3.1 ....................................................................................................................................... 24
3.4 Type I and Type II Errors ........................................................................................................ 24
3.5 Steps Followed in Hypothesis Testing ..................................................................................... 24
3.6 Tests Concerning the Population Mean .................................................................................. 26
Activity 3.2 ....................................................................................................................................... 29
3.7 Test Concerning a Population Proportion .............................................................................. 29
Activity 3.3 ....................................................................................................................................... 31
3.8 Confidence Interval Approach to Hypothesis Testing ........................................................... 31
Activity 3.4 ....................................................................................................................................... 32
3.8 Summary .................................................................................................................................... 32
References ........................................................................................................................................ 33
Unit 4 .................................................................................................................................................... 34
Simple Linear Regression Analysis ................................................................................................... 34
4.1 Introduction ............................................................................................................................... 34
4.2 Unit Objectives .......................................................................................................................... 34
4.3 What is Regression Analysis? .................................................................................................. 34
4.4 Types of Variables ..................................................................................................................... 34
Activity 4.1 ....................................................................................................................................... 35
iv
4.5 Scatter Plots ............................................................................................................................... 35
Activity 4.2 ....................................................................................................................................... 36
Activity 4.3 ....................................................................................................................................... 37
4.6 The Simple Linear Regression Model ..................................................................................... 38
4.6.1 Model assumptions ............................................................................................................. 38
4.6.2 Random error term ............................................................................................................ 38
4.6.3 Estimating the regression equation .................................................................................. 38
Activity 4.4 ....................................................................................................................................... 40
4.6.4 Interpretation of a and b ................................................................................................ 40
4.6.5 Some uses of the regression model .................................................................................... 41
4.7 Estimating Values of the Dependent Variable ........................................................................ 41
Activity 4.5 ....................................................................................................................................... 42
4.8 Summary .................................................................................................................................... 42
References ........................................................................................................................................ 43
Unit 5 .................................................................................................................................................... 44
Correlation Analysis ........................................................................................................................... 44
5.1 Introduction ............................................................................................................................... 44
5.2 Unit Objectives .......................................................................................................................... 44
5.3 Relating Correlation Analysis to Regression Analysis .......................................................... 44
5.4 Scatter Diagrams ....................................................................................................................... 44
Activity 5.1 ....................................................................................................................................... 45
5.5 Correlation Coefficient ............................................................................................................. 45
5.5.1 Pearson’s product moment correlation coefficient ......................................................... 46
Activity 5.2 ....................................................................................................................................... 47
5.5.2 Spearman’s rank correlation coefficient .......................................................................... 48
5.6 Coefficient of Simple Determination ....................................................................................... 50
5.7 Testing whether X and Y are Correlated ................................................................................ 51
Activity 5.4 ....................................................................................................................................... 52
5.8 Summary .................................................................................................................................... 52
References ........................................................................................................................................ 53
Unit 6 .................................................................................................................................................... 54
Introduction to Time Series Analysis ................................................................................................ 54
6.1 Introduction ............................................................................................................................... 54
6.2 Unit Objectives .......................................................................................................................... 54
v
6.3 Components of a Time Series ................................................................................................... 54
6.3.1 Trend component ............................................................................................................... 54
6.3.2 Seasonal component ........................................................................................................... 55
6.3.3 Cyclical component ............................................................................................................ 55
Unit 7 .................................................................................................................................................... 66
Index Numbers .................................................................................................................................... 66
7.1 Introduction ............................................................................................................................... 66
7.2 Unit Objectives .......................................................................................................................... 66
7.3 Types of Index Numbers........................................................................................................... 66
7.3.1 Price indices ........................................................................................................................ 66
7.3.2 Quantity indices ................................................................................................................. 67
7.3.3 Value indices ....................................................................................................................... 67
7.4 Simple Index Numbers ............................................................................................................. 67
vi
7.4.1 Simple price index .............................................................................................................. 67
Activity 7.1 ....................................................................................................................................... 68
7.4.2 Simple quantity index ........................................................................................................ 68
Activity 7.2 ....................................................................................................................................... 68
7.4.3 Index number series trends ............................................................................................... 69
Activity 7.3 ....................................................................................................................................... 70
7.4.4 Changing the base period .................................................................................................. 70
Activity 7.4 ....................................................................................................................................... 71
7.5 Weighted Index Numbers ......................................................................................................... 71
7.5.1 Weighted average of relatives indices .............................................................................. 71
Activity 7.5 ....................................................................................................................................... 73
7.5.2 Weighted aggregate indices ............................................................................................... 73
Activity 7.6 ....................................................................................................................................... 75
7.6 Use of Index Numbers as Deflators ......................................................................................... 75
Activity 7.7 ....................................................................................................................................... 76
7.7 Challenges in Constructing Index Numbers ........................................................................... 77
7.8 Summary .................................................................................................................................... 77
References ........................................................................................................................................ 78
Unit 8 .................................................................................................................................................... 79
Statistics List of Formulae .................................................................................................................. 79
8.1 Normal Distribution .................................................................................................................. 79
8.2 Statistical Estimation ................................................................................................................ 79
8.2.1 Point estimators .................................................................................................................. 79
8.2.2 Confidence interval estimation ......................................................................................... 79
8.3 Hypothesis Testing .................................................................................................................... 80
8.3.1 Tests concerning the population mean ............................................................................. 80
8.3.2 Test concerning a population proportion ........................................................................ 80
8.4 Simple Linear Regression Analysis ......................................................................................... 80
8.5 Correlation Analysis ................................................................................................................. 81
8.5.1 Testing for the existence of a linear relationship between X and Y .............................. 81
8.6 Introduction to Time Series Analysis ...................................................................................... 81
.................................................................................................................... 81
8.6.1 Trend analysis
................................................................................................................ 81
8.6.2 Seasonal analysis
vii
.......................................................................................................................... 82
8.7 Index Numbers
...................................................................................................... 82
8.7.1 Simple Index Numbers
.................................................................................................. 82
8.7.2 Changing the Base period
.................................................................................................. 82
8.7.3 Weighted Index Numbers
APPENDICES ..................................................................................................................................... 83
Statistical Tables ............................................................................................................................. 83
viii
Module Overview
The module BBFH 107 Business Statistics II is a build up on the module BBFH 103 Business
Statistics 1. Students are normally required to pass the later module before they embark on
this module. While Business Statistics I was largely focused on Descriptive Statistics, this
module is mainly centred on the other branch of Statistics called Statistical Inference.
The module consists of eight units. In Unit 1 we look at the characteristics of the normal
curve before tackling the normal probability distribution. Unit 2 is about point estimation and
confidence interval estimation, while in Unit 3 we introduce you to hypothesis testing.
Simple regression analysis and correlation analysis which are statistical techniques of
establishing relationships between variables are covered in Unit 4 and Unit 5 respectively. In
Unit 6 we introduce you to time series analysis, while in Unit 7 we focus on index numbers.
In Unit 8 we provide you with a summary of important statistical formulae.
You are encouraged to study the worked examples before attempting activity questions in
each unit. References for further reading are provided at the end of each unit. We wish you
well in your studies.
1
BLANK PAGE
Unit 1
The Normal Distribution
1.1 Introduction
The normal distribution is used to model continuous random variables. A continuous random
variable can assume any value in a given interval. Examples of variables that can be modelled
by the normal distribution are:
• The times taken by a worker to complete an assigned task repeatedly
• The weights of all new born babies at a hospital
• The salaries of all government workers
However, the normal probability distribution can also be used to investigate the behaviour of
discrete variables that can have many values, for example, marks obtained by all ‘O’ Level
students in a Mathematics examination. Many other random variables occurring in practice
follow the normal distribution.
In this unit, you will learn about the properties of the normal distribution and evaluate
probabilities for variables that are believed to follow the normal distribution.
2
1.3.1 Properties of the normal curve
The properties of the normal curve/distribution are:
1. The curve is symmetric about the mean
2. It is unimodal – has a single peak
3. At the line of symmetry, the mean, median and mode coincide, that is, mean = median
= mode
4. The curve approaches the horizontal axis asymptotically as we proceed in either
direction away from the centre. This means that the curve will not come into contact
with the horizontal axis at both ends but extends to infinity.
5. The total area under the curve and above the horizontal axis is equal to1
Let X be a random variable that is normally distributed with mean μ and variance σ 2 . We
write X ~ N( μ , σ 2 ) . For example, if the mean of X is 10 and variance is 25, we write X ~
N(10, 52) where 5 is the standard deviation. A random variable with mean zero and variance
one is called a standard normal variable and is denoted by Z, that is Z ~ N(0,1). The
distribution of Z is called the standard normal distribution. An arbitrary normally distributed
variable X is transformed to the standard normal distribution by the transformation
X −u
Z= [1.1]
σ
x1 μ x2
To find P(x1 < X < x2), we must find standard values corresponding to x1 and x2 by the
x −μ x −μ
transformation z1 = 1 and z 2 = 2 .
σ σ
It now follows that P(x1 < X < x2) = P(z1< Z < z 2) and Figure 1.2 is transformed to look like
Figure 1.3 below.
3
z1 0 z2
Figure 1.3 Area under the Standard Normal Curve between z1 and z2
Table 1 in the appendices gives values for the area under the standard normal curve lying to
the left of any specified z value for values of z from -3.4 to 3.4. The area corresponds to the
probability that a given value is less than or equal to z, that is, P(Z ≤ z ) .
Example 1.1
A random variable X is normally distributed with a mean of 10 and variance 25. Find
standard values (z-values) corresponding to:
x = 12
x=8
Solution 1.1
X ~ N (10, 52)
X −u
We make use of the transformation Z = with μ = 10 and σ = 5 .
σ
12 − 10
a) x = 12: z =
5
= 0.4
8 − 10
b) x = 8: z =
5
= -0.4
Activity 1.1
A random variable X is normally distributed with a mean of 15 and variance 36. Find
standard values corresponding to:
a) x = 16
b) x = 13
4
Example 1.3
Let Z ~ N (0, 1). Find
P (Z ≤ 1.34)
P (Z ≤ −2.75)
P (Z ≥ 1.62)
P (0.47 ≤ Z ≤ 1.86)
Solution 1.3
a) The probability P (Z ≤ 1.34) is given by the area shown in Figure 1.4
0 1.34
To find P (Z ≤ 1.34) , we locate a value of z equal to 1.3 in the left column of Table 1. We
then move across the row to the column under 0.04 where we read 0.9099. Therefore, P (Z
≤ 1.34) = 0.9099.
-2.75 0
We locate a value of z = -2.7 under the left column. We then move across the row to the
column under 0.05, giving P (Z ≤ −2.75) = 0.0030.
c) P (Z ≥ 1.62)
The area required is the area under the standard normal curve to the right of z = 1.62 as
shown in figure 1.6
0 1.62
5
In the left column of Table 1, go to a value of z equal to 1.6, then move across that row to the
column under 0.02 where you read 0.9474. This is the area to the left of 1.62, but we want the
area to the right of 1.62 as shown in figure 1.6. You should remember that the total area
under the curve is equal to 1. Therefore, if we subtract the area to the left of z =1.62 from 1,
the remaining area to the right of 1.62 gives us P (Z ≥ 1.62).
0 0.47 1.86
The shaded area is obtained by subtracting the area to the left of z =0.47 from the area to the
left of z = 1.86, that is,
Remark 1.1
The probability that a continuous variable takes a precise value is zero. This implies that the
probability of, say, z is less or equal to 1.25 is just the same as that of z is less than1.25. In
general P(Z ≤ z) = P(Z < z).
Activity 1.2
Let Z ~ N (0, 1). Find
a) P (Z ≤ 3.10)
b) P (Z ≥ −0.27 )
c) P (-1.45 ≤ Z ≤ 2.63)
Example 1.4
Given a random variable X which is normally distributed with mean 15 and variance 100,
find:
P(X < 20)
P(X > 12)
P( 12 < X < 20)
6
Solution 1.4
X ~ N (15, 102)
We begin by finding z-values corresponding to the x-values given using the transformation
given by equation 1.1.
X − μ 20 − 15
a) P(X < 20) = P ( < )
σ 10
= P (Z < 0.5)
= 0.6915
0 0.5
X − μ 12 − 15
b) P(X > 12) = P( > )
σ 10
= P (Z > -0.3)
= 1 - P (Z < -0.3)
= 1 – 0.3821 -0.3 0
= 0.6179
12 − 15 X − μ 20 − 15
c) P (12 < X < 20) = P ( < < )
10 σ 10
= P (-0.3 < Z < 0.5)
= P (Z < 0.5) – P (Z < -0.3)
= 0.6915 – 0.3821)
= 0.3094 -0.3 0 0.5
Example 1.5
The delays that are experienced at a border post by truck drivers to clear their cargo were
found to be normally distributed with mean 48hours and a standard deviation of 6 hours. Find
the probability that a driver has to wait for:
a) at least 36 hours to clear his cargo
b) between 40 hours and 50 hours to clear his cargo
Solution 1.5
Let X be the total waiting time to get clearance. Then X ~ N (48. 62).
X − μ 36 − 48
a) P (X ≥ 36) = P ( ≥ )
σ 6
= P ( Z ≥ −2 )
= 1 – P(Z < - 2)
= 1 – 0.0228
= 0.9772 -2 0
40 − 48 X − μ 50 − 48
b) P( 40 < X < 50) = P ( < < )
6 σ 6
= P(-1.33 < Z < 0.33)
= P(Z < 0.33) – P(Z < -1.33)
= 0.6293 – 0.0918
= 0.5375 -1.33 0 0.33
7
Example 1.6
The demand for second hand Japanese cars in Zimbabwe is normally distributed with a mean
of 1 600 cars sold per month and standard deviation of 50 cars. What is the probability that:
a) at most 1 500 cars will be sold in one month
b) between 1 500 and 1 600 cars will be sold in one month
Solution 1.6
Let X be number of cars sold per month, then X ~ N (1 600, 502).
X −μ 1500 − 1600
a) P(X ≤ 1500) = P ( ≤ )
σ 50
= P (Z ≤ −2)
= 0.0228
-2 0
-2 0 1
Activity 1.3
1. The times that cars took to refuel at a busy service station are normally distributed
with mean 3 minutes and a standard deviation of 0.2 minutes. What is the
probability that a car will take
a) more than 4 minutes to refuel
b) not more than 2 minutes to refuel
c) between 2 and 4 minutes to refuel
2. A fast food restaurant finds that the number of meals it serves in a week is
normally distributed with a mean of 4 000 and a standard deviation of 200. What
is the probability that in a given week the number of meals served?
a) Will be at most 4 500.
b) Will be between 4 000 and 4 500.
3. On average a tuck-shop sells 300 loaves of bread per day with a standard deviation
of 50 loaves. Find the probability that the tuck-shop will sell at least 400 loaves
per day.
1.5 Summary
In this unit you learnt about the normal probability distribution which is used to model
continuous random variables. The distribution is completely specified by two parameters
which are the mean and variance of the distribution. The standard normal distribution has a
mean of 0 and a variance of 1. The area under the standard normal curve gives probabilities.
An arbitrary normal distribution is transformed to the standard normal distribution to
facilitate the evaluation of probabilities using prepared tables.
8
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.
9
Unit 2
Statistical Estimation
2.1 Introduction
Statistical investigations are usually carried out on samples drawn randomly from populations
of interest. As a result, statistical analysis will be based on sample data rather than population
data. The major reasons for this are that it is usually too expensive and time consuming to
collect population data. Sometimes it is also impossible to obtain population data. The results
of the sample study are then used to estimate results for the population thereby allowing
important decisions about the population to be made.
In this unit we introduce you to an important branch of statistical inference called statistical
estimation. You will learn about point estimation and confidence interval estimation for the
mean and proportion of a single population.
Similarly, the sample mean x is taken as an estimator of the population mean μ while the
sample variance s 2 is taken as an estimator of the population variance σ 2 . It is important, to
ensure that samples used in statistical analysis are representative of their parent populations.
The only way to ensure that this is the case is to select random samples using probability
based sampling methods.
The estimation is in two forms namely point estimation and confidence interval estimation.
10
2.4 Point Estimation
In point estimation, a single value of a statistic is used as an estimate of the population
parameter. The disadvantages of a point estimate are that:
• It is not exactly equal to the population mean μ most of the time. The actual estimate
may or be close to it.
• It is uncertain whether it will be a good estimate and we have no idea of the
probability that it is a good estimate. A point estimate does not reveal any information
about the accuracy of the estimation procedure.
1
x=
n
∑ xi [2.1]
where x1 , x2 , ..., xn are n randomly selected sample values drawn from the population.
Example 2.1
The daily sales ($) of a vegetable vendor over 30 randomly selected days are:
14 21 28 17 15 34 10 18 25 30 21 15 11 28 17 20 20 29 31 24 11 19 26
34 10 16 25 30 22 17
Solution 2.1
n = 30, ∑ x = 638
x=
∑x
n
638
=
30
= 21.26666667
≈ $21.27
(∑ xi2 − ∑ i )
1 ( x )2
s2 = [2.2]
n −1 n
Example 2.2
Using the data of Example 2.1, find the point estimate for the variance.
11
Solution 2.2
n = 30, ∑ x = 638 , ∑ x = 15046
2
(∑ x − ∑
2
1 ( x)
s =
2 2
) i
n −1
i
n
1 (638) 2
= (15046 − )
30 − 1 30
1
= (15046 − 13568.13333)
29
1
= (1477.866667)
29
= 50.96091954
≈ 50.9609
Activity 2.1
1. The weights (kg) of 20 bags of potatoes randomly selected from a truckload of
250 bags of potatoes are:
10.2 10.1 9.8 9.9 10.0 9.8 10.3 10.1 10.4 9.7 8.9 9.0 10.6 10.9 11.0
11.3 10.2 12.0 9.8 10.7
Find point estimates of the
a) population mean weight, and
b) population variance of the weight of all potatoes in the truck.
2. The bank balances of 30 randomly selected savings accounts are:
200 128 132 400 380 24 267 306 86 94 125 106 249 364 59
34 126 184 230 342 311 265 46 38 89 122 241 237 98 106
Find point estimates for the
a) population mean balance, and
b) population variance of balances of all savings accounts.
Example 2.3
Refer to the data of Example 2.1. Find the point estimate of the population proportion of daily
sales which are above $20.
Solution 2.3
x = 13 , n = 30
k
pˆ =
n
13
=
20
= 0.65
12
Example 2.4
In a study to determine the proportion of teachers in Zimbabwe who are degree holders, 800
teachers out of a random sample of 2 000 teachers said they have degree qualifications.
a) Find a point estimate of the proportion of all Zimbabwean teachers who have degrees.
b) If there are 15 000 teachers altogether in the country, how many have degrees?
Solution 2.4
n = 2000 k = 800
k
a) pˆ =
n
800
=
2000
= 0.4
Thus 40% of all the teachers in Zimbabwe have degrees.
Activity 2.2
1. Refer to Activity 2.1, suppose the standard weight of a bag of potatoes is 10 kg. Find
an estimate of the proportion of all potato bags that are under weight.
2. A church organisation has a total membership of 600. A survey conducted at the
church showed that 80 church members out of a random sample of 200 members had
bibles. Find a point estimate of the proportion of church members who do not have
bibles.
Confidence interval estimation is preferred to point estimation because the probability that
the interval includes the population measure is known. This is an advantage of interval
estimation over point estimation in that the probability is a measure of our confidence in the
estimated result.
Let us suppose that a 95% confidence interval for the population mean is, say, (10, 13), then
the probability that the mean is included in the interval (10, 13) is 0.95. Hence, we are 95%
confident that the mean lies in the range (10, 13). The probability that the population mean is
not contained in the interval (10, 13) is now 5%. The 5% is the level of error associated with
our confidence interval estimate; it is called the level of significance and it is denoted by α .
13
2.5.1 Interval estimate of the population mean
The formulae that we use to find confidence interval estimates for the population mean μ
depends on whether the population variance is known or not known and also on whether the
sample size is large or small. A sample size of 30 or more is considered a large sample
otherwise it is a small sample.
Case I
If the population standard deviation σ is known, a 100 (1 − α ) % confidence interval for μ is
given by:
σ
x ± Zα 2 × [2.4]
n
where x is the mean of a sample of size n from a population with variance σ 2 , Zα 2
is the value of the standard normal distribution such that the area under the curve to
α σ
the right of it is and is the standard error of the mean.
2 n
Example 2.5
An electrical firm supplies light bulbs that have a length of life that is approximately
normally distributed with a standard deviation of 20 hours. If a random sample of 40 bulbs
has an average life of 800 hours, find
a) a 95% confidence interval for the population mean life of all bulbs supplied by this
firm
b) a 99% confidence interval for the population mean life of all bulbs supplied by this
firm
Solution 2.5
σ = 20 n = 40 x = 800 α = 0.05 ⇒ Z 0.05 2 = Z 0.025 = 1.96
a) A 95% confidence interval for μ is
σ
= x ± Zα 2 ×
n
20
= 800 ± 1.96 ×
40
= 800 ± 6.198064214
= (793.8019, 806.1981)
Thus we are 95% confident that the mean life of all bulbs is between 793.8019 hours
and 806.1981 hours.
b) A 99% confidence interval for μ is
σ
= x ± Zα 2 ×
n
20
= 800 ± 2.5758 ×
40
= 800 ± 8.145394797
= (791.8546, 808.1454)
14
Thus we are 95% confident that the mean life of all bulbs is between 791.8546 hours
and 808.1454 hours.
If we compare the two intervals, you will see that the one based on a higher confidence level
of 99% is wider and conveys less information about the possible value of μ than does the one
based on 95% which is narrower. In general, we say that when sampling is from the same
population, using a fixed sample size, the higher the confidence level, the wider the interval.
Activity 2.3
The burning times of a particular brand of candles imported from Mozambique are
known to be normally distributed with a standard deviation of 5 minutes. The mean
burning times of a random sample of 20 candles was 3 hours. Find a 90% confidence
interval for the mean burning time of all such candles.
Case II
When the population standard deviation σ is unknown and the sample size is large, n ≥ 30 ,
then a 100(1 − α ) % confidence interval for population mean μ is given by
s
x ± Zα 2 × [2.5]
n
where s is the sample standard deviation.
Example 2.6
The Head of a rural primary school is worried by the big number of students who arrive late
for school. In order to be able to adjust the school starting time, he sought to find the average
distance walked by the students to school from home. The mean and standard deviation of the
distances travelled by a random sample of 60 students were 6km and 800m respectively.
Construct a 90% confidence interval for the mean distance travelled by all the students to
school.
Solution 2.6
n = 60 x = 6 s = 800m = 0.8km Z 0.10 2 = Z 0.05 = 1.6449
A 90% confidence interval for μ is
s
x ± Zα 2 ×
n
0 .8
= 6 ± 1.6449 ×
60
= 6 ± 0.169884541
= (5.8301, 6.1699)
We are 90% confident that the mean distance travelled by the students to school is between
5.8301km and 6.1699km.
Activity 2.4
A survey of 400 company executives revealed that the average annual earnings of a
CEO is $200 000 with a standard deviation of $600. Find a 99% confidence interval
for the true average annual earnings for all company executives.
15
Case III
This is the case where the population standard deviation σ is unknown and the sample size is
small, ( n ≤ 30 ). A 100(1 - α ) % confidence interval for μ is given by
s
x ± tα 2 ( n − 1) × [2.6]
n
where n − 1 is the number of degrees of freedom.
Remark 2.1
Since n < 30 , the sample standard deviation of a small sample is not a reliable enough
estimate of the population standard deviation to enable the use of the z- distribution, as a
result we use the t-distribution.
Example 2.7
Refer to Activity 2.1. Find a 95% confidence interval for the mean weight of all bags of
potatoes in the truck.
Solution 2.7
n = 20 x = 10.235 s = 0.7264 α = 0.05 ⇒ tα 2 (n − 1) = t0.025 (19) = 2.09
A 95 % confidence interval for μ is
s
x ± tα 2 (n − 1) ×
n
0.7264
= 10.235 ± 2.09 ×
20
= 10.235 ± 0.339474473
= (9.8955, 10.5745)
We are 95% confident that the true mean weight of all bags of potatoes in the truck is
between 9.8955kg and 10.5745 kg.
Example 2.8
A stock market analyst wanted to estimate the average return on a certain stock. A random
sample of 20 days yielded an average return of 12% and a standard deviation of 4%.
Construct a 95% confidence interval estimate for the average return on this stock?
Solution 2.8
σ is unknown and n = 20 is small, therefore we use the t-distribution. A 95%
confidence interval for μ is
s
x ± tα 2 ( n − 1) ×
n
4
= 12 ± t 0.025 (19) ×
20
= 12 ± 2.09 × 0.894427191
= 12 ± 1.869352829
= (10.1306, 13.8694)
We are 95% confident that the average return on this stock is between 10.13% and 13.87%.
16
Activity 2.5
A random sample of 10 cigarettes of a certain type has an average nicotine content of 15
milligrams and a standard deviation of 2.5 milligrams. Construct a 99% confidence interval
for the true average nicotine content of all the cigarettes.
Example 2.9
In a survey of 300 company executives carried out by the Zimbabwe Congress of Trade
Unions (ZCTU), 81 executives said they are willing to publicly disclose their annual salaries.
Find a 99% confidence interval for the proportion of all executives who are willing to
disclose their annual salaries.
Solution 2.9
n = 300 k = 81 α = 0.01 ⇒ Z α 2 = Z 0.005 = 2.5758
k 81
pˆ = = = 0.27
n 300
A 99% confidence interval for p is
pˆ (1 − pˆ )
pˆ ± Zα 2 ×
n
0.27 × 0.73
= 0.27 ± 2.5758 ×
300
= 0.27 ± 2.5758 × 0.025632011
= 0.27 ± 0.066022934
= (0.2040, 0.3360)
Between 20.4% and 33.6% of all company executives are willing to publicly disclose their
annual salaries.
Activity 2.6
A random sample of 400 customers who visited a retail shop was interviewed and 280
were found to have a preference for a certain brand of toothpaste. Find a 90%
confidence interval for the proportion of the population of customers who prefer the
particular brand of toothpaste.
17
• The resources available in terms of time and cost of the study. A huge sample is
costly to study and the study requires more time.
• The degree of accuracy required. The larger the sample that is used, the narrower the
interval. A narrower interval is associated with less uncertainty and more accurate
estimation results.
In order to determine the sample size for your study, you need to specify the precision of your
estimate and the level of confidence desired. The precision is given by the error that you are
prepared to tolerate in your estimated results. You also need an estimate of the population
standard deviation. This can be obtained from a pilot survey carried out before the actual
study.
We may wish to determine how large a sample is necessary to ensure that the error in
estimating μ will not exceed e - the ‘bound on the error’. In the confidence interval
σ σ
x ± Zα 2 × , the ‘bound on the error, is e = Zα 2 ×
. Now, making n the subject
n n
of formula, the sample size necessary so that the error will not exceed e will be
shown to be:
⎡ Zα 2 × σ ⎤
2
n=⎢ ⎥ [2.8]
⎣ e ⎦
Example 2.10
In Example 2.5, how large a sample is required if we wish to be 95% confident that our
sample mean will be within 10 hours of the true mean?
Solution 2.10
α = 0.05 ⇒ Zα 2 = 1.96 e = 10 σ = 20
⎡ Zα 2 × σ ⎤
2
n=⎢ ⎥
⎣ e ⎦
⎡1.96 × 20 ⎤
2
=⎢
⎣ 10 ⎥⎦
= [3.92]2
= 15.3664
≈ 16
18
Activity 2.7
1. Find the minimum sample size required for estimating the average return on money
market investments to within 0.5% per year with 99% confidence. The standard
deviation of returns is believed to be 2% per year.
2. A market researcher would like to estimate the average amount spent on airtime per
month by each female student at a college. The researcher would like to be able to
determine the average amount spent by all female students at the college to be
within $1 with 95% confidence. From past studies, the population standard
deviation is known to be $2. What is the minimum required sample size?
Example 2.11
In example 2.9, how large a sample is needed if we wish to be 99% confident that our sample
proportion will be within 0.02 of the true proportion of all the CEOs who are willing to
disclose their annual salaries?
Solution 2.11
We know that pˆ = 0.27 and Zα 2 = Z 0.005 = 2.5758 .
The minimum sample size required is
pˆ (1 − pˆ ) Zα2 2
n=
e2
0.27 × 0.73 × 2.57582
=
0.02 2
= 3269.2709
≈ 3270
In practice, you cannot collect sample data before deciding on the sample size to use.
Therefore, we require a way of estimating the appropriate sample size for a study which does
not dependent on the sample proportion, p̂.
The largest value that pˆ (1 − pˆ ) can have is 0.5. You can show this by working out the value of
pˆ (1 − pˆ ) using increasing values of p starting with p = 0.1. If we assume the largest value
of pˆ (1 − pˆ ) , then formula [2.9] is reduced to
2
⎡Z ⎤
n=⎢ α 2⎥ [2.10]
⎣ 2e ⎦
Example 2.12
A researcher intends to conduct a study to estimate the proportion of supermarkets that offer
trolleys suitable for customers with difficulty in walking. Determine the sample size needed if
19
the researcher wishes to be 95 % confident that the estimated proportion is within 8% of the
true proportion.
Solution 2.12
α = 0.05 ⇒ Zα 2 = 1.96 e = 0.08
2
⎡Z ⎤
n=⎢ α 2⎥
⎣ 2e ⎦
2
⎡ 1.96 ⎤
=⎢
⎣ 2 × 0.08 ⎥⎦
= 150.0625
This has to be rounded up in order to meet the confidence requirement. Thus a sample size of
151 supermarkets should be used.
Activity 2.8
1. In a survey of a random sample of 300 shoppers, 180 said they would prefer to make
payments using debit cards. How large a sample is needed if we are to be 95%
confident that the estimate is within 5% of the actual proportion of shoppers who
prefer to transact using debit cards.
2.7 Summary
In this unit we introduced you to an important branch of statistical inference called
estimation. Estimation is about the use of sample measurements to predict population values.
The estimation is done in two ways namely point estimation and confidence interval
estimation. The major drawback of a point estimate is that we have no idea of the probability
that it is a good estimate. This makes interval estimates preferable because they are
associated with a known level of confidence which is a measure of how confident we are that
the interval does include within it the population parameter.
We looked at how interval estimates for a population mean and population proportion are
constructed. We also looked at how to determine the appropriate sample size for estimation
surveys.
20
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.
21
Unit 3
Hypothesis Testing
3.1 Introduction
Hypothesis testing is that branch of statistical inference that is used to verify claims made
concerning population parameters.
In this unit, you will be introduced to tests of hypotheses concerning a single population. The
terminology used in hypothesis testing will be explained. You will learn how to conduct
hypotheses tests concerning the mean and proportion of a single population.
Hypothesis testing involves gathering evidence from a random sample drawn from the
population of interest in order to decide whether the null hypothesis is likely to be true or
false. The hypothesis is rejected if evidence from the sample is not consistent with the stated
hypothesis, otherwise it is accepted. However, the acceptance of the stated hypothesis does
not necessarily imply that it is true, rather it is a result of insufficient evidence to reject it.
22
The alternative hypothesis, denoted by H1, is the negation of the null hypothesis. For example,
a null hypothesis might assert that the population mean is equal to a specified value μ0 . We
write this as H 0 : μ = μ0 . The alternative hypothesis oppose this assertion and it is written as
H1 : μ ≠ μ0 . In this case, the alternative hypothesis suggests that the mean takes values that
are either below μ0 or above it. Therefore, to investigate H0 we conduct a non-directional test
which is known as a two-tailed test.
A null hypothesis might assert that the population mean is at least equal to a certain specified
value μ0 . We write H 0 : μ ≥ μ0 . In this case, the alternative hypothesis would consist of
values below μ0 , that is, H1 : μ < μ0 . Similarly, if a null hypothesis assert that the population
mean is less than or equal to a specified value μ0 , that is, H 0 : μ ≤ μ0 , the alternative will be
H1 : μ > μ0 . In both these cases, since the alternative hypotheses consist of values either
below or above the specified value μ0 , we conduct a one-sided test or a one-tailed test.
Example 3.1
A ZOU Regional Director claims that the average age of a ZOU student is 21. A new
Programme Coordinator at the region doubts this claim. Set up the null and alternative
hypothesis if the Coordinator wishes to show that it is not 21.
Solution 3.1
H 0 : μ = 21
H1 : μ ≠ 21
Example 3.2
A leading bakery claims that the average cost of producing a standard loaf of bread is 80
cents. If you suspect that the claim exaggerates the cost, how would you set up the null and
alternative hypothesis?
Solution 3.2
H 0 : μ ≥ 80c
H1 : μ < 80c
23
Activity 3.1
1. A ZIMRA official at a busy border post claims that it takes, on average, at most 2
days for a truck driver to clear his consignment. You suspect that the average is
greater than 2 days and you want to test the claim. State the null and alternative
hypothesis for this test.
2. An ice cream vending machine is set to dispense 100 grams per cup. You suspect
that the machine is under-filling the cups. Set up the null and alternative hypothesis
to investigate this case.
A type I error is committed when a true null hypothesis is rejected. The probability of
committing a type I error is called the level of significance and it is denoted by α . It is
common to use 1%, 5% and 10% level of significance in calculations.
A type II error is committed if we accept the null hypothesis when it is false. The probability
that the test will be able to detect a false null hypothesis is called the power of a test. In other
words, the power of a test is the probability of rejecting H0 when indeed H0 is false.
When testing for a population proportion and the sample size is large we use the z-
distribution.
24
Step 3: Determine the Rejection and Acceptance Region
Depending on the distribution identified and the level of significance desired, you find a
value from statistical tables which we call a critical value. The critical value separates the
acceptance region from the rejection region. The rejection region is made up of a range of
values such that if a test statistic calculated from sample data falls in it the null hypothesis
would be rejected. The rejection region also depends on the nature of the alternative
hypothesis as shown in the following figures.
Area of rejection
0 critical value
area of rejection (α )
0
Critical value
The calculation of the test statistic depends on whether the population standard deviation σ is
known or unknown and also on the sample size as summarised in Table 3.1 below.
25
Table 3.1 Test Statistic for Testing μ
When σ is known When σ is unknown
Case I: n is large or small Case II: n is large
x − μ0 x − μ0
Z cal = ~ N(0,1) [3-1] Z cal = ~N(0,1) [3-2]
σ n s n
When testing for a single population proportion p , we use the z-distribution and the test
statistic is given by:
pˆ − p0
Z cal = [3-4]
p0 q0
n
Solution 3.3
1. H 0 : μ = 12%
H1 : μ ≠ 12%
2. The population standard deviation σ is unknown, but the sample size n =36 is large, so
we use the z-distribution.
3. The nature of the alternative hypothesis suggests we need to carry out a two-tailed test.
Using α =0.05, the critical value is ± Zα 2 = ± Z 0.025 = ±1.96
26
-1.96 0 1.96
The rejection criteria is therefore to reject H0 if Z cal > 1.96
x − μ0
4. Z cal =
s n
10 − 12
=
3 36
=-4
6. We conclude that the average annual return is not 12% and therefore the analyst’s
claim is false.
Example 3.4
The average weekly earnings of all bus rank marshals is reported to be $180. You believe it is
too low. You collect a random sample of 100 rank marshals and find that the weekly average
is $250 with a standard deviation of $20. Conduct the test at 10% level of significance.
Solution 3.4
1. H 0 : μ ≤ 180
H1 : μ > 180
2. The population standard deviation σ is unknown, but the sample size n =100 is large,
so we use the z-distribution.
0 1.2816
27
x − μ0
4. Z cal =
s n
250 − 180
=
20 100
= 35.
5. Since Zcal =35 is greater than the critical value =1.2816, we reject H0.
6. We conclude that the average weekly earnings of all rank marshals is greater than
$180.
Example 3.5
In an advertisement it is claimed that a certain brand of air freshener will last on average at
least 40 days. A random sample of 12 households took the following number of days to use
up the air freshener:
28 41 36 50 17 39 21 64 26 30 42 12
Test the claim made for the product using a 5% level of significance.
Solution 3.5
1. H 0 : μ ≥ 40
H1 : μ < 40
2. The population standard deviation σ is unknown, but the sample size n =12 is small, so
we use the t-distribution.
3. α = 0.05 and it is a one-tailed test. The critical value is − tα (n − 1) = −t0.05 (11) = −1.80
-1.80 0
We would reject H0 if Tcal > 1.80
28
Activity 3.2
1. Average total daily sales of a fruits vendor are known to be at most $26. The
vendor recently changed his site of operation and moved to a new site at a busy
street corner. He now wants to know whether his daily sales have improved since
then. A random sample of 16 trading days gave an average of $30 with a standard
deviation of $5. Does the data provide evidence that the vendor’s average total
daily sales have improved? Use α = 0.05 .
2. A graduate student comes out of college with an average fees debt of $1 500. A
sample of 200 graduates showed that the average debt was $900 with a standard
deviation of $120. Carry out the test at the 5% level of significance.
3. The average time that children who reside in the same neighboured spent to travel
to school is claimed to be 35 minutes. A random sample of 10 children taken from
the neighbourhood had their travel times recorded as follows:
37 38 40 35 36 35 39 37 40 42
Solution 3.6
1. H 0 : p ≥ 0.25
H1 : p < 0.25
-1.6449 0
29
pˆ − p0
4. Z cal =
p0 q0
n
0.3 − 0.25
=
0.25 × 0.75
60
0.05
=
0.05590
= 0.8945
5. We fail to reject H0
6. We conclude that the data does not provide sufficient evidence to reject H0
Example 3.7
Last year, 70% of total student applications received by the Zimbabwe Open University were
from female applicants. Out of a random sample of 150 applications received this year, 90
were from females. Test the hypothesis that the proportion of applications from females has
not changed using a 10% level of significance.
Solution 3.7
1. H o : p = 0.70
H o : p ≠ 0.70
-1.6449 0 1.6449
30
90
4. pˆ = = 0 .6
150
pˆ − p0
Z cal =
p0 q0
n
0.60 − 0.70
=
0.7 × 0.30
150
− 0.1
=
0.037416573
= -2.6726
Activity 3.3
1. The Traffic Safety Council of Zimbabwe claims that at least 65% of all road accidents are
due to human error. In a random sample of 500 road accidents, it was found that 342
accidents were due to human error. Use 5% level of significance to test the claim.
2. A credit controller of a clothing retail chain estimates that 20% of their customers default
on their monthly bill payment. A random sample of 400 accounts indicated that 130
accounts were at least one month in arrears. Does the data provide evidence to support the
credit controller’s claim? Use α = 0.10
a b
31
Example 3.8
An electrical firm supplies light bulbs that have a length of life that is approximately
normally distributed with a standard deviation of 20 hours. If a random sample of 40 bulbs
has an average life of 800 hours,
a) Find a 95% confidence interval for the population mean life of all bulbs supplied by
this firm.
b) Hence test at 5% level of significance the claim that the population mean life of all
bulbs supplied by this firm is 800 hours.
Solution 3.8
a) From Solution 2.5, the 95% confidence interval for mean life of bulbs was found to be
(791.8546; 808.1454).
b) The hypotheses tested are: H 0 : μ = 800
H1 : μ ≠ 800
Assuming H 0 , the claim is probably true because the confidence interval (791.8546;
808.1454) includes 800.
Activity 3.4
Last year, 70% of total student applications received by the Zimbabwe Open
University were from female applicants. Out of a random sample of 150 applications
received this year, 90 were from females. Use the confidence interval approach to test
the hypothesis that the proportion of applications from females has not changed using
a 10% level of significance.
3.8 Summary
In this unit, you learnt about how to conduct hypotheses tests concerning the mean and
proportion of a single population. We defined a statistical hypothesis is an assumption or a
statement which may or may not be true, made concerning a population parameter.
Hypothesis testing therefore is about verifying whether the claim is true or false. We saw that
there are two types of hypotheses namely the null and alternative hypothesis. The null
hypothesis is a statement of the assertion made concerning a population parameter.
The decision to reject or accept H0 is based on evidence gathered from a random sample
drawn from the population of interest. A wrong decision may be arrived at due to sampling
errors. A type I error is committed when H0 is rejected when in actual fact it is true. If H0 is
accepted when in fact it is false, the error committed is called a type II error.
The procedure of hypothesis testing should be followed religiously. The steps to be followed
were stated as:
a) State the null and alternative hypothesis
b) Identify the distribution
c) Determine the rejection and acceptance region
d) Calculate the test statistic
e) Decide whether or not to reject H0
f) Make a conclusion
The hypothesis is rejected if evidence from the sample is not consistent with the stated
hypothesis, otherwise it is accepted. However, the acceptance of the stated hypothesis does
not necessarily imply that it is true, rather it is a result of insufficient evidence to reject it.
32
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.
33
Unit 4
Simple Linear Regression Analysis
4.1 Introduction
Regression analysis is a widely used statistical technique that has many business applications.
It involves studying the relationship between variables and formulating models that connect
the variables.
In this unit, you will learn how to use observed data to estimate the functional form of the
relationship between the variables. You will also learn how to use the fitted model for
prediction purposes.
Simple linear regression involves modelling a relationship between only two variables. The
word linear implies fitting a straight-line model. When more than two variables are involved
the technique is called multiple regression analysis. The use of simple linear regression in
applied research is limited because the workings of most socio-economic systems are too
complex to be represented by such a simple formulation. However, in this course we will
only look at the relationship between two variables. This will help us explain the fundamental
ideas underlying regression analysis as simply as possible.
34
Example 4.1
When estimating the relationship between expenditure on advertisement and sales, we can fix
monthly levels of expenditure on advertisement but have no direct control over the monthly
sales realised. Therefore, advertisement expenditure is the independent variable and sales
become the dependent variable.
Activity 4.1
State the independent and dependent variable in each of the following studies:
a) Modelling the relationship between company profits and wages.
b) Modelling the relationship between level of risk and returns on investment.
c) Estimating the relationship between starting salary and level of education attained.
d) The relationship between consumption expenditure and disposable income.
Example 4.2
A company would like to estimate the relationship between its monthly sales and the amount
that the company spends on advertisement per month. A random sample of monthly
observations made over the past year is:
Draw a scatter plot to represent the data. Comment on the kind of relationship between
monthly expenditure on advertisement and monthly sales.
35
Solution 4.2
60
50
Monthly Sales ($00)
40
30
20
10
0
0 2 4 6 8 10 12 14 16
Monthly Expenditure ($00)
The points seem to be following a line with positive gradient. If you insert a line of best fit
through the points, you will see that the points do not deviate much from the line. We can
therefore conclude that there is a strong positive linear relationship between monthly
expenditure on advertisement and monthly sales.
Activity 4.2
The following table gives the ages (in years) and prices (in thousands of dollars) for a
random sample of 10 used cars of a specific model on display at a Car Sale.
Age (years) 7 3 9 4 7 5 8 6 2 5
Price ($000) 3 7 1 5 2 6 2 4 7 5
Draw a scatter plot to show the relationship between price and age of car.
36
The following sketch diagrams show scatter plots that you may also encounter and how you
should interpret them.
Y Y
X X
Y Y
X X
Activity 4.3
Sketch scatter diagrams to represent the following relationships:
a) A weak negative linear relationship
b) A strong negative linear relationship
The best straight line that passes through the points on a scatter plot can be fitted ‘by eye’ as
demonstrated in Figure 4.3. Where possible, the line is made to pass through the majority of
points on the scatter plot leaving almost an equal number of points above and below it.
37
Y
X
Figure 4.3 Fitting the Best Straight Line by Eye
However, there is no guarantee that fitting a line by eye will produce the best-fit line.
Different people will produce different lines despite using the same data. The method is,
therefore, unreliable and inconsistent. A more reliable method is the Least Squares Method,
the results of which are dealt with in subsection 4.6.1.
38
The least squares technique give the equation of the best straight line as
Yˆ = a + bX [4-2]
n∑ xy − ∑ x∑ y
where b = [4-3]
n∑ x 2 − (∑ x) 2
and a= ∑ y − b∑ x [4-4]
n
Equation [4-2] is the estimated regression equation connecting variables X and Y, where a
and b are estimates of the population intercept β 0 and population slope β1 of the line
respectively.
Example 4.3
Estimate the regression equation for the data of Example 4.2
Solution 4.3
Using the two variable statistical mode on your calculator, you will obtain the following
results:
n = 6 ∑ x = 54 ∑ x 2 = 560 ∑ y = 175 ∑ xy = 1827
n∑ xy − ∑ x∑ y
b=
n∑ x 2 − (∑ x )
2
6(1827) − 54(175)
=
6(560) − (54) 2
1512
=
444
= 3.405405405
a=∑
y − b∑ x
n
175 − 3.405405405(54)
=
6
= -1.481981978
The regression equation is Yˆ = −1.481981978 + 3.405405405 X
Example 4.4
Use the data of Activity 4.2 to estimate the regression equation connecting price and age of
car.
39
Solution 4.4
∑ x = 56 ∑ x
n = 10 2
= 358 ∑ y = 42 ∑ xy = 194
b= ∑
n xy − ∑ x∑ y
n∑ x − (∑ x )
2 2
a=∑
y − b∑ x
n
42 + 0.927927927 (56)
=
10
= 9.396396391
The estimated regression equation is Yˆ = 9.396396391 − 0.927927927 X
Activity 4.4
The data in the table below relate a manufacturer’s market share (%) with product
quality measured on a scale 0 to 100.
Product quality 27 39 73 66 33 43 47 55 60 68 70 75 82
Market share (%) 2 3 10 9 4 6 5 8 7 9 10 13 12
The parameter b represents the rate of change of Y with respect to X . Thus the value of b
shows the corresponding change in the value of Y for every unit change in the value of X .
Example 4.5
Interpret the value of a and b for the regression equation obtained in Example 4.3
Solution 4.5
The regression equation obtained was Yˆ = −1.481981978 + 3.405405405 X where X is the
age of car and Y is the corresponding price of car.
40
The value a = −1.481981978 cannot be interpreted in terms of price of car since the X values
used to construct the equation did not include zero.
The value b = 3.405405405 represents the corresponding increase in price of car for every
unit increase in age of car.
Example 4.6
Use the model obtained in Example 4.3 to estimate the monthly sales if the monthly
expenditure on advertisement is $1 000.
Solution 4.6
X = 10 ⇒ Y = −1.481981978 + 3.405405405(10)
= 32.57207207
≈ $3257 .21
The monthly sales are estimated to be $3 257.21 if $1 000 is spent on advertisement per
month.
41
Activity 4.5
1. Use the regression model you obtained in Activity 4.4 to estimate the percentage of market
share if the product quality is rated as 65.
2. A money market analyst would like to estimate the relationship between annual incomes of
families and their annual savings. The following data was obtained.
a) Obtain the least squares regression equation connecting income and savings.
b) State three assumptions made when estimating the equation in (a) above.
c) Interpret the slope of the estimated regression equation.
d) Estimate the amount of annual savings for a family with an annual income of $14 000.
3. The quantity demanded (Y) and price (X) of an illicit brew sold at a number of village
shebeens is estimated by the model Yˆ = a + bX i . A random sample of 7 trading days gave
the following observations:
a) Portray the data on a scatter diagram and comment on the relationship shown.
b) Find the estimated regression equation and plot it on the scatter diagram.
c) Use the plotted line to predict the quantity demanded when the price of a bottle is $1.20
4.8 Summary
We defined regression analysis as a statistical technique of understanding the relationship
between variables. Regression analysis allows us to establish the functional form of the
relationship between variables. We looked at the construction of scatter plots which enable
us, even at a glance, to have a ‘feel’ of the kind of relationship that exists between the
variables under investigation. The general two variable linear regression model is given by:
Y = β 0 + β1 X + e
We looked at the role of the error term e in the model which is to act as a proxy for all other
variables that may have an influence on the dependent variable but are omitted in the posited
model. We stated the assumptions of the simple linear regression model. We learnt about how
to fit a regression model to sample data and how to use the model for prediction purposes.
42
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.
43
Unit 5
Correlation Analysis
5.1 Introduction
Correlation analysis and regression analysis are related concepts in that they complement
each other. In Unit 4, we looked at linear regression analysis which is a statistical technique
of establishing the functional form of a relationship between two variables. In this unit, you
will learn about measures of the correlation between two variables. Correlation analysis is a
technique of measuring the degree of linear association between two variables. Using this
technique, business people will be able to tactfully manipulate one variable for the betterment
of the other variable or simply exploit the relationship between the variables in order to
maximise profits.
Given two variables X and Y, the correlation between X and Y is the same as the correlation
between Y and X, so in correlation analysis it would not matter to make a distinction between
an independent variable and a dependent variable as is the case in regression analysis. Unlike
in regression analysis where the independent variable X was assumed to be fixed and non
random, in correlation analysis we assume that both X and Y are random variables.
44
When points follow closely a straight line sloping up to the right as shown in Figure 5.1 (a),
we have a high positive correlation between the two variables. If points follow loosely a
straight line sloping down to the right, we have low negative correlation between variables X
and Y as shown in Figure 5.1 (b).
Y Y
X X
(a) High positive correlation (b) Low negative correlation
Y Y
X X
(c) Zero correlation (d) Zero correlation
Figure 5.1 Scatter Plots Showing Various Degrees of Correlation between X and Y
In Figure 5.1 (c) and (d) the relationship between X and Y is nonlinear giving zero
correlation between the variables.
Activity 5.1
Sketch scatter diagrams to show the correlation between two variables X and Y if the
degree of association is described as:
a) High negative correlation
b) Low positive correlation
45
• r = −1 indicates a perfect negative correlation between the variables
• r close to +1 indicates a strong positive correlation
• r close to -1 indicates a strong negative correlation
• r close to zero implies a weak correlation between the variables
The following adjectives may help you to describe the degree of linear association between
two variables:
n∑ xy − ∑ x∑ y
r= [5.1]
(n∑ x 2
)(
− (∑ x ) 2 n∑ y 2 − (∑ y ) 2 )
Example 5.1
The following data are a random sample of indexed prices of gold and platinum over a six
year period:
Gold (X) 12 10 14 11 12 9
Platinum (Y) 18 17 23 19 20 15
46
Solution 5.1
n=6 ∑ x = 68 ∑ x = 786 ∑ y = 112 ∑ y
2 2
= 2128 ∑ xy = 1292
n∑ xy − ∑ x∑ y
r=
(n∑ x − (∑ x) )(n∑ y − (∑ y) )
2 2 2 2
6(1292) − (68)(112)
=
[6(786) − (68) ][6(2128) − (112) ]
2 2
136
=
(92)(224)
= 0.9474
Example 5.2
The following are the number of hours which a random sample of ten students studied for an
examination and the subsequent grades received by the students.
Hours Studied(X) 8 5 11 13 10 5 18 15 2 8
Grade (Y) 56 44 79 72 70 54 94 85 33 65
Solution 5.2
n = 10 ∑ x = 95 ∑ x 2
= 1121 ∑ y = 652 ∑ y 2
= 45 688 ∑ xy = 6 996
10(6996) − 95(652)
r=
[10(1121) − (95) 2 ] [10(45688) − (652) 2 ]
8020
=
2185 × 31776
= 0.962496223
≈ 0.9625
There is a high positive correlation between hours studied and grade obtained.
Activity 5.2
A random sample of 10 upper six students obtained the following marks in mathematics and
physics in an end of year examination.
Mathematics (X) 76 62 70 59 52 53 53 56 57 56
Physics (Y) 80 68 73 63 65 68 65 63 65 66
Calculate the Pearson’s correlation coefficient for the data. Comment on the extent to which
performance in mathematics is associated to performance in physics.
47
5.5.2 Spearman’s rank correlation coefficient
The Spearman’s correlation coefficient denoted by rs is calculated for ranked data. This is
data measured on the ordinal scale. The correlation coefficient rs is interpreted in the same
way as the Pearson’s correlation coefficient r . The computational formula for rs is given by
6∑ di2
rs = 1 − [5-2]
n(n 2 − 1)
where d is the difference in ranks obtained by subtracting the ranks of y values from the
ranks of x values for each pair of observations.
Example 5.3
A panel of two judges ranked the performance of 5 drama groups (A, B, C, D and E) as
follows:
Drama Group A B C D E
Judge 1 4 5 1 3 2
Judge 2 5 4 2 3 1
Is there agreement in the manner in which the judges perceive the performance of the drama
groups?
Solution 5.3
Judge 1 Judge 2 di d i2
4 5 -1 1
5 4 1 1
1 2 -1 1
3 3 0 0
2 1 1 1
∑d i
2
=4
6∑ di2
rs = 1 −
n(n 2 − 1)
6( 4)
=1−
5(52 − 1)
24
=1−
120
= 0.8
The correlation coefficient is fairly high and positive showing that the judges do not differ in
the way they perceive the performance of the drama groups.
48
Example 5.4
An examiner and a moderator marked 7 examination scripts during a standardisation process
and awarded the following percentage scores:
Script Number 1 2 3 4 5 6 7
Examiner 67 54 38 70 42 70 80
Moderator 58 60 38 67 44 69 76
Calculate Spearman’s rank correlation coefficient and comment on the result obtained.
Solution 5.4
You begin by ranking the examiner scores assigning rank 1 to the highest score and rank 2 to
the second highest and so on. Where there are ties, the tied scores are each assigned the
average of the ranks they would have had assuming there were no ties. The moderator marks
are ranked separately in a similar manner.
The correlation coefficient is high and positive indicating that the examiner and the
moderator have the same perception of the scripts.
49
Activity 5.3
1. The following are the number of hours which a random sample of ten students
studied for an examination and the grades the students received.
2. Two judges ranked the quality of annual reports published by 6 listed companies as
follows:
Calculate the Spearman’s rank correlation coefficient and comment on the result
obtained.
The coefficient of determination has a dual purpose. Firstly, it measures the strength of the
linear relationship between the independent variable X and dependent variable Y. It is a
descriptive measure of the strength of the regression relationship between X and Y. It thus
gives us a measure of how well the estimated regression equation fits the data. The higher r 2
is, the better the fit and the higher our confidence in the regression.
Secondly, r 2 gives the proportion of variability in the dependent variable that is explained by
changes in the independent variable. You will remember that Pearson’s correlation
coefficient is based on the standard deviation, which is a measure of variability. It is befitting
that, since the coefficient of determination is a measure of variability, it should be obtained
by squaring Pearson’s and not Spearman’s correlation coefficient.
Example 5.5
Find the coefficient of determination for the data in Example 5.2
Solution 5.5
r 2 = (0.962496223) 2
= 0.92639898
Thus, about 92.64% of the variation in grades is explained by variation in the hours studied.
50
5.7 Testing whether X and Y are Correlated
The sample correlation coefficient ( r ) is used as a point estimator of the population
correlation coefficient ρ (rho). Therefore, r can be used in testing hypothesis about the true
correlation coefficient ρ . To facilitate the hypothesis testing, we assume both X and Y are
normally distributed.
3. Critical Value: This is a two-tailed test, given the level of significance α , the critical
value is given by ± tα 2 (n − 2) .
Example 5.6
In Example 5.2, test at 5% level of significance whether hours studied and grade received are
correlated.
Solution 5.6
1. H 0 : ρ = 0
H1 : ρ ≠ 0
51
5. Since Tcal = 10.03 > 2.31 we reject H 0
Activity 5.4
1. In Activity 5.2
a) Calculate the coefficient of determination
b) Test at 5% level of significance, whether marks obtained by students in
mathematics are correlated to the marks they obtained in physics.
2. A labour analyst interested in the relationship between turnover and labour supply
collected the following data on annual turnover and number of employees in 12
major retail organisations:
Turnover 20.1 14.0 10.7 10.6 8.6 8.1 5.5 4.9 4.6 4.5 4.3 4.1
($0000s)
Employees 126 141 107 101 92 70 52 34 57 32 47 26
5.8 Summary
In this unit, we saw how regression analysis and correlation analysis complement each other.
Correlation analysis seeks to establish the degree of linear relationship between two
variables. The correlation coefficient is a measure of the linear association between variables.
Its value ranges from -1 to +1. We looked at two methods of calculating the correlation
coefficient by Pearson and Spearman. Spearman’s correlation coefficient is used with ranked
data while Pearson’s correlation coefficient is calculated for interval or ratio data. The
coefficient of determination r 2 is obtained by squaring Pearson’s correlation coefficient. r 2 is
a ratio of the amount of variance that can be explained by the relationship between the
variables to the total variance in the data.
52
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.
53
Unit 6
Introduction to Time Series Analysis
6.1 Introduction
Times series analysis is a statistical technique of detecting patterns in a time series. A time
series is a set of measurements of a variable which are taken at regular time intervals over a
period of time. Many business variables have observations made on them at regular time
intervals. Examples of time series data are daily sales, monthly payroll, annual exports,
annual profits and so on.
Times series data are important in that they help business managers to review past
performance and they provide a basis for predicting future values of the time series.
54
Yt
Trend
Time in years
Yt
Examples of time series variables (Yt) that display seasonal variation are:
• Sales of seasonal items such as blankets/jerseys, school uniforms, umbrellas, fruits
• Credit card spending which is generally high towards and during the festive season
• Electricity consumption which varies depending on time of the day
55
Yt
0 5 10 15
Yt
Yt = Tt + St + Ct + I t [6.1]
The model is appropriate for series that have regular and constant fluctuations around a trend.
To decompose an additive time series you have to subtract the components from each other.
56
6.4.2 Multiplicative Model
The model assumes that the observed time series values are a product of the four components,
when all exist. The model is given by:
Yt = Tt × St × Ct × I t [6.2]
This model is more commonly used than the additive model because it is found to describe
more appropriately time series in a wide range of applications. It is more appropriate for
series that have regular but not constant fluctuations around a trend. To decompose a time
series which is assumed to be multiplicative, we divide the components.
Yˆt = a + bX t [6.3]
where:
b = ∑ t t2 ∑ t ∑2 t
n XY − X Y
n ∑ X t − (∑ X t )
[6.4]
a= ∑ Y − b∑ X
t t
n [6.5]
57
Example 6.1
The annual maize production (in metric tonnes) at Bere farm for the past ten years is
Year 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Production 74 85 87 92 110 115 130 136 142 150
(metric tonnes)
Solution 6.1
a) The years are coded sequentially by assigning 2002 = 1, 2003 = 2, 2004 = 3 and so on.
The following results are obtained:
∑ X t = 55 ∑ X t = 385 ∑ X tYt = 6889 ∑Yt = 1121 ∑Yt = 132119
2 2
n∑ X tYt − ∑ X t ∑ Yt
b=
n∑ X t2 − (∑ X t ) 2
10(6889) − 55(1121)
=
10(385) − (55) 2
7235
=
825
= 8.76969697
1121 − 8.76969697 (55)
a=
10
= 63.86666667
Activity 6.1
The following data give the quarterly sales figures for a retail outlet for the period
2002 to 2004.
58
6.5.2 Moving average method
The other method of isolating the trend is the moving average method. A moving average
(MA) of a time series is an average of a fixed number of observations that moves as we
progress down the series. The moving averages smoothes out peaks and valleys in the
original series to leave out a relatively smooth trend. The moving averages are therefore
estimates of the trend at different stages of the series.
The moving averages are centred at the middle of the observations from which it has been
calculated. The term of the moving average series is meant to coincide with the periodicity of
the original series. For example, a four-point moving average will be appropriate for
quarterly data.
Example 6.2
The daily sales of an airtime vendor over 12 days are recorded below:
37 24 62 80 77 95 94 133 148 155 128 161
Calculate a 3- point moving average of the sales.
Solution 6.2
A 3-point moving average requires you to find the average of three sets of observations at a
time.
The first MA = (37 + 24 +62)/3 = 41.00
The second MA = (24 + 62 + 80)/3 = 55.33
The third MA = (62 + 80 + 77)/3 = 73.00 and so on.
Note that the moving averages are centred at the middle of the data used to calculate it so that
we lose two observations one at the start and the other at the end. Centring is problematic
with even terms because, for an even term, the moving averages are ‘out of phase’ with the
time series observations. To centre the MA series, a further 2-point MA is found by
averaging every consecutive pair as illustrated in Example 6.3.
Example 6.3
The data below shows the sales ($000s) of a seasonal good at a retail outlet over three years.
Year Q1 Q2 Q3 Q4
1 14 32 33 6
2 16 35 36 7
3 15 38 41 8
59
Solution 6.3
a) Table 6.1 A 4-Point Centred MA of Sales
Year Quarter Sales (Yt) Uncentred 4-point MA Centred 4-point MA(T)
1 1 14
2 32
21.25
3 33 21.500
21.75
4 6 22.125
22.50
2 1 16 22.875
23.25
2 35 23.375
23.50
3 36 23.375
23.25
4 7 23.625
24.00
3 1 15 24.625
25.25
2 38 25.375
25.50
3 41
4 8
b) Sales
40
original series
30
20
moving average series
10
0
1 2 3 4 1 2 3 4 1 2 3 4
Year 1 Year 2 Year 3
Figure 6.5 Original Series and Moving Average Series Showing Trends in Sales
60
The moving averages remove the fluctuations in the time series and make the curve smooth
as shown in Figure 6.5. The smoothed curve shows a moderate, upward trend in sales during
the three year period.
Activity 6.2
A supplier of school stationary recorded its quarterly sales figures ($00s) for the years
2009 to 2012. The data is shown in the table below.
Year Q1 Q2 Q3 Q4
2009 48 52 16 35
2010 50 46 22 40
2011 68 34 26 35
2012 73 56 16 45
a) Draw a time series chart of the data and comment on the trend and seasonal
components
b) Calculate centred 4- point moving averages for the data.
c) Plot the four-point MA series on the same graph as the original series.
The stages that are followed in the ratio-to-moving average procedure for quarterly data are:
1. Calculate a centred 4-point moving average series
2. Find seasonal ratios by dividing each actual time series observation, Yt by its
corresponding moving average value
Yt Tt × Ct × St × I t
Seasonal ratio = MA = Tt × Ct
= St × I t
[6.6]
3. Find the average seasonal ratio for each quarter. The average could be the mean or
median but in most cases the median is used since it is not affected by outliers.
4. Add up the average seasonal ratios. They should add up to 4. If they don’t add up to 4
adjust each average by adding to it one-fourth of the difference between their sum and
4. The results are adjusted seasonal ratios/indexes.
Example 6.4
Calculate adjusted seasonal indexes for the data of Example 6.3.
Solution 6.4
The necessary calculations are presented in the form of a table as illustrated in Table 6.2.
61
Table 6.2 Calculation of Seasonal Indexes
Year Quarter Sales (Yt) Uncentred Centred Seasonal Ratio
4-point MA 4-point MA(T) Yt /T
1 1 14
2 32
21.25
3 33 21.500 1.535
21.75
4 6 22.125 0.271
22.50
2 1 16 22.875 0.699
23.25
2 35 23.375 1.497
23.50
3 36 23.375 1.540
23.25
4 7 23.625 0.296
24.00
3 1 15 24.625 0.609
25.25
2 38 25.375 1.498
25.50
3 41
4 8
After obtaining the seasonal ratios, you then find the mean seasonal index for each quarter
Year Q1 Q2 Q3 Q4
1 1.535 0.271
2 0.699 1.497 1.540 0.296
3 0.609 1.498
Mean 0.6540 1.4975 1.5375 0.2835
Activity 6.3
Calculate adjusted seasonal indexes for the data of Activity 6.2.
62
6.6.1 Deseasonalising of data
Deseasonalising refers to removing the effects of seasonal influence on the data. This is
achieved by dividing the actual Y values for each period by its corresponding adjusted
seasonal index.
Actual Y
Deseasonalised Y = Adjusted Seasonal index S
[6.7]
Example 6.5
Using the data of Example 6.3, obtain the deseasonalised series.
Solution 6.5
Table 6.3 Calculation of Deseasonalised Sales Values
Year Quarter Sales(Yt) Adjusted Seasonal Deseasonalised
Index (S) Sales (Yt/S)
1 1 14
2 32
3 33 1.544375 33.204
4 6 0.290375 6.425
2 1 16 0.660875 15.118
2 35 1.504375 35.165
3 36 1.544375 36.100
4 7 0.290375 6.860
3 1 15 0.660875 16.274
2 38 1.504375 38.174
3 41
4 8
Activity 6.4
Using the data of Activity 6.2, obtain the deseasonalised series.
For the data in Example 6.3, the predicted sales are found by multiplying the trend estimate
by the corresponding adjusted seasonal index as shown in Table 6.4.
63
Table 6.4 Calculation of Predicted Sales
Year Quarter Sales(Yt) Trend Estimate(T) Adjusted Seasonal Predicted
Index (S) Sales (TxS)
1 1 14
2 32
3 33 21.500 1.544375 33.204
4 6 22.125 0.290375 6.425
2 1 16 22.875 0.660875 15.118
2 35 23.375 1.504375 35.165
3 36 23.375 1.544375 36.100
4 7 23.625 0.290375 6.860
3 1 15 24.625 0.660875 16.274
2 38 25.375 1.504375 38.174
3 41
4 8
Activity 6.5
1. Find the predicted sales for the data of Activity 6.2.
2. A local church organisation recorded the following quarterly amounts (in 000s) of
tithes paid by its members for the period 2010 to 2012.
a) Draw a times series plot of the data and comment on the trend shown.
b) Obtain a centred 4-point MA of the series and use it to calculate adjusted seasonal
indexes for the data.
c) Find the deseasonalised series of the data.
d) Forecast the quarterly amounts of tithes for the year 2013.
6.7 Summary
In this unit we discussed four components of a time series namely the trend, seasonal,
cyclical and irregular component. The whole business of time series analysis is to decompose
a time series into these components either using an additive model or a multiplicative model.
We looked at two methods of isolating the trend which are the Least Squares Method and the
Moving Averages Method. We saw how a moving average smoothes data to reveal trends in
the data.
We also looked at how to isolate the seasonal component using the Ratio to Moving Average
method. Finally you learnt about how to obtain predicted series values using seasonal indices.
64
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.
65
Unit 7
Index Numbers
7.1 Introduction
An index number is a number that measures the relative change in a set of measurements over
time. Index numbers show changes over time by expressing the new value, Vn as a
percentage of some existing value, V0 called the base value.
V
Index number = n × 100
Vo [7.1]
The base value is the value of the variable at some reference point in the past called the base
period. The index number of the base period is assumed to be 100.
In this unit, you will construct price and quantity indices for both weighted and unweighted
indices.
66
7.3.2 Quantity indices
Quantity indices measure how much of a commodity is produced or consumed over time.
Some examples of quantity indices are:
• Industrial index which gives a measure of change in industrial output now compared
to a past reference point
• Mining index which gives a measure of change in minerals production now compared
to a specified base period
Pn
SPI = × 100
P0 [7.2]
Example 7.1
During a Christmas clearance sale, a bottle of gin that sold for $12 before the sale was now
selling for $8. Calculate the Simple Price Index.
Solution 7.1
Pn
SPI = × 100
P0
8
= × 100
12
= 66.67%
67
Activity 7.1
1. Suppose the price of a 2 litre bottle of cooking oil was $3.00 in 2010 and in 2012
the price was $3.50. Calculate the simple price index using 2010 as the base year.
2. The average retail prices of a bar of soap for the years 2010 to 2012 are as follows:
Determine a Simple Price Index for 2011 and 2012 using 2009 as the base.
Qn
SQI = × 100
Qo [7.3]
Example 7.2
The annual production of maize in Zimbabwe for the years 1997 and 2000 was 5 000 metric
tonnes and 400 metric tonnes respectively. Using 1997 as the base year, determine the change
in maize production.
Solution 7.2
Qn
SQI = × 100
Qo
400
= × 100
5000
=8%
The annual production of maize fell by 92% between 1997 and the year 2000.
Activity 7.2
The following data give the prices and quantities of two commodities from 2010 to
2011:
68
7.4.3 Index number series trends
The index numbers for a given period gives a reflection of trends in the output or price of
commodities, that is, a time series of index numbers will show whether there has been an
increase or decrease in the output or price of a commodity.
Consider the index of maize production for the period 1995 to 2006 below
Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Index 84 96 100 104 112 86 82 79 65 36 38 30
The index numbers show a steady increase in maize production from 1995 to 1999 followed
by a gradual decline from the year 2000 to 2006. The year 1997 is the base year because it
has index number 100. Production for the other years is then compared in percentage terms
with the production obtaining in 1997. For example, compared to the maize production in
1997, the production in 1999 was 12% higher and that by 2006 the production had declined
by 70%.
Example 7.3
The following figures represent the average annual cost (in dollars per square metre) of low
density residential stands in Mutare for the years 2004 to 2012.
Year 2004 2005 2006 2007 2008 2009 2010 2011 2012
Price 16 20 24 24 29 30 35 36 40
Construct simple index numbers for the prices using 2005 as the base period (2005 = 100).
Comment on the trend shown.
Solution 7.3
The year 2005 is the reference point, and the index number for 2005 is taken to be 100.
P2004
The index for 2004 = × 100
P2005
16
= × 100
20
= 80 %
P2008
The index for 2008 = × 100
P2005
29
= × 100
20
= 145%
The index numbers of the remaining years are calculated in similar fashion. The results are
summarised in Table 7.1
69
Table 7.1 Price Index for Residential Stands
Year 2004 2005 2006 2007 2008 2009 2010 2011 2012
Price 16 20 24 24 29 30 35 36 40
Index 80 100 120 120 145 150 175 180 200
There was a steady increase in the price of residential stands from 2004 to 2012 with the
price in 2012 being double what it was in 2005.
Activity 7.3
The average quarterly sales of a retail chain are shown below.
Quarter 1 2 3 4
Sales ($000s) 43 54 50 84
Using the third quarter as the base period, express the average sales as index numbers.
The other reason for changing the base period is to enable comparison between two index
number series with different base periods. Two index number series can only be compared if
they have the same base period, therefore, if the base is not the same it is necessary to rebase
one of them.
To change the base period of an index, change the index number of the new base period to
100, then divide all numbers in the index by the index value of the proposed new base period
and multiply by 100.
Old index value
New index value = × 100
Index value of new base [7.4]
Example 7.4
Consider the index of maize production for the period 1995 to 2006 referred to earlier on.
Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Index 84 96 100 104 112 86 82 79 65 36 38 30
Change the base period of the maize production index from 1997 to 2003.
Solution 7.4
The year 2003 is now assigned an index number of 100. The old index numbers of the
remaining years are each divided by 65 and the result multiplied by 100 to obtain their
respective new index numbers. For example, the new index number for the year 1995 and
2006 are calculated as follows:
70
84
New index number for 1995 = × 100
65
= 129.23%
30
New index number for 2006 = × 100
65
= 46.15%
Activity 7.4
1. In Example 7.4, change the base period of the maize production index from 1997 to
2000.
2. The following data are July 2009 to July 2010 commodity price index for a group
of consumer goods:
140 138 124 98 100 152 148 143 150 146 155 158 162
a) What is the base year used here?
b) Describe the trend in the price of the commodities over this period.
c) Change the base period to March 2010.
71
by calculating a weighted average of price relatives while the industrial index is a weighted
average of quantity relatives.
The procedure for calculating a Weighted Average of Price Relative Index (WAPRI) is as
follows:
1. Calculate the price relative of each item. For item i the price relative X i is given by the
formula
P
X i = n × 100 . [7.5]
P0
2. Using the weights Wi given, obtain a weighted average of the price relatives. The
computational formula is given by
WAPRI =
∑W X
i i
[7.6]
∑W i
Example 7.5
The data shows the unit prices of three different commodities X, Y and Z for two consecutive
years and the quantities consumed
Calculate a Weighted Average of Price Relative Index (WAPRI) for the commodities using
2010 as the base period.
Solution 7.5
WAPRI = ∑W X
i i
∑W i
3480
=
32
= 108.75%
72
Activity 7.5
Suppose you are given the following products consumed by an average family in
2003 and 2004.
Calculate a Weighted Average of Price Relative Index (WAPRI) for the commodities
using 2003 as the base period.
In base-period weighting, when comparing prices it is assumed that quantities are held
constant at base period levels whilst when comparing quantities, it is assumed that prices are
held constant at the base period level. Base weighting is less expensive and less time
consuming because there is no continuous calculation of weights. However, the relevance of
the weights may diminish with the passage of time so that rebasing may be necessary.
In current-period weighting, when comparing prices it is assumed that quantities are held
constant at current period levels, whilst when comparing quantities, it is assumed that prices
are held constant at current period level. Current weighting involves continuous calculation
of weights which is expensive and time consuming. This also makes valid comparisons
difficult or impossible due to continuously changing weights. Despite these drawbacks,
current weighting is preferred because it ensures that an item is rated in accordance with its
current importance, so that there is no risk of producing a grossly misleading index through
the use of outdated weights.
The base-period weighted indices are called Laspeyers indices whilst the current-period
weighted indices are the Paasche indices. The computational formulae are presented below:
73
Paasche Price Index, PPI = ∑PQn n
× 100 [7.9]
∑PQ0 n
A related index number is the Fisher’s index which is the geometric mean of the Laspeyre
and Paasche index numbers
Example 7.6
The following data give the prices and quantities of the types of food stuff bought by a
private boarding school in 2011 and 2012
Calculate:
• Laspeyre and Paasche Price Indices for 2012, with 2011 as the base year and interpret
your results.
• Fisher Price Index and interpret the result.
Solution 7.6
a) LPI = ∑PQ
n 0
× 100
∑PQ
0 0
39 820
= × 100
25 570
= 155.73 %
74
PPI = ∑PQ
n n
× 100
∑PQ
0 n
77 760
= × 100
50100
= 155.21 %
Using both old and current quantities, prices have increased by 55.47 %.
Activity 7.6
Using the data provided in Example 7.6, calculate:
a) Laspeyre and Paasche Quantity Indices for 2012, with 2011 as the base year and
interpret your results.
b) Fisher Quantity Index and interpret the result.
The Consumer Price Index is an overall measure of relative changes in prices of many goods
and thus reflects changes in the value of money. The CPI is used as a deflator in converting
nominal amounts of money to what are called real amounts of money. Real amounts of
money are amounts that are comparable through time without due regard to changes in the
value of money due to inflation.
The converting procedure involves indentifying a constant point in time – the base period. By
simply dividing Y dollars in year i by the CPI value for year i and multiplying by 100, we
convert our X nominal (year i ) dollars to constant (base year) dollars.
The all items CPI for the years 2008 to 2011 as provided by ZIMSTAT are shown in Table
7.2.
75
We will now look at an example to illustrate the use of the CPI as a deflator.
Example 7.7
Suppose that during the years 2009 to 2011, the entry salary of a trained teacher was as
follows:
Use the CPI figures provided in Table 7.2 to transform the salaries to 2008 dollars.
Solution 7.7
If we divide the 2009 salary of $150 by the CPI of that year and multiply the result by 100,
we get the equivalent salary in 2008 dollars, that is, the salary in real terms.
150
× 100 = 162.87
92.1
In real terms, the entry salary for a trained teacher in 2009 was $162.87.
250
For 2010: × 100 = 263.44
94.9
275
For 2011: × 100 = 280.04
98.2
Thus, the entry salary for 2010 and 2011 was $263.44 and $280.04 respectively in constant
2008 dollars. The salary has increased by $117.17 from 2009 to 2011. This shows that the
salary was able to keep up with inflation.
Activity 7.7
The data that follows shows the average price of a 2 litre bottle of cooking oil over
the past three years.
Year Price ($)
2009 2.75
2010 3.10
2011 3.50
Use the CPI figures in Table 7.2 to adjust the price to constant 2008 dollars.
76
7.7 Challenges in Constructing Index Numbers
The following are the problems associated with the construction of index numbers:
Unavailability of data – data is expensive to collect and it is not always practicable to
determine the quantities involved (sold).
1. Choice of base year – the base year has to be a reasonably normal year characterised
by stability in business activity and such years are difficult to come by.
2. Selection of items – there may be disagreements on the items to include. The selection
should be such that movements in prices of those items chosen will be representative
of the movements of prices of all items considered relevant.
3. Choice of weights – it is difficult to select typical quantities and prices which measure
relative importance in the construction of composite indices. The weights may
become outdated with time giving rise to misleading indices.
4. Comparability of index series – comparison is only possible if two index series have
the same base period.
7.8 Summary
In this unit, we looked at the construction of simple index numbers and weighted index
numbers. Index numbers are used to measure the relative change in a set of measurements
over time. A base period is chosen to serve as a reference point. The base period is given
index number 100.
Simple indices show changes pertaining to a single item while aggregate indices are for a
group of items. Because items do not contribute the same to the envisaged change, the items
are given weights to reflect their relative importance. The weights may be current weights or
base weights. Whilst base weights are less expensive to use, they may be outdated thereby
giving rise to misleading indices. As time goes on, it may be necessary to change the base
period to keep up with current trends. We saw how the base can be changed from one period
to another.
We looked at the construction of the CPI and how it is used to adjust for inflation. Finally, we
discussed the problems that are associated with the construction of index numbers. These
include the unavailability of data, the choice of base year, choice of weights and selection of
items to make up the basket.
77
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.
78
BLANK PAGE
Unit 8
Statistics List of Formulae
In this unit we give a statistics list of formulae which will be used in this course.
1
Sample mean, x =
n
∑ xi [2.1]
(∑ xi2 − ∑ i )
1 ( x )2
Sample variance, s 2 = [2.2]
n −1 n
k
Sample population proportion, pˆ = [2.3]
n
79
A 100(1 − α ) % confidence interval for the population proportion p is given by:
pˆ (1 − pˆ )
pˆ ± Zα 2 × [2.7]
n
The minimum sample size necessary to ensure that the error in estimating μ will not
exceed a specified amount e is given by:
⎡ Zα 2 × σ ⎤
2
n=⎢ ⎥ [2.8]
⎣ e ⎦
The minimum sample size required to estimate the population proportion to be within a
specified amount e with 100(1 − α )% confidence is given by:
pˆ (1 − pˆ ) Zα2 2
n= [2-9]
e2
2
⎡ Zα 2 ⎤
n=⎢ ⎥ [2.10]
⎣ 2e ⎦
Y = β 0 + β1 X + e [4-1]
80
The least squares estimates of β 0 and β1 are a and b respectively
n∑ xy − ∑ x∑ y
where b = [4-2]
n∑ x 2 − (∑ x ) 2
and a= ∑ y − b∑ x [4-3]
n
n∑ xy − ∑ x∑ y
r= [5.1]
(n∑ x 2
)(
− (∑ x ) 2 n∑ y 2 − (∑ y ) 2 )
Spearman’s Rank Correlation Coefficient rs is given by
6∑ di2
rs = 1 − [5-2]
n(n 2 − 1)
The intercept and slope aand bof the estimated trend line are given by
n∑ X tYt − ∑ X t ∑ Yt
b=
n∑ X t2 − (∑ X t ) 2
[6.2]
a= ∑ Y − b∑ X
t t
n [6.3]
Seasonal ratio Yt T × Ct × S t × I t
= = t = St × I t [6.4]
MA Tt × Ct
81
Actual Y
Deseasonalised Y = Adjusted Seasonal index S
[6.5]
Qn
Simple Quantity Index SQI = × 100 [7.2]
Qo
=
∑W X i i
∑W
WAPRI [7.4]
i
82
APPENDICES
Statistical Tables
List of Tables
1. Normal distribution
2. Student t distribution
83
84
85