You are on page 1of 102

Bachelor of Commerce

Business Statistics II

Module BBFH 107


Author: Amos Tendai Munzara
Master of Business Administration (ZOU)
Bachelor of Science Mathematics and Statistics (ZOU)
Diploma in Education (Gweru Teachers' College)

Content Reviewer: Kudzanayi Ruvharo


Master of Science in Statistics (UZ)
Bachelor of Science Special Honours in Statistics (UZ)
Bachelor of Science Mathematics and Statistics (ZOU)
Diploma in Education (Gweru Teachers' College)

Editor: Barnabas Muyengwa


Master of Education in Teacher Education (UZ)
Bachelor of Education Mathematics (UZ)
Certificate in Education (Gweru Teachers' College)
Published by: Zimbabwe Open University

P.O. Box MP1119

Mount Pleasant

Harare, ZIMBABWE

The Zimbabwe Open University is a distance teaching and open


learning institution.

Year: June 2013

Reprinted: November 2013

Cover design: T. Ndhlovu

Layout : S. Mapfumo

I.S.B.N: 978-1-77938-732-5

Printed by: ZOU Press, Harare

Typeset in Times New Roman, 12 point on auto leading

© Zimbabwe Open University. All rights reserved. No part of this


publication may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without the prior permission of the Zimbabwe Open
University.
To the student
The demand for skills and knowledge academics, technologists and
and the requirement to adjust and administrators of varied backgrounds,
change with changing technology, places training, skills, experiences and personal
on us a need to learn continually interests. The combination of all these
throughout life. As all people need an qualities inevitably facilitates the
education of one form or another, it has production of learning materials that
been found that conventional education teach successfully any student, anywhere
institutions cannot cope with the and far removed from the tutor in space
demand for education of this magnitude. and time. We emphasize that our
It has, however, been discovered that learning materials should enable you to
distance education and open learning, solve both work-related problems and
now also exploiting e-learning other life challenges.
technology, itself an offshoot of e-
commerce, has become the most To avoid stereotyping and professional
effective way of transmitting these narrowness, our teams of learning
appropriate skills and knowledge materials producers come from different
required for national and international universities in and outside Zimbabwe,
development. and from Commerce and Industry. This
openness enables ZOU to produce
Since attainment of independence in materials that have a long shelf life and
1980, the Zimbabwe Government has are sufficiently comprehensive to cater
spearheaded the development of for the needs of all of you, our learners
distance education and open learning at in different walks of life. You, the
tertiary level, resulting in the learner, have a large number of optional
establishment of the Zimbabwe Open courses to choose from so that the
University (ZOU) on 1 March, 1999. knowledge and skills developed suit the
career path that you choose. Thus, we
ZOU is the first, leading, and currently strive to tailor-make the learning
materials so that they can suit your
the only university in Zimbabwe entirely
personal and professional needs. In
dedicated to teaching by distance
developing the ZOU learning materials,
education and open learning. We are
we are guided by the desire to provide
determined to maintain our leading
you, the learner, with all the knowledge
position by both satisfying our clients
and skill that will make you a better
and maintaining high academic performer all round, be this at certificate,
standards. To achieve the leading diploma, undergraduate or postgraduate
position, we have adopted the course level. We aim for products that will settle
team approach to producing the varied comfortably in the global village and
learning materials that will holistically competing successfully with anyone. Our
shape you, the learner to be an all-round target is, therefore, to satisfy your quest
performer in the field of your own for knowledge and skills through
choice. Our course teams comprise distance education and open learning
Any course or programme launched by ZOU is you may never meet in life. It is our intention
conceived from the cross-pollination of ideas to bring the computer, email, internet chat-
from consumers of the product, chief among rooms, whiteboards and other modern methods
whom are you, the students and your employers. of delivering learning to all the doorsteps of
We consult you and listen to your critical analysis our learners, wherever they may be. For all these
of the concepts and how they are presented. We developments and for the latest information on
also consult other academics from universities what is taking place at ZOU, visit the ZOU
the world over and other international bodies website at www.zou.ac.zw
whose reputation in distance education and open
learning is of a very high calibre. We carry out Having worked as best we can to prepare your
pilot studies of the course outlines, the content learning path, hopefully like John the Baptist
and the programme component. We are only prepared for the coming of Jesus Christ, it is my
too glad to subject our learning materials to hope as your Vice Chancellor that all of you,
academic and professional criticism with the will experience unimpeded success in your
hope of improving them all the time. We are educational endeavours. We, on our part, shall
determined to continue improving by changing continually strive to improve the learning
the learning materials to suit the idiosyncratic materials through evaluation, transformation of
needs of our learners, their employers, research, delivery methodologies, adjustments and
economic circumstances, technological sometimes complete overhauls of both the
development, changing times and geographic materials and organizational structures and
location, in order to maintain our leading culture that are central to providing you with
position. We aim at giving you an education the high quality education that you deserve. Note
that will work for you at any time anywhere and that your needs, the learner ‘s needs, occupy a
in varying circumstances and that your central position within ZOU’s core activities.
performance should be second to none.
Best wishes and success in your studies.
As a progressive university that is forward
looking and determined to be a successful part
of the twenty-first century, ZOU has started to
introduce e-learning materials that will enable
you, our students, to access any source of
information, anywhere in the world through
internet and to communicate, converse, discuss _____________________
and collaborate synchronously and Prof. Primrose Kurasha
asynchronously, with peers and tutors whom Vice Chancellor
The Six Hour Tutorial Session At
The Zimbabwe Open University
A s you embark on your studies with the Zimbabwe
Open University (ZOU) by open and distance
learning, we need to advise you so that you can make
This is where the six hour tutorial comes in. For it
to work, you need to know that:
· There is insufficient time for the tutor to
the best use of the learning materials, your time and
the tutors who are based at your regional office. lecture you
· Any ideas that you discuss in the tutorial,
The most important point that you need to note is originate from your experience as you
that in distance education and open learning, there work on the materials. All the issues
are no lectures like those found in conventional raised above are a good source of topics
universities. Instead, you have learning packages that (as they pertain to your learning) for
may comprise written modules, tapes, CDs, DVDs discussion during the tutorial
and other referral materials for extra reading. All these
· The answers come from you while the
including radio, television, telephone, fax and email
can be used to deliver learning to you. As such, at tutor’s task is to confirm, spur further
the ZOU, we do not expect the tutor to lecture you discussion, clarify, explain, give
when you meet him/her. We believe that that task is additional information, guide the
accomplished by the learning package that you receive discussion and help you put together full
at registration. What then is the purpose of the six answers for each question that you bring
hour tutorial for each course on offer? · You must prepare for the tutorial by
bringing all the questions and answers
At the ZOU, as at any other distance and open that you have found out on the topics to
learning university, you the student are at the centre the discussion
of learning. After you receive the learning package, · For the tutor to help you effectively, give
you study the tutorial letter and other guiding him/her the topics beforehand so that in
documents before using the learning materials. During cases where information has to be
the study, it is obvious that you will come across gathered, there is sufficient time to do
concepts/ideas that may not be that easy to understand so. If the questions can get to the tutor
or that are not so clearly explained. You may also at least two weeks before the tutorial,
come across issues that you do not agree with, that that will create enough time for thorough
actually conflict with the practice that you are familiar preparation.
with. In your discussion groups, your friends can bring
ideas that are totally different from yours and In the tutorial, you are expected and required to
arguments may begin. You may also find that an idea take part all the time through contributing in every
is not clearly explained and you remain with more way possible. You can give your views, even if
questions than answers. You need someone to help they are wrong, (many students may hold the same
you in such matters. wrong views and the discussion will help correct
The Six Hour Tutorial Session At The Zimbabwe Open University

the errors), they still help you learn the correct thing as the tutor may dwell on matters irrelevant to the
as much as the correct ideas. You also need to be ZOU course.
open-minded, frank, inquisitive and should leave no
stone unturned as you analyze ideas and seek
clarification on any issues. It has been found that Distance education, by its nature, keeps the tutor
those who take part in tutorials actively, do better in and student separate. By introducing the six hour
assignments and examinations because their ideas are tutorial, ZOU hopes to help you come in touch with
streamlined. Taking part properly means that you the physical being, who marks your assignments,
prepare for the tutorial beforehand by putting together assesses them, guides you on preparing for writing
relevant questions and their possible answers and examinations and assignments and who runs your
those areas that cause you confusion. general academic affairs. This helps you to settle
down in your course having been advised on how
Only in cases where the information being discussed to go about your learning. Personal human contact
is not found in the learning package can the tutor is, therefore, upheld by the ZOU.
provide extra learning materials, but this should not
be the dominant feature of the six hour tutorial. As
stated, it should be rare because the information
needed for the course is found in the learning package
together with the sources to which you are referred.
Fully-fledged lectures can, therefore, be misleading

The six hour tutorials should be so structured that the


tasks for each session are very clear. Work for each
session, as much as possible, follows the structure given
below.

Session I (Two Hours)


Session I should be held at the beginning of the semester. The main aim
of this session is to guide you, the student, on how you are going to
approach the course. During the session, you will be given the overview
of the course, how to tackle the assignments, how to organize the logistics
of the course and formation of study groups that you will belong to. It is
also during this session that you will be advised on how to use your
learning materials effectively.
The Six Hour Tutorial Session At The Zimbabwe Open University

Session II (Two Hours)


This session comes in the middle of the semester to respond to the
challenges, queries, experiences, uncertainties, and ideas that you are
facing as you go through the course. In this session, difficult areas in the
module are explained through the combined effort of the students and
the tutor. It should also give direction and feedback where you have not
done well in the first assignment as well as reinforce those areas where
performance in the first assignment is good.

Session III (Two Hours)


The final session, Session III, comes towards the end of the semester.
In this session, you polish up any areas that you still need clarification on.
Your tutor gives you feedback on the assignments so that you can use
the experience for preparation for the end of semester examination.

Note that in all the three sessions, you identify the areas
that your tutor should give help. You also take a very
important part in finding answers to the problems posed.
You are the most important part of the solutions to your
learning challenges.

Conclusion for this course, but also to prepare yourself to


contribute in the best way possible so that you
In conclusion, we should be very clear that six can maximally benefit from it. We also urge you
hours is too little for lectures and it is not to avoid forcing the tutor to lecture you.
necessary, in view of the provision of fully self-
contained learning materials in the package, to BEST WISHES IN YOUR STUDIES.
turn the little time into lectures. We, therefore,
urge you not only to attend the six hour tutorials ZOU
Table of Contents

Module Overview .................................................................................................................................. 1 
Unit 1 ...................................................................................................................................................... 2 
The Normal Distribution ...................................................................................................................... 2 
1.1 Introduction ................................................................................................................................. 2 
1.2 Unit Objectives ............................................................................................................................ 2 
1.3 The Normal Curve  ................................................................................................................... 2 

1.3.1 Properties of the normal curve ........................................................................................... 3 


1.3.3 Area under the normal curve .............................................................................................. 3 
Activity 1.1 ......................................................................................................................................... 4 
1.4 Evaluating Probabilities using the Standard Normal Tables ................................................. 4 
Activity 1.2 ......................................................................................................................................... 6 
1.4.1 Practical problems ............................................................................................................... 7 
Activity 1.3 ......................................................................................................................................... 8 
1.5 Summary ...................................................................................................................................... 8 
References .......................................................................................................................................... 9 
Unit 2 .................................................................................................................................................... 10 
Statistical Estimation .......................................................................................................................... 10 
2.1 Introduction ............................................................................................................................... 10 
2.2 Unit Objectives .......................................................................................................................... 10 
2.3 What is Statistical Estimation? ................................................................................................ 10 
2.4 Point Estimation ........................................................................................................................ 11 
2.4.1 Point estimator of the population mean ........................................................................... 11 
2.4.2 Point estimator of the population variance ...................................................................... 11 
Activity 2.1 ....................................................................................................................................... 12 
2.4.3 Point estimator of the population proportion .................................................................. 12 
Activity 2.2 ....................................................................................................................................... 13 
2.5 Confidence Interval Estimation ............................................................................................... 13 
2.5.1 Interval estimate of the population mean ........................................................................ 14 
Activity 2.3 ....................................................................................................................................... 15 
Activity 2.4 ....................................................................................................................................... 15 
Activity 2.5 ....................................................................................................................................... 17 

iii 
 
2.5.2 Estimation of the population proportion ......................................................................... 17 
Activity 2.6 ....................................................................................................................................... 17 
2.6 Determining Sample Size in Estimation .................................................................................. 17 
2.6.1 Sample size for estimating population mean ................................................................... 18 
Activity 2.7 ....................................................................................................................................... 19 
2.6.2 Sample size for estimating a population proportion ....................................................... 19 
Activity 2.8 ....................................................................................................................................... 20 
2.7 Summary .................................................................................................................................... 20 
References ........................................................................................................................................ 21 
Unit 3 .................................................................................................................................................... 22 
Hypothesis Testing .............................................................................................................................. 22 
3.1 Introduction ............................................................................................................................... 22 
3.2 Unit Objectives .......................................................................................................................... 22 
3.3 Statistical Hypotheses ............................................................................................................... 22 
3.3.1 Types of hypotheses ........................................................................................................... 22 
3.3.2 Deciding on the null hypothesis ........................................................................................ 23 
Activity 3.1 ....................................................................................................................................... 24 
3.4 Type I and Type II Errors ........................................................................................................ 24 
3.5 Steps Followed in Hypothesis Testing ..................................................................................... 24 
3.6 Tests Concerning the Population Mean .................................................................................. 26 
Activity 3.2 ....................................................................................................................................... 29 
3.7 Test Concerning a Population Proportion .............................................................................. 29 
Activity 3.3 ....................................................................................................................................... 31 
3.8 Confidence Interval Approach to Hypothesis Testing ........................................................... 31 
Activity 3.4 ....................................................................................................................................... 32 
3.8 Summary .................................................................................................................................... 32 
References ........................................................................................................................................ 33 
Unit 4 .................................................................................................................................................... 34 
Simple Linear Regression Analysis ................................................................................................... 34 
4.1 Introduction ............................................................................................................................... 34 
4.2 Unit Objectives .......................................................................................................................... 34 
4.3 What is Regression Analysis? .................................................................................................. 34 
4.4 Types of Variables ..................................................................................................................... 34 
Activity 4.1 ....................................................................................................................................... 35 

iv 
 
4.5 Scatter Plots ............................................................................................................................... 35 
Activity 4.2 ....................................................................................................................................... 36 
Activity 4.3 ....................................................................................................................................... 37 
4.6 The Simple Linear Regression Model ..................................................................................... 38 
4.6.1 Model assumptions ............................................................................................................. 38 
4.6.2 Random error term ............................................................................................................ 38 
4.6.3 Estimating the regression equation .................................................................................. 38 
Activity 4.4 ....................................................................................................................................... 40 
4.6.4 Interpretation of a and b  ................................................................................................ 40 
4.6.5 Some uses of the regression model .................................................................................... 41 
4.7 Estimating Values of the Dependent Variable ........................................................................ 41 
Activity 4.5 ....................................................................................................................................... 42 
4.8 Summary .................................................................................................................................... 42 
References ........................................................................................................................................ 43 
Unit 5 .................................................................................................................................................... 44 
Correlation Analysis ........................................................................................................................... 44 
5.1 Introduction ............................................................................................................................... 44 
5.2 Unit Objectives .......................................................................................................................... 44 
5.3 Relating Correlation Analysis to Regression Analysis .......................................................... 44 
5.4 Scatter Diagrams ....................................................................................................................... 44 
Activity 5.1 ....................................................................................................................................... 45 
5.5 Correlation Coefficient ............................................................................................................. 45 
5.5.1 Pearson’s product moment correlation coefficient ......................................................... 46 
Activity 5.2 ....................................................................................................................................... 47 
5.5.2 Spearman’s rank correlation coefficient .......................................................................... 48 
5.6 Coefficient of Simple Determination ....................................................................................... 50 
5.7 Testing whether X and Y are Correlated ................................................................................ 51 
Activity 5.4 ....................................................................................................................................... 52 
5.8 Summary .................................................................................................................................... 52 
References ........................................................................................................................................ 53 
Unit 6 .................................................................................................................................................... 54 
Introduction to Time Series Analysis ................................................................................................ 54 
6.1 Introduction ............................................................................................................................... 54 
6.2 Unit Objectives .......................................................................................................................... 54 


 
6.3 Components of a Time Series ................................................................................................... 54 
6.3.1 Trend component ............................................................................................................... 54 
6.3.2 Seasonal component ........................................................................................................... 55 
6.3.3 Cyclical component ............................................................................................................ 55 

6.3.4 Irregular component .......................................................................................................... 56 


6.4 Time Series Models ................................................................................................................... 56 
6.4.1 Additive Model ................................................................................................................... 56 
6.4.2 Multiplicative Model .......................................................................................................... 57 
 ............................................................................................... 57 
6.5 Isolating the Trend Component
 ............................................................................................................ 57 
6.5.1 Least squares method
 ....................................................................................................................................... 58 
Activity 6.1
 .................................................................................................... 59 
6.5.2 Moving average method
 ....................................................................................................................................... 61 
Activity 6.2
 ........................................................................................... 61 
6.6 Isolating the Seasonal Component
 ....................................................................................................................................... 62 
Activity 6.3
 ...................................................................................................... 63 
6.6.1 Deseasonalising of data
 ....................................................................................................................................... 63 
Activity 6.4
 ............................................................................................ 63 
6.6.2 Predicted values of the series
 ....................................................................................................................................... 64 
Activity 6.5
 .................................................................................................................................... 64 
6.7 Summary
References ........................................................................................................................................ 65 

Unit 7 .................................................................................................................................................... 66 

Index Numbers .................................................................................................................................... 66 
7.1 Introduction ............................................................................................................................... 66 
7.2 Unit Objectives .......................................................................................................................... 66 
7.3 Types of Index Numbers........................................................................................................... 66 
7.3.1 Price indices ........................................................................................................................ 66 
7.3.2 Quantity indices ................................................................................................................. 67 
7.3.3 Value indices ....................................................................................................................... 67 
7.4 Simple Index Numbers ............................................................................................................. 67 

vi 
 
7.4.1 Simple price index .............................................................................................................. 67 
Activity 7.1 ....................................................................................................................................... 68 
7.4.2 Simple quantity index ........................................................................................................ 68 
Activity 7.2 ....................................................................................................................................... 68 
7.4.3 Index number series trends ............................................................................................... 69 
Activity 7.3 ....................................................................................................................................... 70 
7.4.4 Changing the base period .................................................................................................. 70 
Activity 7.4 ....................................................................................................................................... 71 
7.5 Weighted Index Numbers ......................................................................................................... 71 
7.5.1 Weighted average of relatives indices .............................................................................. 71 
Activity 7.5 ....................................................................................................................................... 73 
7.5.2 Weighted aggregate indices ............................................................................................... 73 
Activity 7.6 ....................................................................................................................................... 75 
7.6 Use of Index Numbers as Deflators ......................................................................................... 75 
Activity 7.7 ....................................................................................................................................... 76 
7.7 Challenges in Constructing Index Numbers ........................................................................... 77 
7.8 Summary .................................................................................................................................... 77 
References ........................................................................................................................................ 78 
Unit 8 .................................................................................................................................................... 79 
Statistics List of Formulae .................................................................................................................. 79 
8.1 Normal Distribution .................................................................................................................. 79 
8.2 Statistical Estimation ................................................................................................................ 79 
8.2.1 Point estimators .................................................................................................................. 79 
8.2.2 Confidence interval estimation ......................................................................................... 79 
8.3 Hypothesis Testing .................................................................................................................... 80 
8.3.1 Tests concerning the population mean ............................................................................. 80 
8.3.2 Test concerning a population proportion ........................................................................ 80 
8.4 Simple Linear Regression Analysis ......................................................................................... 80 
8.5 Correlation Analysis ................................................................................................................. 81 
8.5.1 Testing for the existence of a linear relationship between X and Y .............................. 81 
8.6 Introduction to Time Series Analysis ...................................................................................... 81 
 .................................................................................................................... 81 
8.6.1 Trend analysis
 ................................................................................................................ 81 
8.6.2 Seasonal analysis

vii 
 
 .......................................................................................................................... 82 
8.7 Index Numbers
 ...................................................................................................... 82 
8.7.1 Simple Index Numbers
 .................................................................................................. 82 
8.7.2 Changing the Base period
 .................................................................................................. 82 
8.7.3 Weighted Index Numbers
APPENDICES ..................................................................................................................................... 83 
Statistical Tables ............................................................................................................................. 83 

viii 
 
Module Overview
The module BBFH 107 Business Statistics II is a build up on the module BBFH 103 Business
Statistics 1. Students are normally required to pass the later module before they embark on
this module. While Business Statistics I was largely focused on Descriptive Statistics, this
module is mainly centred on the other branch of Statistics called Statistical Inference.

The module consists of eight units. In Unit 1 we look at the characteristics of the normal
curve before tackling the normal probability distribution. Unit 2 is about point estimation and
confidence interval estimation, while in Unit 3 we introduce you to hypothesis testing.
Simple regression analysis and correlation analysis which are statistical techniques of
establishing relationships between variables are covered in Unit 4 and Unit 5 respectively. In
Unit 6 we introduce you to time series analysis, while in Unit 7 we focus on index numbers.
In Unit 8 we provide you with a summary of important statistical formulae.

You are encouraged to study the worked examples before attempting activity questions in
each unit. References for further reading are provided at the end of each unit. We wish you
well in your studies.


 
BLANK PAGE
Unit 1
The Normal Distribution
1.1 Introduction
The normal distribution is used to model continuous random variables. A continuous random
variable can assume any value in a given interval. Examples of variables that can be modelled
by the normal distribution are:
• The times taken by a worker to complete an assigned task repeatedly
• The weights of all new born babies at a hospital
• The salaries of all government workers

However, the normal probability distribution can also be used to investigate the behaviour of
discrete variables that can have many values, for example, marks obtained by all ‘O’ Level
students in a Mathematics examination. Many other random variables occurring in practice
follow the normal distribution.

In this unit, you will learn about the properties of the normal distribution and evaluate
probabilities for variables that are believed to follow the normal distribution.

1.2 Unit Objectives


By the end of the unit, you should be able to:

• sketch the normal curve


• state the properties of the normal distribution
• use the properties of the normal distribution
• define the standard normal distribution
• transform a normal distribution into a standard normal distribution
• compute probabilities using standard normal distribution tables

1.3 The Normal Curve


The normal curve is bell-shaped as shown in Figure 1.1. The curve is completely specified by
the mean μ and variance σ 2 of the distribution under investigation.

Figure 1.1: The Normal Curve


 
1.3.1 Properties of the normal curve
The properties of the normal curve/distribution are:
1. The curve is symmetric about the mean
2. It is unimodal – has a single peak
3. At the line of symmetry, the mean, median and mode coincide, that is, mean = median
= mode
4. The curve approaches the horizontal axis asymptotically as we proceed in either
direction away from the centre. This means that the curve will not come into contact
with the horizontal axis at both ends but extends to infinity.
5. The total area under the curve and above the horizontal axis is equal to1

1.3.2 Standard normal curve


We have an infinite number of normal curves because different values of mean μ and
variance σ 2 will give rise to different normal curves with varying centres and peakedness.
However, one is selected as our standard. A mean of zero and variance of one will give rise to
the standard normal curve.

Let X be a random variable that is normally distributed with mean μ and variance σ 2 . We
write X ~ N( μ , σ 2 ) . For example, if the mean of X is 10 and variance is 25, we write X ~
N(10, 52) where 5 is the standard deviation. A random variable with mean zero and variance
one is called a standard normal variable and is denoted by Z, that is Z ~ N(0,1). The
distribution of Z is called the standard normal distribution. An arbitrary normally distributed
variable X is transformed to the standard normal distribution by the transformation

X −u
Z= [1.1]
σ

1.3.3 Area under the normal curve


The area under a normal curve between any two specified points gives the probability that the
random variable assumes values between the two points. In Figure 1.2, the shaded area give
the probability that a random variable X assumes values between x = x1 and x = x2, that is
P(x1< X < x2).

x1 μ x2

Figure 1.2 Area under the Curve between x1 and x2

To find P(x1 < X < x2), we must find standard values corresponding to x1 and x2 by the
x −μ x −μ
transformation z1 = 1 and z 2 = 2 .
σ σ
It now follows that P(x1 < X < x2) = P(z1< Z < z 2) and Figure 1.2 is transformed to look like
Figure 1.3 below.


 
z1 0 z2

Figure 1.3 Area under the Standard Normal Curve between z1 and z2

Table 1 in the appendices gives values for the area under the standard normal curve lying to
the left of any specified z value for values of z from -3.4 to 3.4. The area corresponds to the
probability that a given value is less than or equal to z, that is, P(Z ≤ z ) .

Example 1.1
A random variable X is normally distributed with a mean of 10 and variance 25. Find
standard values (z-values) corresponding to:
x = 12
x=8

Solution 1.1
X ~ N (10, 52)
X −u
We make use of the transformation Z = with μ = 10 and σ = 5 .
σ
12 − 10
a) x = 12: z =
5
= 0.4
8 − 10
b) x = 8: z =
5
= -0.4

Activity 1.1
A random variable X is normally distributed with a mean of 15 and variance 36. Find
standard values corresponding to:
a) x = 16
b) x = 13

1.4 Evaluating Probabilities using the Standard Normal Tables


Table 1 in the appendices is used to find the probability that Z is less than or equal to a
certain value, z or greater or equal to z. I will use a few examples to illustrate how this is
done.


 
Example 1.3
Let Z ~ N (0, 1). Find
P (Z ≤ 1.34)
P (Z ≤ −2.75)
P (Z ≥ 1.62)
P (0.47 ≤ Z ≤ 1.86)

Solution 1.3
a) The probability P (Z ≤ 1.34) is given by the area shown in Figure 1.4

0 1.34

Figure 1.4 The Probability P (Z ≤ 1.34)

To find P (Z ≤ 1.34) , we locate a value of z equal to 1.3 in the left column of Table 1. We
then move across the row to the column under 0.04 where we read 0.9099. Therefore, P (Z
≤ 1.34) = 0.9099.

b) The area required to find P (Z ≤ −2.75) is shown in Figure 1.5

-2.75 0

Figure 1.5 The Area Required to Find P (Z ≤ −2.75)

We locate a value of z = -2.7 under the left column. We then move across the row to the
column under 0.05, giving P (Z ≤ −2.75) = 0.0030.

c) P (Z ≥ 1.62)
The area required is the area under the standard normal curve to the right of z = 1.62 as
shown in figure 1.6

0 1.62

Figure 1.6 The Area Required


 
In the left column of Table 1, go to a value of z equal to 1.6, then move across that row to the
column under 0.02 where you read 0.9474. This is the area to the left of 1.62, but we want the
area to the right of 1.62 as shown in figure 1.6. You should remember that the total area
under the curve is equal to 1. Therefore, if we subtract the area to the left of z =1.62 from 1,
the remaining area to the right of 1.62 gives us P (Z ≥ 1.62).

P (Z ≥ 1.62) = 1 – P( Z < 1.62)


= 1 – 0.9474
= 0.0526

d) Figure 1.7 shows the area between z = 0.47 and z = 1.86

0 0.47 1.86

Figure 1.7 The Area between z = 0.47 and z = 1.86

The shaded area is obtained by subtracting the area to the left of z =0.47 from the area to the
left of z = 1.86, that is,

P( 0.47 ≤ Z ≤ 1.86) = P ( Z ≤ 1.86) − P( Z ≤ 0.47)


= 0.9686 – 0.6808
= 0.2878

Remark 1.1
The probability that a continuous variable takes a precise value is zero. This implies that the
probability of, say, z is less or equal to 1.25 is just the same as that of z is less than1.25. In
general P(Z ≤ z) = P(Z < z).

Activity 1.2
Let Z ~ N (0, 1). Find
a) P (Z ≤ 3.10)
b) P (Z ≥ −0.27 )
c) P (-1.45 ≤ Z ≤ 2.63)

Example 1.4
Given a random variable X which is normally distributed with mean 15 and variance 100,
find:
P(X < 20)
P(X > 12)
P( 12 < X < 20)


 
Solution 1.4
X ~ N (15, 102)
We begin by finding z-values corresponding to the x-values given using the transformation
given by equation 1.1.
X − μ 20 − 15
a) P(X < 20) = P ( < )
σ 10
= P (Z < 0.5)
= 0.6915
0 0.5
X − μ 12 − 15
b) P(X > 12) = P( > )
σ 10
= P (Z > -0.3)
= 1 - P (Z < -0.3)
= 1 – 0.3821 -0.3 0
= 0.6179

12 − 15 X − μ 20 − 15
c) P (12 < X < 20) = P ( < < )
10 σ 10
= P (-0.3 < Z < 0.5)
= P (Z < 0.5) – P (Z < -0.3)
= 0.6915 – 0.3821)
= 0.3094 -0.3 0 0.5

1.4.1 Practical problems


In this section we are going to solve practical problems using the normal probability
distribution.

Example 1.5
The delays that are experienced at a border post by truck drivers to clear their cargo were
found to be normally distributed with mean 48hours and a standard deviation of 6 hours. Find
the probability that a driver has to wait for:
a) at least 36 hours to clear his cargo
b) between 40 hours and 50 hours to clear his cargo

Solution 1.5
Let X be the total waiting time to get clearance. Then X ~ N (48. 62).
X − μ 36 − 48
a) P (X ≥ 36) = P ( ≥ )
σ 6
= P ( Z ≥ −2 )
= 1 – P(Z < - 2)
= 1 – 0.0228
= 0.9772 -2 0

40 − 48 X − μ 50 − 48
b) P( 40 < X < 50) = P ( < < )
6 σ 6
= P(-1.33 < Z < 0.33)
= P(Z < 0.33) – P(Z < -1.33)
= 0.6293 – 0.0918
= 0.5375 -1.33 0 0.33

 
Example 1.6
The demand for second hand Japanese cars in Zimbabwe is normally distributed with a mean
of 1 600 cars sold per month and standard deviation of 50 cars. What is the probability that:
a) at most 1 500 cars will be sold in one month
b) between 1 500 and 1 600 cars will be sold in one month

Solution 1.6
Let X be number of cars sold per month, then X ~ N (1 600, 502).

X −μ 1500 − 1600
a) P(X ≤ 1500) = P ( ≤ )
σ 50
= P (Z ≤ −2)
= 0.0228

-2 0

1500 − 1600 1650 − 1600


b) P (1500 < X < 1650) = P ( <Z< )
50 50
= P (-2 < Z < 1)
= P (Z < 1) – P (Z < -2)
= 0.8413 – 0.0228
= 0.8413

-2 0 1

Activity 1.3
1. The times that cars took to refuel at a busy service station are normally distributed
with mean 3 minutes and a standard deviation of 0.2 minutes. What is the
probability that a car will take
a) more than 4 minutes to refuel
b) not more than 2 minutes to refuel
c) between 2 and 4 minutes to refuel
2. A fast food restaurant finds that the number of meals it serves in a week is
normally distributed with a mean of 4 000 and a standard deviation of 200. What
is the probability that in a given week the number of meals served?
a) Will be at most 4 500.
b) Will be between 4 000 and 4 500.
3. On average a tuck-shop sells 300 loaves of bread per day with a standard deviation
of 50 loaves. Find the probability that the tuck-shop will sell at least 400 loaves
per day.

1.5 Summary
In this unit you learnt about the normal probability distribution which is used to model
continuous random variables. The distribution is completely specified by two parameters
which are the mean and variance of the distribution. The standard normal distribution has a
mean of 0 and a variance of 1. The area under the standard normal curve gives probabilities.
An arbitrary normal distribution is transformed to the standard normal distribution to
facilitate the evaluation of probabilities using prepared tables.


 
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.


 
Unit 2
Statistical Estimation
2.1 Introduction
Statistical investigations are usually carried out on samples drawn randomly from populations
of interest. As a result, statistical analysis will be based on sample data rather than population
data. The major reasons for this are that it is usually too expensive and time consuming to
collect population data. Sometimes it is also impossible to obtain population data. The results
of the sample study are then used to estimate results for the population thereby allowing
important decisions about the population to be made.

In this unit we introduce you to an important branch of statistical inference called statistical
estimation. You will learn about point estimation and confidence interval estimation for the
mean and proportion of a single population.

2.2 Unit Objectives


By the end of the unit, you should be able to:
• define statistical estimation
• provide justification for estimating population parameters
• distinguish between point estimation and confidence interval estimation
• find point estimates of population parameters
• construct confidence interval estimates of population parameters

2.3 What is Statistical Estimation?


Statistical estimation involves the use of sample statistics to predict the corresponding
population parameters. In this process, a sample measure is used as an estimate of the
corresponding population measure. A sample proportion p̂ is used as an estimator of the
population proportion p . For example, an aspiring Member of Parliament (MP) may want to
estimate the true proportion of voters that favour him in a constituency. The MP would obtain
the opinions of a random sample of eligible voters in the constituency. The fraction of voters
in the sample who favour the MP could be used as an estimate of the true proportion of voters
who are likely to vote for the MP.

Similarly, the sample mean x is taken as an estimator of the population mean μ while the
sample variance s 2 is taken as an estimator of the population variance σ 2 . It is important, to
ensure that samples used in statistical analysis are representative of their parent populations.
The only way to ensure that this is the case is to select random samples using probability
based sampling methods.

The estimation is in two forms namely point estimation and confidence interval estimation.

10 
 
2.4 Point Estimation
In point estimation, a single value of a statistic is used as an estimate of the population
parameter. The disadvantages of a point estimate are that:
• It is not exactly equal to the population mean μ most of the time. The actual estimate
may or be close to it.
• It is uncertain whether it will be a good estimate and we have no idea of the
probability that it is a good estimate. A point estimate does not reveal any information
about the accuracy of the estimation procedure.

2.4.1 Point estimator of the population mean


The point estimator of the population mean is the sample mean x given by:

1
x=
n
∑ xi [2.1]
where x1 , x2 , ..., xn are n randomly selected sample values drawn from the population.

Example 2.1
The daily sales ($) of a vegetable vendor over 30 randomly selected days are:
14 21 28 17 15 34 10 18 25 30 21 15 11 28 17 20 20 29 31 24 11 19 26
34 10 16 25 30 22 17

Find the point estimate of the population mean.

Solution 2.1
n = 30, ∑ x = 638
x=
∑x
n
638
=
30
= 21.26666667
≈ $21.27

2.4.2 Point estimator of the population variance


The best estimator of the population variance is the sample variance which is given by:

(∑ xi2 − ∑ i )
1 ( x )2
s2 = [2.2]
n −1 n

Example 2.2
Using the data of Example 2.1, find the point estimate for the variance.

11 
 
Solution 2.2
n = 30, ∑ x = 638 , ∑ x = 15046
2

(∑ x − ∑
2
1 ( x)
s =
2 2
) i

n −1
i
n
1 (638) 2
= (15046 − )
30 − 1 30
1
= (15046 − 13568.13333)
29
1
= (1477.866667)
29
= 50.96091954
≈ 50.9609

Activity 2.1
1. The weights (kg) of 20 bags of potatoes randomly selected from a truckload of
250 bags of potatoes are:
10.2 10.1 9.8 9.9 10.0 9.8 10.3 10.1 10.4 9.7 8.9 9.0 10.6 10.9 11.0
11.3 10.2 12.0 9.8 10.7
Find point estimates of the
a) population mean weight, and
b) population variance of the weight of all potatoes in the truck.
2. The bank balances of 30 randomly selected savings accounts are:
200 128 132 400 380 24 267 306 86 94 125 106 249 364 59
34 126 184 230 342 311 265 46 38 89 122 241 237 98 106
Find point estimates for the
a) population mean balance, and
b) population variance of balances of all savings accounts.

2.4.3 Point estimator of the population proportion


An estimator of the population proportion p is given by:
k
pˆ = [2.3]
n
where k in the number of elements with the desired characteristics and n is the sample size.

Example 2.3
Refer to the data of Example 2.1. Find the point estimate of the population proportion of daily
sales which are above $20.

Solution 2.3
x = 13 , n = 30
k
pˆ =
n
13
=
20
= 0.65

12 
 
Example 2.4
In a study to determine the proportion of teachers in Zimbabwe who are degree holders, 800
teachers out of a random sample of 2 000 teachers said they have degree qualifications.
a) Find a point estimate of the proportion of all Zimbabwean teachers who have degrees.
b) If there are 15 000 teachers altogether in the country, how many have degrees?

Solution 2.4
n = 2000 k = 800
k
a) pˆ =
n
800
=
2000
= 0.4
Thus 40% of all the teachers in Zimbabwe have degrees.

b) Number of teachers with degrees = 40% of 15 000


= 6 000.

Activity 2.2
1. Refer to Activity 2.1, suppose the standard weight of a bag of potatoes is 10 kg. Find
an estimate of the proportion of all potato bags that are under weight.
2. A church organisation has a total membership of 600. A survey conducted at the
church showed that 80 church members out of a random sample of 200 members had
bibles. Find a point estimate of the proportion of church members who do not have
bibles.

2.5 Confidence Interval Estimation


A confidence interval is a range of numbers believed to include an unknown population
parameter. Attached to the interval is a measure of our confidence that the interval indeed
contain the population measure.
The general formula for a confidence interval is given by:

estimate ± (table) (standard error of estimate)

Confidence interval estimation is preferred to point estimation because the probability that
the interval includes the population measure is known. This is an advantage of interval
estimation over point estimation in that the probability is a measure of our confidence in the
estimated result.

Let us suppose that a 95% confidence interval for the population mean is, say, (10, 13), then
the probability that the mean is included in the interval (10, 13) is 0.95. Hence, we are 95%
confident that the mean lies in the range (10, 13). The probability that the population mean is
not contained in the interval (10, 13) is now 5%. The 5% is the level of error associated with
our confidence interval estimate; it is called the level of significance and it is denoted by α .

13 
 
2.5.1 Interval estimate of the population mean
The formulae that we use to find confidence interval estimates for the population mean μ
depends on whether the population variance is known or not known and also on whether the
sample size is large or small. A sample size of 30 or more is considered a large sample
otherwise it is a small sample.

There are three cases to consider:

Case I
If the population standard deviation σ is known, a 100 (1 − α ) % confidence interval for μ is
given by:
σ
x ± Zα 2 × [2.4]
n
where x is the mean of a sample of size n from a population with variance σ 2 , Zα 2
is the value of the standard normal distribution such that the area under the curve to
α σ
the right of it is and is the standard error of the mean.
2 n

Example 2.5
An electrical firm supplies light bulbs that have a length of life that is approximately
normally distributed with a standard deviation of 20 hours. If a random sample of 40 bulbs
has an average life of 800 hours, find
a) a 95% confidence interval for the population mean life of all bulbs supplied by this
firm
b) a 99% confidence interval for the population mean life of all bulbs supplied by this
firm

Solution 2.5
σ = 20 n = 40 x = 800 α = 0.05 ⇒ Z 0.05 2 = Z 0.025 = 1.96
a) A 95% confidence interval for μ is
σ
= x ± Zα 2 ×
n
20
= 800 ± 1.96 ×
40
= 800 ± 6.198064214
= (793.8019, 806.1981)

Thus we are 95% confident that the mean life of all bulbs is between 793.8019 hours
and 806.1981 hours.
b) A 99% confidence interval for μ is
σ
= x ± Zα 2 ×
n
20
= 800 ± 2.5758 ×
40
= 800 ± 8.145394797
= (791.8546, 808.1454)

14 
 
Thus we are 95% confident that the mean life of all bulbs is between 791.8546 hours
and 808.1454 hours.

If we compare the two intervals, you will see that the one based on a higher confidence level
of 99% is wider and conveys less information about the possible value of μ than does the one
based on 95% which is narrower. In general, we say that when sampling is from the same
population, using a fixed sample size, the higher the confidence level, the wider the interval.

Activity 2.3
The burning times of a particular brand of candles imported from Mozambique are
known to be normally distributed with a standard deviation of 5 minutes. The mean
burning times of a random sample of 20 candles was 3 hours. Find a 90% confidence
interval for the mean burning time of all such candles.

Case II
When the population standard deviation σ is unknown and the sample size is large, n ≥ 30 ,
then a 100(1 − α ) % confidence interval for population mean μ is given by
s
x ± Zα 2 × [2.5]
n
where s is the sample standard deviation.

Example 2.6
The Head of a rural primary school is worried by the big number of students who arrive late
for school. In order to be able to adjust the school starting time, he sought to find the average
distance walked by the students to school from home. The mean and standard deviation of the
distances travelled by a random sample of 60 students were 6km and 800m respectively.
Construct a 90% confidence interval for the mean distance travelled by all the students to
school.

Solution 2.6
n = 60 x = 6 s = 800m = 0.8km Z 0.10 2 = Z 0.05 = 1.6449
A 90% confidence interval for μ is
s
x ± Zα 2 ×
n
0 .8
= 6 ± 1.6449 ×
60
= 6 ± 0.169884541
= (5.8301, 6.1699)

We are 90% confident that the mean distance travelled by the students to school is between
5.8301km and 6.1699km.

Activity 2.4
A survey of 400 company executives revealed that the average annual earnings of a
CEO is $200 000 with a standard deviation of $600. Find a 99% confidence interval
for the true average annual earnings for all company executives.

15 
 
Case III
This is the case where the population standard deviation σ is unknown and the sample size is
small, ( n ≤ 30 ). A 100(1 - α ) % confidence interval for μ is given by
s
x ± tα 2 ( n − 1) × [2.6]
n
where n − 1 is the number of degrees of freedom.

Remark 2.1
Since n < 30 , the sample standard deviation of a small sample is not a reliable enough
estimate of the population standard deviation to enable the use of the z- distribution, as a
result we use the t-distribution.

Example 2.7
Refer to Activity 2.1. Find a 95% confidence interval for the mean weight of all bags of
potatoes in the truck.

Solution 2.7
n = 20 x = 10.235 s = 0.7264 α = 0.05 ⇒ tα 2 (n − 1) = t0.025 (19) = 2.09
A 95 % confidence interval for μ is
s
x ± tα 2 (n − 1) ×
n
0.7264
= 10.235 ± 2.09 ×
20
= 10.235 ± 0.339474473
= (9.8955, 10.5745)

We are 95% confident that the true mean weight of all bags of potatoes in the truck is
between 9.8955kg and 10.5745 kg.

Example 2.8
A stock market analyst wanted to estimate the average return on a certain stock. A random
sample of 20 days yielded an average return of 12% and a standard deviation of 4%.
Construct a 95% confidence interval estimate for the average return on this stock?

Solution 2.8
σ is unknown and n = 20 is small, therefore we use the t-distribution. A 95%
confidence interval for μ is
s
x ± tα 2 ( n − 1) ×
n
4
= 12 ± t 0.025 (19) ×
20
= 12 ± 2.09 × 0.894427191
= 12 ± 1.869352829
= (10.1306, 13.8694)

We are 95% confident that the average return on this stock is between 10.13% and 13.87%.

16 
 
Activity 2.5
A random sample of 10 cigarettes of a certain type has an average nicotine content of 15
milligrams and a standard deviation of 2.5 milligrams. Construct a 99% confidence interval
for the true average nicotine content of all the cigarettes.

2.5.2 Estimation of the population proportion


A population proportion shows the percentage of a population that possesses the
characteristic of interest. A 100(1 − α ) % confidence interval for the population proportion p
is given by:
pˆ (1 − pˆ )
pˆ ± Zα 2 × [2.7]
n

Example 2.9
In a survey of 300 company executives carried out by the Zimbabwe Congress of Trade
Unions (ZCTU), 81 executives said they are willing to publicly disclose their annual salaries.
Find a 99% confidence interval for the proportion of all executives who are willing to
disclose their annual salaries.

Solution 2.9
n = 300 k = 81 α = 0.01 ⇒ Z α 2 = Z 0.005 = 2.5758
k 81
pˆ = = = 0.27
n 300
A 99% confidence interval for p is
pˆ (1 − pˆ )
pˆ ± Zα 2 ×
n
0.27 × 0.73
= 0.27 ± 2.5758 ×
300
= 0.27 ± 2.5758 × 0.025632011
= 0.27 ± 0.066022934
= (0.2040, 0.3360)

Between 20.4% and 33.6% of all company executives are willing to publicly disclose their
annual salaries.

Activity 2.6
A random sample of 400 customers who visited a retail shop was interviewed and 280
were found to have a preference for a certain brand of toothpaste. Find a 90%
confidence interval for the proportion of the population of customers who prefer the
particular brand of toothpaste.

2.6 Determining Sample Size in Estimation


To decide on the sample size appropriate for a survey study you have to make a compromise
between two factors which are:

17 
 
• The resources available in terms of time and cost of the study. A huge sample is
costly to study and the study requires more time.
• The degree of accuracy required. The larger the sample that is used, the narrower the
interval. A narrower interval is associated with less uncertainty and more accurate
estimation results.

In order to determine the sample size for your study, you need to specify the precision of your
estimate and the level of confidence desired. The precision is given by the error that you are
prepared to tolerate in your estimated results. You also need an estimate of the population
standard deviation. This can be obtained from a pilot survey carried out before the actual
study.

2.6.1 Sample size for estimating population mean


A confidence interval for population mean μ provides an estimate of the accuracy of our
point estimate x . If μ is actually the centre value of the interval, then x estimates μ without
error. However, x will not be exactly equal to μ most of the time, and the point estimate is
usually in error.

We may wish to determine how large a sample is necessary to ensure that the error in
estimating μ will not exceed e - the ‘bound on the error’. In the confidence interval
σ σ
x ± Zα 2 × , the ‘bound on the error, is e = Zα 2 ×
. Now, making n the subject
n n
of formula, the sample size necessary so that the error will not exceed e will be
shown to be:
⎡ Zα 2 × σ ⎤
2

n=⎢ ⎥ [2.8]
⎣ e ⎦

Example 2.10
In Example 2.5, how large a sample is required if we wish to be 95% confident that our
sample mean will be within 10 hours of the true mean?

Solution 2.10
α = 0.05 ⇒ Zα 2 = 1.96 e = 10 σ = 20
⎡ Zα 2 × σ ⎤
2

n=⎢ ⎥
⎣ e ⎦
⎡1.96 × 20 ⎤
2

=⎢
⎣ 10 ⎥⎦
= [3.92]2
= 15.3664
≈ 16

18 
 
Activity 2.7
1. Find the minimum sample size required for estimating the average return on money
market investments to within 0.5% per year with 99% confidence. The standard
deviation of returns is believed to be 2% per year.

2. A market researcher would like to estimate the average amount spent on airtime per
month by each female student at a college. The researcher would like to be able to
determine the average amount spent by all female students at the college to be
within $1 with 95% confidence. From past studies, the population standard
deviation is known to be $2. What is the minimum required sample size?

2.6.2 Sample size for estimating a population proportion


The minimum sample size required to estimate the population proportion to be within a
specified amount e with 100(1 − α )% confidence is given by:
pˆ (1 − pˆ ) Zα2 2
n= [2-9]
e2

Example 2.11
In example 2.9, how large a sample is needed if we wish to be 99% confident that our sample
proportion will be within 0.02 of the true proportion of all the CEOs who are willing to
disclose their annual salaries?

Solution 2.11
We know that pˆ = 0.27 and Zα 2 = Z 0.005 = 2.5758 .
The minimum sample size required is
pˆ (1 − pˆ ) Zα2 2
n=
e2
0.27 × 0.73 × 2.57582
=
0.02 2
= 3269.2709
≈ 3270

In practice, you cannot collect sample data before deciding on the sample size to use.
Therefore, we require a way of estimating the appropriate sample size for a study which does
not dependent on the sample proportion, p̂.

The largest value that pˆ (1 − pˆ ) can have is 0.5. You can show this by working out the value of
pˆ (1 − pˆ ) using increasing values of p starting with p = 0.1. If we assume the largest value
of pˆ (1 − pˆ ) , then formula [2.9] is reduced to
2
⎡Z ⎤
n=⎢ α 2⎥ [2.10]
⎣ 2e ⎦

Example 2.12
A researcher intends to conduct a study to estimate the proportion of supermarkets that offer
trolleys suitable for customers with difficulty in walking. Determine the sample size needed if

19 
 
the researcher wishes to be 95 % confident that the estimated proportion is within 8% of the
true proportion.

Solution 2.12
α = 0.05 ⇒ Zα 2 = 1.96 e = 0.08
2
⎡Z ⎤
n=⎢ α 2⎥
⎣ 2e ⎦
2
⎡ 1.96 ⎤
=⎢
⎣ 2 × 0.08 ⎥⎦
= 150.0625

This has to be rounded up in order to meet the confidence requirement. Thus a sample size of
151 supermarkets should be used.

Activity 2.8
1. In a survey of a random sample of 300 shoppers, 180 said they would prefer to make
payments using debit cards. How large a sample is needed if we are to be 95%
confident that the estimate is within 5% of the actual proportion of shoppers who
prefer to transact using debit cards.

2. A DStv research team intends to install monitoring devices in a random sample of


households in order to produce 99% interval estimates of the proportion of
households watching specific programmes. In how many households will the team
have to install the devices if they want to estimate to within 1% of the true
proportion.

2.7 Summary
In this unit we introduced you to an important branch of statistical inference called
estimation. Estimation is about the use of sample measurements to predict population values.
The estimation is done in two ways namely point estimation and confidence interval
estimation. The major drawback of a point estimate is that we have no idea of the probability
that it is a good estimate. This makes interval estimates preferable because they are
associated with a known level of confidence which is a measure of how confident we are that
the interval does include within it the population parameter.

We looked at how interval estimates for a population mean and population proportion are
constructed. We also looked at how to determine the appropriate sample size for estimation
surveys.

20 
 
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.

21 
 
Unit 3
Hypothesis Testing
3.1 Introduction
Hypothesis testing is that branch of statistical inference that is used to verify claims made
concerning population parameters.

In this unit, you will be introduced to tests of hypotheses concerning a single population. The
terminology used in hypothesis testing will be explained. You will learn how to conduct
hypotheses tests concerning the mean and proportion of a single population.

3.2 Unit Objectives


By the end of the unit, you should be able to:
• define a statistical hypothesis
• formulate the null hypothesis and the alternative hypothesis for a given situation
• distinguish between Type I and Type II errors
• distinguish between one-tailed and two-tailed tests
• outline the steps followed in the procedure of hypothesis testing
• conduct tests concerning the population mean μ
• conduct tests concerning the population proportion p

3.3 Statistical Hypotheses


A statistical hypothesis is a claim or a guess or an assumption or a statement which may or
may not be true, made concerning a population parameter.

Hypothesis testing involves gathering evidence from a random sample drawn from the
population of interest in order to decide whether the null hypothesis is likely to be true or
false. The hypothesis is rejected if evidence from the sample is not consistent with the stated
hypothesis, otherwise it is accepted. However, the acceptance of the stated hypothesis does
not necessarily imply that it is true, rather it is a result of insufficient evidence to reject it.

3.3.1 Types of hypotheses


There are two types of hypotheses which are called the:
• null hypothesis, and
• alternative
• hypothesis

A null hypothesis is an assertion about the value of a population parameter. It is a formal


statement of the claim being made concerning a population measure. The null hypothesis is
denoted by H0.

22 
 
The alternative hypothesis, denoted by H1, is the negation of the null hypothesis. For example,
a null hypothesis might assert that the population mean is equal to a specified value μ0 . We
write this as H 0 : μ = μ0 . The alternative hypothesis oppose this assertion and it is written as
H1 : μ ≠ μ0 . In this case, the alternative hypothesis suggests that the mean takes values that
are either below μ0 or above it. Therefore, to investigate H0 we conduct a non-directional test
which is known as a two-tailed test.

A null hypothesis might assert that the population mean is at least equal to a certain specified
value μ0 . We write H 0 : μ ≥ μ0 . In this case, the alternative hypothesis would consist of
values below μ0 , that is, H1 : μ < μ0 . Similarly, if a null hypothesis assert that the population
mean is less than or equal to a specified value μ0 , that is, H 0 : μ ≤ μ0 , the alternative will be
H1 : μ > μ0 . In both these cases, since the alternative hypotheses consist of values either
below or above the specified value μ0 , we conduct a one-sided test or a one-tailed test.

3.3.2 Deciding on the null hypothesis


Determining what the null hypothesis should be in a given situation may prove to be difficult.
However if the null hypothesis is wrongly formulated, then the test will be pretty
meaningless. The following notes will be handy in deciding what the null hypothesis should
be:
• The null hypothesis always has an element of equality; either an equal to (=) sign or a
greater or equal to ( ≥ ) sign or a less than or equal to ( ≤ ) sign is used in the
expression of H0.
• The null hypothesis is usually an expression of a claim made by someone. However,
if the claim does not include an equal to (=) sign it becomes the alternative
hypothesis.
• The null hypothesis is the hypothesis that we formulate with the hope of rejecting.
• If the null hypothesis is true, then no corrective action would be necessary, whereas if
the alternative hypothesis is true, some corrective action would be necessary.

Example 3.1
A ZOU Regional Director claims that the average age of a ZOU student is 21. A new
Programme Coordinator at the region doubts this claim. Set up the null and alternative
hypothesis if the Coordinator wishes to show that it is not 21.

Solution 3.1
H 0 : μ = 21
H1 : μ ≠ 21

Example 3.2
A leading bakery claims that the average cost of producing a standard loaf of bread is 80
cents. If you suspect that the claim exaggerates the cost, how would you set up the null and
alternative hypothesis?

Solution 3.2
H 0 : μ ≥ 80c
H1 : μ < 80c

23 
 
Activity 3.1
1. A ZIMRA official at a busy border post claims that it takes, on average, at most 2
days for a truck driver to clear his consignment. You suspect that the average is
greater than 2 days and you want to test the claim. State the null and alternative
hypothesis for this test.

2. An ice cream vending machine is set to dispense 100 grams per cup. You suspect
that the machine is under-filling the cups. Set up the null and alternative hypothesis
to investigate this case.

3.4 Type I and Type II Errors


In deciding to reject or accept a null hypothesis, there will be chances for erroneously
rejecting or accepting it. Such errors may be due to faulty sampling procedures.

A type I error is committed when a true null hypothesis is rejected. The probability of
committing a type I error is called the level of significance and it is denoted by α . It is
common to use 1%, 5% and 10% level of significance in calculations.

A type II error is committed if we accept the null hypothesis when it is false. The probability
that the test will be able to detect a false null hypothesis is called the power of a test. In other
words, the power of a test is the probability of rejecting H0 when indeed H0 is false.

3.5 Steps Followed in Hypothesis Testing


The following steps should be followed when conducting a hypothesis test:

Step 1: State the Null and Alternative Hypothesis


The null and alternative hypotheses are specified at this initial stage before gathering any
evidence. It would be unethical and rather manipulative to formulate the H0 and H1 at one’s
convenience after gathering evidence; a practise that we refer to as data snooping.

Step 2: Identify the Distribution


For problems in this unit, you have to choose between two distributions namely the z-
distribution and the t-distribution.

When testing for the population mean μ we use:


a) The z-distribution when the:
• population standard deviation σ is known
• population standard deviation is unknown and n is large ( n ≥ 30)
b) The t-distribution when the population standard deviation σ is unknown and the
sample size n is small ( n < 30) .

When testing for a population proportion and the sample size is large we use the z-
distribution.

24 
 
Step 3: Determine the Rejection and Acceptance Region
Depending on the distribution identified and the level of significance desired, you find a
value from statistical tables which we call a critical value. The critical value separates the
acceptance region from the rejection region. The rejection region is made up of a range of
values such that if a test statistic calculated from sample data falls in it the null hypothesis
would be rejected. The rejection region also depends on the nature of the alternative
hypothesis as shown in the following figures.

area of rejection ( α 2 ) area of rejection ( α 2 )

critical value 0 critical value

Figure 3.1 Rejection Region for H1 : μ ≠ μ0

Area of rejection

0 critical value

Figure 3.2 Rejection Region for H1 : μ > μ0

area of rejection (α )

0
Critical value

Figure 3.3 Rejection for H1 : μ < μ0

Step 4: Calculate the Test Statistic


A test statistic is a value calculated from sample data that is used to decide whether or not to
reject H0. Once the test statistic falls within the rejection region, H0 is rejected.

The calculation of the test statistic depends on whether the population standard deviation σ is
known or unknown and also on the sample size as summarised in Table 3.1 below.

25 
 
Table 3.1 Test Statistic for Testing μ
When σ is known When σ is unknown
Case I: n is large or small Case II: n is large
x − μ0 x − μ0
Z cal = ~ N(0,1) [3-1] Z cal = ~N(0,1) [3-2]
σ n s n

Case III: n is small


x − μ0
Tcal = ~ t(n-1) [3-3]
s n

When testing for a single population proportion p , we use the z-distribution and the test
statistic is given by:

pˆ − p0
Z cal = [3-4]
p0 q0
n

Step 5: Decide Whether or not to Reject H0


The decision is made on the basis of a comparison between the value of the test statistic and
the critical value. If the test statistic is greater than the critical value in absolute terms, it will
fall in the rejection region thus leading to the rejection of H0.

Step 6: Make a Conclusion


If H0 is rejected, we conclude that H1 is probably true. If we fail to reject H0, we conclude
that the evidence gathered is insufficient to warrant the rejection of H0.

3.6 Tests Concerning the Population Mean


Example 3.3
A stock market analyst claims that the average annual return on stocks in the construction
industry is 12%. You want to test whether this claim is true. You collect a random sample of
36 stocks in the construction industry and finds that the average annual return is 10% with a
standard deviation of 3%. Use a 5% level of significance to test the analyst’s claim.

Solution 3.3
1. H 0 : μ = 12%
H1 : μ ≠ 12%

2. The population standard deviation σ is unknown, but the sample size n =36 is large, so
we use the z-distribution.

3. The nature of the alternative hypothesis suggests we need to carry out a two-tailed test.
Using α =0.05, the critical value is ± Zα 2 = ± Z 0.025 = ±1.96

26 
 
-1.96 0 1.96
The rejection criteria is therefore to reject H0 if Z cal > 1.96

x − μ0
4. Z cal =
s n
10 − 12
=
3 36
=-4

5. Since Z cal = 4 > 1.96 , we reject H0

6. We conclude that the average annual return is not 12% and therefore the analyst’s
claim is false.

Example 3.4
The average weekly earnings of all bus rank marshals is reported to be $180. You believe it is
too low. You collect a random sample of 100 rank marshals and find that the weekly average
is $250 with a standard deviation of $20. Conduct the test at 10% level of significance.

Solution 3.4
1. H 0 : μ ≤ 180
H1 : μ > 180

2. The population standard deviation σ is unknown, but the sample size n =100 is large,
so we use the z-distribution.

3. α = 0.05 and it is a one-tailed test. The critical value is Zα = Z 0.10 = 1.2816

0 1.2816

We would reject H0 if Zcal > 1.2826

27 
 
x − μ0
4. Z cal =
s n
250 − 180
=
20 100
= 35.

5. Since Zcal =35 is greater than the critical value =1.2816, we reject H0.

6. We conclude that the average weekly earnings of all rank marshals is greater than
$180.

Example 3.5
In an advertisement it is claimed that a certain brand of air freshener will last on average at
least 40 days. A random sample of 12 households took the following number of days to use
up the air freshener:
28 41 36 50 17 39 21 64 26 30 42 12

Test the claim made for the product using a 5% level of significance.

Solution 3.5
1. H 0 : μ ≥ 40
H1 : μ < 40

2. The population standard deviation σ is unknown, but the sample size n =12 is small, so
we use the t-distribution.

3. α = 0.05 and it is a one-tailed test. The critical value is − tα (n − 1) = −t0.05 (11) = −1.80

-1.80 0
We would reject H0 if Tcal > 1.80

4. You should verify that x = 33.8333 and s =14.6339


x − μ0
Tcal =
s n
38.8333 − 40
=
14.6339 12
= -0.2762

5. Since Tcal = 0.2762 < 1.80, we fail to reject H0


6. We conclude that the data does not provide sufficient evidence at 5% level of
significance to reject H0

28 
 
Activity 3.2
1. Average total daily sales of a fruits vendor are known to be at most $26. The
vendor recently changed his site of operation and moved to a new site at a busy
street corner. He now wants to know whether his daily sales have improved since
then. A random sample of 16 trading days gave an average of $30 with a standard
deviation of $5. Does the data provide evidence that the vendor’s average total
daily sales have improved? Use α = 0.05 .

2. A graduate student comes out of college with an average fees debt of $1 500. A
sample of 200 graduates showed that the average debt was $900 with a standard
deviation of $120. Carry out the test at the 5% level of significance.

3. The average time that children who reside in the same neighboured spent to travel
to school is claimed to be 35 minutes. A random sample of 10 children taken from
the neighbourhood had their travel times recorded as follows:

37 38 40 35 36 35 39 37 40 42

Test the claim using a 1% level of significance.

3.7 Test Concerning a Population Proportion


Example 3.6
A long distance bus conductor claims that at least 25% of passengers buy some bananas to eat
before reaching their destinations. If 18 out of a random sample of 60 passengers bought
bananas during their trip with the bus, is the conductor right? Use 5% level of significance.

Solution 3.6
1. H 0 : p ≥ 0.25
H1 : p < 0.25

2. n = 60 is a large sample; we use the z-distribution

3. It is a one-tailed test. The critical value is - Z 0.05 = −1.6449

-1.6449 0

We would reject H0 if Zcal < -1.6449

29 
 
pˆ − p0
4. Z cal =
p0 q0
n
0.3 − 0.25
=
0.25 × 0.75
60
0.05
=
0.05590
= 0.8945

5. We fail to reject H0

6. We conclude that the data does not provide sufficient evidence to reject H0

Example 3.7
Last year, 70% of total student applications received by the Zimbabwe Open University were
from female applicants. Out of a random sample of 150 applications received this year, 90
were from females. Test the hypothesis that the proportion of applications from females has
not changed using a 10% level of significance.

Solution 3.7
1. H o : p = 0.70
H o : p ≠ 0.70

2. n = 150 is a large sample, so we use the z-distribution

3. Using α = 0.10 , the critical value is ± Z 0.05 = ±1.6449

-1.6449 0 1.6449

We would reject H0 if Z cal > 1.6449

30 
 
90
4. pˆ = = 0 .6
150
pˆ − p0
Z cal =
p0 q0
n
0.60 − 0.70
=
0.7 × 0.30
150
− 0.1
=
0.037416573
= -2.6726

5. Since Z cal = 2.6726 > 1.6449 , we reject H0

6. We conclude that the proportion of female applicants has changed.

Activity 3.3
1. The Traffic Safety Council of Zimbabwe claims that at least 65% of all road accidents are
due to human error. In a random sample of 500 road accidents, it was found that 342
accidents were due to human error. Use 5% level of significance to test the claim.

2. A credit controller of a clothing retail chain estimates that 20% of their customers default
on their monthly bill payment. A random sample of 400 accounts indicated that 130
accounts were at least one month in arrears. Does the data provide evidence to support the
credit controller’s claim? Use α = 0.10

3.8 Confidence Interval Approach to Hypothesis Testing


Suppose (a, b) is a 100 (1 − α )% confidence interval for μ , then the confidence interval (a, b)
provides plausible values of μ under the null hypothesis H 0 . Assuming H 0 , if μ falls within
(a, b) ,we do not reject H 0 but if it lies outside the interval (a, b) we reject H 0 .Thus the lower
and upper confidence limits form a pair of critical values beyond which H 0 will be rejected as
shown in Figure 3.4.

Rejection Acceptance Rejection


region region region

a b

Figure 3.4 Confidence Interval as Critical Values

31 
 
Example 3.8
An electrical firm supplies light bulbs that have a length of life that is approximately
normally distributed with a standard deviation of 20 hours. If a random sample of 40 bulbs
has an average life of 800 hours,
a) Find a 95% confidence interval for the population mean life of all bulbs supplied by
this firm.
b) Hence test at 5% level of significance the claim that the population mean life of all
bulbs supplied by this firm is 800 hours.

Solution 3.8
a) From Solution 2.5, the 95% confidence interval for mean life of bulbs was found to be
(791.8546; 808.1454).
b) The hypotheses tested are: H 0 : μ = 800
H1 : μ ≠ 800
Assuming H 0 , the claim is probably true because the confidence interval (791.8546;
808.1454) includes 800.

Activity 3.4
Last year, 70% of total student applications received by the Zimbabwe Open
University were from female applicants. Out of a random sample of 150 applications
received this year, 90 were from females. Use the confidence interval approach to test
the hypothesis that the proportion of applications from females has not changed using
a 10% level of significance.

3.8 Summary
In this unit, you learnt about how to conduct hypotheses tests concerning the mean and
proportion of a single population. We defined a statistical hypothesis is an assumption or a
statement which may or may not be true, made concerning a population parameter.
Hypothesis testing therefore is about verifying whether the claim is true or false. We saw that
there are two types of hypotheses namely the null and alternative hypothesis. The null
hypothesis is a statement of the assertion made concerning a population parameter.

The decision to reject or accept H0 is based on evidence gathered from a random sample
drawn from the population of interest. A wrong decision may be arrived at due to sampling
errors. A type I error is committed when H0 is rejected when in actual fact it is true. If H0 is
accepted when in fact it is false, the error committed is called a type II error.

The procedure of hypothesis testing should be followed religiously. The steps to be followed
were stated as:
a) State the null and alternative hypothesis
b) Identify the distribution
c) Determine the rejection and acceptance region
d) Calculate the test statistic
e) Decide whether or not to reject H0
f) Make a conclusion
The hypothesis is rejected if evidence from the sample is not consistent with the stated
hypothesis, otherwise it is accepted. However, the acceptance of the stated hypothesis does
not necessarily imply that it is true, rather it is a result of insufficient evidence to reject it.

32 
 
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.

33 
 
Unit 4
Simple Linear Regression Analysis
4.1 Introduction
Regression analysis is a widely used statistical technique that has many business applications.
It involves studying the relationship between variables and formulating models that connect
the variables.

In this unit, you will learn how to use observed data to estimate the functional form of the
relationship between the variables. You will also learn how to use the fitted model for
prediction purposes.

4.2 Unit Objectives


By the end of the unit, you should be able:
• define regression analysis
• distinguish between a dependent variable and an independent variable
• construct a scatter plot to depict the relationship between two variables
• state the assumptions of the general linear regression model
• fit a simple linear regression model to a set of data
• explain some uses of regression analysis
• use the fitted regression model to predict values of Y for given values of X

4.3 What is Regression Analysis?


In real life situations we are usually interested in relationships between variables. Regression
analysis is a statistical technique of modelling the relationship between variables.

Simple linear regression involves modelling a relationship between only two variables. The
word linear implies fitting a straight-line model. When more than two variables are involved
the technique is called multiple regression analysis. The use of simple linear regression in
applied research is limited because the workings of most socio-economic systems are too
complex to be represented by such a simple formulation. However, in this course we will
only look at the relationship between two variables. This will help us explain the fundamental
ideas underlying regression analysis as simply as possible.

4.4 Types of Variables


There are two types of variables that we deal with in simple linear regression analysis. The
variable that we cannot control and has a dependence relationship on the other variable is
called the response variable or dependent variable, denoted by Y. The variable that we can
easily manipulate and fix beforehand is called the explanatory or independent variable
denoted by X.

34 
 
Example 4.1
When estimating the relationship between expenditure on advertisement and sales, we can fix
monthly levels of expenditure on advertisement but have no direct control over the monthly
sales realised. Therefore, advertisement expenditure is the independent variable and sales
become the dependent variable.

Activity 4.1
State the independent and dependent variable in each of the following studies:
a) Modelling the relationship between company profits and wages.
b) Modelling the relationship between level of risk and returns on investment.
c) Estimating the relationship between starting salary and level of education attained.
d) The relationship between consumption expenditure and disposable income.

4.5 Scatter Plots


Random pairs of observations for the independent and dependent variables are plotted on a
graph to show the kind of relationship that exists between the variables. From the scatter plot
you can tell if the relationship can be modelled by a straight line equation before proceeding
to do linear regression analysis.

Example 4.2
A company would like to estimate the relationship between its monthly sales and the amount
that the company spends on advertisement per month. A random sample of monthly
observations made over the past year is:

Monthly Expenditure ($00) Monthly Sales ($00)


7 18
9 35
5 12
15 50
12 36
6 24

Draw a scatter plot to represent the data. Comment on the kind of relationship between
monthly expenditure on advertisement and monthly sales.

35 
 
Solution 4.2

60

50

Monthly Sales ($00)
40

30

20

10

0
0 2 4 6 8 10 12 14 16
Monthly Expenditure ($00)

Figure 4.1 Scatter Plot for Example 4.2

The points seem to be following a line with positive gradient. If you insert a line of best fit
through the points, you will see that the points do not deviate much from the line. We can
therefore conclude that there is a strong positive linear relationship between monthly
expenditure on advertisement and monthly sales.

Activity 4.2
The following table gives the ages (in years) and prices (in thousands of dollars) for a
random sample of 10 used cars of a specific model on display at a Car Sale.

Age (years) 7 3 9 4 7 5 8 6 2 5
Price ($000) 3 7 1 5 2 6 2 4 7 5

Draw a scatter plot to show the relationship between price and age of car.

36 
 
The following sketch diagrams show scatter plots that you may also encounter and how you
should interpret them.

Y Y

X X

(a) A perfect positive linear relationship (b) A perfect negative relationship

Y Y

X X

(c) A weak positive linear relationship (d) No relationship between X and Y

Figure 4.2 Scatter Plots Showing different Relationships between X and Y

In Figure 4.2, you should take note of the following:


• The relationship between X and Y in (a) and (b) is exact because all the points lie in a
straight line
• The points in (c) show greater dispersion from the best straight line, therefore the
diagram show a weak positive linear relationship between X and Y
• The points in (d) show no discernible pattern, implying there is no relationship
between X and Y

Activity 4.3
Sketch scatter diagrams to represent the following relationships:
a) A weak negative linear relationship
b) A strong negative linear relationship

The best straight line that passes through the points on a scatter plot can be fitted ‘by eye’ as
demonstrated in Figure 4.3. Where possible, the line is made to pass through the majority of
points on the scatter plot leaving almost an equal number of points above and below it.

37 
 
Y

X
Figure 4.3 Fitting the Best Straight Line by Eye

However, there is no guarantee that fitting a line by eye will produce the best-fit line.
Different people will produce different lines despite using the same data. The method is,
therefore, unreliable and inconsistent. A more reliable method is the Least Squares Method,
the results of which are dealt with in subsection 4.6.1.

4.6 The Simple Linear Regression Model


The simple linear regression model is given by
Y = β 0 + β1 X + e [4-1]
where Y is the dependent variable, X is the independent variable, e is the error term, β 0 and
β1 are called population regression coefficients representing the intercept and slope
respectively. The model represents the unknown population relationship between the two
variables.

4.6.1 Model assumptions


The assumptions that underlie the simple linear regression model are that:
• The relationship between X and Y is a linear
• The values of X are fixed and not random
• The errors e are normally distributed with mean zero and a constant variance, σ 2
• The errors are independent of each other, that is, they are uncorrelated with one
another

4.6.2 Random error term


The simple linear regression model consists of two components namely a non-random
component, which is the straight line β 0 + β1 X and a random component represented by the
error term e . The error term is included in the model for the following reasons:
• It takes care of errors that arise in the measurement of the dependent variable Y
• It captures the effect of other important variables that are omitted in the model
• It represents the randomness of the system generating the observed data

4.6.3 Estimating the regression equation


The best straight line obtained using the least squares technique is the line which takes the
path that results in there being the least possible sum of squared differences between the
points and the line. In this module, I will not bother you by getting into the details of the
Least Squares Method, but I will teach you how to use the results of the method.

38 
 
The least squares technique give the equation of the best straight line as
Yˆ = a + bX [4-2]
n∑ xy − ∑ x∑ y
where b = [4-3]
n∑ x 2 − (∑ x) 2

and a= ∑ y − b∑ x [4-4]
n
Equation [4-2] is the estimated regression equation connecting variables X and Y, where a
and b are estimates of the population intercept β 0 and population slope β1 of the line
respectively.

Example 4.3
Estimate the regression equation for the data of Example 4.2

Solution 4.3
Using the two variable statistical mode on your calculator, you will obtain the following
results:
n = 6 ∑ x = 54 ∑ x 2 = 560 ∑ y = 175 ∑ xy = 1827
n∑ xy − ∑ x∑ y
b=
n∑ x 2 − (∑ x )
2

6(1827) − 54(175)
=
6(560) − (54) 2
1512
=
444
= 3.405405405

a=∑
y − b∑ x
n
175 − 3.405405405(54)
=
6
= -1.481981978
The regression equation is Yˆ = −1.481981978 + 3.405405405 X

Example 4.4
Use the data of Activity 4.2 to estimate the regression equation connecting price and age of
car.

39 
 
Solution 4.4
∑ x = 56 ∑ x
n = 10 2
= 358 ∑ y = 42 ∑ xy = 194
b= ∑
n xy − ∑ x∑ y
n∑ x − (∑ x )
2 2

10(194) − 56( 42)


=
10(358) − (56) 2
− 412
=
444
= − 0.927927927

a=∑
y − b∑ x
n
42 + 0.927927927 (56)
=
10
= 9.396396391
The estimated regression equation is Yˆ = 9.396396391 − 0.927927927 X

Activity 4.4
The data in the table below relate a manufacturer’s market share (%) with product
quality measured on a scale 0 to 100.

Product quality 27 39 73 66 33 43 47 55 60 68 70 75 82
Market share (%) 2 3 10 9 4 6 5 8 7 9 10 13 12

a) State the independent and dependent variable.


b) Draw a scatter plot of the data. Is it reasonable to fit a linear regression model?
Explain.
c) Estimate the simple linear regression equation between market share and product
quality.

4.6.4 Interpretation of a and b


The parameter a is the intercept on the dependent variable. It is the value that the variable Y
is predicted to assume if the variable X has a zero value. You should guard against
interpreting the intercept in terms of the dependent variable if the range of X -values used to
construct the model do not include zero.

The parameter b represents the rate of change of Y with respect to X . Thus the value of b
shows the corresponding change in the value of Y for every unit change in the value of X .

Example 4.5
Interpret the value of a and b for the regression equation obtained in Example 4.3

Solution 4.5
The regression equation obtained was Yˆ = −1.481981978 + 3.405405405 X where X is the
age of car and Y is the corresponding price of car.

40 
 
The value a = −1.481981978 cannot be interpreted in terms of price of car since the X values
used to construct the equation did not include zero.

The value b = 3.405405405 represents the corresponding increase in price of car for every
unit increase in age of car.

4.6.5 Some uses of the regression model


Some of the uses of a regression model are:
a) Prediction: The model is used to estimate future values of the dependent variable.
b) System explanation: Explanation of the relationships among variables and how
variables affect the dependent variable.
c) Variable screening: The model can be used to determine which variables have the
greatest effect on the dependent variable so that unnecessary variables can be omitted
from the model
d) Planning and control: if we have an appropriate model, we can explain the physical
system and thus plan ahead and control the system.

4.7 Estimating Values of the Dependent Variable


The regression model can be used to estimate values of the dependent variable given values
of the independent variable. A given value of X is substituted into the regression equation to
obtain the corresponding value of Y. The model can give reliable estimates of values of Y for
X-values within the range of values used to construct the model. Outside the range of X-
values used in the construction of the model, the model becomes unpredictable and may give
misleading results. For this reason, you should guard against extrapolation in regression
analysis.

Example 4.6
Use the model obtained in Example 4.3 to estimate the monthly sales if the monthly
expenditure on advertisement is $1 000.

Solution 4.6
X = 10 ⇒ Y = −1.481981978 + 3.405405405(10)
= 32.57207207
≈ $3257 .21
The monthly sales are estimated to be $3 257.21 if $1 000 is spent on advertisement per
month.

41 
 
Activity 4.5
1. Use the regression model you obtained in Activity 4.4 to estimate the percentage of market
share if the product quality is rated as 65.

2. A money market analyst would like to estimate the relationship between annual incomes of
families and their annual savings. The following data was obtained.

Annual Income ($000s) 15 12 18 10 16 13 20 19 15


Annual Savings ($000s) 3.5 2.1 3.8 0.9 3.9 2.6 5.0 4.2 4.5

a) Obtain the least squares regression equation connecting income and savings.
b) State three assumptions made when estimating the equation in (a) above.
c) Interpret the slope of the estimated regression equation.
d) Estimate the amount of annual savings for a family with an annual income of $14 000.

3. The quantity demanded (Y) and price (X) of an illicit brew sold at a number of village
shebeens is estimated by the model Yˆ = a + bX i . A random sample of 7 trading days gave
the following observations:

Price ($) 0.50 0.75 1.00 1.50 1.80 2.00 2.50


Quantity (250ml bottles) 25 16 21 16 14 15 9

a) Portray the data on a scatter diagram and comment on the relationship shown.
b) Find the estimated regression equation and plot it on the scatter diagram.
c) Use the plotted line to predict the quantity demanded when the price of a bottle is $1.20

4.8 Summary
We defined regression analysis as a statistical technique of understanding the relationship
between variables. Regression analysis allows us to establish the functional form of the
relationship between variables. We looked at the construction of scatter plots which enable
us, even at a glance, to have a ‘feel’ of the kind of relationship that exists between the
variables under investigation. The general two variable linear regression model is given by:

Y = β 0 + β1 X + e

We looked at the role of the error term e in the model which is to act as a proxy for all other
variables that may have an influence on the dependent variable but are omitted in the posited
model. We stated the assumptions of the simple linear regression model. We learnt about how
to fit a regression model to sample data and how to use the model for prediction purposes.

42 
 
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.

43 
 
Unit 5
Correlation Analysis
5.1 Introduction
Correlation analysis and regression analysis are related concepts in that they complement
each other. In Unit 4, we looked at linear regression analysis which is a statistical technique
of establishing the functional form of a relationship between two variables. In this unit, you
will learn about measures of the correlation between two variables. Correlation analysis is a
technique of measuring the degree of linear association between two variables. Using this
technique, business people will be able to tactfully manipulate one variable for the betterment
of the other variable or simply exploit the relationship between the variables in order to
maximise profits.

5.2 Unit Objectives


By the end of this unit, you should be able to:
• define correlation analysis
• distinguish between correlation analysis and regression analysis
• draw and interpret scatter diagrams in terms of correlation
• calculate the Pearson’s product moment correlation coefficient
• calculate the Spearman’s rank correlation coefficient
• test for the existence of a linear relation between variables
• find the coefficient of determination
• become acquainted with business uses of correlation analysis

5.3 Relating Correlation Analysis to Regression Analysis


Regression analysis is concerned with establishing the functional form of the relationship
between variables. Correlation analysis is a statistical procedure that is used to measure the
extent to which variables are related or associated. Two variables are highly correlated if they
move well together. Correlation analysis answers the following questions: Is there a linear
relationship between two variables? If yes, how strong is that relationship?

Given two variables X and Y, the correlation between X and Y is the same as the correlation
between Y and X, so in correlation analysis it would not matter to make a distinction between
an independent variable and a dependent variable as is the case in regression analysis. Unlike
in regression analysis where the independent variable X was assumed to be fixed and non
random, in correlation analysis we assume that both X and Y are random variables.

5.4 Scatter Diagrams


The strength of the linear relationship between X and Y can be inferred from a scatter plot of
the two variables using observed data. When there is little dispersion in the points from the
best straight line, the correlation is generally high.

44 
 
When points follow closely a straight line sloping up to the right as shown in Figure 5.1 (a),
we have a high positive correlation between the two variables. If points follow loosely a
straight line sloping down to the right, we have low negative correlation between variables X
and Y as shown in Figure 5.1 (b).

Y Y

X X
(a) High positive correlation (b) Low negative correlation

Y Y

X X
(c) Zero correlation (d) Zero correlation

Figure 5.1 Scatter Plots Showing Various Degrees of Correlation between X and Y

In Figure 5.1 (c) and (d) the relationship between X and Y is nonlinear giving zero
correlation between the variables.

Activity 5.1
Sketch scatter diagrams to show the correlation between two variables X and Y if the
degree of association is described as:
a) High negative correlation
b) Low positive correlation

5.5 Correlation Coefficient


The correlation coefficient denoted by r is a numerical measure of the strength of the linear
relationship between two variables X and Y. A correlation coefficient takes values between -
1 and +1 inclusive, that is − 1 ≤ r ≤ +1 .

Possible values of r are interpreted as follows:


• r is equal to zero, indicates there is no correlation between the variables. We should
take note that r = 0 essentially implies a lack of linearity and not a lack of association.
A strong quadratic relationship like the one shown in Figure 5.1(c) has r = 0 to
indicate a nonlinear association.
• r = +1 indicates a perfect positive correlation between the variables.

45 
 
• r = −1 indicates a perfect negative correlation between the variables
• r close to +1 indicates a strong positive correlation
• r close to -1 indicates a strong negative correlation
• r close to zero implies a weak correlation between the variables

The following adjectives may help you to describe the degree of linear association between
two variables:

Values of r Suitable adjectives


+0.9 to +1.0 Strong, positive
+0.6 to +0.89 Fair/moderate, positive
+0.3 to +0.59 Weak, positive
0.0 to +0.29 Negligible/scant positive
0.0 to -0.29 Negligible/scant negative
-0.3 to -0.59 Weak, negative
-0.6 to -0.89 Fair/moderate, negative
-0.9 to -1.0 Strong, negative

There are two commonly used correlation coefficients which are:


a) Pearson’s product moment correlation coefficient( r )
b) Spearman’s rank correlation coefficient ( rs )

5.5.1 Pearson’s product moment correlation coefficient


Pearson’s correlation coefficient is calculated for ratio data or interval data. It is based on the
mean and standard deviation and therefore can be affected by extreme values. Pearson’s
coefficient is obtained using the following computational formula:

n∑ xy − ∑ x∑ y
r= [5.1]
(n∑ x 2
)(
− (∑ x ) 2 n∑ y 2 − (∑ y ) 2 )
Example 5.1
The following data are a random sample of indexed prices of gold and platinum over a six
year period:

Gold (X) 12 10 14 11 12 9
Platinum (Y) 18 17 23 19 20 15

Calculate and interpret the correlation coefficient for the data.

46 
 
Solution 5.1
n=6 ∑ x = 68 ∑ x = 786 ∑ y = 112 ∑ y
2 2
= 2128 ∑ xy = 1292
n∑ xy − ∑ x∑ y
r=
(n∑ x − (∑ x) )(n∑ y − (∑ y) )
2 2 2 2

6(1292) − (68)(112)
=
[6(786) − (68) ][6(2128) − (112) ]
2 2

136
=
(92)(224)
= 0.9474

A correlation coefficient of 0.9474 is close to +1 and it indicates a strong positive correlation


between the prices of gold and platinum. When platinum prices are going up, gold prices are
also expected to go up, other things being equal.

Example 5.2
The following are the number of hours which a random sample of ten students studied for an
examination and the subsequent grades received by the students.

Hours Studied(X) 8 5 11 13 10 5 18 15 2 8
Grade (Y) 56 44 79 72 70 54 94 85 33 65

Calculate Pearson’s correlation coefficient, r and comment on the result.

Solution 5.2
n = 10 ∑ x = 95 ∑ x 2
= 1121 ∑ y = 652 ∑ y 2
= 45 688 ∑ xy = 6 996
10(6996) − 95(652)
r=
[10(1121) − (95) 2 ] [10(45688) − (652) 2 ]
8020
=
2185 × 31776
= 0.962496223
≈ 0.9625

There is a high positive correlation between hours studied and grade obtained.

Activity 5.2
A random sample of 10 upper six students obtained the following marks in mathematics and
physics in an end of year examination.

Mathematics (X) 76 62 70 59 52 53 53 56 57 56
Physics (Y) 80 68 73 63 65 68 65 63 65 66

Calculate the Pearson’s correlation coefficient for the data. Comment on the extent to which
performance in mathematics is associated to performance in physics.

47 
 
5.5.2 Spearman’s rank correlation coefficient
The Spearman’s correlation coefficient denoted by rs is calculated for ranked data. This is
data measured on the ordinal scale. The correlation coefficient rs is interpreted in the same
way as the Pearson’s correlation coefficient r . The computational formula for rs is given by
6∑ di2
rs = 1 − [5-2]
n(n 2 − 1)
where d is the difference in ranks obtained by subtracting the ranks of y values from the
ranks of x values for each pair of observations.

Example 5.3
A panel of two judges ranked the performance of 5 drama groups (A, B, C, D and E) as
follows:

Drama Group A B C D E
Judge 1 4 5 1 3 2
Judge 2 5 4 2 3 1

Is there agreement in the manner in which the judges perceive the performance of the drama
groups?

Solution 5.3
Judge 1 Judge 2 di d i2
4 5 -1 1
5 4 1 1
1 2 -1 1
3 3 0 0
2 1 1 1
∑d i
2
=4

6∑ di2
rs = 1 −
n(n 2 − 1)
6( 4)
=1−
5(52 − 1)
24
=1−
120
= 0.8

The correlation coefficient is fairly high and positive showing that the judges do not differ in
the way they perceive the performance of the drama groups.

48 
 
Example 5.4
An examiner and a moderator marked 7 examination scripts during a standardisation process
and awarded the following percentage scores:

Script Number 1 2 3 4 5 6 7
Examiner 67 54 38 70 42 70 80
Moderator 58 60 38 67 44 69 76

Calculate Spearman’s rank correlation coefficient and comment on the result obtained.

Solution 5.4
You begin by ranking the examiner scores assigning rank 1 to the highest score and rank 2 to
the second highest and so on. Where there are ties, the tied scores are each assigned the
average of the ranks they would have had assuming there were no ties. The moderator marks
are ranked separately in a similar manner.

Script number Examiner (X) Moderator (Y) Rank X Rank Y di d i2


1 67 58 4 5 -1 1
2 54 60 5 4 1 1
3 38 38 7 7 0 0
4 70 67 2.5 3 -0.5 0.25
5 42 44 6 6 0 0
6 70 69 2.5 2 0.5 0.25
7 80 76 1 1 0 0
∑ di2 = 2.5
6∑ di2
rs = 1 −
n(n 2 − 1)
6( 2.5)
=1−
7(7 2 − 1)
15
=1−
336
= 0.955357142
≈ 0.9554

The correlation coefficient is high and positive indicating that the examiner and the
moderator have the same perception of the scripts.

49 
 
Activity 5.3
1. The following are the number of hours which a random sample of ten students
studied for an examination and the grades the students received.

Hours Studied (X) 8 5 11 13 10 5 18 15 2 8


Grade (Y) 56 44 79 72 70 54 94 85 33 65

Calculate Spearman’s rank correlation coefficient, rs and comment on the result.

2. Two judges ranked the quality of annual reports published by 6 listed companies as
follows:

Report Number Judge A Judge B


1 4 5
2 3 2
3 1 3
4 2 1
5 6 6
6 5 4

Calculate the Spearman’s rank correlation coefficient and comment on the result
obtained.

5.6 Coefficient of Simple Determination


The coefficient of determination which is denoted by r 2 is obtained by squaring Pearson’s
correlation coefficient. The result is usually expressed as a percentage.

The coefficient of determination has a dual purpose. Firstly, it measures the strength of the
linear relationship between the independent variable X and dependent variable Y. It is a
descriptive measure of the strength of the regression relationship between X and Y. It thus
gives us a measure of how well the estimated regression equation fits the data. The higher r 2
is, the better the fit and the higher our confidence in the regression.

Secondly, r 2 gives the proportion of variability in the dependent variable that is explained by
changes in the independent variable. You will remember that Pearson’s correlation
coefficient is based on the standard deviation, which is a measure of variability. It is befitting
that, since the coefficient of determination is a measure of variability, it should be obtained
by squaring Pearson’s and not Spearman’s correlation coefficient.

Example 5.5
Find the coefficient of determination for the data in Example 5.2

Solution 5.5
r 2 = (0.962496223) 2
= 0.92639898

Thus, about 92.64% of the variation in grades is explained by variation in the hours studied.

50 
 
5.7 Testing whether X and Y are Correlated
The sample correlation coefficient ( r ) is used as a point estimator of the population
correlation coefficient ρ (rho). Therefore, r can be used in testing hypothesis about the true
correlation coefficient ρ . To facilitate the hypothesis testing, we assume both X and Y are
normally distributed.

The testing procedure is outlined below:


1. Hypothesis: H 0 : ρ = 0
H1 : ρ ≠ 0

2. Distribution: We use the t- distribution.

3. Critical Value: This is a two-tailed test, given the level of significance α , the critical
value is given by ± tα 2 (n − 2) .

4. Test Statistic: The test statistic is given by:


r
Tcal =
(1 − r ) (n − 2)
2
[5.3]
where r is the correlation coefficient and n is the sample size.

5. Decision Criteria: We would reject H 0 if Tcal > tα 2 (n − 2) .


6. Conclusion: When H 0 is rejected, we conclude that X and Y are correlated.

Example 5.6
In Example 5.2, test at 5% level of significance whether hours studied and grade received are
correlated.

Solution 5.6
1. H 0 : ρ = 0
H1 : ρ ≠ 0

2. We use the t- distribution

3. The critical value is given by t0.05 2 (10 − 2) = t0.025 (8) = 2.31


r
4. Tcal =
(1 − r 2 ) (n − 2)
0.962496223
=
(1 − 0.92639898) 8
0.962496223
=
0.095917295
= 10.03464727
≈ 10.03

51 
 
5. Since Tcal = 10.03 > 2.31 we reject H 0

6. We conclude that hours studied and grades received are correlated.

Activity 5.4
1. In Activity 5.2
a) Calculate the coefficient of determination
b) Test at 5% level of significance, whether marks obtained by students in
mathematics are correlated to the marks they obtained in physics.

2. A labour analyst interested in the relationship between turnover and labour supply
collected the following data on annual turnover and number of employees in 12
major retail organisations:

Turnover 20.1 14.0 10.7 10.6 8.6 8.1 5.5 4.9 4.6 4.5 4.3 4.1
($0000s)
Employees 126 141 107 101 92 70 52 34 57 32 47 26

a) Find the coefficient of determination and use it to advise a prospective investor


in the retail sector.
b) Test at 5% level of significance, whether turnover and number of employees
are correlated.

5.8 Summary
In this unit, we saw how regression analysis and correlation analysis complement each other.
Correlation analysis seeks to establish the degree of linear relationship between two
variables. The correlation coefficient is a measure of the linear association between variables.
Its value ranges from -1 to +1. We looked at two methods of calculating the correlation
coefficient by Pearson and Spearman. Spearman’s correlation coefficient is used with ranked
data while Pearson’s correlation coefficient is calculated for interval or ratio data. The
coefficient of determination r 2 is obtained by squaring Pearson’s correlation coefficient. r 2 is
a ratio of the amount of variance that can be explained by the relationship between the
variables to the total variance in the data.

52 
 
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.

53 
 
Unit 6
Introduction to Time Series Analysis
6.1 Introduction
Times series analysis is a statistical technique of detecting patterns in a time series. A time
series is a set of measurements of a variable which are taken at regular time intervals over a
period of time. Many business variables have observations made on them at regular time
intervals. Examples of time series data are daily sales, monthly payroll, annual exports,
annual profits and so on.

Times series data are important in that they help business managers to review past
performance and they provide a basis for predicting future values of the time series.

6.2 Unit Objectives


By the end of this unit, you should be able to:
• define a time series
• identify and describe the components of a time series
• draw time series charts
• carryout a trend analysis in time series data using the least squares method and the
moving- average method
• deseasonalise data using the Ratio-to-Moving Average method
• predict future values of a time series

6.3 Components of a Time Series


A time series is best analysed by decomposing it into different components. The components
of a time series are:
• Trend component (T)
• Seasonal component (S)
• Cyclical component (C)
• Irregular Component (I)

6.3.1 Trend component


The trend component is an underlying longer-term movement in the series showing a steady
tendency of increase or decrease through time as illustrated in Figure 6.1.

54 
 
Yt

Trend

Time in years

Figure 6.1 Long Term Trend

The trend shows the overall movement in the series.

6.3.2 Seasonal component


The seasonal component is a short-term recurrent component, which may be daily, weekly,
monthly as well as seasonal. The type of ‘seasonal’ component thus depends on how
regularly the data are collected. However, seasonal variation is usually a feature of data
collected quarterly. The variation follows a complete cycle throughout a whole year, with the
same general pattern repeating itself year after year as illustrated by Figure 6.2.

Yt

Year 1 Year 2 Year 3

Figure 6.2 Seasonal Component

Examples of time series variables (Yt) that display seasonal variation are:
• Sales of seasonal items such as blankets/jerseys, school uniforms, umbrellas, fruits
• Credit card spending which is generally high towards and during the festive season
• Electricity consumption which varies depending on time of the day

6.3.3 Cyclical component


The cyclical component is a long-term recurrent component that repeats over several years.
Cyclical movements differ in intensity and also vary in lengths usually lasting from 2 to 10
years. In business, cyclical behaviour is often referred to as the business cycle characterised
by troughs and peaks of business activity. Figure 6.3 is an illustration of the cyclical
component.

55 
 
Yt

0 5 10 15

Figure 6.3 Cyclical Variation

6.3.4 Irregular component


The irregular component accounts for variation which is of a random nature and is not part of
either the trend or the recurrent components. It does not contain any obviously predictable
pattern. The variation is due to sporadic forces such as natural disasters (floods, drought,
cyclones) or man-made disasters such as civil wars, strikes, boycotts.

Figure 6.4 is an illustration of the irregular component.

Yt

Year 1 Year 2 Year 3

Figure 6.4 Irregular Trend

6.4 Time Series Models


The relationship between the components of a time series can be described by two models of
a time series which are the Additive Model and the Multiplicative Model.

6.4.1 Additive Model


The model assumes that the components are added together with each observation Yt being
the sum of a set of components:

Yt = Tt + St + Ct + I t [6.1]

The model is appropriate for series that have regular and constant fluctuations around a trend.
To decompose an additive time series you have to subtract the components from each other.

56 
 
6.4.2 Multiplicative Model
The model assumes that the observed time series values are a product of the four components,
when all exist. The model is given by:

Yt = Tt × St × Ct × I t [6.2]

This model is more commonly used than the additive model because it is found to describe
more appropriately time series in a wide range of applications. It is more appropriate for
series that have regular but not constant fluctuations around a trend. To decompose a time
series which is assumed to be multiplicative, we divide the components.

6.5 Isolating the Trend Component


It is important to isolate the trend so that we have an idea of the general direction taken by the
series. In this section, we will discuss two methods of trend analysis which are the least
squares method, and the moving average method.

6.5.1 Least squares method


A times series chart of the observations against time may show that a straight line best
describes the increase or decrease in the series. In such cases you will use the least squares
technique borrowed from simple linear regression to estimate the trend equation.

The trend equation is estimated by

Yˆt = a + bX t [6.3]

where:

Ŷt is the estimated value of the dependent variable


X t is the independent variable which is time numbered sequentially from 1
a is the intercept, and
b is the slope of the trend line

The computational formulae for a and bare given by

b = ∑ t t2 ∑ t ∑2 t
n XY − X Y
n ∑ X t − (∑ X t )
[6.4]

a= ∑ Y − b∑ X
t t

n [6.5]

57 
 
Example 6.1
The annual maize production (in metric tonnes) at Bere farm for the past ten years is

Year 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
Production 74 85 87 92 110 115 130 136 142 150
(metric tonnes)

a) Fit a trend line to the data.


b) Forecast production for the year 2012.

Solution 6.1
a) The years are coded sequentially by assigning 2002 = 1, 2003 = 2, 2004 = 3 and so on.
The following results are obtained:
∑ X t = 55 ∑ X t = 385 ∑ X tYt = 6889 ∑Yt = 1121 ∑Yt = 132119
2 2

n∑ X tYt − ∑ X t ∑ Yt
b=
n∑ X t2 − (∑ X t ) 2
10(6889) − 55(1121)
=
10(385) − (55) 2
7235
=
825
= 8.76969697
1121 − 8.76969697 (55)
a=
10
= 63.86666667

The trend line is given by


Yˆt = 63.86666667 + 8.76969697 X t

X t = 11 so that the maize production in 2012 is forecast to be


b) For 2012,
Yt = 63.86666667 + 8.76969697(11)
= 160.3333333
≈ 160 metric tonnes

Activity 6.1
The following data give the quarterly sales figures for a retail outlet for the period
2002 to 2004.

Year Quarter 1 Quarter 2 Quarter 3 Quarter 4


2002 10 12 13 11
2003 12 15 16 13
2004 14 16

a) Estimate the trend line using the Least Squares method


b) Forecast the sales for the last quarter of year 2004.

58 
 
6.5.2 Moving average method
The other method of isolating the trend is the moving average method. A moving average
(MA) of a time series is an average of a fixed number of observations that moves as we
progress down the series. The moving averages smoothes out peaks and valleys in the
original series to leave out a relatively smooth trend. The moving averages are therefore
estimates of the trend at different stages of the series.

The moving averages are centred at the middle of the observations from which it has been
calculated. The term of the moving average series is meant to coincide with the periodicity of
the original series. For example, a four-point moving average will be appropriate for
quarterly data.

Example 6.2
The daily sales of an airtime vendor over 12 days are recorded below:
37 24 62 80 77 95 94 133 148 155 128 161
Calculate a 3- point moving average of the sales.

Solution 6.2
A 3-point moving average requires you to find the average of three sets of observations at a
time.
The first MA = (37 + 24 +62)/3 = 41.00
The second MA = (24 + 62 + 80)/3 = 55.33
The third MA = (62 + 80 + 77)/3 = 73.00 and so on.

Sales 37 24 62 80 77 95 94 133 148 155 128 161


3-point MA 41 55.33 73 84 88.67 107.33 125 145.33 143.67 148

Note that the moving averages are centred at the middle of the data used to calculate it so that
we lose two observations one at the start and the other at the end. Centring is problematic
with even terms because, for an even term, the moving averages are ‘out of phase’ with the
time series observations. To centre the MA series, a further 2-point MA is found by
averaging every consecutive pair as illustrated in Example 6.3.

Example 6.3
The data below shows the sales ($000s) of a seasonal good at a retail outlet over three years.

Year Q1 Q2 Q3 Q4
1 14 32 33 6
2 16 35 36 7
3 15 38 41 8

a) Calculate four-point moving averages for the series


b) Plot the four-point MA series on the same graph as the original series.

59 
 
Solution 6.3
a) Table 6.1 A 4-Point Centred MA of Sales
Year Quarter Sales (Yt) Uncentred 4-point MA Centred 4-point MA(T)
1 1 14

2 32
21.25
3 33 21.500
21.75
4 6 22.125
22.50
2 1 16 22.875
23.25
2 35 23.375
23.50
3 36 23.375
23.25
4 7 23.625
24.00
3 1 15 24.625
25.25
2 38 25.375
25.50
3 41

4 8

b) Sales
40

original series
30

20
moving average series

10

0
1 2 3 4 1 2 3 4 1 2 3 4
Year 1 Year 2 Year 3
Figure 6.5 Original Series and Moving Average Series Showing Trends in Sales

60 
 
The moving averages remove the fluctuations in the time series and make the curve smooth
as shown in Figure 6.5. The smoothed curve shows a moderate, upward trend in sales during
the three year period.

Activity 6.2
A supplier of school stationary recorded its quarterly sales figures ($00s) for the years
2009 to 2012. The data is shown in the table below.

Year Q1 Q2 Q3 Q4
2009 48 52 16 35
2010 50 46 22 40
2011 68 34 26 35
2012 73 56 16 45

a) Draw a time series chart of the data and comment on the trend and seasonal
components
b) Calculate centred 4- point moving averages for the data.
c) Plot the four-point MA series on the same graph as the original series.

6.6 Isolating the Seasonal Component


If we have a seasonal time series, finding a moving average series for it will have the effect
of smoothing out the seasonality. Assuming a multiplicative model, we divide each
observation by the corresponding value of the moving-average series to isolate the seasonal
and irregular components. This procedure is known as the ratio-to-moving average method.

The stages that are followed in the ratio-to-moving average procedure for quarterly data are:
1. Calculate a centred 4-point moving average series

2. Find seasonal ratios by dividing each actual time series observation, Yt by its
corresponding moving average value

Yt Tt × Ct × St × I t
Seasonal ratio = MA = Tt × Ct
= St × I t
[6.6]

3. Find the average seasonal ratio for each quarter. The average could be the mean or
median but in most cases the median is used since it is not affected by outliers.

4. Add up the average seasonal ratios. They should add up to 4. If they don’t add up to 4
adjust each average by adding to it one-fourth of the difference between their sum and
4. The results are adjusted seasonal ratios/indexes.

Example 6.4
Calculate adjusted seasonal indexes for the data of Example 6.3.

Solution 6.4
The necessary calculations are presented in the form of a table as illustrated in Table 6.2.

61 
 
Table 6.2 Calculation of Seasonal Indexes
Year Quarter Sales (Yt) Uncentred Centred Seasonal Ratio
4-point MA 4-point MA(T) Yt /T
1 1 14

2 32
21.25
3 33 21.500 1.535
21.75
4 6 22.125 0.271
22.50
2 1 16 22.875 0.699
23.25
2 35 23.375 1.497
23.50
3 36 23.375 1.540
23.25
4 7 23.625 0.296
24.00
3 1 15 24.625 0.609
25.25
2 38 25.375 1.498
25.50
3 41

4 8

After obtaining the seasonal ratios, you then find the mean seasonal index for each quarter

Year Q1 Q2 Q3 Q4
1 1.535 0.271
2 0.699 1.497 1.540 0.296
3 0.609 1.498
Mean 0.6540 1.4975 1.5375 0.2835

Sum of means = 0.6540 + 1.4975 + 1.5375 + 0.2835 = 3.9725


To ensure that the sum of means is 4, we add to each mean one-fourth of the difference
between 3.9725 and 4, that is, 0.0275 ÷ 4 = 0.006875:

Quarter Adjustment Adjusted Seasonal Index


1 0.6540 + 0.006875 0.660875
2 1.4975 + 0.006875 1.504375
3 1.5375 + 0.006875 1.544375
4 0.2835 + 0.006875 0.290375
Total 4.000000

Activity 6.3
Calculate adjusted seasonal indexes for the data of Activity 6.2.

62 
 
6.6.1 Deseasonalising of data
Deseasonalising refers to removing the effects of seasonal influence on the data. This is
achieved by dividing the actual Y values for each period by its corresponding adjusted
seasonal index.
Actual Y
Deseasonalised Y = Adjusted Seasonal index S
[6.7]

Example 6.5
Using the data of Example 6.3, obtain the deseasonalised series.

Solution 6.5
Table 6.3 Calculation of Deseasonalised Sales Values
Year Quarter Sales(Yt) Adjusted Seasonal Deseasonalised
Index (S) Sales (Yt/S)
1 1 14
2 32
3 33 1.544375 33.204
4 6 0.290375 6.425
2 1 16 0.660875 15.118
2 35 1.504375 35.165
3 36 1.544375 36.100
4 7 0.290375 6.860
3 1 15 0.660875 16.274
2 38 1.504375 38.174
3 41
4 8

Activity 6.4
Using the data of Activity 6.2, obtain the deseasonalised series.

6.6.2 Predicted values of the series


Once we have the adjusted seasonal indexes, we multiply them by the trend estimates to get
the predicted series values.

For the data in Example 6.3, the predicted sales are found by multiplying the trend estimate
by the corresponding adjusted seasonal index as shown in Table 6.4.

63 
 
Table 6.4 Calculation of Predicted Sales
Year Quarter Sales(Yt) Trend Estimate(T) Adjusted Seasonal Predicted
Index (S) Sales (TxS)
1 1 14
2 32
3 33 21.500 1.544375 33.204
4 6 22.125 0.290375 6.425
2 1 16 22.875 0.660875 15.118
2 35 23.375 1.504375 35.165
3 36 23.375 1.544375 36.100
4 7 23.625 0.290375 6.860
3 1 15 24.625 0.660875 16.274
2 38 25.375 1.504375 38.174
3 41
4 8

Activity 6.5
1. Find the predicted sales for the data of Activity 6.2.

2. A local church organisation recorded the following quarterly amounts (in 000s) of
tithes paid by its members for the period 2010 to 2012.

Year Quarter 1 Quarter 2 Quarter 3 Quarter 4


2010 100 120 132 110
2011 125 156 168 130
2012 141 164 180 200

a) Draw a times series plot of the data and comment on the trend shown.
b) Obtain a centred 4-point MA of the series and use it to calculate adjusted seasonal
indexes for the data.
c) Find the deseasonalised series of the data.
d) Forecast the quarterly amounts of tithes for the year 2013.

6.7 Summary
In this unit we discussed four components of a time series namely the trend, seasonal,
cyclical and irregular component. The whole business of time series analysis is to decompose
a time series into these components either using an additive model or a multiplicative model.
We looked at two methods of isolating the trend which are the Least Squares Method and the
Moving Averages Method. We saw how a moving average smoothes data to reveal trends in
the data.

We also looked at how to isolate the seasonal component using the Ratio to Moving Average
method. Finally you learnt about how to obtain predicted series values using seasonal indices.

64 
 
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.

65 
 
Unit 7
Index Numbers
7.1 Introduction
An index number is a number that measures the relative change in a set of measurements over
time. Index numbers show changes over time by expressing the new value, Vn as a
percentage of some existing value, V0 called the base value.
V
Index number = n × 100
Vo [7.1]
The base value is the value of the variable at some reference point in the past called the base
period. The index number of the base period is assumed to be 100.

In this unit, you will construct price and quantity indices for both weighted and unweighted
indices.

7.2 Unit Objectives


By the end of the unit, you should be able to:
• define index numbers
• calculate simple index numbers
• change the base from one period to another
• calculate weighted index numbers
• compare base weighting and current weighting
• state the purpose of index numbers
• use the CPI to adjust for inflation
• describe the challenges that are encountered in the construction of index numbers

7.3 Types of Index Numbers


There are three major categories of index numbers which are:
1. Price indices,
2. Quantity indices, and
3. Value indices.

7.3.1 Price indices


Price indices measure changes in price over time. Some examples of price indices are:
• Consumer Price Index (CPI) which measures the overall price change, from month to
month, of a representative selection of goods and services that are relevant to a typical
household. The CPI is used to calculate the rate of inflation.
• Producer Price Index (PPI) which measures the average change over time in selling
prices received by domestic producers for their output.

66 
 
7.3.2 Quantity indices
Quantity indices measure how much of a commodity is produced or consumed over time.
Some examples of quantity indices are:
• Industrial index which gives a measure of change in industrial output now compared
to a past reference point
• Mining index which gives a measure of change in minerals production now compared
to a specified base period

7.3.3 Value indices


Value indices measure changes in total monetary worth of say exports (export index) or
imports (import index) of an economy between two time periods.

7.4 Simple Index Numbers


The word ‘simple’ implies the measurements are for a single variable. A simple index
number is the ratio of two values of a variable, expressed as a percentage. The most
commonly referred to simple indices are: the Simple Price Index and the Simple Quantity
Index.

7.4.1 Simple price index


The Simple Price Index (SPI), sometimes known as a price relative, measures changes in the
price of a commodity. It shows the effect of a price change on a single product. The current
price is expressed as a percentage of the price at base period, that is:

Pn
SPI = × 100
P0 [7.2]

Example 7.1
During a Christmas clearance sale, a bottle of gin that sold for $12 before the sale was now
selling for $8. Calculate the Simple Price Index.

Solution 7.1
Pn
SPI = × 100
P0
8
= × 100
12
= 66.67%

The price of the gin was reduced by 33.33%.

67 
 
Activity 7.1
1. Suppose the price of a 2 litre bottle of cooking oil was $3.00 in 2010 and in 2012
the price was $3.50. Calculate the simple price index using 2010 as the base year.

2. The average retail prices of a bar of soap for the years 2010 to 2012 are as follows:

Year Price ($)


2010 1.00
2011 1.50
2012 2.10

Determine a Simple Price Index for 2011 and 2012 using 2009 as the base.

7.4.2 Simple quantity index


The Simple Quantity Index, sometimes called the quantity relative, is used to show changes
in the quantity sold or produced for a single product. The Simple Quantity Index (SQI) is
calculated as follows:

Qn
SQI = × 100
Qo [7.3]

Example 7.2
The annual production of maize in Zimbabwe for the years 1997 and 2000 was 5 000 metric
tonnes and 400 metric tonnes respectively. Using 1997 as the base year, determine the change
in maize production.

Solution 7.2
Qn
SQI = × 100
Qo
400
= × 100
5000
=8%

The annual production of maize fell by 92% between 1997 and the year 2000.

Activity 7.2
The following data give the prices and quantities of two commodities from 2010 to
2011:

Product 2010 2011


Price Quantity Price Quantity
A 35 5 000 40 2 500
B 16 600 20 450

Using 2010 as the base period, calculate the


a) Simple Price Index for commodity A.
b) Simple Quantity Index for Commodity B.

68 
 
7.4.3 Index number series trends
The index numbers for a given period gives a reflection of trends in the output or price of
commodities, that is, a time series of index numbers will show whether there has been an
increase or decrease in the output or price of a commodity.

Consider the index of maize production for the period 1995 to 2006 below

Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Index 84 96 100 104 112 86 82 79 65 36 38 30

The index numbers show a steady increase in maize production from 1995 to 1999 followed
by a gradual decline from the year 2000 to 2006. The year 1997 is the base year because it
has index number 100. Production for the other years is then compared in percentage terms
with the production obtaining in 1997. For example, compared to the maize production in
1997, the production in 1999 was 12% higher and that by 2006 the production had declined
by 70%.

Example 7.3
The following figures represent the average annual cost (in dollars per square metre) of low
density residential stands in Mutare for the years 2004 to 2012.

Year 2004 2005 2006 2007 2008 2009 2010 2011 2012
Price 16 20 24 24 29 30 35 36 40

Construct simple index numbers for the prices using 2005 as the base period (2005 = 100).
Comment on the trend shown.

Solution 7.3
The year 2005 is the reference point, and the index number for 2005 is taken to be 100.

P2004
The index for 2004 = × 100
P2005
16
= × 100
20
= 80 %

P2008
The index for 2008 = × 100
P2005
29
= × 100
20
= 145%

The index numbers of the remaining years are calculated in similar fashion. The results are
summarised in Table 7.1

69 
 
Table 7.1 Price Index for Residential Stands
Year 2004 2005 2006 2007 2008 2009 2010 2011 2012
Price 16 20 24 24 29 30 35 36 40
Index 80 100 120 120 145 150 175 180 200

There was a steady increase in the price of residential stands from 2004 to 2012 with the
price in 2012 being double what it was in 2005.

Activity 7.3
The average quarterly sales of a retail chain are shown below.

Quarter 1 2 3 4
Sales ($000s) 43 54 50 84

Using the third quarter as the base period, express the average sales as index numbers.

7.4.4 Changing the base period


With the passage of time, the relevance of any reference point in the past decreases in terms
of comparison with values in the present. Therefore, it may be necessary to change the base
period by moving it closer to the present.

The other reason for changing the base period is to enable comparison between two index
number series with different base periods. Two index number series can only be compared if
they have the same base period, therefore, if the base is not the same it is necessary to rebase
one of them.

To change the base period of an index, change the index number of the new base period to
100, then divide all numbers in the index by the index value of the proposed new base period
and multiply by 100.
Old index value
New index value = × 100
Index value of new base [7.4]

Example 7.4
Consider the index of maize production for the period 1995 to 2006 referred to earlier on.

Year 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006
Index 84 96 100 104 112 86 82 79 65 36 38 30

Change the base period of the maize production index from 1997 to 2003.

Solution 7.4
The year 2003 is now assigned an index number of 100. The old index numbers of the
remaining years are each divided by 65 and the result multiplied by 100 to obtain their
respective new index numbers. For example, the new index number for the year 1995 and
2006 are calculated as follows:

70 
 
84
New index number for 1995 = × 100
65
= 129.23%

30
New index number for 2006 = × 100
65
= 46.15%

Year Old Index New Index


1995 84 129.23
1996 96 147.69
1997 100 153.85
1998 104 160.00
1999 112 172.31
2000 86 132.31
2001 82 126.15
2002 79 121.54
2003 65 100.00
2004 36 55.38
2005 38 58.46
2006 30 46.15

Activity 7.4
1. In Example 7.4, change the base period of the maize production index from 1997 to
2000.

2. The following data are July 2009 to July 2010 commodity price index for a group
of consumer goods:

140 138 124 98 100 152 148 143 150 146 155 158 162
a) What is the base year used here?
b) Describe the trend in the price of the commodities over this period.
c) Change the base period to March 2010.

7.5 Weighted Index Numbers


Weighted index numbers gives the combined effect of items whose importance are rated
differently. In this section, we will discuss two types of index which gives due weight to an
item’s importance. These are:
• Weighted average of relatives indices
• Weighted aggregate indices

7.5.1 Weighted average of relatives indices


A weighted average of relatives is an index number and by its use changes in a number of
items of differing importance can be brought together into a single figure for the purpose of
showing their average change. For example, the Consumer Price Index (CPI) is constructed

71 
 
by calculating a weighted average of price relatives while the industrial index is a weighted
average of quantity relatives.

The procedure for calculating a Weighted Average of Price Relative Index (WAPRI) is as
follows:
1. Calculate the price relative of each item. For item i the price relative X i is given by the
formula
P
X i = n × 100 . [7.5]
P0
2. Using the weights Wi given, obtain a weighted average of the price relatives. The
computational formula is given by

WAPRI =
∑W X
i i
[7.6]
∑W i

Example 7.5
The data shows the unit prices of three different commodities X, Y and Z for two consecutive
years and the quantities consumed

Commodity 2010 2011 Quantity


Price per unit Price per unit
X 20 22 20
Y 40 48 2
Z 100 104 10

Calculate a Weighted Average of Price Relative Index (WAPRI) for the commodities using
2010 as the base period.

Solution 7.5

Commodity Price Relative, Xi Weight, Wi Relative Weight, WiXi


X 110 20 2 200
Y 120 2 240
Z 104 10 1 040
Total 32 3 480

WAPRI = ∑W X
i i

∑W i

3480
=
32
= 108.75%

72 
 
Activity 7.5
Suppose you are given the following products consumed by an average family in
2003 and 2004.

Commodity 2003 2004 Quantity


Price per unit Price per unit
Bread 24 35 550
Milk 25 49 287
Sugar 40 56 24

Calculate a Weighted Average of Price Relative Index (WAPRI) for the commodities
using 2003 as the base period.

7.5.2 Weighted aggregate indices


Weighted aggregate index numbers take into account the differences in relative influence
exerted by different products in a composite index. We have to decide how much weight to
attach to each of the products. Generally, price indices are constructed by weighting the
prices of items by the corresponding quantities bought, sold, produced or consumed in the
base year or current year. Similarly, quantity indices are constructed by weighting the
quantities of the items by the corresponding prices in the base year or the current period.

In base-period weighting, when comparing prices it is assumed that quantities are held
constant at base period levels whilst when comparing quantities, it is assumed that prices are
held constant at the base period level. Base weighting is less expensive and less time
consuming because there is no continuous calculation of weights. However, the relevance of
the weights may diminish with the passage of time so that rebasing may be necessary.

In current-period weighting, when comparing prices it is assumed that quantities are held
constant at current period levels, whilst when comparing quantities, it is assumed that prices
are held constant at current period level. Current weighting involves continuous calculation
of weights which is expensive and time consuming. This also makes valid comparisons
difficult or impossible due to continuously changing weights. Despite these drawbacks,
current weighting is preferred because it ensures that an item is rated in accordance with its
current importance, so that there is no risk of producing a grossly misleading index through
the use of outdated weights.

The base-period weighted indices are called Laspeyers indices whilst the current-period
weighted indices are the Paasche indices. The computational formulae are presented below:

Laspeyre Price Index, LPI = ∑PQn 0


× 100 [7.7]
∑PQ0 0

Laspeyre Quantity Index, LQI = ∑ Q P × 100


n 0
[7.8]
∑Q P 0 0

73 
 
Paasche Price Index, PPI = ∑PQn n
× 100 [7.9]
∑PQ0 n

Paasche Quantity Index, PQI = ∑ Q P × 100


n n
[7.10]
∑Q P 0 n

A related index number is the Fisher’s index which is the geometric mean of the Laspeyre
and Paasche index numbers

Fisher Price Index, FPI = LPI × PPI [7.11]

Fisher Quantity Index, FQI = LQI × PQI [7.12]

Example 7.6
The following data give the prices and quantities of the types of food stuff bought by a
private boarding school in 2011 and 2012

Food Type 2011 2012


Price Quantity Price Quantity
A 25 400 37 780
B 27 310 42 700
C 30 240 50 390

Calculate:
• Laspeyre and Paasche Price Indices for 2012, with 2011 as the base year and interpret
your results.
• Fisher Price Index and interpret the result.

Solution 7.6

Food Type PnQ0 P0Q0 PnQn P0Qn


A 14 800 10 000 28 860 19 500
B 13 020 8 370 29 400 18 900
C 12 000 7 200 19 500 11 700
Sum 39 820 25 570 77 760 50 100

a) LPI = ∑PQ
n 0
× 100
∑PQ
0 0

39 820
= × 100
25 570
= 155.73 %

Using old quantities, prices have increased by 55.73%

74 
 
PPI = ∑PQ
n n
× 100
∑PQ
0 n

77 760
= × 100
50100
= 155.21 %

Using current quantities, prices have increased by 55.21 %.

b) Fisher Price Index, FPI = LPI × PPI


= 155.73 × 155.21
= 155.47 %

Using both old and current quantities, prices have increased by 55.47 %.

Activity 7.6
Using the data provided in Example 7.6, calculate:
a) Laspeyre and Paasche Quantity Indices for 2012, with 2011 as the base year and
interpret your results.
b) Fisher Quantity Index and interpret the result.

7.6 Use of Index Numbers as Deflators


The value of money changes as time goes on due to inflation. A dollar today is not worth the
same as a dollar 10 or so years ago. The use of index numbers as deflators allows us to
compare amounts of money across time periods.

The Consumer Price Index is an overall measure of relative changes in prices of many goods
and thus reflects changes in the value of money. The CPI is used as a deflator in converting
nominal amounts of money to what are called real amounts of money. Real amounts of
money are amounts that are comparable through time without due regard to changes in the
value of money due to inflation.

The converting procedure involves indentifying a constant point in time – the base period. By
simply dividing Y dollars in year i by the CPI value for year i and multiplying by 100, we
convert our X nominal (year i ) dollars to constant (base year) dollars.

The all items CPI for the years 2008 to 2011 as provided by ZIMSTAT are shown in Table
7.2.

Table 7.2 Consumer Price Index (Dec 2008 = 100)


Year CPI
2008 100
2009 92.1
2010 94.9
2011 98.2
Source: www.zimstat.co.zw/index (08-01-2013)

75 
 
We will now look at an example to illustrate the use of the CPI as a deflator.

Example 7.7
Suppose that during the years 2009 to 2011, the entry salary of a trained teacher was as
follows:

Year Salary ($)


2009 150
2010 250
2011 300

Use the CPI figures provided in Table 7.2 to transform the salaries to 2008 dollars.

Solution 7.7

Year Salary ($) CPI


2009 150 92.1
2010 250 94.9
2011 275 98.2

If we divide the 2009 salary of $150 by the CPI of that year and multiply the result by 100,
we get the equivalent salary in 2008 dollars, that is, the salary in real terms.

150
× 100 = 162.87
92.1

In real terms, the entry salary for a trained teacher in 2009 was $162.87.

We repeat the same procedure for 2010 and 2011.

250
For 2010: × 100 = 263.44
94.9

275
For 2011: × 100 = 280.04
98.2

Thus, the entry salary for 2010 and 2011 was $263.44 and $280.04 respectively in constant
2008 dollars. The salary has increased by $117.17 from 2009 to 2011. This shows that the
salary was able to keep up with inflation.

Activity 7.7
The data that follows shows the average price of a 2 litre bottle of cooking oil over
the past three years.
Year Price ($)
2009 2.75
2010 3.10
2011 3.50

Use the CPI figures in Table 7.2 to adjust the price to constant 2008 dollars.

76 
 
7.7 Challenges in Constructing Index Numbers
The following are the problems associated with the construction of index numbers:
Unavailability of data – data is expensive to collect and it is not always practicable to
determine the quantities involved (sold).
1. Choice of base year – the base year has to be a reasonably normal year characterised
by stability in business activity and such years are difficult to come by.
2. Selection of items – there may be disagreements on the items to include. The selection
should be such that movements in prices of those items chosen will be representative
of the movements of prices of all items considered relevant.
3. Choice of weights – it is difficult to select typical quantities and prices which measure
relative importance in the construction of composite indices. The weights may
become outdated with time giving rise to misleading indices.
4. Comparability of index series – comparison is only possible if two index series have
the same base period.

7.8 Summary
In this unit, we looked at the construction of simple index numbers and weighted index
numbers. Index numbers are used to measure the relative change in a set of measurements
over time. A base period is chosen to serve as a reference point. The base period is given
index number 100.

Simple indices show changes pertaining to a single item while aggregate indices are for a
group of items. Because items do not contribute the same to the envisaged change, the items
are given weights to reflect their relative importance. The weights may be current weights or
base weights. Whilst base weights are less expensive to use, they may be outdated thereby
giving rise to misleading indices. As time goes on, it may be necessary to change the base
period to keep up with current trends. We saw how the base can be changed from one period
to another.

We looked at the construction of the CPI and how it is used to adjust for inflation. Finally, we
discussed the problems that are associated with the construction of index numbers. These
include the unavailability of data, the choice of base year, choice of weights and selection of
items to make up the basket.

77 
 
References
Aczel, A.D. and Sounderpandian, J. (2005). Complete Business Statistics. India: Tata
McGraw-Hill.
Buglear, J. (2005). Quantitative Methods for Business. London: Elsevier Butterworth
Heinemann.
Kazmier, L.J. (2003). Schaum’s Easy Outline: Business Statistics. Blacklick: McGraw-Hill
Trade.
Kemp, S.M. and Kemp, S. (2004). Business Statistics Demystified. Blacklick: McGraw-Hill
Proffessional Publishing.
Muchengetwa, S. (2005). Business Statistics. Harare: Zimbabwe Open University.
Wegner, T. (1999). Applied Business Statistics. Cape Town: Juta and Co.

78 
 
BLANK PAGE
Unit 8
Statistics List of Formulae
In this unit we give a statistics list of formulae which will be used in this course.

8.1 Normal Distribution


8.1.1 Value X
An arbitrary normal value X is transformed to a standard normal variable Z by the
transformation
X −μ
Z= [1.1]
σ

8.2 Statistical Estimation


Statistical estimation includes point estimators and confidence interval estimation.

8.2.1 Point estimators

1
Sample mean, x =
n
∑ xi [2.1]

(∑ xi2 − ∑ i )
1 ( x )2
Sample variance, s 2 = [2.2]
n −1 n
k
Sample population proportion, pˆ = [2.3]
n

8.2.2 Confidence interval estimation


If the population standard deviation σ is known, a 100(1 − α )% confidence interval for μ is
given by:
σ
x ± Zα 2 × [2.4]
n

If the population standard deviation σ is unknown and n ≥ 30 , then a 100(1 − α ) %


confidence interval for population mean μ is given by:
s
x ± Zα 2 × [2.5]
n

If the population standard deviation σ is unknown and n < 30 , then a 100(1 − α )%


confidence interval for μ is given by:
s
x ± tα 2 (n − 1) × [2.6]
n

79 
 
A 100(1 − α ) % confidence interval for the population proportion p is given by:
pˆ (1 − pˆ )
pˆ ± Zα 2 × [2.7]
n

The minimum sample size necessary to ensure that the error in estimating μ will not
exceed a specified amount e is given by:

⎡ Zα 2 × σ ⎤
2

n=⎢ ⎥ [2.8]
⎣ e ⎦

The minimum sample size required to estimate the population proportion to be within a
specified amount e with 100(1 − α )% confidence is given by:
pˆ (1 − pˆ ) Zα2 2
n= [2-9]
e2
2
⎡ Zα 2 ⎤
n=⎢ ⎥ [2.10]
⎣ 2e ⎦

8.3 Hypothesis Testing


8.3.1 Tests concerning the population mean
Test statistic for testing for the mean of a single population

When σ is known When σ is unknown


Case I: n is large or small Case II: n is large
x − μ0 x − μ0
Z cal = ~ N(0,1) [3-1] Z cal = ~N(0,1) [3-2]
σ n s n

Case III: n is small


x − μ0
Tcal = ~ t(n-1) [3-3]
s n

8.3.2 Test concerning a population proportion


The test statistic for testing for a proportion of a single population is given by:
pˆ − p0
Z cal = [3-4]
p0 q0
n

8.4 Simple Linear Regression Analysis


The simple linear regression model is given by

Y = β 0 + β1 X + e [4-1]

80 
 
The least squares estimates of β 0 and β1 are a and b respectively
n∑ xy − ∑ x∑ y
where b = [4-2]
n∑ x 2 − (∑ x ) 2

and a= ∑ y − b∑ x [4-3]
n

8.5 Correlation Analysis


Pearson’s product moment correlation coefficient is given by

n∑ xy − ∑ x∑ y
r= [5.1]
(n∑ x 2
)(
− (∑ x ) 2 n∑ y 2 − (∑ y ) 2 )
Spearman’s Rank Correlation Coefficient rs is given by
6∑ di2
rs = 1 − [5-2]
n(n 2 − 1)

8.5.1 Testing for the existence of a linear relationship between X and Y


The test statistic is given by:
r
Tcal =
(1 − r ) (n − 2)
2
[5.3]

We would reject H 0 if Tcal > tα 2 (n − 2) . [5.4]

8.6 Introduction to Time Series Analysis


8.6.1 Trend analysis
The fitted trend line: Yˆ = a + bX t [6.1]

The intercept and slope aand bof the estimated trend line are given by

n∑ X tYt − ∑ X t ∑ Yt
b=
n∑ X t2 − (∑ X t ) 2
[6.2]

a= ∑ Y − b∑ X
t t

n [6.3]

8.6.2 Seasonal analysis

Seasonal ratio Yt T × Ct × S t × I t
= = t = St × I t [6.4]
MA Tt × Ct

81 
 
Actual Y
Deseasonalised Y = Adjusted Seasonal index S
[6.5]

8.7 Index Numbers


8.7.1 Simple Index Numbers
Pn
Simple Price Index SPI = ×100 [7.1]
P0

Qn
Simple Quantity Index SQI = × 100 [7.2]
Qo

8.7.2 Changing the Base period

Old index value


New index value = × 100
Index value of new base [7.3]

8.7.3 Weighted Index Numbers


Weighted Average of Price Relative Index (WAPRI) is given by:

=
∑W X i i

∑W
WAPRI [7.4]
i

Laspeyre Price Index, LPI = ∑PQ n 0


× 100 [7.5]
∑PQ 0 0

Laspeyre Quantity Index, LQI = ∑ Q P × 100 n 0


[7.6]
∑Q P 0 0

Paasche Price Index, PPI = ∑PQ n n


× 100 [7.7]
∑PQ 0 n

Paasche Quantity Index, PQI = ∑Q P n n


× 100 [7.8]
∑Q P 0 n

Fisher Price Index, FPI = LPI × PPI [7.9]

Fisher Quantity Index, FQI = LQI × PQI [7.10]

82 
 
APPENDICES
Statistical Tables

List of Tables
1. Normal distribution
2. Student t distribution

83 
 
84 
 
85 
 

You might also like