You are on page 1of 111

KENYATTA UNIVERSITY

INSTITUTE OF OPEN LEARNING

CMS 200/STC-C208
BUSINESS STATISTICS

Phares B.O. Ochola

Department of Management Science


Module Summary

This module covers fundamental topics in Business Statistics. In Lesson one, the concept

Statistical Investigation is discussed. Lesson Two covers the area of Statistical Data Collection

methods with special emphasis on various sources of Statistical data citing their distinguishing

characteristics; types of data and their sources; and design of various data collection techniques

and tools. In Lesson Three Data Organization is discussed. In this lesson, data collection units

are defined; various types of statistical data classification approaches as well as formation and

use of statistical Tables and Frequency are discussed.

In Lesson Four, a description of both diagrammatic and Graphical data presentation is given as

well as construction of various statistical diagrams and graphs. Lesson Five then discusses

statistical measures of Central Tendency starting off with a definition of the concept of

descriptive statistics. Various Measures of Central Tendency are then determined and their value

interpreted. In Lesson Six, a discussion of Measures of Dispersion is presented while in

Lessons Seven and Eight Time Series Analysis and Index Numbers are respectively covered.

Phares B.O. Ochola

i
Table of Content

Module Summary _________________________________________________________________ i


Table of Content _________________________________________________________________ ii
Lesson One: The Statistical Investigation_____________________________________________ 1
1.0 Objectives: ________________________________________________________________ 1
1.1 Introduction_______________________________________________________________ 1
1.1.1 What is Statistics? ________________________________________________________ 1
Activity 1a. ______________________________________________________________________ 2
Activity 1b: _____________________________________________________________________ 2
1.2 Variables and Constants_____________________________________________________ 2
1.3 Data Types ________________________________________________________________ 3
1.3.1 Quantitative Data ________________________________________________________ 3
Activity 1c:______________________________________________________________________ 3
1.3.2 Discrete Data ____________________________________________________________ 3
Activity 1d: _____________________________________________________________________ 3
1.3.3. Continuous Data _________________________________________________________ 3
Activity 1e:______________________________________________________________________ 3
1.3.4 Qualitative Data _________________________________________________________ 3
Activity 1f: ______________________________________________________________________ 4
1.3.5 Ordinal Data ____________________________________________________________ 4
Activity 1g:______________________________________________________________________ 4
1.3.6 Nominal Data____________________________________________________________ 4
Activity 1h: _____________________________________________________________________ 4
1.3.7 Times Series and Cross-Sectional Data_______________________________________ 4
Activity 1i: ______________________________________________________________________ 5
1.4. Statistical Investigation/Inquiry ______________________________________________ 5
1.4.1 Problem Definition _______________________________________________________ 5
Activity 1j: ______________________________________________________________________ 5
1.4.2 Determine a data collection technique _______________________________________ 5
1.4.3 Design data collection tools and administer these on the respondents _____________ 5
1.4.4 Organization of Collected Data ____________________________________________ 5

ii
1.4.5 Analysis and Interpretation ________________________________________________ 5
1.4.6. Report Writing __________________________________________________________ 6
Further Reading _________________________________________________________________ 6
Exercise One: Statistical Investigation _______________________________________________ 7
Lesson Two: Statistical Data Collection ______________________________________________ 9
2.0 Objectives: ________________________________________________________________ 9
2.1 Introduction_______________________________________________________________ 9
2.1.1 Types and Sources of Statistical Data ________________________________________ 9
2.1.1.1. Primary data sources:___________________________________________________ 9
2.1.1.2 Secondary Data Sources:__________________________________________________ 9
2.2 Merits and Demerits of Primary and Secondary Data ____________________________ 9
2.2.1 Primary Data: ___________________________________________________________ 9
2.2.2 Secondary Data: ________________________________________________________ 10
Activity 2a _____________________________________________________________________ 10
2.3 Data Collection Techniques _________________________________________________ 11
2.3.1 Sampling Techniques ____________________________________________________ 11
2.3.1.1 Types of sampling techniques _____________________________________________ 11
2.3.1.1.1. Simple Random Sampling ______________________________________________ 11
2.3.1.1.2 Stratified Proportionate Random Sample _________________________________ 12
2.3.1.1.3 Systematic Sampling___________________________________________________ 13
2.3.1.1.4 Quota Sampling_______________________________________________________ 13
2.3.1.1.5 Census Technique _____________________________________________________ 14
2.3.2 Statistical Data Collection Units ___________________________________________ 14
2.3.3 Data Collection Tools ____________________________________________________ 14
2.3.3.1 Questionnaires__________________________________________________________ 15
Activity 2b _____________________________________________________________________ 15
Further Reading ________________________________________________________________ 15
Exercises 2: Statistical Data Collection______________________________________________ 16
Lesson Three: Data Organization __________________________________________________ 18
3.0 Objectives: _______________________________________________________________ 18
3.1 Introduction______________________________________________________________ 18
3.2 Data Classification ________________________________________________________ 18
3.3. Types Of Classification_____________________________________________________ 19

iii
3.3.1 Geographical classification: _______________________________________________ 19
3.3.2 Chronological Classification ______________________________________________ 19
3.3.3. Qualitative Classification _________________________________________________ 20
3.3.4 Quantitative Classification________________________________________________ 20
3.3.5 Data Tabulation. ________________________________________________________ 20
Activity 3a _____________________________________________________________________ 21
3.3.6 Formation of Frequency Distribution Table _________________________________ 21
3.3.6.1 Grouped Frequency Distribution __________________________________________ 22
Further Reading ________________________________________________________________ 23
Exercise 3: Data Organization_____________________________________________________ 24
Lesson Four: Diagrammatic and Graphical Data Presentation__________________________ 26
4.0 Objectives________________________________________________________________ 26
4.1 Introduction______________________________________________________________ 26
4.2 Diagrammatic Data Presentation ____________________________________________ 26
4.2.1 Simple Bar Diagram _____________________________________________________ 26
Activity 4a:_____________________________________________________________________ 26
4.2.2. Relative Bar Charts _____________________________________________________ 27
4.2.2.1. Multiple/Stacked Bar Graphs ___________________________________________ 28
Activity 4b: ____________________________________________________________________ 28
4.2.2.2 Component Bar Chart ___________________________________________________ 29
Activity 4c:_____________________________________________________________________ 29
4.2.2.3 Pie Charts. _____________________________________________________________ 31
Activity 4d: ____________________________________________________________________ 31
Activity 4d: _____________________________________________________________________ 33
4.2.3 Histogram ____________________________________________________________ 34
Activity 4e:_____________________________________________________________________ 35
Activity 4d: ____________________________________________________________________ 36
4.3 Graphical Data Presentation ________________________________________________ 36
4.3.1 Frequency polygon ______________________________________________________ 36
4.3.2 Frequency Curves _______________________________________________________ 37
4.3.2.1 Cumulative Frequency Curve _____________________________________________ 37
Activity 4f: ______________________________________________________________________ 38
Further Reading ________________________________________________________________ 39

iv
_______________________________________________________________________________ 40
Lesson Four Exercises: Diagrammatic and Graphical Data Presentation ______________ 40
Lesson Five: Measures of Central Tendency. _____________________________________ 41
5.0 Objectives _______________________________________________________________ 41
5.1 Introduction______________________________________________________________ 41
5.2 Measures Of Central Tendency______________________________________________ 41
5.2.1 Arithmetic Mean ________________________________________________________ 42
Activity 5a:_____________________________________________________________________ 42
Activity 5c:_____________________________________________________________________ 43
Activity 5d: ____________________________________________________________________ 43
5.2.2 The Median ____________________________________________________________ 45
Activity 5f. _____________________________________________________________________ 46
Activity 5g _____________________________________________________________________ 46
5.2.3 The Mode ______________________________________________________________ 50
(a) Mode of Ungrouped Data___________________________________________________ 50
Activity 5h: ____________________________________________________________________ 50
Activity 5i: _____________________________________________________________________ 52
Activity 5J:_____________________________________________________________________ 53
5.2.4 Harmonic Mean ________________________________________________________ 53
Activity 5k: ____________________________________________________________________ 54
Activity 5l: _____________________________________________________________________ 55
5.2.5 Geometric Mean ________________________________________________________ 56
Further Reading ________________________________________________________________ 56
Exercise 5: Descriptive Statistics __________________________________________________ 58
Lesson 6: Measures of Dispersion and Shape: ________________________________________ 60
6.0 Objectives________________________________________________________________ 60
6.1 Introduction______________________________________________________________ 60
6.2 Absolute Measures of Dispersion ____________________________________________ 60
6.2.1 Range _________________________________________________________________ 61
6.2.1.1 Range of Ungrouped Data __________________________________________________ 61
Activity 6a:_____________________________________________________________________ 61
Activity 6b: ____________________________________________________________________ 61
6.2.1.2 Range: Grouped Data. __________________________________________________ 61

v
Activity 6c:_____________________________________________________________________ 62
6.3 Mean Absolute Deviation (MAD) ____________________________________________ 62
Activity 6d: ____________________________________________________________________ 62
Activity 6e:_____________________________________________________________________ 63
6.4 Variance, hence The Standard Deviation ______________________________________ 64
6.4.1 Variance and Standard deviation: Generalized Procedure _____________________ 64
Activity 6f: _____________________________________________________________________ 65
6.4.1.1 Variation Measures of Grouped Data_______________________________________ 66
Activity 6g:_____________________________________________________________________ 66
6.1.4 Quartile Deviations ______________________________________________________ 67
6.2 Relative Measures of Dispersion _____________________________________________ 67
6.2.1 Relative Range, Rr ______________________________________________________ 67
6.2.2 Coefficient of Quartile Deviation, Cqd. _____________________________________ 68
Activity 6h: ____________________________________________________________________ 68
6.2.3 Coefficient of Variation, Cvar _____________________________________________ 69
Activity 6i: _____________________________________________________________________ 69
6.2.4 Semi Inter Quartile Range ( SIQR). ________________________________________ 70
Activity 6j _____________________________________________________________________ 70
6.3 Measures of Shape. ________________________________________________________ 70
6.3.1 Skewness ______________________________________________________________ 70
6.3.1.1 Measures of Skewness____________________________________________________ 73
Activity 6k: ____________________________________________________________________ 73
6.3.2 Kurtosis _______________________________________________________________ 74
6.3.2.1 Measurement of Kurtosis _________________________________________________ 75
Lesson 6L: _____________________________________________________________________ 75
Further Reading ________________________________________________________________ 76
Exercise 6: Measures of Dispersion and Shape _______________________________________ 77
Lesson 7: Time Series Analysis, TSA ____________________________________________ 79
7.0 Objectives________________________________________________________________ 79
7.1 Introduction______________________________________________________________ 79
7.2 Definition of Time Series and its Components __________________________________ 79
Activity 7a:_____________________________________________________________________ 79
7.3 Components of Time Series Data ____________________________________________ 79

vi
Activity 7b: ____________________________________________________________________ 80
7.4 Time Series Models ________________________________________________________ 80
7.5 Secular Trend Analysis_____________________________________________________ 80
7.5.1 Trend Measurement Methods _____________________________________________ 81
Activity 7c:_____________________________________________________________________ 81
Activity 7d: ____________________________________________________________________ 83
Activity 7e:_____________________________________________________________________ 84
Activity 7f: _____________________________________________________________________ 85
Activity 7f: _____________________________________________________________________ 86
Activity 7g:_____________________________________________________________________ 88
7.6 Seasonal Variation. ________________________________________________________ 89
7.6.1 Seasonal Variation Measurement.__________________________________________ 89
Activity 7h: ____________________________________________________________________ 89
Further Reading ________________________________________________________________ 90
Exercise 7: Time Series Analysis ___________________________________________________ 91
Lesson Eight: Index Numbers_____________________________________________________ 95
8.0 Objectives________________________________________________________________ 95
8.1 Introduction______________________________________________________________ 95
8.2 Index number Construction_________________________________________________ 95
8.3 The relative method _______________________________________________________ 96
8.3.1 Price Relative index , PRI _________________________________________________ 96
Activity 8a:_____________________________________________________________________ 96
Activity 8b: ____________________________________________________________________ 97
8.4 Aggregate Method and Types _______________________________________________ 98
Further Reading _______________________________________________________________ 100
Exercise 8: Index Numbers ______________________________________________________ 101

vii
1
________________________________________________________________________
Lesson One: The Statistical Investigation
________________________________________________________________________

1.0 Objectives:

At the end of this topic, the student should be able to appreciate the concept of statistical
investigation. Specifically, the student should be able to: -

• Describe the concept of statistics and what characterisizes statistical


investigations.
• Distinguish, with examples, the differences between variables, and constants as
used in statistical investigation.
• Identify and define various forms of data types citing illustrating examples from
a business environment.
• List and briefly discuss the steps critical to undertaking a statistical
investigation or inquiry.

1.1 Introduction

1.1.1 What is Statistics?

Statistics may be considered as facts that are expressible in terms of numbers for
purposes of decision making or as the “art and science “ of gathering, organizing,
analyzing and interpreting the resulting information as per the stated objectives.

Facts that are collected from some source before further processing are called raw data
and when processed, raw data is transformed into information. Simply stated, data is the
raw material [input] for information [output]. The diagrammatic representation of data
conversion into information is as shown below:-

Data Data Processing Information

Examples of Statistical Data (Facts)

• Stock sales turnover


• Sale volume data
• Number of customers using various products
• Personnel Records
• CPA Student marks rates.
• Yearly agricultural yields

1
• Outpatient records at the outpatient points.
• Rates of interest inflation over the years or applying to various countries.
• Number of widows/widowers and orphans from Aids related deaths.
• Number of children dropping out of school in various schools, districts, provinces
or nationally over the year.
• Infant mortality Nationally
• Number of Matatus operating in various routes.
• % of political party supporters Provincially.
• Number of males/females employed at Juakali and Sons Ltd.
• Temperature level and amount of ice-cream sold.
• Amount of maize, fish, millet harvested for 18 years by 17 families.

Activity 1a.
List other Examples of Statistical Data

Individual single facts e.g. Atieno’s height/age, Mama Njeri’s profit from sales during the
month of June etc. do not form statistics. Statistical data must allow for some objective
interpretation or comparison to be made e.g. Atieno’s height/age compared with the
others, Atieno’s average height, Mama Njeri’s average profit or profits for say 12months.

Activity 1b:

• List six examples of data processing methods and the resulting output of
processing such data.

1.2 Variables and Constants

Generally, those quantities that change with changes in others are called VARIABLES
while those that remain the same whatever the change in other quantities are called
CONSTANTS.

If a statistical data set contains only one variable e.g. cost, revenue or income , then it is
said to be a one – variable or univariate data. However, if it has two variables e.g.
temperature level and volume of ice-cream sold, amount of Rainfall and crop yield,
volume of stock turnover and interest rates then it is said to be a two-variable or bivariate.
If the data has more than two variables then it is said to be a-many-variable or
multivariate data set e.g. growth yield, with type of soil [fertilizer], and amount of rainfall
among others. Based on these concepts, the scheme for representation of the statistical
data structures is as below: -

2
1.3 Data Types

1.3.1 Quantitative Data

These are data sets that are expressible in number form [quantities] e.g. profit, cost,
number of cars, number of females and number of students dropping out of school.
Quantitative data may take two forms: discrete and continuous depending on the type
values the variables take.

Activity 1c:
List examples of quantitative data types

1.3.2 Discrete Data

These are data sets obtained from variables that can only take whole number integral
values.

Examples of Discrete Data

• Number of children proceeding to class 4


• Number of firms manufacturing food item
• Sex of an employee if these can be coded as either 0 or 1.
• Number of advertisement per time in a national Television.,

Activity 1d:
List other examples of discrete data types

1.3.3. Continuous Data

These are data sets obtained from variable which may take the values of all natural
numbers, both integral and fractional.

Examples

• Profits of a firm for b years


• Temperatures per day
• Sales volume
• Revenue generated after sale
• Student marks in percentage.

Activity 1e:
List other examples of continuous data types

1.3.4 Qualitative Data

3
This is data classification based on attributes and numbers, e.g. Beauty, Clever, Gender,
Strong, High. If the categories can only take two forms then the variable is said to be
dichotomous and one can assign the number 0 or 1 to each from effectively converting
them into quantitative data e.g. Male=1, Female 0. Broadly there are two types of
quantitative data: Ordinal and Normal data.

Activity 1f:
List other examples of Qualitative data types

1.3.5 Ordinal Data

Data is ordinal if a meaningful order can be generated e.g. 1st,2nd, 3rd etc. Data may be
ranked in some order and these ranks used for subsequent analysis

Examples

• Job description
• Rating student performance
• Ranking colleges/schools performances

Activity 1g:
List other examples of ordinal data types

1.3.6 Nominal Data

Data is nominal if no meaningful order can be assigned i.e. using only categories. No
meaningful numerical measures can be determined

Examples

• Out put from manufacturing firms: Plastics, electronics, wood-based


• Estates for all the residence of Nairobi

Activity 1h:
List other examples of Qualitative data types

1.3.7 Times Series and Cross-Sectional Data

Data may be categorized as Time-series (if the data values are recorded in a meaningful
sequence) or cross- sectional data (if the sequence of recording is relevant).
For examples, Time series include profit from sale of maize for the past 10 years and
weekly Retail Sales while, Cross –Sectional include; Number of hours of sleep measured
for 20 people to test the effectiveness of a sleeping pill and Number of phone- calls
processed yesterday by each of the firm’s order-taking employees.

4
Activity 1i:
List other examples of Time Series and Cross Sectional data types

1.4. Statistical Investigation/Inquiry

Statistical investigation like any other investigations involves attempts to discover a


hidden cause of a problem or the reasons why some variables behave the way they do. It
involves sequentially inter related components that should be followed with the main
objective of using the result of the investigation to make an objective/rational decision.
In business, this is mandatory since complex decisions must be made hourly, daily,
monthly, yearly and so on to help the organization achieve its strategic objectives. The
components of statistical investigation include:

1.4.1 Problem Definition

The problem definition involves the determination of the scope [Geographical aid
Content] that should be addressed as per the objective of the enquiry.

Examples of Problem Definition

If the problem is to do with the performance of firms manufacturing food products.


Should all the firms be investigated? Must all the food products be addressed? What
specifically should be addressed in order to address the problem of performance of the
firms producing food items?

Activity 1j:
Describe FOUR other examples of Problem Definition

1.4.2 Determine a data collection technique


Should the sampling or census method be used. Sampling entails a selection of part of
the population from which data is collected, organized and analyzed and the result of
which is generalized for the entire population Census is 100% investigation. The issue
here is to determine the sample size and data collection units.

1.4.3 Design data collection tools and administer these on the respondents

The tools of data collection must be appropriate for the problem and investigation. After
designing and piloting, the tools are then administered on the respondents (individuals
from which data is collected).

1.4.4 Organization of Collected Data


Raw data collected are then organized using methods to be discussed in later chapters for
purposes analysis into information.

1.4.5 Analysis and Interpretation


tatistical methods [descriptive and inferential] are then used to obtain statistical evidence
for the behavior of a system as the results then interpreted.

5
1.4.6. Report Writing

This is based on the results of the analysis and the interpretation arising, a report is then
presented discussing the evidence together with approximate conclusion.

Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge


University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.

Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

6
______________________________________________________________
Exercise One: Statistical Investigation
______________________________________________________________

1.1. Define the term “statistics” explaining clearly what is “statistical” and a “non-
Statistical" facts.

1..2. Differentiate between data and information, illustrating the processing of data
into information using a practical example.

1.3. Briefly discuss the use of statistic in the following:


• Health
• Accounting
• Auditing
• Marketing

1.3 List the attributes that an individual item must possess to be a “statistician”

1.5. Your consultancy firm, DSS, has been requested by various clients to help them
identify the data items specific to problems they propose to investigate.

1.6 Exhaustively, enumerate the data items that in your opinion would help them
solve the problems indicated by each client below:

Client 1: environment and human health group ltd.


This group wishes to obtain information to in turn advise an environment and
human health protection groups on the viability of banning the growing and using
of tobacco and its products.
Client 2: Kenya Society Against Violence on Women (K.S.A.V.W)
K.S.A.V.W proposes to determine the causes and extent of abuses against women.
Client 3 The Police Department
The head of traffic division would wish to determine the causes and impacts of
i) Road accident trends
ii) Traffic jams in Kenya’s city streets.
Client 4: Small Business Enterprises Advisory Group
A group of owners operating in the Eastlands area of Nairobi would wish to be
advised on the factors that would contribute toward profitability of their kiosks.
Client 5: School Heads Association
Currently increasing rates of schools pupils' dropouts before class 8 has aroused
the associations concern.

1.7 What data items should an investigator collect in her attempt to investigate the
following:
• Increasing infant mortality
• Performance in Education by high scholars
• Household expenditure pattern.

7
• Effect of public awareness campaign on Aids and Aids related deaths
• Product performance in the market place.
• Opinion poll on political party performance on future elections.

1.8 After processing the data items identified in 2 and 3, discuss the information that
would result to aid the user make logical decisions

1.9 Distinguish between the following terms giving an example of each:


• Variable and Constant
• Dependent and independent variable
• Quantitative and Qualitative data

1.10 Discuss the following data type providing specific examples:


• Ordinal data
• Nominal data
• Ratio data
• Scale data

1.11 State the criteria data items must meet prior to categorizing them as in 6.

1.12 Suppose you have been requested to investigate the following areas:
• The performance of kiosks in your sub-location.
• The changing crop yield that has contributed to increasing level of poverty.
• Causes of road accident in Nairobi – Kisumu Road

1.13 Discuss the steps you would take during your investigation to help your client
make an appropriate decision.

8
________________________________________________________________________
Lesson Two: Statistical Data Collection
________________________________________________________________________

2.0 Objectives:
At the end of this topic, the student should be able to describe the process of
statistical data collection. Specifically, the student should be able to: -

• Describe various sources of Statistical data citing their distinguishing


characteristics.

• Differentiate between Primary and Secondary data highlighting their


merits and demerits.

• Describe and design various forms of data collection techniques citing


their use defining criteria.

• Discuss and design various data collection tools with specific reference
to appropriateness of use.

2.1 Introduction
2.1.1 Types and Sources of Statistical Data
Statistical may as belonging to two categories: Primary and Secondary. The
former are data collected from a original (primary) source while the latter are
those collected from a secondary source. Precisely, primary data are those
collected first hand for a specific purpose while Secondary data are those that
were originally collected for a different objective but are currently required for
use for the present purpose that is, second-hand data.

2.1.1.1. Primary data sources:


These include; Present recording of say production volume, student
Grades/marks, agricultural output/food consumption by various households,
outpatient daily turnover, drug utilization volume at various chemists, daily
revenue collected by Matatu owners among others.

2.1.1.2 Secondary Data Sources:

These include; statistical abstracts, journals, past periodicals/Newsweek’s,


Newspapers, Accounts/Sales records.

2.2 Merits and Demerits of Primary and Secondary Data

2.2.1 Primary Data:

The advantages of the primary Data include the following:

(a) Accuracy

9
(b) Less biased,
(c) Objective and Specific

The demerits of primary data lie in the fact that during collection and analysis, is
Time and cost consuming.

2.2.2 Secondary Data:

The advantages of Secondary Data include its being:

• Low cost in terms of time/resources.


• Ease of collection.

Due to its nature, prior to using secondary data an investigator must take the
following precautions: It must be confirmed that the data should:

• Have been collected during periods of normalcy e.g absence of say floods,
famine, war, inflation, recession etc.
• Not have been collected in the too far distant in the past to minimize the
impact of changes in the surroundings
• Cover the scope of the study under the present investigation.
• Provide the exact kind of information required and should be in suitable
form.
In summary, on order to use secondary data for a particular inquiry, the
investigator should determine:
• How the data was sourced,
• The criteria used,
• Its components and how the resultant data was put together.

Activity 2a

(a) List other sources of: -

i. Primary Data
ii. Secondary data

(b) Differentiate between Secondary and Primary Data.


(c) Enumerate the advantages and disadvantages of secondary and primary data.
(d) Assuming you have been selected into that is investigating the factors that
influence the type of daily TV adverts appearing. Indicate the various sources
from which you may collect data and highlighting whether they are primary or
secondary data.

10
2.3 Data Collection Techniques

2.3.1 Sampling Techniques

A sample is part of a population where the latter consist of all the items under
investigation e.g. all the Banks, Transport vehicles, all students and all brands of
cooking fat. Sampling approach entails selecting part of the population using
objective techniques to collect data. The data collected from the sample is then
analyzed, the result of which is then generalized for the whole population.

Merits:
• Quick results
• High quantity interviews
• More skilled analysis
• Lower cost
• Error can be assessed
• Non-response easier.
Demerits:
• Biased
• May give misleading conclusion if not properly undertaken:- deliberate
selection.
• May exclude key members of the population if the correct sampling
technique is not employed
• Haphazard selection

2.3.1.1 Types of sampling techniques

2.3.1.1.1. Simple Random Sampling

In this technique, each unit of the population is selected such that the resulting
sample has the same characteristics as the population and each such unit has
exactly an equal chance of being included in the sample.

• Procedure One: Black Box


All the units of the population of size N are numbered from I upped N. For
example, if there are 100 units in the population, number them from 1 up to 100.
The numbered units are then placed in a “black box” or a closed drum.
The units are then mixed thoroughly in the box or drum.
Draw from the drum or the box one at a time until the required number of units
that make up the sample is obtained.

There are however two ways of drawing the units: Sampling with and Sampling
without replacement.

o Drawing with replacement

11
This is the process where each subsequent unit drawn is replaced back into the
drum containing elements of the population. This process has the advantage of
the units in each draw having an equal chance of being selected as that in the
preceding draw but has the disadvantage that a unit drawn earlier may be re-
drawn lengthening the period of unit selection.

o Drawing without –replacement


This is the process where one unit is drawn, put aside as selected and subsequent
ones are drawn until the required numbers of items are drawn. The process as the
advantage that a previously selected unit can not be re-drawn hence shortening the
period of sample unit selection but has the advantage that the chance of selecting
subsequent sample units increases as the experiment progresses this not-with-
standing the fact that in each draw each element has an equal chance of being
included in the sample.

The procedure for selecting sample units by the above method is only appropriate
for population of small sizes, however, as the population size increases, the
procedure becomes impractical, hence the use of the second Procedure

• Procedure Two: Use Of Random Numbers


Random Numbers (RN) is one mode of selecting a random digit to represent the
number of each selected unit from the population. A random number [digit] table
is a list in which the digits 0 to 9 has an equal chance 1/10 of being independently
selected. The procedure of selection is as below:
Number the population from I to N, if N= 100 the system of numbering should be
001, 002…… 100, if N = 10, 01, 02 ….10.
On the random number table establish arbitrarily a place to begin reading the
numbers from the table .
Starting at the selected place, read the number in groups usually from left to right,
top-bottom, diagonally as per the numbering system in [a] recording the
corresponding population unit selected
If the random number 13 between I and N and has not been chosen, include it in
the sample

Note that if the RN is less than 0 but greater than N then, discard it as there will
be no corresponding unit. However, if the RN has already been chosen discard it
since the sampling is without replacement and proceed to the next.

2.3.1.1.2 Stratified Proportionate Random Sample

A stratum is a assumed to have uniform [homogeneous] structure. As per


statistical populations, a stratum is assumed to have homogeneous units e.g. if a
population is consist of gender units, then two strata may be generated, a male and
a female stratum. Each stratum can further be divided with respect to different
groups processing similar characteristics. Hence, in order not to select only data
units from a population that is heterogeneous, stratification will assure that each
stratum is given on equal chance of being selected.

12
Procedure
Divide the population into strata or blocks in such a manner that each block is as
homogenous as possible. Each stratum is then proportionately sampled from at
random. Suppose a population of size 100 can be partitioned into 40 male and 60
female. The males can further be divide into <15 years, 15-25 and > 25 years
with sizes 20, 15 and 5 units respectively. Similarly females an also be portioned
into <15 years, 15-25 years and 725 years with respective sizes of 10,40 and 10
units. Select a sample of size 20 from this population.

The procedure then is to develop a sample frame as below: -

Stratum Population Size Sample Size


MALE 40 40 × 20
=8
100
<15 YEARS 20 20 × 8
=4
40
15-25 YEARS 15 15 × 8
=3
40
> 25 YEARS 5 5×8
=1
40
FEMALE 60 60 × 20
= 12
100
<15 YEARS 10 10 × 12
=2
60
15-25 YEARS 40 40 × 12
=8
60
> 25 YEARS 10 10 × 12
=2
60
TOTAL 100 20

2.3.1.1.3 Systematic Sampling

Determine a list of all the units of a population. If a sample of every item is


selected then the procedure results into a systematic sample). Obtain the first
entry sing simple random sampling. Determine based on the required sample size,
the interval for item selection.

2.3.1.1.4 Quota Sampling

Partition the sample into Quotas each indicating the number of individuals to act
as respondents. Choose the quotas in a way that the samples representatives of the
entire population. Note that the sample may be completely unrepresentative in
other aspects.

13
2.3.1.1.5 Census Technique

This is the process of 100% investigation i.e. where each and every member (unit)
of the population is included in investigation. The technique has only one critical
advantage of being 100% accurate and is only appropriate for populations of
small sizes. As the population sizes increases, the technique will be
disadvantageous since:

• It will be costly in terms of time and manpower


• Impractical in terms of volume of work
• Follow-up of non-responses difficult.

2.3.2 Statistical Data Collection Units

Data collection units or statistical units are the units upon which data is
collected/measured/counted analyzed and interrelated. Prior to data collection,
statistical units must be clearly defined.

Example:

Statistical units of Collection includes: household, number of packets of milk,


student marks, retail prices, daily temperature and so on. These are basically
variables. The data collection measurement include size, volume, weight and
quantity among others while those of analysis and interpretation: ratio, %,
measures of averages and discussion etc.

A good statistical unit should have the following properties:-

• It must not change with place and time i.e. it must be stable.
• It must be precisely and concisely defined and unambiguous i.e. it must
have one and only one meaning.
• It must be relevant to the investigation at hand i.e. it should be
investigation objective specific.
• It must be uniform: Homogeneity assures that the unit does not mean
different things at different times and place.

2.3.3 Data Collection Tools

After determining the sample technique, establishing the sample size and defining
the appropriate statistical units, the investigate must then determine the tools to be
used in the collection of data. A variety of data collection are available to the
investigator depending on the objective of the investigation and the scope, these
tools can be used singly or in combination.

14
2.3.3.1 Questionnaires

This is a data collection tool where the details of the investigation are organized in
question form. It is usually arranged in 2 parts: the classification component and
the detail component. The classification component consists of the date of
interview, name of the interview, details of the respondent while the second part
consist of the questions that solicits information on the objective of the inquiry. A
good questionnaire must the following desired characteristics:

(a) Technical terms should be avoided i.e. the questions should be easily
understood
(b) Questions should be unambiguous i.e. they should have one and only one
interpretation.
(c) Questions must not contain words of vague weaning e.g. words like unskilled,
large etc should be avoided.
(d) Questions requiring calculations should be avoided.
(e) The respondent should not be expected to perform or decide upon
classification.
(f) Questions should not lead to biased answers.
(g) The questionnaire should not be too long
(h) Questionnaires should cover the exact object of the inquiry.

Activity 2b
Discuss the Interview and observation methods of data collection.

Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge


University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

15
________________________________________________________________________
Exercises 2: Statistical Data Collection
________________________________________________________________________

2.1 Distinguish between the primary and secondary data, highlighting the main
sources of each.

2.1 Discuss the merit and demerits of:

(a) Primary data


(b) Secondary data

2.3 The use of secondary data requires that the investigator must take extra
precautions to assure its appropriateness. Assuming you were the investigators list
any five precautions you would take.

2.4 Indicate both the primary and secondary data sources if you are to undertake the
following investigations.

(a) Patient congestion problem at the Hospital's out - patient point.


(b) Performance of a firm’s products in the market place
(c) Factors that influence commuter preference of Matatu and Kenya bus service
Transport mode.
(d) Factors impacting on the performance of rice growing in Kisumu District.

2.5 Discuss the following data collection techniques highlighting their merits and
demerits: -
(a) Sampling technique
(b) Census method

2.6 Discuss using practical examples the Sampling Techniques below: -


(a) Simple random sampling
(b) Stratified random sampling
(c) Systematic random sampling
(d) Quota sampling
(e) Radial sampling
(f) Multi-stage sampling

2.7 Define “statistical units”

2.8 Describe with specific examples any three types of statistical units
a good statistical unit.

2.9 Identify appropriate statistical units you would use if you were to investigate the
problem areas in item 4.
2.10 Discuss the following data collection tools indicating the requirements for each

16
(a) Interview
(b) Observation
(c) Questionnaires

2.11 List the characteristics of a good questionnaire.

17
________________________________________________________________________
Lesson Three: Data Organization
________________________________________________________________________

3.0 Objectives:

At the end of this topic, the student should be able to describe the data
organization and presentation procedures. Specifically, the student should be able
to: -

• Define data collection units for a statistical investigation.


• Apply various types of statistical data classification approaches to
organize collected data.
• Describe various types of statistical table and apply the concept to
organize statistical data.
• Form frequency distribution tables.

3.1 Introduction

Data as collected first hand is usually in a form from which no meaningful


analysis and conclusions may be drawn. Hence, for such purposes, there is need
to organize data to unravel the hidden characteristics necessary for decision
making. Fundamentally, the reasons for organizing data are to:-

• Present data in a form that is analyzable


• Be able to unravel the hidden feathers of data so that the systems
behaviour can be identified.
• Compare the performance of two or more phenomena so that meaningful
conclusions may be drawn.

3.2 Data Classification

This systematic grouping of data items into homogenous classes whose


characteristics are similar. Consider a situation where the problem involves the
investigation of the distribution and utilization of the family budget . The
classification process would be as below:
Gender

Male Female

Pass Fail Pass Fail

18
If an item can be classified into 2 groups then such a classification is called
dichotomous classification.

Example: Family Budget

Food Clothing
Fees

Adult
Drinks Groceries Children

Male Female
Male Female

3.3. Types Of Classification

There are generally 4 types of classification: -

3.3.1 Geographical classification:

This is dependent on the geographical regions e.g. Country, Province, District,


Location, University, Estate, Village and so on. The classification regions must
have same similar features based on the investigators assumptions.

3.3.2 Chronological Classification

This type of classification is with respect to the time of occurrence i.e. it is time
based e.g. time of birth, date of manufacture.

Example: Crop Yield by year


Year Crop Yield
1999 10,000
2000 50,000
2001 90,789
2002 87,000

19
3.3.3. Qualitative Classification

This is the type of classification that is attribute based i.e. the behaviour cannot be
expressed in numerical terms.
Kenyan Students
Example.

Male Female

Intelligent Stupid Intelligent Stupid

In most cases, for purposes of analysis, the attributes must be converted into
numerical forms through coding.

Example

Let, Male = 1, Female = 0, Intelligent = 2, and Stupid = 3

3.3.4 Quantitative Classification

This type of classification is based on variables [discrete or continuous] that can


take numerical values. Prior to undertaking quantitative classification, the
variables must be defined i.e. represented by any letter of the alphabet,
subscripted letters e.g. x1, x2 ….. or x11, x12, x13 etc, acronyms like Kenyatta =
Ken, Firm = Frm etc. This is the most commonly used data classification used in
data organization for statistical analysis and subsequent interpretation e.g.
Frequency distribution table.

3.3.5 Data Tabulation.

The second step after data classification during data organization is tabulation.
Statistical tabulation is data organization involving the arrangement of data using
the class [groups]. The items are arranged into rows and columns where the rows
run from left to right and the columns top to bottom. Generally a statistical table
has the following components:-

• Headings: These summarize the content of the table e.g. Kenya’s


population.
• Head Notes: Extra comment on the content of the table e.g. [1990-1997]

20
• Column Heading: [caption] Describes the heading of each column. Each
column heading may have a sub-heading or sub-caption
• Row Heading [stub]: Describes the heading of each row and may have a
sub row heading or sub columns.
• Foot-Note: An extra comment normally written below the table
explaining some unusually item within the table usually indicated by an
asterix.
• Source- Note: Indicates the source of the table normally for tables
obtained from a secondary source.
• Totals/subtotals: These shows the row/sub-row and column/sub-column
totals.
• Body: The intersection between the row and columns where data entry
occurs. It is worth noting that Row totals = column totals = Grand Total

Activity 3a

A tobacco-manufacturing firm carried out a market survey in Nairobi and


Kisumu. The survey involved investigating the user of its 8 branches of
cigarettes. A sample of 200 each was selected from the two firms with the
following results. Nairobi: Ratio of Male to Female is 3:5,
Users = 60% Female users =60%, Kisumu: Male Female = 4:6, Non-user =
40% and Male user = 50%, Form a concise table of the information provided.

3.3.6 Formation of Frequency Distribution Table

A part from data organization through classification, and tabulation, raw data may
be organized by putting then is a frequency table. A frequency is the number of
occurrences of data items within a homogeneous grouping or class.

Procedure.
• Define the variable that represents the data items using any letter of the
alphabet or its subscripted form.
• Determine the number of homogeneous classes
• Find the number of classes and the type of classes to use.

Example:
Consider the following data representing the number of packets bought by 80
households in Nairobi
20 55 55 21 25 25 26 30 30 35
30 30 30 30 30 30 30 30 30 30
30 30 25 25 25 35 30 30 30 35
35 35 40 40 40 35 21 60 65 20
40 40 40 40 40 40 25 25 25 25
40 40 40 40 40 45 45 45 45 45
50 50 50 55 55 60 60 60 35 30
35 30 35 40 50 40 30 26 26 26

21
Determine the Frequency Distribution of the above data.

Packets of Milk (X) Tally Family Consuming (f)


20 // 2
21 // 2
25 //// //// 9
26 //// 4
30 //// //// //// //// 20
35 //// //// 9
40 //// //// //// / 16
45 //// 5
50 //// 4
55 //// 4
60 //// 4
65 / 1
∑f = 80

• From this frequency distribution a number of observations are now possible:-


• How many packets of milk do most families consume?
• How many to the least number of families consume?
• Do most families consume larger number of packets of milk or fewer number of
packets of milk?

This mode of forming frequency distribution is only appropriate where the values of
the variable under investigation is not highly dispersed otherwise it would not
condensed data as is required. For such cones, a second mode of forming frequency
distribution would be most appropriate:

3.3.6.1 Grouped Frequency Distribution

For this type of frequency distribution, the ;variable values will be placed in
classes or groups within the number of variable values falling in each class will be
established.

Procedure

Group the variable values into classes, from theory between 5 to 20 classes may
be appropriate. However an objective number of classes may be obtained from
Sturges rule.
Let the number of classes be η and the number of observation be N
Since N = 80
η = 1+3.2 log 80
= 1+3.2 [log 8*10]
= 1+3.2 [3log2+log10]
= 1+3.2 [3*0.3010+1 = 1+3.2 [1.903]
= 7.08 ≈7 since the number of classes must be a whole number.

22
Find the class size, C. The class size must be in the same number of decimal
places as the variable values. From Sturges rule
c = range,r /η , where R is the difference between the highest and the lowest
variable value. For example, for a lower limit =20 and upper limit = 65, then
c = 45 / 7 . Form the classes hence the frequency distribution. In forming the
classes start with the least value at an interval of the class size until the highest
variable value is included in the last class. Using the revelation let X = Number
of packet of milk above. Hence 2 types of tables are as below:-

(i) The inclusive type (ii) Exclusive type

X Tally F X Tally F
20-25 //// //// //// //// 19 20-26 //// //// //// //// 19
26-31 //// //// //// //// 19 26-32 //// //// //// //// 19
32-37 //// //// 8 32-38 //// //// 8
38-43 //// //// //// / 16 38-44 //// //// //// / 16
44-49 //// 5 44-50 //// 5
50-55 //// //// 8 50-56 //// //// 8
56-61 //// 4 56-62 //// 4
62-67 / 1 62-68 / 1
Σf = 80 Σf = 80

Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge


University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.

Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

23
________________________________________________________________________
Exercise 3: Data Organization
________________________________________________________________________

3.1 Discuss the concept of “Data Organization” high the output resulting from the
process.

3.2 What is data classification and what criteria should collected data satisfy prior to
its being classified?

3.3 Using specific examples illustrate the following types of classification:


(a) Chronological
(b) Geographical
(c) Qualitative
(d) Quantitative

3.3 Illustrate the classification structure of the following data variables explaining the
criteria for your decision.

(a) Kenya’s crop yield pattern


(b) Population of matatus in Kenya's roads
(c) Manufacturing firms from 1910-1980
(d) Mother and child prevalent diseases.

3.5 Discuss the problems with data classification

3.6 State what is meant by "Statistical Tabulation"

3.7 List the components, and hence sketch a the skeleton, of Statistical Table.

3.8 Using a practical example illustrate the following types of a Statistical Table.
(a) One-way Table
(b) Two-way Table
(c) Three-way Table
(d) Manifold Table

3.9 Complete the following Statistical Table relating to the use and non-use of
Tobacco in the indicated three Kenya’s Towns by Gender showing all your
workings.

Town
Kisumu Nairobi
Tobacco Male Female Sub total Male Female Sub total Total
Users 500 - 900 - 700 - -
Non users - 100 - 300 - - 3000
Total - - 2500 - - 3500 6000

24
The ratio of Male:Female visiting Komala Health Centre each year from 1990-
1994 is 2:8. Of the yearly male and female patients, the ratios of children :Adults
are 7:5 and 5:4 respectively. Given that the 1990, 1991,1992, 1993 and 1994
patients were 8000, 6000, 3600, 3200 and 9000 respectively. Form a a well
annotated Statistical Table based on the above information.

3.10 State what is meant by the following terms:


(a) Frequency
(b) Class limit
(c) Class interval
(d) Inclusive classes
(e) Exclusive classes
(f) Open-ended classes

3.11 If a frequency distribution has open-ended classes, State:


(a) The open-ended bridging procedure.
(b) The reason for bridging of the open-ends.

3.12 The following data represent the number of hours worked per month for 98
sugarcane cutters.

78 72 72 84 80 64 68 72 44 82 66 36 58 56
104 66 80 78 82 58 84 78 64 80 80 70 72 56
74 80 66 70 78 74 82 76 56 78 56 20 30 90
74 82 64 62 68 78 60 80 66 54 54 50 56 90
52 72 56 80 76 32 86 62 82 70 20 46 90 90
90 66 46 88 50 46 46 56 76 72 52 74 54 86
80 80 84 88 62 80 78 78 84 68 72 84 90 56

Form a frequency distribution for the above data

3.13 The data below indicates the height of 20 coffee trees sampled from a farm large
scale farm: 4.8, 2.4, 2.8, 4.0, 2.8, 3.6, 4.2, 5.0,4.4, 2.5, 4.9, 1.5, 4.7, 3.6, 2.8, 1.4,
3.5, 3.6, 4.6, 2.6. Construct a frequency distribution for the above.

25
________________________________________________________________
Lesson Four: Diagrammatic and Graphical Data Presentation
________________________________________________________________

4.0 Objectives

At the end of this lesson, the student should be able to diagrammatically, graphically and
analytically illustrate statistical data. Specifically, the student should be able to: -

Describe, construct and interpret the following diagrams used to illustrate statistical data:-
(a) Simple Bar Diagrams
(b) Component Bar Diagrams
(c) Stacked bar diagrams
(d) Simple and multiple pie charts.
(e) Histograms
(f) Frequency Polygons

4.1 Introduction
In this lesson, the procedures for constructing statistical diagrams and graphs will be
discussed. Further, the students will be expected to cover: diagrammatic data
presentation: Simple, Component and Stacked bar graphs; graphical presentation which
include histograms, and frequency polygons. The power of diagrammatic and graphical
presentation as tools for data characteristic interpretation shall be appreciated.

4.2 Diagrammatic Data Presentation


4.2.1 Simple Bar Diagram

In these diagrams, vertical and horizontal bars are drawn whose length is proportional to
the magnitude (size) of the variable. In order to construct bar charts, a suitable scale is
chosen either at the bottom (for vertical bar charts) or on the side (for vertical bar charts),
the choice of which being optional. Bars should have appropriate widths, constant inter-
bar widths, and heights that are proportional to the frequencies. Simple bar charts do not
allow for the comparison of more than one variable.

Activity 4a:
The data below represents the number of trade licenses issued to Jua Kali artisans from
1990-1995.

YEAR NO. OF LINCENCES (‘000)


1990 10
1991 15
1992 17
1993 25
1994 30

26
1995 40

Construct a bar chart to describe the above data and interpret the result.

Solution: -

The procedure is to:

(i). Decide whether to draw a horizontal or a vertical scale bar chart.


(ii). Select an appropriate scale for the variable (No. of licenses issued)
(iii). Starting from zero
(iv). Select appropriate bar width and inter-bar distances
(v). Construct the bar chart for all the years (see chart 4.1)
(vi). Select, title, vertical and horizontal axes labels.

Number of Lincenses

45
40
Number of Lincenses

35
30
25
20
15
10
5
0

Year

Interpretation

There is a general increase in the number of trade licenses issued from 1990-1995.

4.2.2. Relative Bar Charts

These consist of two types; Stacked/Multiple Bar charts and Component Bar Charts.
Unlike the simple bar charts, they allow for comparing the characteristics of more than

27
one data sets. The basis for their construction may either be absolute data values or
percentages.

4.2.2.1. Multiple/Stacked Bar Graphs

The data set below indicates the profits realized by three (flows I, Ii and III
between 1990 and 1992.

FIRM 1990 1991 1992


I 10 40 20
II 20 20 40
III 10 40 30

Activity 4b:
Using the above data Construct the stacked bar graphs based on

i. Absolute Data Values


ii. Percentages

Solution Procedure

(i). Identify the highest variable value to enable you to construct either an
appropriate vertical or horizontal scale for the data starting from zero.
(ii). Determine appropriate bar widths and inter stacked bars distances. These
must be constant.
(iii). For each year, construct stacked bars for the three firms using the
measurements.
(iv). Label both the horizontal and vertical axes as well as the title
(v). Construct an appropriate key (legend) for the bars
(vi). Interpret the picture exhibited.

Total Tea Output


(1990-1992)

45
40
35
30 I
Output

25
II
20
15 III
10
5
0
1990 1991 1992
Year

Interpretation:

28
Interpretation may be within the year or between the same variable across the years.

The procedure for the percentages based stacked bar graphs is the same as above only
that percentages distribution for each year is used as shown below:-

Total Tea Output


(1990-1992)

60
I
40
% Output

II
20 III

0
1990 1991 1992
Year

4.2.2.2 Component Bar Chart

Just like the relative bar charts, they allow for comparing more than one data set
Consider the data below indicating the profit margin in thousands of shillings for three
firms from 1990-1992

FIRM 1992 1991 1990


I 20 40 10
II 40 20 20
III 30 40 10

Activity 4c:
Using the above data Construct the component bar graphs based on
(a) Absolute Data Values
(b) Percentages

Solution Procedure:

(i). For each year, find the cumulative totals across the firms to determine the
positions of the bars as part of the whole.

29
(ii). Using the highest cumulative total and starting from zero, determine the
appropriate scale for the height of the bars
(iii). Select a constant appropriate inter component bars distances and bar widths
for all the bars.
(iv). Interpret the result.

Total Tea Output


(1990-1992)

120

100

80

III
Output

60 II
I

40

20

0
1990 1991 1992
Year

The procedure for constructing the percentage component bar graphs is similar to above
only that percentage values derived from that data are used the result being as below:

30
Total Tea Output
(1990-1992)

100
80 III
% Output

60 II
40 I
20
0
1990 1991 1992
Year

4.2.2.3 Pie Charts.

It shows the relations of parts to the whole, unlike multiple and component bar charts,
where length of bars are compared, area of segments [parts of a circle] are compared.
They are of two types; simple and multiple, the latter allowing for comparing more than
one data set.

(a) Simple Bar Charts Construction Procedure

The procedure is to:

(i). Draw a circle.


(ii). Express the data values into degree equivalents
(iii). Determine the proportion of the circle that each item takes.

Activity 4d:
The date below shows the use of a family budget. Construct a Pie Chart.

Item Expenditure [Ksh.]


Food 800
Clothing 1500
Rent 4500
Fees 3000
Health 1000

Solution Procedure is to: -

31
(a) Change the given values into degrees

Item Expenditure [Ksh.] Degree


Food 800 270
Clothing 1500 500
Rent 4500 1500
Fees 3000 1000
Health 1000 330
Total 10,800 3600

(b) Draw a circle of a given radius


(c) Construct an initial base line.
(d) From the initial base line and using a protractor, measure out the proportions
above.
(e) Either shade and provide a key for each segment or indicate the names (labels) of
the items in each segment.

Family Budget [Ksh.]

1000 800

1500

Food
3000 Clothing
Rent
Fees
Health

4500

32
(b) Multiple Pie Charts

If more than one data set were to be compared using pie charts, the steps will be the same
as in 4.4.1. above except for step 2. To allow for a more accurate comparison, the
procedure for the drawing the circle would be to:

(i). Find the total of the items for each data set
(ii). Find the ratio of the totals. These will be the radius of each data set pie chart. .
(iii). Draw circles whose radius is equal to the size of ratio.
(iv). Go to step c for each circle, hence construct the pie charts

Activity 4d:
Construct pie charts for the above data to allow you to compare expenditures for the 3
families and interpret.

FAMILIES
ITEM A B C
Food 2000 500 5000
Clothing 2000 400 7000
Rent 3000 1000 1500
Fees 800 200 2500
Health 1000 900 3000

Solution:
Following the above steps, the result is as below:-

Family
ITEM A B C
Food 2000 500 5000
Clothing 2000 400 7000
Rent 3000 1000 1500
Fees 800 200 2500
Heaith 1000 900 3000
Total 8800 3000 19000
Radius 3 1 6

33
Family A: Household Budget Family B Household Budget

Food Food
Clothing Clothing
Rent
Rent Fees
Fees Heath

Heath

Radius = 3 Radius = 1

F a m ily C : H o u s e h o ld B u d g e t

Fo o d
C lo th in g
Re n t
Fe e s
H e a th

Radius = 6

4.2.3 Histogram

It is a diagram similar to but not the same as a bar diagram. The area of a bar chart has
no meaning while that of a histogram does. The histogram displays the frequency density
of occurrence of a range of data [vertically or horizontally]. A vertical histogram will
have the data values on the horizontal axis and frequency density on the vertical axis
while a horizontal histogram exhibits the data values [scaled by class intervals] on the
vertical axis with the frequency density [using the appropriate scale] on the horizontal
axis.

34
Procedure for construction

The procedure is as follows:

If not already grouped, group the data values into grouped data Frequency distribution.
(i). Determine the frequency density by dividing class frequency by Class size.

(ii). Determine the frequency density axis approximate scale. The data values
scale is the class interval.

(iii). Draw either an exclusive class interval bared histogram [inter bar distance =
0] or inclusive class interval bared histogram [inter bar distance = inter class
gap size]

Activity 4e:
Consider the data below relating the number of pills and sleeping time and construct a
histogram for the data.

Sleeping Time Number of Pills


0-16 32
16-48 108
48-80 42
80-112 18

Solution:
Let, C = Class interval, X = Sleeping time, F = Number of Pill users

X fi c fi /c
0-16 32 16 2
16-48 108 16 6.75
48-80 42 16 2.625
80-112 18 16 1.125

35
Sleeping Time

propotion of Class size 8


Number People as a

0
0-16 16-48 48-80 80-112
Sleeping Time Insterval

Activity 4d:
Histogram for ungrouped data: The data below represents the daily revenue (‘Ksh.000)
obtained by 20 matatu owners plying Kibera route: 5 6 4 5
7 8 4 3 9 4 6 4 6 8 9 8
7 3 4 5

(a) Form an exclusive class interval bared frequency distribution


(b) From the distribution in (a). Construct a histogram.

4.3 Graphical Data Presentation

A graph is a representation of data movement to exhibit the relationship between two


variables. One of the variables is the independent and the other the dependent. A
decision maker can use a graph to explain some behaviour in data values.

4.3.1 Frequency polygon

If the class intervals are equal and the midpoints of these intervals are plotted against the
corresponding frequency density then the resulting graph is the frequency polygon. The
area under a frequency polygon and the histogram derived from the same data are equal.

36
Sleeping Time

propotion of Class size 8


Number People as a

6
4
2
0
0-16 16-48 48-80 80-112
Sleeping Time Insterval

4.3.2 Frequency Curves

The histogram and the frequency polygon tends to a frequency curve as the number of
observations become greater and greater with intervals becoming smaller and smaller.
Such curves can be obtained by smoothing the frequency polygon.

4.3.2.1 Cumulative Frequency Curve

Procedure:

The procedure is to:

(i). If not grouped, then the data is grouped into appropriate classes hence
determine the less-than or the greater- than cumulative frequency distribution
determined.
(ii). For the less than (greater than ) Cumulative frequency distribution, the upper
limit (Lower Limit) of every class is plotted against the corresponding
Cumulative frequency.
(iii). The resulting points are the joined with a smooth curve. The cumulative
frequency curves are sometimes called the OGIVE.
(iv). Apart from using absolute values to construct the ogives, a percentage ogive
can be constructed and the procedure is to: If not grouped, the data is then
grouped into classes then

37
(a) The cumulative frequency is hence computed and converted into %
cumulative frequency
(b) The resulting % cumulative frequencies are plotted against
corresponding limits and the plotted points are then joined with a smooth
curve.

Note: If the less than and the greater than gives are plotted on the same graph, then the
point where they meet is significant is called the median of the distribution..

Activity 4f:
Construction of Cumulative Frequency Curves: Find the less-than- and the greater than
cumulative frequencies hence plot the results against the appropriate limits using the data
in the table below:

Profit Number of Firms


5.5 –10.5 4
10.5 – 15.5 12
15.5 – 20.5 8
20.5 – 25.5 6

Solution:

Lower Limit Upper Limit fi CFL CFG


5.5 10.5 4 4 30
10.5 15.5 12 16 26
15.5 20.5 8 24 14
20.5 25.5 6 30 6

Where CFL is the cumulative frequency less than while CFG is the cumulative frequency
greater
than.

38
Profit Accruing

35

30
Cummulative Number of Firms

25

20

15

10

0
0 5 10 15 20 25 30
Profit Values

Less-Than-Ogive Greater-Than-Ogive

Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge


University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.

Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

39
________________________________________________________________________
Lesson Four Exercises: Diagrammatic and Graphical Data Presentation
________________________________________________________________________

Use the data below to construct diagrams and graphs in questions 4.1-4.5 and interpret
the result.

Number of Employees
Income (Ksh.‘000) Firm A Firm B
5-9 8 15
10-14 12 20
15-19 14 25
20-24 30 30
25-29 25 15
30-34 22 35
35-39 18 10
40-44 15 15
45-49 10 12

4.1 Simple Bar Diagrams


4.2 Component Bar Diagrams
4.3 Stacked bar diagrams
4.4 Histograms
4.5 Frequency Polygons

The data below represents the daily revenue (‘Ksh.00) obtained by 50 Matatu owners
plying Kibera route:
12 25 36 40 35 75 58 64 13 24
29 44 36 74 46 78 39 28 67 43
44 15 25 36 15 55 45 29 50 40
23 45 54 10 56 23 45 29 30 28
23 57 78 67 70 59 60 23 45 67

4.6 Form an exclusive class interval bared frequency distribution


4.7 From the distribution in (a). Construct a histogram hence frequency polygon

40
________________________________________________________________
Lesson Five: Measures of Central Tendency.
________________________________________________________________

5.0 Objectives

At the end of this lesson, the student should be able to determine and interpret various
measures of Central Tendency. Specifically, the student should be able to: -

(a) Define the concept of descriptive statistics and measures of central tendency.
(b) Distinguish between various measures of central tendency.
(c) Calculate and interpret for both ungrouped and grouped data the following
measures of central tendency:
(i). Arithmetic Mean
(ii). Median
(iii). Mode
(iv). Harmonic Mean
(v). Geometric Mean
(d) Discuss the merits and demerits the measures of Central Tendency cited
above.

5.1 Introduction

In this lesson, we will cover the single measures of statistical characteristics from which
statistical inferences can be made about the behavior of a phenomenon which data is
collected. These measures are broadly categorized into two: Measures of Central
Tendencies (Averages) and Measures of Dispersion or Variability. A measure of central
tendency or average is a single value that represents or is typical of a set of data values
about which data items tend to collate, while of measure of dispersion or variability
exhibits the extent of variation of individual data values from the average. Measures of
dispersion will be covered in lesson six.

5.2 Measures Of Central Tendency

These descriptive Measures of Central Tendency include: -

(a) Arithmetic mean


(b) Median
(c) Mode
(d) Harmonic Mean
(e) Geometric mean

41
5.2.1 Arithmetic Mean

The arithmetic mean of a sample of n values or observations, denoted by x1, x2, …, xn


may be denoted by x , is given by x = ∑ xi / ∑ f i . The arithmetic mean determined
for both ungrouped and group statistical data, where fi is the number of times the ith
variable xi occurs.

(a) Arithmetic Mean of ungrouped Data

Procedure:

(i). Given ungrouped data, find the sum of the values.


(ii). Divide the result in (i) by the number of variables to obtain the arithmetic
mean.

Activity 5a:
Suppose a dairy farmer gets 10,20,8,14, 16, 18, 12, 42 litre for his 8 dairy cows every
week. Find the mean volume of milk obtained per week.

Solution:

Let X be the number of litres obtained denoted by x1 = 10, x2=20, x3=8 x4 =14, x5, = 16,
x6,=18, x7,=12, x8,=41. Hence the average is obtained by summing up all the values of X
and dividing by n=8.

∑ x / ∑ f =139 / 8 =17.4 litres per week.


i

Activity 5b: Data appearing with frequencies, f: The data below shows the marks
obtained by 20 students taking Business Statistics this open learning semester: 15, 30,
40, 15, 30, 30, 80, 50, 50, 80, 15, 30, 40, 20, 20, 70, 40, 50,70.

Solution

Let X = Mark scored and f = Number of students scoring mark x (Frequency).

X Tally Frequency (f) x if


15 /// 3 45
20 // 2 40
30 /// 3 90
40 //// 5 200
50 /// 3 150
70 // 2 140
80 // 2 160

Σf = 20 Σxi f = 825

42
Σf = 20 and Σxif = 825, and ∑x f /∑ f
i =825 / 20 = 41.25

Activity 5c:
Generally for ungrouped data, suppose x1, x2, x3,….. xi:, xn observation occurs with
respective frequencies f1, f2, f3, …, fi, fn. Find the arithmetic mean.

Solution:
Then the arithmetic mean will be determined as below.

X F XF

x1 f1 x1 f1
x2 f2 x2 f2
. . .
. . .
xi fi xi fi
. . .
. . .
xn fn xn fn

∑ fi ∑ xi fi

Σxi f i
Hence the Arithmetic Mean =
Σf i
(b) Arithmetic Mean Grouped Data Mean.

Whenever data is grouped, those values falling in a given class interval are assumed as
having the same value as that of class representative or mid point.

Activity 5d:
Mean of Grouped Data: From the frequency distribution table given below find the
arithmetic mean.

Number Of Employees
Wages(‘000)
2–4 6
4–6 2
6 –8 4
8 – 10 6
10 – 12 2

43
Solution

(a) Method 1:

^
Let X= Wage earned class, x = Mid-point and f = Number of employees.

^ ^

X x f x f
2–4 3 6 18
4–6 5 2 10
6 –8 7 4 28
8 – 10 9 6 54
10 - 12 11 2 22
∑ fi = 10 ∑ xi fi =132

_
Σxˆi f i
Hence the arithmetic mean = x = = 13.2
Σf i

(b) Method 2: Coding Method:-

If the values being dealt with are large values, then the best approach is to use the coding
method.

The procedure is to:-


(i). Select the midpoint in case of grouped data or actual data value incase of
ungrouped data, corresponding to the highest frequency.
(ii). Subt5ract the selected value in (i) from the other values (midpoints or actual
values)
(iii). If the result in (ii) have a common factor, divide each of these values by the
common factor.
(iv). Using the result in (iii), use the standard method and find the arithmetic mean.
The result is called the false mean denoted by X F .
(v). Determine the true mean X by multiply the false mean by the common
factor used on the coding process and adding the highest midpoint or actual
value used in (i).
The Coding Method Example
^ ^ ^
X x f xc d= 1 2 x c df
2–4 3 6 -6 -3 -18
4–6 5 2 -4 -2 -4
6 –8 7 4 -2 -1 -4
8 – 10 9 6 0 0 0
10 - 12 11 2 2 1 2
∑ fi = 20 ∑ xi fi =-24

44

Hence false mean x f = − 24 = −1.2
20

Therefore, X =(−1.2 * 2) + 9 = 6.6

Advantages of Mean:

(i). It is a more easily represented average since it is based on all values.


(ii). It can be used for further mathematical treatment.
(iii). It is easy to calculate.

Disadvantages of Mean:

(i). It is not approximate for open ended classes as it will use estimated class
representatives.
(ii). It is affected by outliers (extremely small/large values)
(iii). It is rigid.

5.2.2 The Median

The median is a positional average or measure of central tendency. The median Md of N


observations of a data set depends on whether N is odd or Even.

(a) Median for Ungrouped Data

Case 1. When the number of Observations N is Odd

Md is the middle observation when the data is arranged in ascending or descending


order. The procedure for determining the median is to:-

(i). Arrange the data in ascending or descending order.


(ii). Select the middle value. This value is the median Md.

Activity 5e.

Odd number of Observations. Consider the following data: 10, 5, 10, 25, 12.

Solution. Arranged in ascending order results in : 5, 10, 10, 12, 25.


The median, Md = 10

45
Case 2. When the number of Observations N is Even

Md is the arithmetic mean of the two middle values when the data is arranged in
ascending or descending order. The procedure is to;-

(i). Arrange the observations into ascending or descending order


(ii). Select the two middle values
(iii). Find the arithmetic mean of the values obtained in (ii). The result is the
median.

Activity 5f.
Even number of Observation: Consider the following data: 10, 5, 10, 25, 12. 25.

(i). Arrangement of the data into ascending order: 5, 10, 10, 12, 25, 25.
(ii). The median is the arithmetic mean of the two middle values. Hence median
Md = (10 + 12) = 11
2

(b) Median For Grouped Data

There are two methods for determining the median of grouped data: the analytical and the
graphical methods.

(I) The analytical Method.


N cm
The formula method uses the model, Md = L+ − CFLm ) where L = Lowest Class
(
2 f
Boundary of the median class, cm =class size of the median class, N= Number of
observations, CFL = Less than cumulative frequency of the class preceding the median
class, and f = frequency of the median class.

Activity 5g
Analytical method Example: Consider the data below showing the weekly number of
packets of milk , X, bought by families, F, staying in a slum area.

^
X x F
2–4 10 6
4–6 12 2
6 –8 15 4
8 – 10 3 6
∑ fi = 20

The median number of packets of milk will be that which is bought by the family
occupying the middle position.

46
Solution

The procedure is to:-

(i) Compute the-less-than or the-greater-than Cumulative frequency

^
X x f CFL
2–4 10 6 6
4–6 12 2 8
6 –8 15 4 12
8 – 10 3 6 18
∑ fi = 18

(ii). Determine the median class. Since N = 18, the median will occupy, the 9th
position. Given c=3, N=18, f = frequency of median class (6-8) = 4, CFp =
Cumulative frequency of the class preceding the median class = 8. Hence Md =
3 18
6 + ( − 8) = 6 * 0.75 = 6.75
4 2

(II) Graphical method

This method uses either the-less-than or the-greater-than ogive or cumulative frequency


graph. The procedure is to:-

(i). Either draw the less than or the greater than Ogive.
(ii). From the vertical (the cumulative frequency) axis mark off the median
position.
(iii). Draw a horizontal line from the point obtained in (ii) above to the curve then
from the point where the line touches the Ogive, draw a vertical line
perpendicular to the value axis.
(iv). Read off the median value at the horizontal axis (the point of intersection
between the line constructed and the horizontal axis)
(v). Alternatively,
• Draw both the less than and the greater than Ogives. The two curves
intersect at a point.
• From the point of intersection, draw a vertical line to the value axis.
• Read off the median value as in (iv) above.

47
Method 1:

Graphical Method

18
17
16
15
14
13
12
11
10
CFd

9
8
7
6
5
4
3
2
1
0
0 2 4 6 8 10 12
X

Hence Md = 6.75

Method 2:

(i) Plot both the computed the-less-than and the-greater-than frequencie.


(ii) The value of the variable at the point of intersection read to scale in the
median.

^ CFL
X x F
2–4 10 6 6 18
4–6 12 2 8 12
6 –8 15 4 12 10
8 – 10 3 6 18 6
∑ fi = 18

48
Alternative Method for Median Determination

20

18

16
Cumulative Frequencies

14

12

10

0
0 2 4 6 8 10 12
Variable Value

Less-than-cumulative Freq Greater-than-cumulative freq.

Merits of the Median

The median is: -

(i). Not affected by outliers


(ii). Appropriate for open ended classes intervals
(iii). Rigidly defined
(iv). Easily calculated and to interpret.

Demerits of the Median

The median is:-

(i). It is not representative of all the observations since it is not based on all
values.
(ii). It may not be used for further mathematical treatment.
(iii). Affected by fluctuation in the data distribution.

49
5.2.3 The Mode

It is a measure of average of a data set that occurs with the greatest frequency (most
frequently). If there is only one data set value with the highest frequency then the
distribution is unimodal otherwise multimodal.

(a) Mode of Ungrouped Data

Activity 5h:

(i) A sample of 9 pineapples were selected and weighed resulting in the


following weights: 3, 4, 5, 5, 5, 6, 6, 7. Find the Modal weight.

(ii) Find the mode of the following distributions:-

x 1 2 3 4 5
f 4 3 9 12 7

Graphical Illustration (ii)

14

12

10

8
F

0
0 1 2 3 4 5 6
X

The modal value = 4

50
(ii) The data below represents the time x taken to process different blends of an
item: b Find the modal time taken and hence illustrate the distribution graphically.

x 2 4 6 8 10 12 14 16 18 20
f 4 9 13 20 10 8 12 15 20 9

The modal values are 8 and 18. This is a bimodal distribution.

Graphical Illustration

Bimodal Distribution

25

20

15
F

10

0
0 5 10 15 20 25
X

(b) Mode of Grouped Data

There are two methods for finding the mode of grouped data: the analytical and the
graphical methods.

Method 1: Analytical Method:

The procedure is to:-

51
(i). If the class type is of exclusive type, convert to inclusive type other proceed.
(The conversation process involves the determination of the Bridging factor
d= ½ gap =1/2 x 1 = 0.5 hence subtracting the bridging factor d=0.5 from
every lower limit but add it to every upper limit).
cmo ( f mo − f p )
(ii). Use the formula below to find the mode Mo. Mo = L +
( f m0 − f p ) − ( f s − f mo )

Where L = lowest class boundary of the modal class, cm0 = the modal class, class size,
f m0 = the modal class frequency, f p = frequency of the class preceding the modal class,
f s = frequency of the class succeeding the modal class.

Activity 5i:
Analytical Determination of the Mode: Consider the data below showing the
distribution of the number of 2kg Kimbo sold by 20 Kiosks in Eastlands per week.

Number of 2Kg Kimbo Number of Kiosks


0-4 2
5-9 12
10-14 4
15-19 2

Method 2: Graphical Method.


The mode is determined using the histogram. The procedure is to:-

(i). Construct a histogram for the given distribution.


(ii). From the resulting diagram, mark the highest tip of the bar as below.
(iii). Join the opposite corners of the highest peaked bar. Mark the point of
intersection E.
(iv). Draw a perpendicular line from E to the value axis (horizontal axis). Read the
Modal value, Mo.

Mo

52
Activity 5J:
Graphical Illustration: The data below shows the shoe sizes bought by female
customers at a shop in the month of December.

Shoe sizes (x) 3-4 5-6 7-8 9-10


Number of Customers (f) 4 9 13 20

Use the graphical method to determine the mode of the above distribution.

Merits and Demerits of the Mode

Merits

The mode is :-

(i). Not affected by outliers


(ii). Appropriate for open-ended class based frequency distributions.
(iii). Easy to calculate and comprehend.

Demerits

The mode is :-

(i). Not based on all observations.


(ii). Not amenable be for further mathematical treatment.
(iii). Affected most by fluctuation is data distribution.

5.2.4 Harmonic Mean

The harmonic mean, Hm is the reciprocal of the mean of the reciprocals of the variable
values in a data set.

(a) Harmonic Mean of Ungrouped Data

The procedure is to:-

Given the variable X taking values x1, x2, … xi … xn, occurring respectively with
frequencies f1, f2, f3,….fi,….fn.

1 1 1 1 1
(i). Find the reciprocal of the variable values , ,…… ,…. . If each
x1 x2 x3 xi xn
values of 1/x appears only once, then the reciprocals will appear with the same
respective frequencies.

53
fi
(ii). Determine
xi
fi
Σ 1n i
xi
(iii). Find the sum of the reciprocals. i

1 n fi
(iv). Find the Arithmetic mean of the reciprocals: Σ1 hence the reciprocal of
N i xi i
1
the result given by Hm = . This is the Harmonic mean.
fi
1 / N (Σ ) n
i
xi
The above procedure may be summarized as in the table below:-

X F 1 F
X X
x1 f1 1 f1
x1 x1
x2 f2 1 f2
x2 x2
x3 f3 1 f3
x3 x3
X4 f4 1 f4
x4 x4
. . . .
. . . .
xi fi 1 fi
xi xi
. . . .
. . . .
xn fn 1 fn
xn xn
∑f i
∑x
fi
i

fi
∑x ∑f i
i
The mean of the reciprocals is defined by . Hence the Harmonic mean is:
∑f i
f
∑x i

Activity 5k:
Find the Harmonic Mean of the data set below: 40,20,25, 14, 10, 30.

54
(b) Harmonic Mean for Grouped Data

The procedure is as below:


(i) If not grouped, group the data.
(ii) Find the midpoint of each class. The midpoints will hence be the data
value estimates.
(iii) Use the procedure for ungrouped data to find the Harmonic Mean.

Activity 5l:
Harmonic Mean of Grouped Data: Find the Harmonic mean of the following
distribution.

Sleeping Time (T) Number of Users of Pill


2-4 2
4-6 1
6-8 3
8-10 2
10-12 2

Merits and Demerits of Harmonic Mean

Merits

Harmonic mean is:-

(i) Based in all observation


(ii) Rigidly defined.
(iii) Appropriate for further mathematical treatment.
(iv) Not affected be fluctuations due sampling.

Demerits.

Harmonic mean is:-

(i) Not easy to compute.


(ii) Difficult to interpret

55
5.2.5 Geometric Mean

The geometric mean, Gm, of a distribution is the nth root of the product of the variable
values of a data set. The general procedure is as follows: Let n = Number of observations
on variable x appearing with frequency f. The table below illustrates the procedure for
determining the geometric mean.

X x1 x2 x3 x4 x5 x6 ………… xi …… Xn
F f1 f2 f3 f4 f5 f6 ………… fi …… fn

1
Σ in f i
Gm = ( x × x × x × ..... × xi × ... × x )
1
f1
2
f2
3
f3 fi
n
fn
where Σin f i = N

In order to determine the geometric mean Gm, logarithms approach is used as below:-

1
Σ in f i
In Gm = In ( ( x × x × x × ..... × xi × ... × x )
1
f1
2
f2
3
f3 fi
) n
fn

1 ( Inx1f1 + Inx2f 2 + Inx3f 3 + .... + Inxif i + ...Inxnf n )


N
= 1 ( f1 Inx1 + f 2 Inx2 + f 3 Inx3 + .... + f i Inxi + ... f n Inxn )
N

1 n
= Σi f i Inxi
N
1
Hence Gm = Antilog Σin f i Inxi
N
Merits and Demerits

Merits
Geometric mean is:-
(i) Based in all observation
(ii) Rigidly defined.
(iii) Appropriate for further mathematical treatment.
(iv) Not affected be fluctuations due sampling.

Demerits.
Geometric mean is:-

(i) Not easy to compute.


(ii) Not appropriate if any of the variable values is zero.

Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

56
Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge
University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.

Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

57
_____________________________________________________________________
Exercise 5: Descriptive Statistics
_____________________________________________________________________

5.1 Given N =7 and X i =2i +i2, Find (i) ΣXi (ii) 1/N ΣXi , where i=1,2,3,4,5,6,7.

5.2 For Zi = i -1, where i=2,4,6…10; Find (i) Σ Zi (ii) Σ(Zi)2 (iii)1/NΣZi

5.3 Suppose N=4, and Xi is the ith variable value such that Xi = -2, X2=3,
X3=-5, X4=7.Find i) ΣXi (ii) Σ Xi iii) ΣXi 2, iv) 1/N ΣXi 2

5.4 Give N=4, Xi=2i2-i, and yi=3i-1, for i=1,2,3,4.


Find i) Σ xi yi
ii) ΣxiΣyi
iii) Σxi+ Σyi
iv) Σ (xi+yi)

5.5 If k=5, xi=i+1, for i=1,2,3,4; Verify that Σkxi = kΣxi

5.6 The data below shows the marks obtained by 10 High school students in
mathematics in the first term (xi) and second term(yi)

Student First Term Second Term


1 40 55
2 50 60
3 60 70
4 80 90
5 30 40
6 90 80
7 20 40
8 55 60
9 70 90
10 60 60
2
Find: (i) Σ xi (ii) Σyi (iii) Σ (xi+yi) (iv) Σxi (v) Σyi2
(vi)1/NΣxi (vii)1/N Σxi, (viii) Σ (xi-yi), (ix) Σyi- Σxi

5.7 Discuss the following measures of location in exercise 5.7-5.12 highlighting there
merits, demerits, and application areas.

(a) Arithmetic mean


(b) mode
(c) Median
(d) Geometric mean
(e) Harmonic mean

58
5.8 The frequency distribution of the number of packets bought per month by 50
household residing in Kibera is as below:

No. of Packets of Milk 0-10 10-20 20-30 30-40 40-50 50-60


Number of Families 5 10 5 10 12 8

5.9 The table below shows the distribution of the services time (minutes) at 20 service
stations.

Service Time 2-4 4-6 6-8 8-10 >10


Number of Stations 4 2 6 2 6

5.10 The number of 2kg Kimbo cooking fat sold at a kiosk retail outlet in 40 weeks is
as below:
10 15 10 14 20 30 25 30 40 50
15 20 40 20 30 25 50 40 50 40
60 20 50 40 60 20 10 15 20 40
20 25 60 40 35 20 40 50 60 20

5.11: The frequency distribution of the number of bags of maize harvested by 50


farmers in Kitale is as below:

14 70 20 80 40 20 100 14 30 55
20 40 30 70 50 40 95 15 35 40
100 30 80 60 60 60 60 18 20 90
40 15 90 75 70 70 40 20 25 40
50 16 40 100 80 90 14 40 40 90

5.12 Indicate the best measures of location the decision makers would require
for the problems (a) – (b) stating your reasons.

(a) Shoe seller


(b) A public transporter:
(c) Dress seller
(d) Pharmacist
(e) Test evaluators
(f) Book distributors

59
________________________________________________________________
Lesson 6: Measures of Dispersion and Shape:
________________________________________________________________

6.0 Objectives

At the end of this lesson, the student should be able to determine and interpret various
measures of dispersion and shape. Specifically, the student should be able to: -

(a) Calculate and Interpret the:-

(i). Range, R
(ii). Mean Deviation, MD
(iii). Semi Inter-Quartile Range , SIQR
(iv). Variance S2 and δ2
(v). Standard deviation, S or δ
(vi). Coefficient of variations, CoV
(vii). Skewness
(viii). Kurtosis

(b) List and discuss the merits and demerits of the various measures of dispersion and
shape identified in (i) – (viii).

6.1 Introduction

This lesson will we cover various measures of dispersion which will include: Range, R;
Mean Deviation, MD; Semi Inter-Quartile range, SIQR; Standard deviation, S or δ; and
Coefficient of variations. The lesson will further deal with measures of characteristics of
the shape of statistical distribution: Skewness and Kurtosis.

6.2 Absolute Measures of Dispersion

Dispersion or variability of statistical data may be defined as scatteredness. It indicates


the extent to which the actual values of a variable are scattered around a given measure
of Central Tendency or average. Its study provides an idea about the homogeneity (less
dispersed) or heterogeneity (more dispersed) of a statistical distribution. For a complete
description of statistical data it is recognized that since a measure of Central Tendency is
only a representative value, some actual data values will be above, equal and below it.
The closer the actual values are to the measure of average, the lower the dispersion or
variability. A good measure of dispersion must have the following characteristics: - It
should: -

(i). Be capable of being used to further mathematical manipulation.


(ii). Not be affected by sampling fluctuations.
(iii). Be based on all observations, easy to comprehend and rigidly defined.

60
6.2.1 Range

6.2.1.1 Range of Ungrouped Data

The range, R is the difference between the minimum and the maximum values in a
dataset. Let Vmax = Maximum value ,Vmin= Minimum value, Ra = Absolute Range, Then:
Ra = Vmax-Vmin,

(a) Merits

(i). Easy to calculate. The data set is scanned for the maximum and the minimum
values which are then selected and the absolute and relative ranges are
determined.
(ii). Used when extreme values (largest and lowest) of data values are important
especially when what is required is the extent and not the variability of the
data.

(b) Demerits

(i). It is an insensitive measure of dispersion. That is, two distributions may have
the same range, however the dispersion about the mean significantly different
(ii). It is not informative. It only quantifies the spread in data as shown by the
extreme values. The best range would be that within which most of the data
would lie.

Activity 6a:
Consider the two data sets Set 1 and Set 2 defined as below:

Data Set1: 2, 4, 8, 50
Data Set2: 2,4, 9, 6, 15,50

The two data sets clearly have the same range, but different arithmetic means
hence different variability. Determine these two measures.

Activity 6b:
Find the absolute range for the following Data Sets:

Data Set1: 2,4,6,8,12


Data Set2: 3,4,8,10,15

6.2.1.2 Range: Grouped Data.

The procedure for determining the range for ungrouped data is as follows:-

61
(i). If not given, form a Frequency Distribution for the data.
(ii). Find the lowest class boundary of the first class Vmin, Upper class boundary
of the last class Vmax ,
(iii). Bases on the values in (ii) find Ra. and Rr.

Activity 6c:
Find the Absolute and Relative Range for the following Data relating to the
distribution of Student Marks in a Statistics Examination.

Marks Number of Students


15.5-20.5 10
20.5-25.5 10
25.5-30.5 8
30.5-35.5 5
35.5-40.5 20
40.5-45.5 5
45.5-50.5 15

6.3 Mean Absolute Deviation (MAD)

This is the mean deviation of the actual data values from the Arithmetic Mean. The
procedure is to: -

(a) Find the arithmetic mean of the distribution by summing the actual value if
the data is ungrouped or the class midpoints for grouped data taking cognizance
of the frequencies.

(b) Find The deviation of the individual actual values or midpoints for grouped
data from the mean and hence their corresponding absolute deviations.

(c) Multiply the resulting absolute deviations with the corresponding frequencies
and find the sum.

(d) Find the Mean Absolute Deviation (MAD) by dividing the result in (c) by ∑fi.
deviations.

Activity 6d:
Consider the following data showing the waiting time for service at a dispensary
clinic: 2,4,2,4,5,4,4,5,9,9,5,7,5,9.

Solution:
Let the waiting time of the ith service be xi and the corresponding number of
people waiting for treatment fi.

62
xi fi xifi xi− xmean xi− xmean fixi− xmean
2 2 4 -2 2 4
4 4 16 0 0 0
5 4 20 1 1 4
7 1 7 3 3 3
9 3 27 5 5 15
Σ fi =14 Σ xi fi = 64 Σ fixi− xmean=26

∑ xi f i
From x =
∑ fi
64
x= =4
14
∑ f i ( xi − x ) i 26
Hence MAD = = = 1.857
∑ fi 14

Activity 6e:
Grouped Data M.A.D: Find the Mean Absolute Deviation given the data below
showing the distribution of profits made by a sample of 10 local banks during the
2003/2004 financial year

Profit (Ksh.000,000) Number of banks


100-200 4
200-300 2
300-400 3
400-500 1

Solution

Let P be the profit p̂ , be the midpoint of the grouped profit, Pmean the mean
profit Pa be the assumed mean.

P f p̂ ( p̂ -Pmean) d1 = ( p̂ - Pmean)/100 f d1 | p̂ -PT| F| p̂ -PT|


100-200 2 150 -100 -1 -2 130 260
200-300 4 250 0 0 0 30 120
300-400 3 350 100 1 3 70 210
400-500 1 450 200 2 2 170 170
Σ f = 10 Σ f d1=3 Σ F|Pest -PT| = 760

63
∑ di fi 3
Therefore from =
pa = = 0.3 . Hence the true mean
∑ fi 10
∑ f i ( pˆ − p ) 760
p = 0.3 *100 + 250 = 280 . Therefore MAD = = = 76 .
∑ fi 10

6.4 Variance, hence The Standard Deviation

Variance hence the standard deviation is the most widely used measure of
variability. It is based on the most versatile measure of central tendency; the
Arithmetic Mean. It effectively summarizes the typical distance of actual data
value from the mean. The standard deviation depicts the scatteredness of such
individual data values about their common representative (their center of gravity).
Positive deviations would indicate above average individual values while negative
ones below average items If the average of these deviations were to be taken they
would total up to zero. Let di be the deviation of the variable value xi from the
mean, x . Graphically this may be represented as below. As the di tends to zero,
the individual dispersion decreases hence the total deviation tends to zero.

X2
X5
X1

X6
X4

X3

6.4.1 Variance and Standard deviation: Generalized Procedure

Let x1 ,x2,x3………..xi,xi+1…..xn with corresponding frequencies f1,f2,f3 …..fi.


fi …....fn be the data values for ungrouped data of size N or the mid points of
grouped frequency classes as in the table below. The xi may be the midpoints of
grouped data values. Let x be the arithmetic mean.

Col.1 Col.2 Col.3 Col.4 Col.5 Col.6


xi fi xifi xi- x (xi- x )2 fi(xi - x )2
x1 f1 x1f1 x1- x (x1- x )2 f1(xi- x )2
x2 f2 x2f2 x2- x (x2- x )2 f2(xi- x )2

xi fi xifi xi- x (xi- x )2 fi(xi- x )2

xn fn xnfn xn- x (xn- x )2 fn(xi- x )2


Σ fi Σ xifi Σfi(xi- x )2

64
Note:
Col.1 = Individual Values of ungrouped data variables or the Midpoints of grouped data
classes.
Col.3 = The ith value xi appearing fi times among the respective totals of the individual
variable values or the midpoints.
Therefore, in order to determine the Standard Deviation, σ, the procedure is to find:-
(a) The arithmetic mean using the standard methods. Where
∑ xi f i
x= …………………………………………………….[Col.2 and Col.3]
∑ fi

(b) The deviation of the individual values/midpoints from the mean x and hence
the square of the deviations ……………...…………..………..[Col. 4 and Col.5]

(c) The sum of the square of the deviations of data values noting that the ith
square deviation also occurs fi times, that is find Σfi(xi- x )2
…………………………………...…………………………………………[Col.6].

(d) The arithmetic mean of the sum of the square of the deviations, that is the
Variance of the variable x, denoted by Var(x), s2, or σ2 is given by:
Var(x) = s2 =σ2 = Σfi(xi- x )2 / ∑fi

Where s2 and σ2 is the sample and the population variance respectively. It is further
noted that, given that the Standard Deviation is defined as the square root of the
variance, the following relationships define the sample and population standard
deviations respectively:-
Sample Standard Deviation = S = Σfi ( xi − x )2 /(Σfi − 1) and Population Standard
Deviation =σ2 = Σf i ( xi − x ) 2 / Σf i , where ∑fi =N = Number of observations.
It is also worth noting that if the data under consideration is a sample, then a
correction is made in the denominator of the model, to take care of the number of
measures that are already estimated in this case the arithmetic mean . The correction
then results in the denominator being equal to ∑fi − 1

Activity 6f:
Variance and Standard deviation of ungrouped data: The output from one of the
processing lines producing iron slabs resulted into the sample data below defined by
weight (kg): 4,2,6,7,3,9,12. Find the Variance hence the standard deviation.

65
Solution:

Let wi be the weight of the ith slab in kg and fi be the frequency ith weight. Also let
w be the arithmetic mean of the weight or the mean weight.

wi Tally fi wifi wi- w (wi- w )2 fi (wi- w )2


2 // 2 4 -4 16 32
3 / 1 3 -3 9 9
4 / 1 4 -2 4 4
6 / 1 6 0 0 0
7 / 1 7 1 1 1
9 / 1 9 3 9 9
12 / 1 12 6 36 36
Σfi= 8 Σwifi= 45 Σ fi (wi- wmean )2= 91

∑ wi f i 45 ∑( wi − w ) 2 91
x= = = 6 . Var 9w) = = = 13 . Hence, the Standard
∑ f − 1i (8 − 1) ∑ f − 1i 7
∑ f i (wi − w )
2
Deviation σ = = 13 = 3.6
∑ fi − 1

6.4.1.1 Variation Measures of Grouped Data

Activity 6g:
Variance and Standard Deviation of Grouped Data: The sales volume of bags
of processed sugar from a manufacturing to 20 destination is as in the table below.
The volume being expressed in Ksh 000, 000. Find the standard deviation of the
distribution.
Sales volume Number of destinations
2-6 2
6-10 6
10-14 3
14-18 5
18-22 4

The procedure is to:-

a. Find the midpoints to act as an approximation (representative) of the members


of the class.

b. Follow the procedure for finding the standard deviation of the ungrouped data.

66
Solution:

Let x be the sales volume and f the number of destinations.

x x̂i fi x̂i fi ( xi − x ) ( xˆi − x ) 2 fi ( xˆi − x ) 2


2-6 4 2 8 -9 81 648
6-10 8 6 48 -5 25 150
10-14 12 3 36 -1 1 3
14-18 16 5 80 3 9 45
18-22 20 4 80 7 49 196
Σfi=20 Σ x̂i fi=252 Σfi ( xˆi − x ) 2 =1042


Σxi f i
Therefore, the mean x = = 252 = 252 = 13 . The Sample
(Σf ii − 1) 20 − 1 19
1042
standard deviation, S = Σf i ( xˆi − x ) 2 /(Σf i − 1) = = 2.04
251

6.1.4 Quartile Deviations

This is sometimes called Semi-Inter-Quartile Range (S.I.R). The procedure for


finding the S.I.R is to find:-

(a) Derive the less-than-commutative frequency.


(b) Determine the 25% and 75% position, that is 25% of N and 75% of N.
(c) Find the first and third quartiles Q1 and Q3 respectively where
c N c 3N
Q1 = L+ q ( − cf q ) and Q3= L+ q ( − cf q ) ; L is the lower class
fq 4 fq 4
boundary of the quartile class; Cq is the class size of the quartile class; fq the
frequency of the quartile class; N is the sample or population size; cfq is the
cumulative frequency of the class before the quartile class. It is noted that the
quartile value that occupies the 50% position is the Median..

6.2 Relative Measures of Dispersion

The measures of variation discussed in the earlier section, The Range, Mean
Deviation, Quartile deviation and Standard deviation are not appropriate for
comparing the dispersion between variable values of two or more datasets. The
best measures for comparison purposes are the Relative Measures of Dispersion,
the relative range, coefficient of mean deviation, Coefficient quartile deviation
and coefficient of variation.

6.2.1 Relative Range, Rr

67
As earlier stated, the range, R is the difference between the minimum and the
maximum values in dataset. Let Vmax = Maximum value, Vmin= Minimum value,
The Relative Range is defined as the ratio of (Vmax - Vmin) and Vmax + Vmin where
if Rr = Relative Range, then Rr = (Vmax-Vmin)/ (Vmax+Vmin). Simply, the Higher
the Rr, the greater the extent of dispersion.

6.2.2 Coefficient of Quartile Deviation, Cqd.

The procedure for determining the coefficient of quartile deviation has its basis on
Q1 and Q3, the 1st and 3 rd quartiles respectively and this is to: -

(a) Determine Q1 and Q3.


(b) Based on the results above, find the difference between and sum of Q1 and
Q3.
(Q − Q1 )
(c) Find Cqd where Cqd = 3 . The higher the Cqd the greater the
(Q3 + Q1 )
variability.

Activity 6h:
Find the CQD for the data below:-

Employee Number of Employees


3-5 2
6-8 1
9-11 5
12-14 2

Solution

Since the classes are of the exclusive type, the class intervals must be bridged
using the normal gap bridging procedure learnt earlier.

X f CFL
2.5-5.5 2 2
5.5-8.5 1 3
8.5-11.5 5 8
11.5-14.5 2 10

Hence, Q1 is at the 25% of 10 = 2.5th position, while Q3 is at the 75% of 10 =


7.5th position. The Q1 and the Q3 classes are respectively 5.5-8.5 and 8.5-11.5.
c N c 3N
From Q1 = L+ q ( − cf q ) and Q3= L+ q ( − cf q ) . Substituting the
fq 4 fq 4
respective values from the table: - Q1 = 5.5 + 3/1(10/4-2 ) = 7, while Q3 = 8.5 +
3/5 (3%10/4-3) = 8.5 +2.7 = 11.2. Hence CQD = (11.2-7)/11.2+7) = 2.7.

68
6.2.3 Coefficient of Variation, Cvar

This is the most efficient measure of relative dispersion for comparing variability
σ
in more than one data set. It is defined by Cvar = x 100. The procedure for
x
finding Cvar is to determine:-

(a) The arithmetic mean and the standard deviation of the given data.
(b) Cvar using the results in (a), the coefficient of variability on the basis
of the model given above.

Activity 6i:
The data below represents the lifespan of electric bulbs(in months) produced at
two different factories in Kisumu and Nairobi. Determine the Coefficient of
Variation of the factory producing more dependable bulbs i.e the most dependable
factory.
Number of Bulbs
Lifespan Kisumu Nairobi
2-4 8 2
4-6 10 25
6-8 20 3
8-10 2 10

Solution:

x X̂ K fk fn X̂ K fk X̂ fn fk ( X̂ K - X K )2 fn( X̂ K - X& N ) 2
2-4 3 8 2 24 6 62.72 10.58
4-6 5 10 25 50 125 6.4 2.25
6-8 7 20 3 140 21 28.8 8.67
8-10 9 2 2 18 18 20.48 27.38
40 32 118.4 48.88

118.4
X K = 232 / 40 =5.8 and X N = 170 / 32 = 5.3 . Hence σ K = = 2.96 and
40

118 . 4 48 . 88
σ K = = 2 . 96 and σ N = = 1 . 53 The coefficient of
40 32
σK 2.96
variation = × 100 = × 100 = 51 while Coefficient of variation.
µk 5.8
σN 1.53
× 100 = × 100 = 29 .
µN 5.3

69
Since the CV of the bulb life span from Nairobi is lower than that of Kisumu, the
former bulbs have a lower level of unpredictability hence subject to random
failure. Therefore the Nairobi bulbs are better in terms of lighting life behaviour.

6.2.4 Semi Inter Quartile Range ( SIQR).

SIQR, is the quartile deviation that measures the extent of variability between the
groups of data in the lower quartile and those of the upper quartile .The procedure
is to determine:

(a) Q1 and Q3 using the standard formulae.


Q3 − Q1
(b) From the values of Q1 and Q3, in (a) find SIQR =
2

Activity 6j
(i) Consider the following data:- 2,6,8,15, 18, 25,30
(ii) Consider the following data:, hence the Semi Quartile Deviation.

Waiting Time Number of Customers


12-16 15
17-21 10
22-26 17
27-31 10
32-36 18

6.3 Measures of Shape.

Measures of shape are statistical tools for determining the degree and direction of
the shape of frequency distribution of statistical data values. The shape of a
distribution can either be symmetrical or asymmetrical in nature. If a frequency
polygon is such that when a vertical line constructed from the center of its peak to
the independent variable axis (horizontal), it partitions the area under the curve
but above the axis then the distribution is said to be symmetric otherwise it is
asymmetric. The statistical tools for determining the degree and direction of shape
are Skewness and Kurtosis.

6.3.1 Skewness

Skewness is a measure of the degree asymmetry of a frequency distribution. This


measure has its basis on the Arithmetic mean, x , the Median, Md and the Mode,
Mo. The following cases illustrates the various types of shape of a frequency
polygon by its extent of skewness:-

Case I: x = Mo = Md

70
25

20

15

10

0
-15 -10 -5 0 5 10 15

Case 2 x > Md > Mo. This is the case of a positively skewed distribution
(skewed to the right)

Examples:-

(a) In an examination, majority of the candidates scored lower marks or


fewer candidates scored higher marks.

(b) In an investigation of Profits made by selected firms, majority of the firms


investigated realized lower profit margins.

71
60

50

40

30

20

10

0
-10 -5 0 5 10 15 20 25 30

Case 3:x < Mo < Md This is the case of a negatively skewed distribution
(skewed to the left)

Examples:-

(a) In an examination, majority of the candidates scored lower marks or


fewer candidates scored higher marks.

(b) In an investigation of Profits made by selected firms, majority of the firms


investigated realized lower profit margins.

72
40

35

30

25

20

15

10

0
-30 -25 -20 -15 -10 -5 0 5 10

6.3.1.1 Measures of Skewness

The degree of Skewness in a data set as exhibited by its frequency polygon is


measured by among others the Pearsonian Coefficient of Skewness, Sk. where
x − Md
Sk = 3( ). The measure Sk is such that –3≤ Sk ≤ 3. The Sk
σ
discrimination is as below: -

Discriminant Type of Skewness


Negatively skewed -3≤ Sk<0
Perfectly symmetric distribution Sk=0
Positively skewed 0<Sk ≤3

Activity 6k:
The time, x in second needed to serve a sample of 50 customers at Wananchi
Bank in March is as below. Using the Pearsonian Coefficient of Skewness, Sk
determine the degree and direction of Skewness of the distribution and interpret
X 10-20 20-30 30-40 40-50 50-60
F 4 10 20 8 8

73
Solution

Step-1
Determine the Arithmetic Mean, xmean and the Standard Deviation, S.

x x̂i fi x̂i fi ( x̂i - x ) ( x̂i - x )2 fi( x̂i - x )2


10-20 15 4 60 -21 441 1764
20-30 25 10 250 -11 121 1210
30-40 35 20 700 -1 1 20
40-50 45 8 360 11 121 648
50-60 55 8 440 19 361 2888
3fi=50 3 x̂i fi =1810 3 fi( x̂i - x )2=6530

Hence xmean = 3 x̂i fi/3fi = 1810 / 50 ≅ 36.


Therefore Var (x) = 3 fi ( x̂i - x )2/ 3fi = 6530/50 ≅31. Hence s = 311/2

Step-2
Determine Median, Md.

x x̂i fi CfL
10-20 15 4 4
20-30 25 10 14
30-40 35 20 34
40-50 45 8 42
50-60 55 8 50
3fi =50

The Median Value Md lies at 1/2 % 50 = 25th position. Hence, from, Md = L +


c/f(N/2 - Cfb), Md = 30 + 10/20 (50/2 - 14) = 34.5. Therefore Sk =3(xmean - Md)/s =
3(36-34.5)/361/2. The distribution is hence positively skewed, that is, most people
spent less time in the queue. This partly implies that the tellers are efficient.

6.3.2 Kurtosis

Kurtosis is a measure of “Peakedness” or “Flatness” of frequency distribution


polygon. The criterion for assessing the Peakedness or flatness of a polygon is the
normal distribution. There are 3 cases of kurtosis:-

Case 1: Leptokurtic
This is the shape of a distribution where most of the observations are highly
concentrated near the mode. The polygon is more peaked than the normal
distribution polygon.

Case2: Mesokurtic (Normal Distribution)

74
The polygon has a less peaked distribution and has more observations in its
shoulders than a Leptokurtic distribution.
Case 3: Platykurtic
The polygon is a flatter, plateau like, frequency curve.

25

20

15

10

0
-8 -6 -4 -2 0 2 4 6 8 10

Leptokurtic Normal Distribution Platykurtic

6.3.2.1 Measurement of Kurtosis


The Measure of Kurtosis is KUR. Where KUR = 3i-1 fi (xi -xmean)/Nσ4. Where N
is the number of observations, σ the standard deviation and xmean is the arithmetic
mean. The greater the KUR ,the more peaked the distribution the smaller the
flatter.

Lesson 6L:
Using the example above for the determination of Skewness, find the Kurtosis for
the above distribution.

Solution:

x xmid fi
10-20 15 4
20-30 25 10
30-40 35 20
40-50 45 8
50-60 55 8

75
Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge


University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.

Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

76
________________________________________________________________________
Exercise 6: Measures of Dispersion and Shape
_______________________________________________________________________

6.1 In exercise 1 - 10
Find (i) the Pearson’s Skewness coefficient (ii) Kurtosis, hence draw on these
distributions using the same horizontal axis scale for each data set in numbers 1 -
9and interpret

1. 3,3, 2,15,7,2 6. -2,-4,4,2


2. 10,20,25,5,15,0 7. 9,10,12,12,10,12,10
3. 8,2,2,2,4,3 8. 4,2,3,4,6,4
4. 0,0,2,1,4,5,6 9. 6,4,4,8,-2
5. 6.5 4,4,4,2,4,4,4 10. -2, -2,-4,2,3

6.2 In exercise (i)-(iv), calculate and Interpret

(a) Pearson's Skewness coefficient


(b) Kurtosis.

(i) The frequency distribution of waiting time, to the nearest minute, prior to
voting in a polling station for a sample of 100 potential voters is as
below:-

Waiting time: 15-20 20-25 25-30 30-35 35-40 40-45


No. of voters 10 2 10 40 13 25

(ii) Mastermind Tobacco Co. undertook to investigate its tobacco leaf


development at its Leaf Research center. 32 leaves were selected at
random and their width measured in millimeters, resulting in the data
below:

15.4 30.4 16.7 50.1 14.8 19.1 32.3 44.3


16,3 15.5 25.5 16.1 25.5 14.6 14.5 35.5
15.2 16.5 30.4 14.2 16.5 15.4 17.8 14.3
20.0 14.0 40.4 17.3 17.3 20.7 45.5 15.5

(iii) The data below show the monthly salaries for a sample of 50 employees in
the public and private sector.

Sectors
Salary (Ksh.000) Public private
2-4 4 5
4-6 8 10
6-8 12 10
8-10 14 15

77
10-12 2 5
>12 10 15

(iv) The end of semester examination results awarded to 40 Economics students by the
Lecturers in Microeconomics and Quantitative Techniques is as in the table
below:

COURSE UNIT
Marks Microeconomics Quantitative techniques
10-20 1 5
20-30 2 4
30-40 0 2
40-50 10 8
50-60 5 8
60-70 15 8
70-80 5 3
80-90 2 2

6.3 Discuss how the measures of Skewness can be used in decision making in :
(a) Teacher evaluation
(b) Auditing
(c) Manufacturing processes

78
________________________________________________________________
Lesson 7: Time Series Analysis, TSA
________________________________________________________________

7.0 Objectives

At the end of this lesson, the student should be able to determine and interpret measures
of Time Series Data. Specifically, the student should be able to: -

(a) Define Time Series data and its component citing various practical examples.
(b) Calculate and Interpret the:-
(i) Trend Line,
(ii) Secular Variation,
(iii) Cyclic variation
(iv) Seasonal variation.

7.1 Introduction

In this lesson, we will learn what time series data is, the definition of Time Series and its
components and their measurement and interpretation. In order to objectively undertake
Time Series analysis various models will be derived.

7.2 Definition of Time Series and its Components

Statistical data is considered as time series if the figures are generated and recorded
chronologically. Time Series Analysis involve the use of various models to analyze time
dependent statistical data by insolating various characteristics defining the movements in
such data. Examples of Time Series data include: yearly sales volume, quarterly profit,
monthly revenue, yearly school dropout rates Time series data consist of four
components: The Secular Trend, Cyclic Variation, Seasonal Variation, and Irregular
Variation. TSA is hence the analysis of the impact of these four components on the
performance of time series data

Activity 7a:
Lists six characteristics time series that a data set musts meet

7.3 Components of Time Series Data

(a) Secular (long term) Trend, T

This describes the general increase or decrease (movement) in the variable value over a
long period of time. The steady long-term increase in the cost of living say as measured
by Consumer Price Index (CPI) is an example of circular trend.

Examples: Increase of population, and Steady decrease in agricultural production.

79
(b) Cyclic Variation, C

These are fluctuations that do not follow specific patterns, moving in a unpredictable
manner. In business, peak and slump of business performance. In an economy:
Recession, Growth, Decline and stagnation are some of the examples.

(c) Seasonal Variations, S

These are patterns of change within a year and is repeated yearly e.g. Christmas sales,
volume of school text bought every school start time, and Volume of umbrellas bought
during the rainy season. Due to the regularity of Seasonal variations this component is
normally used in, for example, business forecasting.

(d) Irregular Variation, I

This is a component of Time Series caused by to random fluctuations, non-recurring,


irregular influences e.g. floods, and wars.

Activity 7b:
Discuss the four components of Time Series citing for examples for each

7.4 Time Series Models

The four components of Time Series Analysis are known to interact either
Multiplicatively, or Additively to produce the observed values that characterizes time
series data. Three models hence arise from these interactions as shown in below.

Let OTSD =Observed Time series data values.

Multiplicative model: OTSD = T×C×S×I


Additive Model: OTSD =T+C+S+I
Mixed Model: OTSD=(T×C)+S+I…e.t.c
.
For purposes of our analysis in this lesson, the Multiplicative model will be used. Further,
it is noted that to be able to analyze the impact of each component while assuming the
others constant, the rest of the components are eliminated and the effect of the candidate
component is analyzed as discussed in the sections that follow.

7.5 Secular Trend Analysis

Trend is analyzed so as to allow the decision maker to describe the historical pattern in
data with a view to
(i) Evaluating the performance (success or failure) of the
past policy
(ii) Forecast the past patterns into the future so as to help in
the estimation of some future values

80
(iii)
Eliminate the trend component from time series so as to study the other
components.
Trend may be linear or non linear (Parabolic, exponential, Logarithmic), increasing or
decreasing.

7.5.1 Trend Measurement Methods

This involves the constants of the equation that has been chosen to represent the trend; y
= β0 +β1x, y = β0 +β1x+β1x2, or y=β0β1x. The latter is linearized by taking its logarithm
resulting into Iny =Inβ0 +x Inβ1 where Inβ1 = slope and Inβ0 is the intercept of the
vertical axis .

(a) The Free hand method

The trend line is determined by inspecting the graph that best fits the series. From the
judgmental test line, a trend equation is determined through the determination of the
constants of the appropriate equations: linear, parabolic, or logarithmic.
Consider, for example the general linear equation y = β0 +β1x. The procedure for free
hand method for determining the trend line is as below: -

(i) Read the trend value of the 1st and the nth period.
(ii) Determine the difference, denoted by ∆y, between y values of the 1st
and the nth values.
(iii) Find the slope ∆y /∆x ( =slope, β1) and hence read β0, the
y = intercept (=the value of the 1st. period if it is at the origin).

The main disadvantage of this method is that it is subjective, since there is no statistical
criterion for ascertaining its adequacy this apart from its being consuming.

(b) Semi Average Method, S.A.M


There are two cases regarding the Semi-Average Method.

Case 1: When number of periods, N=Even,


The procedure is to:-

(i) Partition the series under consideration


(ii) Find the arithmetic mean of each segment.
(iii) Draw the straight line passing through the two averages from the
partitioned series.

Activity 7c:
Determine of Trend Line: SAM Method

Year Profit (Ksh.’000,000)


1990 46
1991 55
1992 120

81
1993 80
1994 45
1995 90
1996 80
1997 75

Solution:

77

76

75

74

73

72

71

70
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998

Case 2: Number of Periods N = ODD. In this case there are three methods for
separating the series.

(i) Method 1: Add half of the value of the middle period to


the value of each part.
(ii) Method 2:Add the total value of the middle period to
the total value of each part
(iii) Method 3:Drop the value of the middle period from the
computation of the average.

When the S.A.M. method is used, the middle point is considered as the origin The y-
intercept, β0 and the slope β1 are derived from: β0=(S1+S2)/(t1+t2) and

82
β1=(S2-S1) /t1 (N-t2); where S1 and S2 are partial sums of the segments t1, t2, the number of
time units for the 1st and 2nd segments and N =total number of periods

Activity 7d:
Determination of Trend Line: Find the trend line using the S.A.M method using the
data below:

Year 90 91 92 93 94 95
Sales vol. (000,000) 14 20 15 18 19 21

Solution
Year Sales Volume S.A.M.
90 14
91 20 (14+20+19)/3=18
92 15
93 18
94 19 (18+19+20)/3=19
95 21

(i) Plot the points (91,18) and (94,19) together with the
actual data. From the resulting trend values find the trend line.

83
Sales Volume (1990-1995)

25

20
Sales Volume (1000)

15

10

0
89 90 91 92 93 94 95 96
Year

(ii) From β0=(S1+S2)/(t1+t2) and β1=(S2-S1) /t1 (N-t2), β0 = (53+57)/(3+3) = 18, and β1
= (57 -53)/3(6-3) = 4/9. Hence from y = β0 +β1x, y = 18 +4/9x.

Activity 7e:
Find S.A.M. and the resulting trend line based on the data below:

Year 90 91 92 93 94 95 96
School dropouts 200 90 150 80 110 250 200

The main drawbacks of the Semi Average Method are that:-

(i) Outliers affect it since it depends on the arithmetic


mean.
(ii) If any of the two extremes contain more depressions
then the trend line is not a true representation of the secular movement.

(c) Moving Average Method.(M.A.M)

A moving average is a series in which each period figure is replaced by the mean of the
values of that period range and those of the number preceding and succeeding periods.
The procedure involves the determination of moving averages based on n-yearly moving
totals. The procedure for determining the moving averages is to:-

84
(i) Find the moving totals by computing the totals for the first n (odd or
even) data values.
(ii) By omitting the first data item in the preceding group of data values
generate subsequent moving totals. This is repeated until the total for
all n data values are determined.
(iii) Determine the moving averages (trend values) by dividing the
resulting moving total by the period size, n. Each total and hence the
corresponding average is then centered corresponding to an
appropriate point in time.

Activity 7f:
Odd Period Moving Total: Compute the Trend Values using the Three-year Moving
Totals.

Year Sales. 3-YR Moving Total (MT) 3-YR Moving Average (MA)
1 10
2 12 37 12
3 15 47 16
4 20 42 14
5 7 39 13
6 12 29 10
7 10

3-Yr MT denotes the three yearly moving Totals determined as follows: 37 =(10+12+15),
47 = (12+15+20)), 42 =(15+20+7), 39 = 20+7+11), 29 = (7+12+10). While 3-Yr MA
denotes the three yearly moving averages (the trend values) determined by dividing each
of the resulting 3-Yr MT by n = 3. The value of each centered year is replaced by the
mean of the value of the three successive years. Both the three-year moving total and
moving average are centered as indicated above. The above procedure is true for all odd
number of moving average periods. By plotting the trend values and the actual values on
the same graph, the graph below is obtained.

85
Sales Volume ('000)

25

20
Sales Volume

15 Actual Sales

3 yr MA
10

0
1 2 3 4 5 6 7
Year Code

Activity 7f:
Even Period Moving Total: Given the data below representing the sales of volume of
100kg sacks of maize, in thousands of shillings from Jan-October. Determine the two
yearly moving averages and plot the result together with the original data on the same
graph.

Year 1 2 3 4 5 6 7 8 9 10
Sales 2 4 4 3 2 4 6 5 6 2

Solution:

The procedure is to
(a) Determine the two-yearly moving totals using the standard method discussed
for the case when n = odd. However, the two-yearly moving totals just as for
any even number period, are not centered as required.

(b) Center the result, by determining the two-yearly moving totals centered (2-Yr
Moving Total Centered).

(c) Divide the result in b) by 2n to obtain the centered moving averages, the trend
values. This is equal to adding four successive data values as below.

86
Year Sales 2-Yr MT 2-Yr MTC 2-Yr MA
1 2
6
2 4 14 3.5
8
3 4 15 3.75
7
4 3 12 3
5
5 2 11 2.75
6
6 4 16 4
10
7 6 21 5.25
11
8 5 22 5.5
11
9 6 19 4.75
8
10 2

The Graphical illustration of the above result is as below:-

Sales Volume (10 Years)

5
Sales Volume

4
Sales
2-YR MA
3

0
0 2 4 6 8 10 12
Years

87
(d) Method of Least Squares:

The above three methods are inefficient when applied in forecasting of the performance
of say a business entity. The method of least squares is a more rigorous mathematical
techniques for determining of trend values and hence forecasting purposes. The
assumption for the methods of Least Squares (LSM) is that the line fitted through the
scattered data values is the line of best fit. Consider the general line y = β0 +β1x, where
the best linear estimators for the parameters β0 and β1 are βˆ0 and β̂1 . LSM is the most
appropriate tool for determining trend line for data fitting linear and non-linear models.
The procedure is similar to that used for regression analysis. That is, given y = β0 +β1x,
then the normal regression equations Σy =nβ0 +β1Σx and Σxy = β0Σx +β1Σx2 are solved
simultaneously. However, the solution is simplified by noting that for times series data
β0 and β1 can be obtained by assuming that the middle of the series as the origin. In this
case and noting that the time units are usually uniform and consecutive then ∑x=0. Σy
=nβ0 and Σxy = β1Σx2. Hence, Σy/n =β0 while Σxy/Σx2 = β1

Activity 7g:
Determination of Trend Lines using Method of Least Squares: Determine the trend
line using the method of least squares for the following crop yield data from1985 - 1991.

Year 85 86 87 88 89 90 91
Yield 10 12 4 6 8 6 10

Solution:
Assuming the origin to be the mid point then 1988 will be assigned a value of zero.

Year x y x2 xy
1985 -3 10 9 -30
1986 -2 12 4 -24
1987 -1 4 1 -4
1988 0 6 0 0
1989 1 8 1 8
1990 2 6 4 12
1991 3 10 9 30
2
Σy =56 Σ x =28 Σxy = -4

Therefore β0 = Σy/N =56/7 = 8, and β1 = Σxy /Σy = -4/28=-1/7. Hence, substituting


these values into the general equation y = β0 +β1x, the trend line resulting is y=8-1⁄7-x
When say x = 0(year=1988), the crop yield is 8 units. In many practical situations, the
linear trend lines do not always result. A case in point is the case of the growth and decay
situations. These normally result into non -linear models. To be able to fit the trend lines,

88
the curves must be linearized through the use of logarithms. For example the exponential
trend line y = β0β1x . The procedure would be to find: -

The logarithm of the equation:-


Iny = Inβ0β1x = Inβ0 + x Inβ1; Σ Iny= Σ Inβ0 + Inβ1Σx =n Inβ0 +
Inβ1Σx. At the mid point x = 0, and using the normal equations: -
ΣIny = nInβ0+ Inβ1Σx……………..……………………….….(i)
ΣxIny = ΣxInβ0 + Inβ1Σx2…………………………………….(ii).
Due to the assumption of Σx=0, Σ Iny= nInβ0 hence, Inβ0 =Σ Iny/n.
β0 = Antilog(ΣIny/n) and Inβ1 = ΣxIny/Σx2. Hence,β1= Antilog(ΣxIny/Σx2).

7.6 Seasonal Variation.

These are the pressures on Time Series data that are as a result of man-made or natural
phenomena. Seasonal Variations are usually repetitive and periodic, less than one year,
week, month or quarter. The Seasonal Variation studies aid decision makers to:-

(a) Establish of past changes.


(b) Projection future patterns based on the past performance.
(c) Eliminate seasonal variation so as to allow for the study of cyclic variation. This
process is called diseasionalisation.

7.6.1 Seasonal Variation Measurement.

The Ratio –to Moving method is used to measure Seasonal Variation .It describes the
degree of seasonal variation. The procedure is to:-

(a) Identify the seasonal component of the series

(b) Express the result in (a) as a percentage

Activity 7h:

Determination of Seasonal Variation. Given the data in the first three columns of the
table below, determine the measures of seasonal variation using the procedure below:-

(a) Determine the four -quarter moving totals for the series .This total is written
between the II and III
(b) Compute the four quarters centered totals by adding the two successive moving
totals.
(c) Divided each of the centered moving totals by 2n.
(d) Plot the original data smoothed values in (c) in the same graph.
(e) Collect all the percentages of actual values to moving average to arrange them by
quarter

89
Year ¼ Turnover 4-QMT 4QMTCentred 4QMA Centered
85 I 6
II 4 15
III 3 13 28 3.5
IV 2 15 28 3.5
86 I 4 15 30 3.25
II 6 18 33 4.12
III 3 16 34 4.25
IV 5 15 31 3.25
87 I 2 16 31 3.25
II 5 13 29 3.63
III 4
IV 2

Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge


University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.

Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

90
________________________________________________________________________
Exercise 7: Time Series Analysis
________________________________________________________________________

7.0 Discuss the use of time series in studying:-

(a) Business activities


(b) Health care performance
(c) Agriculture production
(d) Weather pattern
(f) Educational development

7.1 Describe the following components of time series data illustrating the components
graphically.
(a) Secular trend
(b) Cyclical variation
(c) Seasonal variation
(d) Irregular variation

7.2 State any four causes of each of the following components of the series.

7.3 Discuss the models of studying time series assuming the components interact;
(a) Multiplicatively
(b) Additively
(c) Both Multiplicatively and Additively

7.4: State the main practical reasons for studying;


(a) Secular trend
(b) Cyclic variation
(c) Seasonal variation
(d) Irregular variation

7.5: the maize output data in the Rift Valley form 1993 to 1997 in tons per quarter is
given below:

OUTPUT (in Tons)


YEAR QTR.1 QTR II QTR III QTR IV
1993 2 - 8 2
1994 4 2 6 5
1995 3 1 8 2
1996 4 4 6 5
1997 6 1 4 1

Determine the trend value using;


(a) The semi-average method
(b) The moving average method

91
(c) The method of test square

7.6: The data below is the industrial output index for Kenya with 1980 as the base
year.
Year 1975 1976 1977 1979 1980 1981 1982 1983
Output index 120 130 140 150 141 15o 155 145
Fit
(a) Linear trend to the index
(b) Parabolic trend line to the index

7.7: Given below is the performance of the Komala share index at the NSE;
YEAR SHARE INDEX
1980-81 120
1981-82 90
1982-83 111
1983-84 115
1984-85 121
1885-86 119
1986-87 118
1987-88 121
Fit:
(a) The linear trend lines for the index
(b) Paraboric trend line for the index
(c) Based on the model resulting in (a) and (b) find the share index for the
years 1994 and 1996

7.6: The data below represents the number of passes in a national examination by
students of Komala High school.
Year 80 81 82 83 84 85 86 87 88 89
Passes 40 58 78 98 75 65 63 45 78 87
(a) Determine the Trend values using the 2,3,4,and 5, year period.
(b) Using the results in (a), draw the graphs on the using the same axis the original
and the trend values.

7.7: The quarterly wholesale prices for a group of a chain of wholesale outlets in
Western Kenya is as below:

Yearly Wholesale Prices


Quarter 1990 1991 1992 1993 1994
I 400 550 450 480 550
II 200 600 250 600 650
III 450 450 550 450 600
IV 350 500 650 450 555

Determine the seasonal indices using


(a) Ratio to trend method
(b) Method of link averages

92
(c) Using the results in (a) compute the deseasonalised data values
(d) Determine the trend line using the moving average method.

7.8: Given the following data on fish output at Kendu Bay Beach, Nyanza:
Fish yield (1000kg)
Year Jan Feb Mar April May June July Aug Sept Oct Nov Dec
1990 84 61 50 48 72 45 90 70 55 72 40 45
1991 75 55 45 60 62 84 61 85 60 55 80 90

Determine the seasonal indices hence the deseasonalised data values based on the
moving average method using 2 monthly period for each year.

7.9: A straight-line trend by the method of least square was fitted to the Rice yield of
Kano plains for six years 1980-1985.
(a) The equation showed that in 1980 the yield would be 20,000 and 17,500 in 1985,
find the equation using 1983 as the origin.
(b) Determine the monthly annual yield in 1984.
(c) Change the annual equation in (a).to a monthly equation with the origin at June
15th. 1984.

7.10. The fuel consumption volume (million of shillings) of Oginga Odinga Petrol
station along Ahero-Othoro Rural road is as in the table below:
Year 80 81 82 83 84 85 86 87
Volume 4.6 8.5 2.8 5.2 10.1 8.3 5.9 6.7

(a) Determine the trend values using


(i) A three yearly moving average method
(ii) List squares method assuming a parabolic fit.
(b) The cyclic variation using the trend values in (a)
(c) The seasonal indices hence the deseasonalized data values.
(d) Predict the consumption volume for 1988and1989 using the results in (a).

7.11 The public’s concern in the performance of the judiciary has promoted the A.G to
order an investigation to determine the rate of disposal of cases criminal cases in a
sample of Kenya’s courts. The data below shows the proportion of cases disposed
or based on the potential cases between 1985-1989 :
Year 1985 1986 1987 1988 1989
Proportion of cases 0.4 0.6 0.25 0.45 0.35
(a) Fit a linear trend line on the above data.
(b) Find the trend values using a yearly moving average method.
(c) Based on the result of (a) and ( b)determine the cyclic fluctuations.
(d) Find the seasonal data values hence the deseasonalised indices.
(e) Find the projected number of criminal cases disposed of using the models in a)
and b) for 1985 given a potential number of cases of 20,000.

93
7.12 The data below indicates the number of cases of the outpatients casualty facility
of the new Nyanza General hospital between 1990-1997 (Cases being those suffering
from malaria)
Year 1990 1991 1992 1993 1994 1995 1996 1997
Patients 1000 999 840 1500 1040 1200 1000 1150
Fit
(a) Linear trend line on the data
(b) A parabolic trend line on the data
(c) Use the following methods to determine the trend values.
(i) Semi moving average method
(ii) The yearly moving average
(d) Graph the results (a) and (b) together with the actual data.
(e) Using the values in (a) and (b)
(i)Find the cyclic variation
(ii)The seasonal indices hence the deseasonalised indices .
(f) Based on the results Find the :

94
________________________________________________________________
Lesson Eight: Index Numbers
________________________________________________________________

8.0 Objectives

At the end of this lesson, the student should be able to determine and interpret various
types of Index Numbers. Specifically, the student should be able to: -

(i) Define the Concept of Index Numbers and various Types of Index
Numbers.
(ii) List and Discuss Index Number construction considerations.
(iii) Distinguish between Simple and Aggregated Index Numbers citing
their advantages and disadvantages.
(iv) Calculate and Interpret the Paasche’s, Laspyre’s Index Numbers and
their derivatives.

8.1 Introduction

An index number is a relative measure of the performance of a phenomenon (Business or


Non-Business) between two temporal or special points. Whether temporal or spatial the
reference point for comparison is called the base period while the point at which
comparison is made the current period. Hence, generally an index number is used to
measure the changes in magnitude between the periods.

8.2 Index number Construction


Prior to constructing Index Numbers, the following must be taken into consideration:-
(a) The items included in the index number consideration should be specific
to the variable and representative of the population. Due to time and cost
considerations, in selecting a representative sample of items to be included the
sample size should allow the constructed index number achieve its purpose.
(b) An index number should be reliable. Reliability of an index number is
hinged on the values of its constituent items. Hence the data source should be
provide accurate dataset values.
(c) Comparability: Data items forming the index number should be
(d) comparable. A compromise must be reached between an index whose components
are fixed and one which keeps changing
(e) Data source and collection: The base period should not be far from the current
due to inevitable changes that impact in a given situation.
(f) The base period should be a period the phenomena performing at 100% level.

Further, the analyst must ask herself the following fundamental questions: -

(a) What is the purpose of the index number?


(b) Is the required Index number price, volume or value based?

95
(c) Who will use the constructed index number i.e. Family or household. Men,
Women, or children, Industry or country etc?
(d) What is the status of the potential user? Rich or Poor, Large or Small?
(e) What is the index number start date or point I e the base period?

Generally there are two approaches for constructing an index number: The Relative and
the Aggregate methods.

8.3 The relative method

It may be price, volume, or value relative. This involves a comparison between the p
price, volume, or value relative of the number of items/ commodities in the base period
and the current period. That is, express the price, volume, or value relative of the current
period as a percentage of that of the base period.

8.3.1 Price Relative index , PRI

The procedure is as below: -

(a) Let Poi = price of item i in the base period o, Pni = price of the commodity i in the
current period n. Therefore Price Relative of the ith commodity is given by
PRIi = Pni/Poi for every ith.
.
(b) Find the sum of the Price relatives: - ∑ PRIi = ∑(Pni/Poi).Divide the resulting Price
Relative ∑ PRIi of the ith item by the number of items N to obtain the Arithmetic
Mean of the Relatives, Therefore PRI =(∑Pni/Poi)/ N

8.3.2 The quantity relative: QRI


The above procedure for computing PRI is the same as that of QRI.
Therefore QRI = =(∑qni/qoi)/ N. Where QRI is the quantity Relative index derived from
qni is the quantity of the ith item in the current period n and qoi is the quantity of the ith
item in the period o.

Activity 8a:
Suppose that households still purchase the same quantity in 1999 as they did in 1997, but
that prices of the items used have changed over years as in the table below
PRICE (Ksh.’000)
ITEM 1997 (100%) 1999
Milk 18 20
Matches 2 3
Salt 5 6
Sugar 20 24
Unga Ugali 45 55
Charcoal 300 450

96
Find the total Relative Index and interpret the results. (100% indicated in parenthesis for
1997 shows that this year is the base year while 1999 was a current year).

Solution:

ITEM Poi Pni Pni/Poi


Milk 18 20 1.11
Matches 2 3 1.50
Salt 5 6 1.20
Sugar 20 24 1.20
Ugali 45 55 1.22
Charcoal 300 450 1.55
∑Pni/Poi = 7.73

pni xp0i )
pon = ( ) x100
N

pon = 7.73 X 100 = 128


6
Interpretation
There was on average, a 28% increase in prices between 1997 to 1999

Activity 8b:
Now suppose that the households still pay the same price per unit in 1999 as did in 1997
but that the quantities they use have changed over the years as indicated below:-
Quantity
ITEM 1997 (100) 1999
Milk 300 240
Matches 500 450
Salt 250 500
Sugar 500 650
Ugali 450 500
Charcoal 3120 100

Find the quantity Relative Index and interpret.

ITEM q0i qni ∑qni/qoi


Milk 300 240 0.80
Matches 500 450 0.90
Salt 250 500 2.20
Sugar 500 650 1.30
Ugali 450 500 1.11
Charcoal 3120 100 0.83
∑qni/qoi = 7.14

97
qoi = ith item quantity in base year o while qni = ith item quantity in current period n.

Therefore qon = ∑(qni/qoi)100 where N= 6


= (7.14/6)100 = 119

Interpretation:

On average there was a 19% increase in quantity purchased and consumed between 1997
and 1999.

Disadvantages of Relative Indices

Relative index numbers are sensitive to the units the items are measured in.
The measure of central tendency to use in their determination is difficult and may be
subjective hence biased.

8.4 Aggregate Method and Types

The basis for the aggregate index Numbers is the price/quantity index Numbers; PIN = Pin
and QIN = Qin The procedure is the same as that of Price relatives however instead of
adding and obtaining the of price/quantity relatives

Pon = Pin = (∑ Pinx100)/∑Poi


Qon = Qin =( ∑qnix100)/ ∑qoi

But, Pon is highly affected by an item with high\low prices. Similarly Qon will similarly be
affected. Also the unit in which the items are measured do have some impact on Pon and
on. These shortcomings of Pon and Qon are eliminated through aggregation by weights.
From the onset, it must be stated that prices are usually weighed by quantities and
quantities by prices.
L

(a) LASPEYRES PRICE INDEX, on .


p
The Laspeyre’s Price Index usually uses the base year quantity as weights.
L
p on
= (∑ p ×q
ni oi
× 100) / ∑ p qoi
oi
L

(b) LASPEYRES QUANTITY INDEX,


q on

The Laspeyre’s Quantity Index usually uses the base year prices as weights.

L
q on
= (∑ q ×
ni
p oi
× 100) / ∑ q
oi
poi
p

(c) PAASCHE PRICE INDEX, on


p
The Paasche’s Price Index usually uses the base year quantity as weights.
p
p ni
= (∑ p ×q
ni ni
× 100) / ∑ p qni
oi

98
p

(d) PAASCHE QUANTITY INDEX


q ni

The Paasche’s Quantity Index usually uses the base year prices as weights.
p
q ni
= (∑ q ×
ni
p ni
× 100) / ∑ q
oi
pni
Note: LASPEYRES VS PAASCHE INDEX NUMBERS

Laspeyre’s index attempts to determine the value of the base year commodities
when valued at current year prices while Paasche’s index determine the value of
the current year commodities when valued at the base year price. But which of the
two is appropriate for computing price/quality indices? .Laspeyre’s index number
usually tends to over state prices in prices while Paasche’s index number tend to
under state them. There is theoretically a time index number that lies between
Laspeyre’s and Paasche’s index numbers and which satisfies the test of ideal
index numbers of the reversal factor reversal circular and proportionality

(e) OTHER INDEX NUMBERS

(i) Marshall-Edgeworth index numbers.

Marshall-Edgeworth Price index numbers Mepi uses the sum of current year and

base year quantities n and


q q
o respectively as weights. Marshall-Edgeworth

Quality index numbers Meqi uses the sum of base and current year prices as
weights is exactly half way between Laspeyre’s and Paasche’s price indices
respectively

(ii) Fishers ideal index number

It is the geometric mean of Laspeyre’s and Paasche’s index numbers

(iii) Chain Base Index Numbers

This involves the discrimination of an index for one period based on changes
previous period and is appropriate for situations where the difference between
base period and current period is large hence not responding appropriately to
changes in consumption. The procedure is to determine the PRIMARY indexes
for Po1, P12 , P23, P34,P45. Pon = Po1x P12x P23……

99
Further Reading

Digamber Patri. Business Statistics and Quantitative Methods, Kalyani Publishers,


New Delhi India, 2002.

Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge


University Press

Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.

Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)

Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)

100
___________________________________________________________________
Exercise 8: Index Numbers
___________________________________________________________________

8.1 Differentiate between


(a) Simple and Weighted Index Numbers
(b) Fixed and Chain Base Index Numbers

8.2: Discuss the basic requirements of an Index Number.

8.3: List the basic considerations for choosing the BASE period

8.4: The data below gives the unit prices in thousands of Kenya Shillings and value
of four commodities sold at Tom Mboya Chain stores.

1990 (100%) 1994


Commodity Unit Price Value Unit Price Value
Tea 200 3200 150 4500
Sorghum 100 1800 180 900
Maize 8 400 10 1000
Potato 12 3600 5 1500

Given commodity Value = Unit Price x Quantity, find: -


(a) Price relative.
(b) Quantity relative
(c) Laspegres price index
(d) Laspegres Quantity index
(e) Paasches price index
(f) Paasches Quantity
(g) Marshals-edge worth rice index
(h) Marshals edge worth quantity index
(i) Fishers price index
(j) Fishers quality index

8.5: For each of the Indices in (a) to (j) in 8.4, perform the Ideal Index Number Test
using the following criteria:
(a) Time Reversal test
(b) Factor Reversal Test
(c) Circular Test
(d) Proportionality Test

8.6: Discuss with specific examples the use of Index Numbers in the following
decision making areas.
(a) Industrial production
(b) Consumer purchasing power
(c) Agricultural production
(d) Commodity prices

101
(e) Retail prices
(f) Foreign trade

8.7: Discuss the problems in using Index Numbers in practice.

8.8: The data in the Table below gives the prices and the average monthly quantities
of the five beauty items purchased by female customers at a retail outlet.

1986 1989 1990


Beauty Item Price Quantity Price Quantity Price Quantity
Perfumant 20 6 5 15 30 9
Hairsofta 2 9 9 9 5 7
Lipstrock 7 2 6 7 15 7
Nailplane 5 3 10 8 8 6
Eyelasher 5 8 7 5 6 12

Using 1986 as base year calculate the following price and quantity index for
1987 and 1990.
(a) Laspeyre's model
(b) Paasche’s model
(c) Fishers model
(d) Marshals - Edgeworth model

8.9: For the crop year 1990-1994 the sugar production (in 100kg sacks) from four
provinces: Nyanza, Western, Coast and Rift valley was as follows: Nyanza
2,000,000, Western1, 400,000, Rift valley 200,000, Coast 1,500,000. Calculate
the Simple Relatives for each province using Nyanza as the base.
Is there any justification for using Link relatives? Explain.

102

You might also like