Professional Documents
Culture Documents
CMS 200/STC-C208
BUSINESS STATISTICS
This module covers fundamental topics in Business Statistics. In Lesson one, the concept
Statistical Investigation is discussed. Lesson Two covers the area of Statistical Data Collection
methods with special emphasis on various sources of Statistical data citing their distinguishing
characteristics; types of data and their sources; and design of various data collection techniques
and tools. In Lesson Three Data Organization is discussed. In this lesson, data collection units
are defined; various types of statistical data classification approaches as well as formation and
In Lesson Four, a description of both diagrammatic and Graphical data presentation is given as
well as construction of various statistical diagrams and graphs. Lesson Five then discusses
statistical measures of Central Tendency starting off with a definition of the concept of
descriptive statistics. Various Measures of Central Tendency are then determined and their value
Lessons Seven and Eight Time Series Analysis and Index Numbers are respectively covered.
i
Table of Content
ii
1.4.5 Analysis and Interpretation ________________________________________________ 5
1.4.6. Report Writing __________________________________________________________ 6
Further Reading _________________________________________________________________ 6
Exercise One: Statistical Investigation _______________________________________________ 7
Lesson Two: Statistical Data Collection ______________________________________________ 9
2.0 Objectives: ________________________________________________________________ 9
2.1 Introduction_______________________________________________________________ 9
2.1.1 Types and Sources of Statistical Data ________________________________________ 9
2.1.1.1. Primary data sources:___________________________________________________ 9
2.1.1.2 Secondary Data Sources:__________________________________________________ 9
2.2 Merits and Demerits of Primary and Secondary Data ____________________________ 9
2.2.1 Primary Data: ___________________________________________________________ 9
2.2.2 Secondary Data: ________________________________________________________ 10
Activity 2a _____________________________________________________________________ 10
2.3 Data Collection Techniques _________________________________________________ 11
2.3.1 Sampling Techniques ____________________________________________________ 11
2.3.1.1 Types of sampling techniques _____________________________________________ 11
2.3.1.1.1. Simple Random Sampling ______________________________________________ 11
2.3.1.1.2 Stratified Proportionate Random Sample _________________________________ 12
2.3.1.1.3 Systematic Sampling___________________________________________________ 13
2.3.1.1.4 Quota Sampling_______________________________________________________ 13
2.3.1.1.5 Census Technique _____________________________________________________ 14
2.3.2 Statistical Data Collection Units ___________________________________________ 14
2.3.3 Data Collection Tools ____________________________________________________ 14
2.3.3.1 Questionnaires__________________________________________________________ 15
Activity 2b _____________________________________________________________________ 15
Further Reading ________________________________________________________________ 15
Exercises 2: Statistical Data Collection______________________________________________ 16
Lesson Three: Data Organization __________________________________________________ 18
3.0 Objectives: _______________________________________________________________ 18
3.1 Introduction______________________________________________________________ 18
3.2 Data Classification ________________________________________________________ 18
3.3. Types Of Classification_____________________________________________________ 19
iii
3.3.1 Geographical classification: _______________________________________________ 19
3.3.2 Chronological Classification ______________________________________________ 19
3.3.3. Qualitative Classification _________________________________________________ 20
3.3.4 Quantitative Classification________________________________________________ 20
3.3.5 Data Tabulation. ________________________________________________________ 20
Activity 3a _____________________________________________________________________ 21
3.3.6 Formation of Frequency Distribution Table _________________________________ 21
3.3.6.1 Grouped Frequency Distribution __________________________________________ 22
Further Reading ________________________________________________________________ 23
Exercise 3: Data Organization_____________________________________________________ 24
Lesson Four: Diagrammatic and Graphical Data Presentation__________________________ 26
4.0 Objectives________________________________________________________________ 26
4.1 Introduction______________________________________________________________ 26
4.2 Diagrammatic Data Presentation ____________________________________________ 26
4.2.1 Simple Bar Diagram _____________________________________________________ 26
Activity 4a:_____________________________________________________________________ 26
4.2.2. Relative Bar Charts _____________________________________________________ 27
4.2.2.1. Multiple/Stacked Bar Graphs ___________________________________________ 28
Activity 4b: ____________________________________________________________________ 28
4.2.2.2 Component Bar Chart ___________________________________________________ 29
Activity 4c:_____________________________________________________________________ 29
4.2.2.3 Pie Charts. _____________________________________________________________ 31
Activity 4d: ____________________________________________________________________ 31
Activity 4d: _____________________________________________________________________ 33
4.2.3 Histogram ____________________________________________________________ 34
Activity 4e:_____________________________________________________________________ 35
Activity 4d: ____________________________________________________________________ 36
4.3 Graphical Data Presentation ________________________________________________ 36
4.3.1 Frequency polygon ______________________________________________________ 36
4.3.2 Frequency Curves _______________________________________________________ 37
4.3.2.1 Cumulative Frequency Curve _____________________________________________ 37
Activity 4f: ______________________________________________________________________ 38
Further Reading ________________________________________________________________ 39
iv
_______________________________________________________________________________ 40
Lesson Four Exercises: Diagrammatic and Graphical Data Presentation ______________ 40
Lesson Five: Measures of Central Tendency. _____________________________________ 41
5.0 Objectives _______________________________________________________________ 41
5.1 Introduction______________________________________________________________ 41
5.2 Measures Of Central Tendency______________________________________________ 41
5.2.1 Arithmetic Mean ________________________________________________________ 42
Activity 5a:_____________________________________________________________________ 42
Activity 5c:_____________________________________________________________________ 43
Activity 5d: ____________________________________________________________________ 43
5.2.2 The Median ____________________________________________________________ 45
Activity 5f. _____________________________________________________________________ 46
Activity 5g _____________________________________________________________________ 46
5.2.3 The Mode ______________________________________________________________ 50
(a) Mode of Ungrouped Data___________________________________________________ 50
Activity 5h: ____________________________________________________________________ 50
Activity 5i: _____________________________________________________________________ 52
Activity 5J:_____________________________________________________________________ 53
5.2.4 Harmonic Mean ________________________________________________________ 53
Activity 5k: ____________________________________________________________________ 54
Activity 5l: _____________________________________________________________________ 55
5.2.5 Geometric Mean ________________________________________________________ 56
Further Reading ________________________________________________________________ 56
Exercise 5: Descriptive Statistics __________________________________________________ 58
Lesson 6: Measures of Dispersion and Shape: ________________________________________ 60
6.0 Objectives________________________________________________________________ 60
6.1 Introduction______________________________________________________________ 60
6.2 Absolute Measures of Dispersion ____________________________________________ 60
6.2.1 Range _________________________________________________________________ 61
6.2.1.1 Range of Ungrouped Data __________________________________________________ 61
Activity 6a:_____________________________________________________________________ 61
Activity 6b: ____________________________________________________________________ 61
6.2.1.2 Range: Grouped Data. __________________________________________________ 61
v
Activity 6c:_____________________________________________________________________ 62
6.3 Mean Absolute Deviation (MAD) ____________________________________________ 62
Activity 6d: ____________________________________________________________________ 62
Activity 6e:_____________________________________________________________________ 63
6.4 Variance, hence The Standard Deviation ______________________________________ 64
6.4.1 Variance and Standard deviation: Generalized Procedure _____________________ 64
Activity 6f: _____________________________________________________________________ 65
6.4.1.1 Variation Measures of Grouped Data_______________________________________ 66
Activity 6g:_____________________________________________________________________ 66
6.1.4 Quartile Deviations ______________________________________________________ 67
6.2 Relative Measures of Dispersion _____________________________________________ 67
6.2.1 Relative Range, Rr ______________________________________________________ 67
6.2.2 Coefficient of Quartile Deviation, Cqd. _____________________________________ 68
Activity 6h: ____________________________________________________________________ 68
6.2.3 Coefficient of Variation, Cvar _____________________________________________ 69
Activity 6i: _____________________________________________________________________ 69
6.2.4 Semi Inter Quartile Range ( SIQR). ________________________________________ 70
Activity 6j _____________________________________________________________________ 70
6.3 Measures of Shape. ________________________________________________________ 70
6.3.1 Skewness ______________________________________________________________ 70
6.3.1.1 Measures of Skewness____________________________________________________ 73
Activity 6k: ____________________________________________________________________ 73
6.3.2 Kurtosis _______________________________________________________________ 74
6.3.2.1 Measurement of Kurtosis _________________________________________________ 75
Lesson 6L: _____________________________________________________________________ 75
Further Reading ________________________________________________________________ 76
Exercise 6: Measures of Dispersion and Shape _______________________________________ 77
Lesson 7: Time Series Analysis, TSA ____________________________________________ 79
7.0 Objectives________________________________________________________________ 79
7.1 Introduction______________________________________________________________ 79
7.2 Definition of Time Series and its Components __________________________________ 79
Activity 7a:_____________________________________________________________________ 79
7.3 Components of Time Series Data ____________________________________________ 79
vi
Activity 7b: ____________________________________________________________________ 80
7.4 Time Series Models ________________________________________________________ 80
7.5 Secular Trend Analysis_____________________________________________________ 80
7.5.1 Trend Measurement Methods _____________________________________________ 81
Activity 7c:_____________________________________________________________________ 81
Activity 7d: ____________________________________________________________________ 83
Activity 7e:_____________________________________________________________________ 84
Activity 7f: _____________________________________________________________________ 85
Activity 7f: _____________________________________________________________________ 86
Activity 7g:_____________________________________________________________________ 88
7.6 Seasonal Variation. ________________________________________________________ 89
7.6.1 Seasonal Variation Measurement.__________________________________________ 89
Activity 7h: ____________________________________________________________________ 89
Further Reading ________________________________________________________________ 90
Exercise 7: Time Series Analysis ___________________________________________________ 91
Lesson Eight: Index Numbers_____________________________________________________ 95
8.0 Objectives________________________________________________________________ 95
8.1 Introduction______________________________________________________________ 95
8.2 Index number Construction_________________________________________________ 95
8.3 The relative method _______________________________________________________ 96
8.3.1 Price Relative index , PRI _________________________________________________ 96
Activity 8a:_____________________________________________________________________ 96
Activity 8b: ____________________________________________________________________ 97
8.4 Aggregate Method and Types _______________________________________________ 98
Further Reading _______________________________________________________________ 100
Exercise 8: Index Numbers ______________________________________________________ 101
vii
1
________________________________________________________________________
Lesson One: The Statistical Investigation
________________________________________________________________________
1.0 Objectives:
At the end of this topic, the student should be able to appreciate the concept of statistical
investigation. Specifically, the student should be able to: -
1.1 Introduction
Statistics may be considered as facts that are expressible in terms of numbers for
purposes of decision making or as the “art and science “ of gathering, organizing,
analyzing and interpreting the resulting information as per the stated objectives.
Facts that are collected from some source before further processing are called raw data
and when processed, raw data is transformed into information. Simply stated, data is the
raw material [input] for information [output]. The diagrammatic representation of data
conversion into information is as shown below:-
1
• Outpatient records at the outpatient points.
• Rates of interest inflation over the years or applying to various countries.
• Number of widows/widowers and orphans from Aids related deaths.
• Number of children dropping out of school in various schools, districts, provinces
or nationally over the year.
• Infant mortality Nationally
• Number of Matatus operating in various routes.
• % of political party supporters Provincially.
• Number of males/females employed at Juakali and Sons Ltd.
• Temperature level and amount of ice-cream sold.
• Amount of maize, fish, millet harvested for 18 years by 17 families.
Activity 1a.
List other Examples of Statistical Data
Individual single facts e.g. Atieno’s height/age, Mama Njeri’s profit from sales during the
month of June etc. do not form statistics. Statistical data must allow for some objective
interpretation or comparison to be made e.g. Atieno’s height/age compared with the
others, Atieno’s average height, Mama Njeri’s average profit or profits for say 12months.
Activity 1b:
• List six examples of data processing methods and the resulting output of
processing such data.
Generally, those quantities that change with changes in others are called VARIABLES
while those that remain the same whatever the change in other quantities are called
CONSTANTS.
If a statistical data set contains only one variable e.g. cost, revenue or income , then it is
said to be a one – variable or univariate data. However, if it has two variables e.g.
temperature level and volume of ice-cream sold, amount of Rainfall and crop yield,
volume of stock turnover and interest rates then it is said to be a two-variable or bivariate.
If the data has more than two variables then it is said to be a-many-variable or
multivariate data set e.g. growth yield, with type of soil [fertilizer], and amount of rainfall
among others. Based on these concepts, the scheme for representation of the statistical
data structures is as below: -
2
1.3 Data Types
These are data sets that are expressible in number form [quantities] e.g. profit, cost,
number of cars, number of females and number of students dropping out of school.
Quantitative data may take two forms: discrete and continuous depending on the type
values the variables take.
Activity 1c:
List examples of quantitative data types
These are data sets obtained from variables that can only take whole number integral
values.
Activity 1d:
List other examples of discrete data types
These are data sets obtained from variable which may take the values of all natural
numbers, both integral and fractional.
Examples
Activity 1e:
List other examples of continuous data types
3
This is data classification based on attributes and numbers, e.g. Beauty, Clever, Gender,
Strong, High. If the categories can only take two forms then the variable is said to be
dichotomous and one can assign the number 0 or 1 to each from effectively converting
them into quantitative data e.g. Male=1, Female 0. Broadly there are two types of
quantitative data: Ordinal and Normal data.
Activity 1f:
List other examples of Qualitative data types
Data is ordinal if a meaningful order can be generated e.g. 1st,2nd, 3rd etc. Data may be
ranked in some order and these ranks used for subsequent analysis
Examples
• Job description
• Rating student performance
• Ranking colleges/schools performances
Activity 1g:
List other examples of ordinal data types
Data is nominal if no meaningful order can be assigned i.e. using only categories. No
meaningful numerical measures can be determined
Examples
Activity 1h:
List other examples of Qualitative data types
Data may be categorized as Time-series (if the data values are recorded in a meaningful
sequence) or cross- sectional data (if the sequence of recording is relevant).
For examples, Time series include profit from sale of maize for the past 10 years and
weekly Retail Sales while, Cross –Sectional include; Number of hours of sleep measured
for 20 people to test the effectiveness of a sleeping pill and Number of phone- calls
processed yesterday by each of the firm’s order-taking employees.
4
Activity 1i:
List other examples of Time Series and Cross Sectional data types
The problem definition involves the determination of the scope [Geographical aid
Content] that should be addressed as per the objective of the enquiry.
Activity 1j:
Describe FOUR other examples of Problem Definition
1.4.3 Design data collection tools and administer these on the respondents
The tools of data collection must be appropriate for the problem and investigation. After
designing and piloting, the tools are then administered on the respondents (individuals
from which data is collected).
5
1.4.6. Report Writing
This is based on the results of the analysis and the interpretation arising, a report is then
presented discussing the evidence together with approximate conclusion.
Further Reading
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
6
______________________________________________________________
Exercise One: Statistical Investigation
______________________________________________________________
1.1. Define the term “statistics” explaining clearly what is “statistical” and a “non-
Statistical" facts.
1..2. Differentiate between data and information, illustrating the processing of data
into information using a practical example.
1.3 List the attributes that an individual item must possess to be a “statistician”
1.5. Your consultancy firm, DSS, has been requested by various clients to help them
identify the data items specific to problems they propose to investigate.
1.6 Exhaustively, enumerate the data items that in your opinion would help them
solve the problems indicated by each client below:
1.7 What data items should an investigator collect in her attempt to investigate the
following:
• Increasing infant mortality
• Performance in Education by high scholars
• Household expenditure pattern.
7
• Effect of public awareness campaign on Aids and Aids related deaths
• Product performance in the market place.
• Opinion poll on political party performance on future elections.
1.8 After processing the data items identified in 2 and 3, discuss the information that
would result to aid the user make logical decisions
1.11 State the criteria data items must meet prior to categorizing them as in 6.
1.12 Suppose you have been requested to investigate the following areas:
• The performance of kiosks in your sub-location.
• The changing crop yield that has contributed to increasing level of poverty.
• Causes of road accident in Nairobi – Kisumu Road
1.13 Discuss the steps you would take during your investigation to help your client
make an appropriate decision.
8
________________________________________________________________________
Lesson Two: Statistical Data Collection
________________________________________________________________________
2.0 Objectives:
At the end of this topic, the student should be able to describe the process of
statistical data collection. Specifically, the student should be able to: -
• Discuss and design various data collection tools with specific reference
to appropriateness of use.
2.1 Introduction
2.1.1 Types and Sources of Statistical Data
Statistical may as belonging to two categories: Primary and Secondary. The
former are data collected from a original (primary) source while the latter are
those collected from a secondary source. Precisely, primary data are those
collected first hand for a specific purpose while Secondary data are those that
were originally collected for a different objective but are currently required for
use for the present purpose that is, second-hand data.
(a) Accuracy
9
(b) Less biased,
(c) Objective and Specific
The demerits of primary data lie in the fact that during collection and analysis, is
Time and cost consuming.
Due to its nature, prior to using secondary data an investigator must take the
following precautions: It must be confirmed that the data should:
• Have been collected during periods of normalcy e.g absence of say floods,
famine, war, inflation, recession etc.
• Not have been collected in the too far distant in the past to minimize the
impact of changes in the surroundings
• Cover the scope of the study under the present investigation.
• Provide the exact kind of information required and should be in suitable
form.
In summary, on order to use secondary data for a particular inquiry, the
investigator should determine:
• How the data was sourced,
• The criteria used,
• Its components and how the resultant data was put together.
Activity 2a
i. Primary Data
ii. Secondary data
10
2.3 Data Collection Techniques
A sample is part of a population where the latter consist of all the items under
investigation e.g. all the Banks, Transport vehicles, all students and all brands of
cooking fat. Sampling approach entails selecting part of the population using
objective techniques to collect data. The data collected from the sample is then
analyzed, the result of which is then generalized for the whole population.
Merits:
• Quick results
• High quantity interviews
• More skilled analysis
• Lower cost
• Error can be assessed
• Non-response easier.
Demerits:
• Biased
• May give misleading conclusion if not properly undertaken:- deliberate
selection.
• May exclude key members of the population if the correct sampling
technique is not employed
• Haphazard selection
In this technique, each unit of the population is selected such that the resulting
sample has the same characteristics as the population and each such unit has
exactly an equal chance of being included in the sample.
There are however two ways of drawing the units: Sampling with and Sampling
without replacement.
11
This is the process where each subsequent unit drawn is replaced back into the
drum containing elements of the population. This process has the advantage of
the units in each draw having an equal chance of being selected as that in the
preceding draw but has the disadvantage that a unit drawn earlier may be re-
drawn lengthening the period of unit selection.
The procedure for selecting sample units by the above method is only appropriate
for population of small sizes, however, as the population size increases, the
procedure becomes impractical, hence the use of the second Procedure
Note that if the RN is less than 0 but greater than N then, discard it as there will
be no corresponding unit. However, if the RN has already been chosen discard it
since the sampling is without replacement and proceed to the next.
12
Procedure
Divide the population into strata or blocks in such a manner that each block is as
homogenous as possible. Each stratum is then proportionately sampled from at
random. Suppose a population of size 100 can be partitioned into 40 male and 60
female. The males can further be divide into <15 years, 15-25 and > 25 years
with sizes 20, 15 and 5 units respectively. Similarly females an also be portioned
into <15 years, 15-25 years and 725 years with respective sizes of 10,40 and 10
units. Select a sample of size 20 from this population.
Partition the sample into Quotas each indicating the number of individuals to act
as respondents. Choose the quotas in a way that the samples representatives of the
entire population. Note that the sample may be completely unrepresentative in
other aspects.
13
2.3.1.1.5 Census Technique
This is the process of 100% investigation i.e. where each and every member (unit)
of the population is included in investigation. The technique has only one critical
advantage of being 100% accurate and is only appropriate for populations of
small sizes. As the population sizes increases, the technique will be
disadvantageous since:
Data collection units or statistical units are the units upon which data is
collected/measured/counted analyzed and interrelated. Prior to data collection,
statistical units must be clearly defined.
Example:
• It must not change with place and time i.e. it must be stable.
• It must be precisely and concisely defined and unambiguous i.e. it must
have one and only one meaning.
• It must be relevant to the investigation at hand i.e. it should be
investigation objective specific.
• It must be uniform: Homogeneity assures that the unit does not mean
different things at different times and place.
After determining the sample technique, establishing the sample size and defining
the appropriate statistical units, the investigate must then determine the tools to be
used in the collection of data. A variety of data collection are available to the
investigator depending on the objective of the investigation and the scope, these
tools can be used singly or in combination.
14
2.3.3.1 Questionnaires
This is a data collection tool where the details of the investigation are organized in
question form. It is usually arranged in 2 parts: the classification component and
the detail component. The classification component consists of the date of
interview, name of the interview, details of the respondent while the second part
consist of the questions that solicits information on the objective of the inquiry. A
good questionnaire must the following desired characteristics:
(a) Technical terms should be avoided i.e. the questions should be easily
understood
(b) Questions should be unambiguous i.e. they should have one and only one
interpretation.
(c) Questions must not contain words of vague weaning e.g. words like unskilled,
large etc should be avoided.
(d) Questions requiring calculations should be avoided.
(e) The respondent should not be expected to perform or decide upon
classification.
(f) Questions should not lead to biased answers.
(g) The questionnaire should not be too long
(h) Questionnaires should cover the exact object of the inquiry.
Activity 2b
Discuss the Interview and observation methods of data collection.
Further Reading
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
15
________________________________________________________________________
Exercises 2: Statistical Data Collection
________________________________________________________________________
2.1 Distinguish between the primary and secondary data, highlighting the main
sources of each.
2.3 The use of secondary data requires that the investigator must take extra
precautions to assure its appropriateness. Assuming you were the investigators list
any five precautions you would take.
2.4 Indicate both the primary and secondary data sources if you are to undertake the
following investigations.
2.5 Discuss the following data collection techniques highlighting their merits and
demerits: -
(a) Sampling technique
(b) Census method
2.8 Describe with specific examples any three types of statistical units
a good statistical unit.
2.9 Identify appropriate statistical units you would use if you were to investigate the
problem areas in item 4.
2.10 Discuss the following data collection tools indicating the requirements for each
16
(a) Interview
(b) Observation
(c) Questionnaires
17
________________________________________________________________________
Lesson Three: Data Organization
________________________________________________________________________
3.0 Objectives:
At the end of this topic, the student should be able to describe the data
organization and presentation procedures. Specifically, the student should be able
to: -
3.1 Introduction
Male Female
18
If an item can be classified into 2 groups then such a classification is called
dichotomous classification.
Food Clothing
Fees
Adult
Drinks Groceries Children
Male Female
Male Female
This type of classification is with respect to the time of occurrence i.e. it is time
based e.g. time of birth, date of manufacture.
19
3.3.3. Qualitative Classification
This is the type of classification that is attribute based i.e. the behaviour cannot be
expressed in numerical terms.
Kenyan Students
Example.
Male Female
In most cases, for purposes of analysis, the attributes must be converted into
numerical forms through coding.
Example
The second step after data classification during data organization is tabulation.
Statistical tabulation is data organization involving the arrangement of data using
the class [groups]. The items are arranged into rows and columns where the rows
run from left to right and the columns top to bottom. Generally a statistical table
has the following components:-
20
• Column Heading: [caption] Describes the heading of each column. Each
column heading may have a sub-heading or sub-caption
• Row Heading [stub]: Describes the heading of each row and may have a
sub row heading or sub columns.
• Foot-Note: An extra comment normally written below the table
explaining some unusually item within the table usually indicated by an
asterix.
• Source- Note: Indicates the source of the table normally for tables
obtained from a secondary source.
• Totals/subtotals: These shows the row/sub-row and column/sub-column
totals.
• Body: The intersection between the row and columns where data entry
occurs. It is worth noting that Row totals = column totals = Grand Total
Activity 3a
A part from data organization through classification, and tabulation, raw data may
be organized by putting then is a frequency table. A frequency is the number of
occurrences of data items within a homogeneous grouping or class.
Procedure.
• Define the variable that represents the data items using any letter of the
alphabet or its subscripted form.
• Determine the number of homogeneous classes
• Find the number of classes and the type of classes to use.
Example:
Consider the following data representing the number of packets bought by 80
households in Nairobi
20 55 55 21 25 25 26 30 30 35
30 30 30 30 30 30 30 30 30 30
30 30 25 25 25 35 30 30 30 35
35 35 40 40 40 35 21 60 65 20
40 40 40 40 40 40 25 25 25 25
40 40 40 40 40 45 45 45 45 45
50 50 50 55 55 60 60 60 35 30
35 30 35 40 50 40 30 26 26 26
21
Determine the Frequency Distribution of the above data.
This mode of forming frequency distribution is only appropriate where the values of
the variable under investigation is not highly dispersed otherwise it would not
condensed data as is required. For such cones, a second mode of forming frequency
distribution would be most appropriate:
For this type of frequency distribution, the ;variable values will be placed in
classes or groups within the number of variable values falling in each class will be
established.
Procedure
Group the variable values into classes, from theory between 5 to 20 classes may
be appropriate. However an objective number of classes may be obtained from
Sturges rule.
Let the number of classes be η and the number of observation be N
Since N = 80
η = 1+3.2 log 80
= 1+3.2 [log 8*10]
= 1+3.2 [3log2+log10]
= 1+3.2 [3*0.3010+1 = 1+3.2 [1.903]
= 7.08 ≈7 since the number of classes must be a whole number.
22
Find the class size, C. The class size must be in the same number of decimal
places as the variable values. From Sturges rule
c = range,r /η , where R is the difference between the highest and the lowest
variable value. For example, for a lower limit =20 and upper limit = 65, then
c = 45 / 7 . Form the classes hence the frequency distribution. In forming the
classes start with the least value at an interval of the class size until the highest
variable value is included in the last class. Using the revelation let X = Number
of packet of milk above. Hence 2 types of tables are as below:-
X Tally F X Tally F
20-25 //// //// //// //// 19 20-26 //// //// //// //// 19
26-31 //// //// //// //// 19 26-32 //// //// //// //// 19
32-37 //// //// 8 32-38 //// //// 8
38-43 //// //// //// / 16 38-44 //// //// //// / 16
44-49 //// 5 44-50 //// 5
50-55 //// //// 8 50-56 //// //// 8
56-61 //// 4 56-62 //// 4
62-67 / 1 62-68 / 1
Σf = 80 Σf = 80
Further Reading
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
23
________________________________________________________________________
Exercise 3: Data Organization
________________________________________________________________________
3.1 Discuss the concept of “Data Organization” high the output resulting from the
process.
3.2 What is data classification and what criteria should collected data satisfy prior to
its being classified?
3.3 Illustrate the classification structure of the following data variables explaining the
criteria for your decision.
3.7 List the components, and hence sketch a the skeleton, of Statistical Table.
3.8 Using a practical example illustrate the following types of a Statistical Table.
(a) One-way Table
(b) Two-way Table
(c) Three-way Table
(d) Manifold Table
3.9 Complete the following Statistical Table relating to the use and non-use of
Tobacco in the indicated three Kenya’s Towns by Gender showing all your
workings.
Town
Kisumu Nairobi
Tobacco Male Female Sub total Male Female Sub total Total
Users 500 - 900 - 700 - -
Non users - 100 - 300 - - 3000
Total - - 2500 - - 3500 6000
24
The ratio of Male:Female visiting Komala Health Centre each year from 1990-
1994 is 2:8. Of the yearly male and female patients, the ratios of children :Adults
are 7:5 and 5:4 respectively. Given that the 1990, 1991,1992, 1993 and 1994
patients were 8000, 6000, 3600, 3200 and 9000 respectively. Form a a well
annotated Statistical Table based on the above information.
3.12 The following data represent the number of hours worked per month for 98
sugarcane cutters.
78 72 72 84 80 64 68 72 44 82 66 36 58 56
104 66 80 78 82 58 84 78 64 80 80 70 72 56
74 80 66 70 78 74 82 76 56 78 56 20 30 90
74 82 64 62 68 78 60 80 66 54 54 50 56 90
52 72 56 80 76 32 86 62 82 70 20 46 90 90
90 66 46 88 50 46 46 56 76 72 52 74 54 86
80 80 84 88 62 80 78 78 84 68 72 84 90 56
3.13 The data below indicates the height of 20 coffee trees sampled from a farm large
scale farm: 4.8, 2.4, 2.8, 4.0, 2.8, 3.6, 4.2, 5.0,4.4, 2.5, 4.9, 1.5, 4.7, 3.6, 2.8, 1.4,
3.5, 3.6, 4.6, 2.6. Construct a frequency distribution for the above.
25
________________________________________________________________
Lesson Four: Diagrammatic and Graphical Data Presentation
________________________________________________________________
4.0 Objectives
At the end of this lesson, the student should be able to diagrammatically, graphically and
analytically illustrate statistical data. Specifically, the student should be able to: -
Describe, construct and interpret the following diagrams used to illustrate statistical data:-
(a) Simple Bar Diagrams
(b) Component Bar Diagrams
(c) Stacked bar diagrams
(d) Simple and multiple pie charts.
(e) Histograms
(f) Frequency Polygons
4.1 Introduction
In this lesson, the procedures for constructing statistical diagrams and graphs will be
discussed. Further, the students will be expected to cover: diagrammatic data
presentation: Simple, Component and Stacked bar graphs; graphical presentation which
include histograms, and frequency polygons. The power of diagrammatic and graphical
presentation as tools for data characteristic interpretation shall be appreciated.
In these diagrams, vertical and horizontal bars are drawn whose length is proportional to
the magnitude (size) of the variable. In order to construct bar charts, a suitable scale is
chosen either at the bottom (for vertical bar charts) or on the side (for vertical bar charts),
the choice of which being optional. Bars should have appropriate widths, constant inter-
bar widths, and heights that are proportional to the frequencies. Simple bar charts do not
allow for the comparison of more than one variable.
Activity 4a:
The data below represents the number of trade licenses issued to Jua Kali artisans from
1990-1995.
26
1995 40
Construct a bar chart to describe the above data and interpret the result.
Solution: -
Number of Lincenses
45
40
Number of Lincenses
35
30
25
20
15
10
5
0
Year
Interpretation
There is a general increase in the number of trade licenses issued from 1990-1995.
These consist of two types; Stacked/Multiple Bar charts and Component Bar Charts.
Unlike the simple bar charts, they allow for comparing the characteristics of more than
27
one data sets. The basis for their construction may either be absolute data values or
percentages.
The data set below indicates the profits realized by three (flows I, Ii and III
between 1990 and 1992.
Activity 4b:
Using the above data Construct the stacked bar graphs based on
Solution Procedure
(i). Identify the highest variable value to enable you to construct either an
appropriate vertical or horizontal scale for the data starting from zero.
(ii). Determine appropriate bar widths and inter stacked bars distances. These
must be constant.
(iii). For each year, construct stacked bars for the three firms using the
measurements.
(iv). Label both the horizontal and vertical axes as well as the title
(v). Construct an appropriate key (legend) for the bars
(vi). Interpret the picture exhibited.
45
40
35
30 I
Output
25
II
20
15 III
10
5
0
1990 1991 1992
Year
Interpretation:
28
Interpretation may be within the year or between the same variable across the years.
The procedure for the percentages based stacked bar graphs is the same as above only
that percentages distribution for each year is used as shown below:-
60
I
40
% Output
II
20 III
0
1990 1991 1992
Year
Just like the relative bar charts, they allow for comparing more than one data set
Consider the data below indicating the profit margin in thousands of shillings for three
firms from 1990-1992
Activity 4c:
Using the above data Construct the component bar graphs based on
(a) Absolute Data Values
(b) Percentages
Solution Procedure:
(i). For each year, find the cumulative totals across the firms to determine the
positions of the bars as part of the whole.
29
(ii). Using the highest cumulative total and starting from zero, determine the
appropriate scale for the height of the bars
(iii). Select a constant appropriate inter component bars distances and bar widths
for all the bars.
(iv). Interpret the result.
120
100
80
III
Output
60 II
I
40
20
0
1990 1991 1992
Year
The procedure for constructing the percentage component bar graphs is similar to above
only that percentage values derived from that data are used the result being as below:
30
Total Tea Output
(1990-1992)
100
80 III
% Output
60 II
40 I
20
0
1990 1991 1992
Year
It shows the relations of parts to the whole, unlike multiple and component bar charts,
where length of bars are compared, area of segments [parts of a circle] are compared.
They are of two types; simple and multiple, the latter allowing for comparing more than
one data set.
Activity 4d:
The date below shows the use of a family budget. Construct a Pie Chart.
31
(a) Change the given values into degrees
1000 800
1500
Food
3000 Clothing
Rent
Fees
Health
4500
32
(b) Multiple Pie Charts
If more than one data set were to be compared using pie charts, the steps will be the same
as in 4.4.1. above except for step 2. To allow for a more accurate comparison, the
procedure for the drawing the circle would be to:
(i). Find the total of the items for each data set
(ii). Find the ratio of the totals. These will be the radius of each data set pie chart. .
(iii). Draw circles whose radius is equal to the size of ratio.
(iv). Go to step c for each circle, hence construct the pie charts
Activity 4d:
Construct pie charts for the above data to allow you to compare expenditures for the 3
families and interpret.
FAMILIES
ITEM A B C
Food 2000 500 5000
Clothing 2000 400 7000
Rent 3000 1000 1500
Fees 800 200 2500
Health 1000 900 3000
Solution:
Following the above steps, the result is as below:-
Family
ITEM A B C
Food 2000 500 5000
Clothing 2000 400 7000
Rent 3000 1000 1500
Fees 800 200 2500
Heaith 1000 900 3000
Total 8800 3000 19000
Radius 3 1 6
33
Family A: Household Budget Family B Household Budget
Food Food
Clothing Clothing
Rent
Rent Fees
Fees Heath
Heath
Radius = 3 Radius = 1
F a m ily C : H o u s e h o ld B u d g e t
Fo o d
C lo th in g
Re n t
Fe e s
H e a th
Radius = 6
4.2.3 Histogram
It is a diagram similar to but not the same as a bar diagram. The area of a bar chart has
no meaning while that of a histogram does. The histogram displays the frequency density
of occurrence of a range of data [vertically or horizontally]. A vertical histogram will
have the data values on the horizontal axis and frequency density on the vertical axis
while a horizontal histogram exhibits the data values [scaled by class intervals] on the
vertical axis with the frequency density [using the appropriate scale] on the horizontal
axis.
34
Procedure for construction
If not already grouped, group the data values into grouped data Frequency distribution.
(i). Determine the frequency density by dividing class frequency by Class size.
(ii). Determine the frequency density axis approximate scale. The data values
scale is the class interval.
(iii). Draw either an exclusive class interval bared histogram [inter bar distance =
0] or inclusive class interval bared histogram [inter bar distance = inter class
gap size]
Activity 4e:
Consider the data below relating the number of pills and sleeping time and construct a
histogram for the data.
Solution:
Let, C = Class interval, X = Sleeping time, F = Number of Pill users
X fi c fi /c
0-16 32 16 2
16-48 108 16 6.75
48-80 42 16 2.625
80-112 18 16 1.125
35
Sleeping Time
0
0-16 16-48 48-80 80-112
Sleeping Time Insterval
Activity 4d:
Histogram for ungrouped data: The data below represents the daily revenue (‘Ksh.000)
obtained by 20 matatu owners plying Kibera route: 5 6 4 5
7 8 4 3 9 4 6 4 6 8 9 8
7 3 4 5
If the class intervals are equal and the midpoints of these intervals are plotted against the
corresponding frequency density then the resulting graph is the frequency polygon. The
area under a frequency polygon and the histogram derived from the same data are equal.
36
Sleeping Time
6
4
2
0
0-16 16-48 48-80 80-112
Sleeping Time Insterval
The histogram and the frequency polygon tends to a frequency curve as the number of
observations become greater and greater with intervals becoming smaller and smaller.
Such curves can be obtained by smoothing the frequency polygon.
Procedure:
(i). If not grouped, then the data is grouped into appropriate classes hence
determine the less-than or the greater- than cumulative frequency distribution
determined.
(ii). For the less than (greater than ) Cumulative frequency distribution, the upper
limit (Lower Limit) of every class is plotted against the corresponding
Cumulative frequency.
(iii). The resulting points are the joined with a smooth curve. The cumulative
frequency curves are sometimes called the OGIVE.
(iv). Apart from using absolute values to construct the ogives, a percentage ogive
can be constructed and the procedure is to: If not grouped, the data is then
grouped into classes then
37
(a) The cumulative frequency is hence computed and converted into %
cumulative frequency
(b) The resulting % cumulative frequencies are plotted against
corresponding limits and the plotted points are then joined with a smooth
curve.
Note: If the less than and the greater than gives are plotted on the same graph, then the
point where they meet is significant is called the median of the distribution..
Activity 4f:
Construction of Cumulative Frequency Curves: Find the less-than- and the greater than
cumulative frequencies hence plot the results against the appropriate limits using the data
in the table below:
Solution:
Where CFL is the cumulative frequency less than while CFG is the cumulative frequency
greater
than.
38
Profit Accruing
35
30
Cummulative Number of Firms
25
20
15
10
0
0 5 10 15 20 25 30
Profit Values
Less-Than-Ogive Greater-Than-Ogive
Further Reading
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
39
________________________________________________________________________
Lesson Four Exercises: Diagrammatic and Graphical Data Presentation
________________________________________________________________________
Use the data below to construct diagrams and graphs in questions 4.1-4.5 and interpret
the result.
Number of Employees
Income (Ksh.‘000) Firm A Firm B
5-9 8 15
10-14 12 20
15-19 14 25
20-24 30 30
25-29 25 15
30-34 22 35
35-39 18 10
40-44 15 15
45-49 10 12
The data below represents the daily revenue (‘Ksh.00) obtained by 50 Matatu owners
plying Kibera route:
12 25 36 40 35 75 58 64 13 24
29 44 36 74 46 78 39 28 67 43
44 15 25 36 15 55 45 29 50 40
23 45 54 10 56 23 45 29 30 28
23 57 78 67 70 59 60 23 45 67
40
________________________________________________________________
Lesson Five: Measures of Central Tendency.
________________________________________________________________
5.0 Objectives
At the end of this lesson, the student should be able to determine and interpret various
measures of Central Tendency. Specifically, the student should be able to: -
(a) Define the concept of descriptive statistics and measures of central tendency.
(b) Distinguish between various measures of central tendency.
(c) Calculate and interpret for both ungrouped and grouped data the following
measures of central tendency:
(i). Arithmetic Mean
(ii). Median
(iii). Mode
(iv). Harmonic Mean
(v). Geometric Mean
(d) Discuss the merits and demerits the measures of Central Tendency cited
above.
5.1 Introduction
In this lesson, we will cover the single measures of statistical characteristics from which
statistical inferences can be made about the behavior of a phenomenon which data is
collected. These measures are broadly categorized into two: Measures of Central
Tendencies (Averages) and Measures of Dispersion or Variability. A measure of central
tendency or average is a single value that represents or is typical of a set of data values
about which data items tend to collate, while of measure of dispersion or variability
exhibits the extent of variation of individual data values from the average. Measures of
dispersion will be covered in lesson six.
41
5.2.1 Arithmetic Mean
Procedure:
Activity 5a:
Suppose a dairy farmer gets 10,20,8,14, 16, 18, 12, 42 litre for his 8 dairy cows every
week. Find the mean volume of milk obtained per week.
Solution:
Let X be the number of litres obtained denoted by x1 = 10, x2=20, x3=8 x4 =14, x5, = 16,
x6,=18, x7,=12, x8,=41. Hence the average is obtained by summing up all the values of X
and dividing by n=8.
Activity 5b: Data appearing with frequencies, f: The data below shows the marks
obtained by 20 students taking Business Statistics this open learning semester: 15, 30,
40, 15, 30, 30, 80, 50, 50, 80, 15, 30, 40, 20, 20, 70, 40, 50,70.
Solution
Σf = 20 Σxi f = 825
42
Σf = 20 and Σxif = 825, and ∑x f /∑ f
i =825 / 20 = 41.25
Activity 5c:
Generally for ungrouped data, suppose x1, x2, x3,….. xi:, xn observation occurs with
respective frequencies f1, f2, f3, …, fi, fn. Find the arithmetic mean.
Solution:
Then the arithmetic mean will be determined as below.
X F XF
x1 f1 x1 f1
x2 f2 x2 f2
. . .
. . .
xi fi xi fi
. . .
. . .
xn fn xn fn
∑ fi ∑ xi fi
Σxi f i
Hence the Arithmetic Mean =
Σf i
(b) Arithmetic Mean Grouped Data Mean.
Whenever data is grouped, those values falling in a given class interval are assumed as
having the same value as that of class representative or mid point.
Activity 5d:
Mean of Grouped Data: From the frequency distribution table given below find the
arithmetic mean.
Number Of Employees
Wages(‘000)
2–4 6
4–6 2
6 –8 4
8 – 10 6
10 – 12 2
43
Solution
(a) Method 1:
^
Let X= Wage earned class, x = Mid-point and f = Number of employees.
^ ^
X x f x f
2–4 3 6 18
4–6 5 2 10
6 –8 7 4 28
8 – 10 9 6 54
10 - 12 11 2 22
∑ fi = 10 ∑ xi fi =132
_
Σxˆi f i
Hence the arithmetic mean = x = = 13.2
Σf i
If the values being dealt with are large values, then the best approach is to use the coding
method.
44
−
Hence false mean x f = − 24 = −1.2
20
−
Therefore, X =(−1.2 * 2) + 9 = 6.6
Advantages of Mean:
Disadvantages of Mean:
(i). It is not approximate for open ended classes as it will use estimated class
representatives.
(ii). It is affected by outliers (extremely small/large values)
(iii). It is rigid.
Activity 5e.
Odd number of Observations. Consider the following data: 10, 5, 10, 25, 12.
45
Case 2. When the number of Observations N is Even
Md is the arithmetic mean of the two middle values when the data is arranged in
ascending or descending order. The procedure is to;-
Activity 5f.
Even number of Observation: Consider the following data: 10, 5, 10, 25, 12. 25.
(i). Arrangement of the data into ascending order: 5, 10, 10, 12, 25, 25.
(ii). The median is the arithmetic mean of the two middle values. Hence median
Md = (10 + 12) = 11
2
There are two methods for determining the median of grouped data: the analytical and the
graphical methods.
Activity 5g
Analytical method Example: Consider the data below showing the weekly number of
packets of milk , X, bought by families, F, staying in a slum area.
^
X x F
2–4 10 6
4–6 12 2
6 –8 15 4
8 – 10 3 6
∑ fi = 20
The median number of packets of milk will be that which is bought by the family
occupying the middle position.
46
Solution
^
X x f CFL
2–4 10 6 6
4–6 12 2 8
6 –8 15 4 12
8 – 10 3 6 18
∑ fi = 18
(ii). Determine the median class. Since N = 18, the median will occupy, the 9th
position. Given c=3, N=18, f = frequency of median class (6-8) = 4, CFp =
Cumulative frequency of the class preceding the median class = 8. Hence Md =
3 18
6 + ( − 8) = 6 * 0.75 = 6.75
4 2
(i). Either draw the less than or the greater than Ogive.
(ii). From the vertical (the cumulative frequency) axis mark off the median
position.
(iii). Draw a horizontal line from the point obtained in (ii) above to the curve then
from the point where the line touches the Ogive, draw a vertical line
perpendicular to the value axis.
(iv). Read off the median value at the horizontal axis (the point of intersection
between the line constructed and the horizontal axis)
(v). Alternatively,
• Draw both the less than and the greater than Ogives. The two curves
intersect at a point.
• From the point of intersection, draw a vertical line to the value axis.
• Read off the median value as in (iv) above.
47
Method 1:
Graphical Method
18
17
16
15
14
13
12
11
10
CFd
9
8
7
6
5
4
3
2
1
0
0 2 4 6 8 10 12
X
Hence Md = 6.75
Method 2:
^ CFL
X x F
2–4 10 6 6 18
4–6 12 2 8 12
6 –8 15 4 12 10
8 – 10 3 6 18 6
∑ fi = 18
48
Alternative Method for Median Determination
20
18
16
Cumulative Frequencies
14
12
10
0
0 2 4 6 8 10 12
Variable Value
(i). It is not representative of all the observations since it is not based on all
values.
(ii). It may not be used for further mathematical treatment.
(iii). Affected by fluctuation in the data distribution.
49
5.2.3 The Mode
It is a measure of average of a data set that occurs with the greatest frequency (most
frequently). If there is only one data set value with the highest frequency then the
distribution is unimodal otherwise multimodal.
Activity 5h:
x 1 2 3 4 5
f 4 3 9 12 7
14
12
10
8
F
0
0 1 2 3 4 5 6
X
50
(ii) The data below represents the time x taken to process different blends of an
item: b Find the modal time taken and hence illustrate the distribution graphically.
x 2 4 6 8 10 12 14 16 18 20
f 4 9 13 20 10 8 12 15 20 9
Graphical Illustration
Bimodal Distribution
25
20
15
F
10
0
0 5 10 15 20 25
X
There are two methods for finding the mode of grouped data: the analytical and the
graphical methods.
51
(i). If the class type is of exclusive type, convert to inclusive type other proceed.
(The conversation process involves the determination of the Bridging factor
d= ½ gap =1/2 x 1 = 0.5 hence subtracting the bridging factor d=0.5 from
every lower limit but add it to every upper limit).
cmo ( f mo − f p )
(ii). Use the formula below to find the mode Mo. Mo = L +
( f m0 − f p ) − ( f s − f mo )
Where L = lowest class boundary of the modal class, cm0 = the modal class, class size,
f m0 = the modal class frequency, f p = frequency of the class preceding the modal class,
f s = frequency of the class succeeding the modal class.
Activity 5i:
Analytical Determination of the Mode: Consider the data below showing the
distribution of the number of 2kg Kimbo sold by 20 Kiosks in Eastlands per week.
Mo
52
Activity 5J:
Graphical Illustration: The data below shows the shoe sizes bought by female
customers at a shop in the month of December.
Use the graphical method to determine the mode of the above distribution.
Merits
The mode is :-
Demerits
The mode is :-
The harmonic mean, Hm is the reciprocal of the mean of the reciprocals of the variable
values in a data set.
Given the variable X taking values x1, x2, … xi … xn, occurring respectively with
frequencies f1, f2, f3,….fi,….fn.
1 1 1 1 1
(i). Find the reciprocal of the variable values , ,…… ,…. . If each
x1 x2 x3 xi xn
values of 1/x appears only once, then the reciprocals will appear with the same
respective frequencies.
53
fi
(ii). Determine
xi
fi
Σ 1n i
xi
(iii). Find the sum of the reciprocals. i
1 n fi
(iv). Find the Arithmetic mean of the reciprocals: Σ1 hence the reciprocal of
N i xi i
1
the result given by Hm = . This is the Harmonic mean.
fi
1 / N (Σ ) n
i
xi
The above procedure may be summarized as in the table below:-
X F 1 F
X X
x1 f1 1 f1
x1 x1
x2 f2 1 f2
x2 x2
x3 f3 1 f3
x3 x3
X4 f4 1 f4
x4 x4
. . . .
. . . .
xi fi 1 fi
xi xi
. . . .
. . . .
xn fn 1 fn
xn xn
∑f i
∑x
fi
i
fi
∑x ∑f i
i
The mean of the reciprocals is defined by . Hence the Harmonic mean is:
∑f i
f
∑x i
Activity 5k:
Find the Harmonic Mean of the data set below: 40,20,25, 14, 10, 30.
54
(b) Harmonic Mean for Grouped Data
Activity 5l:
Harmonic Mean of Grouped Data: Find the Harmonic mean of the following
distribution.
Merits
Demerits.
55
5.2.5 Geometric Mean
The geometric mean, Gm, of a distribution is the nth root of the product of the variable
values of a data set. The general procedure is as follows: Let n = Number of observations
on variable x appearing with frequency f. The table below illustrates the procedure for
determining the geometric mean.
X x1 x2 x3 x4 x5 x6 ………… xi …… Xn
F f1 f2 f3 f4 f5 f6 ………… fi …… fn
1
Σ in f i
Gm = ( x × x × x × ..... × xi × ... × x )
1
f1
2
f2
3
f3 fi
n
fn
where Σin f i = N
In order to determine the geometric mean Gm, logarithms approach is used as below:-
1
Σ in f i
In Gm = In ( ( x × x × x × ..... × xi × ... × x )
1
f1
2
f2
3
f3 fi
) n
fn
1 n
= Σi f i Inxi
N
1
Hence Gm = Antilog Σin f i Inxi
N
Merits and Demerits
Merits
Geometric mean is:-
(i) Based in all observation
(ii) Rigidly defined.
(iii) Appropriate for further mathematical treatment.
(iv) Not affected be fluctuations due sampling.
Demerits.
Geometric mean is:-
Further Reading
56
Donald waters. Quantitative methods for Business. Addiss-Wesley, Cambridge
University Press
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
57
_____________________________________________________________________
Exercise 5: Descriptive Statistics
_____________________________________________________________________
5.1 Given N =7 and X i =2i +i2, Find (i) ΣXi (ii) 1/N ΣXi , where i=1,2,3,4,5,6,7.
5.2 For Zi = i -1, where i=2,4,6…10; Find (i) Σ Zi (ii) Σ(Zi)2 (iii)1/NΣZi
5.3 Suppose N=4, and Xi is the ith variable value such that Xi = -2, X2=3,
X3=-5, X4=7.Find i) ΣXi (ii) Σ Xi iii) ΣXi 2, iv) 1/N ΣXi 2
5.6 The data below shows the marks obtained by 10 High school students in
mathematics in the first term (xi) and second term(yi)
5.7 Discuss the following measures of location in exercise 5.7-5.12 highlighting there
merits, demerits, and application areas.
58
5.8 The frequency distribution of the number of packets bought per month by 50
household residing in Kibera is as below:
5.9 The table below shows the distribution of the services time (minutes) at 20 service
stations.
5.10 The number of 2kg Kimbo cooking fat sold at a kiosk retail outlet in 40 weeks is
as below:
10 15 10 14 20 30 25 30 40 50
15 20 40 20 30 25 50 40 50 40
60 20 50 40 60 20 10 15 20 40
20 25 60 40 35 20 40 50 60 20
14 70 20 80 40 20 100 14 30 55
20 40 30 70 50 40 95 15 35 40
100 30 80 60 60 60 60 18 20 90
40 15 90 75 70 70 40 20 25 40
50 16 40 100 80 90 14 40 40 90
5.12 Indicate the best measures of location the decision makers would require
for the problems (a) – (b) stating your reasons.
59
________________________________________________________________
Lesson 6: Measures of Dispersion and Shape:
________________________________________________________________
6.0 Objectives
At the end of this lesson, the student should be able to determine and interpret various
measures of dispersion and shape. Specifically, the student should be able to: -
(i). Range, R
(ii). Mean Deviation, MD
(iii). Semi Inter-Quartile Range , SIQR
(iv). Variance S2 and δ2
(v). Standard deviation, S or δ
(vi). Coefficient of variations, CoV
(vii). Skewness
(viii). Kurtosis
(b) List and discuss the merits and demerits of the various measures of dispersion and
shape identified in (i) – (viii).
6.1 Introduction
This lesson will we cover various measures of dispersion which will include: Range, R;
Mean Deviation, MD; Semi Inter-Quartile range, SIQR; Standard deviation, S or δ; and
Coefficient of variations. The lesson will further deal with measures of characteristics of
the shape of statistical distribution: Skewness and Kurtosis.
60
6.2.1 Range
The range, R is the difference between the minimum and the maximum values in a
dataset. Let Vmax = Maximum value ,Vmin= Minimum value, Ra = Absolute Range, Then:
Ra = Vmax-Vmin,
(a) Merits
(i). Easy to calculate. The data set is scanned for the maximum and the minimum
values which are then selected and the absolute and relative ranges are
determined.
(ii). Used when extreme values (largest and lowest) of data values are important
especially when what is required is the extent and not the variability of the
data.
(b) Demerits
(i). It is an insensitive measure of dispersion. That is, two distributions may have
the same range, however the dispersion about the mean significantly different
(ii). It is not informative. It only quantifies the spread in data as shown by the
extreme values. The best range would be that within which most of the data
would lie.
Activity 6a:
Consider the two data sets Set 1 and Set 2 defined as below:
Data Set1: 2, 4, 8, 50
Data Set2: 2,4, 9, 6, 15,50
The two data sets clearly have the same range, but different arithmetic means
hence different variability. Determine these two measures.
Activity 6b:
Find the absolute range for the following Data Sets:
The procedure for determining the range for ungrouped data is as follows:-
61
(i). If not given, form a Frequency Distribution for the data.
(ii). Find the lowest class boundary of the first class Vmin, Upper class boundary
of the last class Vmax ,
(iii). Bases on the values in (ii) find Ra. and Rr.
Activity 6c:
Find the Absolute and Relative Range for the following Data relating to the
distribution of Student Marks in a Statistics Examination.
This is the mean deviation of the actual data values from the Arithmetic Mean. The
procedure is to: -
(a) Find the arithmetic mean of the distribution by summing the actual value if
the data is ungrouped or the class midpoints for grouped data taking cognizance
of the frequencies.
(b) Find The deviation of the individual actual values or midpoints for grouped
data from the mean and hence their corresponding absolute deviations.
(c) Multiply the resulting absolute deviations with the corresponding frequencies
and find the sum.
(d) Find the Mean Absolute Deviation (MAD) by dividing the result in (c) by ∑fi.
deviations.
Activity 6d:
Consider the following data showing the waiting time for service at a dispensary
clinic: 2,4,2,4,5,4,4,5,9,9,5,7,5,9.
Solution:
Let the waiting time of the ith service be xi and the corresponding number of
people waiting for treatment fi.
62
xi fi xifi xi− xmean xi− xmean fixi− xmean
2 2 4 -2 2 4
4 4 16 0 0 0
5 4 20 1 1 4
7 1 7 3 3 3
9 3 27 5 5 15
Σ fi =14 Σ xi fi = 64 Σ fixi− xmean=26
∑ xi f i
From x =
∑ fi
64
x= =4
14
∑ f i ( xi − x ) i 26
Hence MAD = = = 1.857
∑ fi 14
Activity 6e:
Grouped Data M.A.D: Find the Mean Absolute Deviation given the data below
showing the distribution of profits made by a sample of 10 local banks during the
2003/2004 financial year
Solution
Let P be the profit p̂ , be the midpoint of the grouped profit, Pmean the mean
profit Pa be the assumed mean.
63
∑ di fi 3
Therefore from =
pa = = 0.3 . Hence the true mean
∑ fi 10
∑ f i ( pˆ − p ) 760
p = 0.3 *100 + 250 = 280 . Therefore MAD = = = 76 .
∑ fi 10
Variance hence the standard deviation is the most widely used measure of
variability. It is based on the most versatile measure of central tendency; the
Arithmetic Mean. It effectively summarizes the typical distance of actual data
value from the mean. The standard deviation depicts the scatteredness of such
individual data values about their common representative (their center of gravity).
Positive deviations would indicate above average individual values while negative
ones below average items If the average of these deviations were to be taken they
would total up to zero. Let di be the deviation of the variable value xi from the
mean, x . Graphically this may be represented as below. As the di tends to zero,
the individual dispersion decreases hence the total deviation tends to zero.
X2
X5
X1
X6
X4
X3
64
Note:
Col.1 = Individual Values of ungrouped data variables or the Midpoints of grouped data
classes.
Col.3 = The ith value xi appearing fi times among the respective totals of the individual
variable values or the midpoints.
Therefore, in order to determine the Standard Deviation, σ, the procedure is to find:-
(a) The arithmetic mean using the standard methods. Where
∑ xi f i
x= …………………………………………………….[Col.2 and Col.3]
∑ fi
(b) The deviation of the individual values/midpoints from the mean x and hence
the square of the deviations ……………...…………..………..[Col. 4 and Col.5]
(c) The sum of the square of the deviations of data values noting that the ith
square deviation also occurs fi times, that is find Σfi(xi- x )2
…………………………………...…………………………………………[Col.6].
(d) The arithmetic mean of the sum of the square of the deviations, that is the
Variance of the variable x, denoted by Var(x), s2, or σ2 is given by:
Var(x) = s2 =σ2 = Σfi(xi- x )2 / ∑fi
Where s2 and σ2 is the sample and the population variance respectively. It is further
noted that, given that the Standard Deviation is defined as the square root of the
variance, the following relationships define the sample and population standard
deviations respectively:-
Sample Standard Deviation = S = Σfi ( xi − x )2 /(Σfi − 1) and Population Standard
Deviation =σ2 = Σf i ( xi − x ) 2 / Σf i , where ∑fi =N = Number of observations.
It is also worth noting that if the data under consideration is a sample, then a
correction is made in the denominator of the model, to take care of the number of
measures that are already estimated in this case the arithmetic mean . The correction
then results in the denominator being equal to ∑fi − 1
Activity 6f:
Variance and Standard deviation of ungrouped data: The output from one of the
processing lines producing iron slabs resulted into the sample data below defined by
weight (kg): 4,2,6,7,3,9,12. Find the Variance hence the standard deviation.
65
Solution:
Let wi be the weight of the ith slab in kg and fi be the frequency ith weight. Also let
w be the arithmetic mean of the weight or the mean weight.
∑ wi f i 45 ∑( wi − w ) 2 91
x= = = 6 . Var 9w) = = = 13 . Hence, the Standard
∑ f − 1i (8 − 1) ∑ f − 1i 7
∑ f i (wi − w )
2
Deviation σ = = 13 = 3.6
∑ fi − 1
Activity 6g:
Variance and Standard Deviation of Grouped Data: The sales volume of bags
of processed sugar from a manufacturing to 20 destination is as in the table below.
The volume being expressed in Ksh 000, 000. Find the standard deviation of the
distribution.
Sales volume Number of destinations
2-6 2
6-10 6
10-14 3
14-18 5
18-22 4
b. Follow the procedure for finding the standard deviation of the ungrouped data.
66
Solution:
−
Σxi f i
Therefore, the mean x = = 252 = 252 = 13 . The Sample
(Σf ii − 1) 20 − 1 19
1042
standard deviation, S = Σf i ( xˆi − x ) 2 /(Σf i − 1) = = 2.04
251
The measures of variation discussed in the earlier section, The Range, Mean
Deviation, Quartile deviation and Standard deviation are not appropriate for
comparing the dispersion between variable values of two or more datasets. The
best measures for comparison purposes are the Relative Measures of Dispersion,
the relative range, coefficient of mean deviation, Coefficient quartile deviation
and coefficient of variation.
67
As earlier stated, the range, R is the difference between the minimum and the
maximum values in dataset. Let Vmax = Maximum value, Vmin= Minimum value,
The Relative Range is defined as the ratio of (Vmax - Vmin) and Vmax + Vmin where
if Rr = Relative Range, then Rr = (Vmax-Vmin)/ (Vmax+Vmin). Simply, the Higher
the Rr, the greater the extent of dispersion.
The procedure for determining the coefficient of quartile deviation has its basis on
Q1 and Q3, the 1st and 3 rd quartiles respectively and this is to: -
Activity 6h:
Find the CQD for the data below:-
Solution
Since the classes are of the exclusive type, the class intervals must be bridged
using the normal gap bridging procedure learnt earlier.
X f CFL
2.5-5.5 2 2
5.5-8.5 1 3
8.5-11.5 5 8
11.5-14.5 2 10
68
6.2.3 Coefficient of Variation, Cvar
This is the most efficient measure of relative dispersion for comparing variability
σ
in more than one data set. It is defined by Cvar = x 100. The procedure for
x
finding Cvar is to determine:-
(a) The arithmetic mean and the standard deviation of the given data.
(b) Cvar using the results in (a), the coefficient of variability on the basis
of the model given above.
Activity 6i:
The data below represents the lifespan of electric bulbs(in months) produced at
two different factories in Kisumu and Nairobi. Determine the Coefficient of
Variation of the factory producing more dependable bulbs i.e the most dependable
factory.
Number of Bulbs
Lifespan Kisumu Nairobi
2-4 8 2
4-6 10 25
6-8 20 3
8-10 2 10
Solution:
x X̂ K fk fn X̂ K fk X̂ fn fk ( X̂ K - X K )2 fn( X̂ K - X& N ) 2
2-4 3 8 2 24 6 62.72 10.58
4-6 5 10 25 50 125 6.4 2.25
6-8 7 20 3 140 21 28.8 8.67
8-10 9 2 2 18 18 20.48 27.38
40 32 118.4 48.88
118.4
X K = 232 / 40 =5.8 and X N = 170 / 32 = 5.3 . Hence σ K = = 2.96 and
40
118 . 4 48 . 88
σ K = = 2 . 96 and σ N = = 1 . 53 The coefficient of
40 32
σK 2.96
variation = × 100 = × 100 = 51 while Coefficient of variation.
µk 5.8
σN 1.53
× 100 = × 100 = 29 .
µN 5.3
69
Since the CV of the bulb life span from Nairobi is lower than that of Kisumu, the
former bulbs have a lower level of unpredictability hence subject to random
failure. Therefore the Nairobi bulbs are better in terms of lighting life behaviour.
SIQR, is the quartile deviation that measures the extent of variability between the
groups of data in the lower quartile and those of the upper quartile .The procedure
is to determine:
Activity 6j
(i) Consider the following data:- 2,6,8,15, 18, 25,30
(ii) Consider the following data:, hence the Semi Quartile Deviation.
Measures of shape are statistical tools for determining the degree and direction of
the shape of frequency distribution of statistical data values. The shape of a
distribution can either be symmetrical or asymmetrical in nature. If a frequency
polygon is such that when a vertical line constructed from the center of its peak to
the independent variable axis (horizontal), it partitions the area under the curve
but above the axis then the distribution is said to be symmetric otherwise it is
asymmetric. The statistical tools for determining the degree and direction of shape
are Skewness and Kurtosis.
6.3.1 Skewness
Case I: x = Mo = Md
70
25
20
15
10
0
-15 -10 -5 0 5 10 15
Case 2 x > Md > Mo. This is the case of a positively skewed distribution
(skewed to the right)
Examples:-
71
60
50
40
30
20
10
0
-10 -5 0 5 10 15 20 25 30
Case 3:x < Mo < Md This is the case of a negatively skewed distribution
(skewed to the left)
Examples:-
72
40
35
30
25
20
15
10
0
-30 -25 -20 -15 -10 -5 0 5 10
Activity 6k:
The time, x in second needed to serve a sample of 50 customers at Wananchi
Bank in March is as below. Using the Pearsonian Coefficient of Skewness, Sk
determine the degree and direction of Skewness of the distribution and interpret
X 10-20 20-30 30-40 40-50 50-60
F 4 10 20 8 8
73
Solution
Step-1
Determine the Arithmetic Mean, xmean and the Standard Deviation, S.
Step-2
Determine Median, Md.
x x̂i fi CfL
10-20 15 4 4
20-30 25 10 14
30-40 35 20 34
40-50 45 8 42
50-60 55 8 50
3fi =50
6.3.2 Kurtosis
Case 1: Leptokurtic
This is the shape of a distribution where most of the observations are highly
concentrated near the mode. The polygon is more peaked than the normal
distribution polygon.
74
The polygon has a less peaked distribution and has more observations in its
shoulders than a Leptokurtic distribution.
Case 3: Platykurtic
The polygon is a flatter, plateau like, frequency curve.
25
20
15
10
0
-8 -6 -4 -2 0 2 4 6 8 10
Lesson 6L:
Using the example above for the determination of Skewness, find the Kurtosis for
the above distribution.
Solution:
x xmid fi
10-20 15 4
20-30 25 10
30-40 35 20
40-50 45 8
50-60 55 8
75
Further Reading
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
76
________________________________________________________________________
Exercise 6: Measures of Dispersion and Shape
_______________________________________________________________________
6.1 In exercise 1 - 10
Find (i) the Pearson’s Skewness coefficient (ii) Kurtosis, hence draw on these
distributions using the same horizontal axis scale for each data set in numbers 1 -
9and interpret
(i) The frequency distribution of waiting time, to the nearest minute, prior to
voting in a polling station for a sample of 100 potential voters is as
below:-
(iii) The data below show the monthly salaries for a sample of 50 employees in
the public and private sector.
Sectors
Salary (Ksh.000) Public private
2-4 4 5
4-6 8 10
6-8 12 10
8-10 14 15
77
10-12 2 5
>12 10 15
(iv) The end of semester examination results awarded to 40 Economics students by the
Lecturers in Microeconomics and Quantitative Techniques is as in the table
below:
COURSE UNIT
Marks Microeconomics Quantitative techniques
10-20 1 5
20-30 2 4
30-40 0 2
40-50 10 8
50-60 5 8
60-70 15 8
70-80 5 3
80-90 2 2
6.3 Discuss how the measures of Skewness can be used in decision making in :
(a) Teacher evaluation
(b) Auditing
(c) Manufacturing processes
78
________________________________________________________________
Lesson 7: Time Series Analysis, TSA
________________________________________________________________
7.0 Objectives
At the end of this lesson, the student should be able to determine and interpret measures
of Time Series Data. Specifically, the student should be able to: -
(a) Define Time Series data and its component citing various practical examples.
(b) Calculate and Interpret the:-
(i) Trend Line,
(ii) Secular Variation,
(iii) Cyclic variation
(iv) Seasonal variation.
7.1 Introduction
In this lesson, we will learn what time series data is, the definition of Time Series and its
components and their measurement and interpretation. In order to objectively undertake
Time Series analysis various models will be derived.
Statistical data is considered as time series if the figures are generated and recorded
chronologically. Time Series Analysis involve the use of various models to analyze time
dependent statistical data by insolating various characteristics defining the movements in
such data. Examples of Time Series data include: yearly sales volume, quarterly profit,
monthly revenue, yearly school dropout rates Time series data consist of four
components: The Secular Trend, Cyclic Variation, Seasonal Variation, and Irregular
Variation. TSA is hence the analysis of the impact of these four components on the
performance of time series data
Activity 7a:
Lists six characteristics time series that a data set musts meet
This describes the general increase or decrease (movement) in the variable value over a
long period of time. The steady long-term increase in the cost of living say as measured
by Consumer Price Index (CPI) is an example of circular trend.
79
(b) Cyclic Variation, C
These are fluctuations that do not follow specific patterns, moving in a unpredictable
manner. In business, peak and slump of business performance. In an economy:
Recession, Growth, Decline and stagnation are some of the examples.
These are patterns of change within a year and is repeated yearly e.g. Christmas sales,
volume of school text bought every school start time, and Volume of umbrellas bought
during the rainy season. Due to the regularity of Seasonal variations this component is
normally used in, for example, business forecasting.
Activity 7b:
Discuss the four components of Time Series citing for examples for each
The four components of Time Series Analysis are known to interact either
Multiplicatively, or Additively to produce the observed values that characterizes time
series data. Three models hence arise from these interactions as shown in below.
Trend is analyzed so as to allow the decision maker to describe the historical pattern in
data with a view to
(i) Evaluating the performance (success or failure) of the
past policy
(ii) Forecast the past patterns into the future so as to help in
the estimation of some future values
80
(iii)
Eliminate the trend component from time series so as to study the other
components.
Trend may be linear or non linear (Parabolic, exponential, Logarithmic), increasing or
decreasing.
This involves the constants of the equation that has been chosen to represent the trend; y
= β0 +β1x, y = β0 +β1x+β1x2, or y=β0β1x. The latter is linearized by taking its logarithm
resulting into Iny =Inβ0 +x Inβ1 where Inβ1 = slope and Inβ0 is the intercept of the
vertical axis .
The trend line is determined by inspecting the graph that best fits the series. From the
judgmental test line, a trend equation is determined through the determination of the
constants of the appropriate equations: linear, parabolic, or logarithmic.
Consider, for example the general linear equation y = β0 +β1x. The procedure for free
hand method for determining the trend line is as below: -
(i) Read the trend value of the 1st and the nth period.
(ii) Determine the difference, denoted by ∆y, between y values of the 1st
and the nth values.
(iii) Find the slope ∆y /∆x ( =slope, β1) and hence read β0, the
y = intercept (=the value of the 1st. period if it is at the origin).
The main disadvantage of this method is that it is subjective, since there is no statistical
criterion for ascertaining its adequacy this apart from its being consuming.
Activity 7c:
Determine of Trend Line: SAM Method
81
1993 80
1994 45
1995 90
1996 80
1997 75
Solution:
77
76
75
74
73
72
71
70
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998
Case 2: Number of Periods N = ODD. In this case there are three methods for
separating the series.
When the S.A.M. method is used, the middle point is considered as the origin The y-
intercept, β0 and the slope β1 are derived from: β0=(S1+S2)/(t1+t2) and
82
β1=(S2-S1) /t1 (N-t2); where S1 and S2 are partial sums of the segments t1, t2, the number of
time units for the 1st and 2nd segments and N =total number of periods
Activity 7d:
Determination of Trend Line: Find the trend line using the S.A.M method using the
data below:
Year 90 91 92 93 94 95
Sales vol. (000,000) 14 20 15 18 19 21
Solution
Year Sales Volume S.A.M.
90 14
91 20 (14+20+19)/3=18
92 15
93 18
94 19 (18+19+20)/3=19
95 21
(i) Plot the points (91,18) and (94,19) together with the
actual data. From the resulting trend values find the trend line.
83
Sales Volume (1990-1995)
25
20
Sales Volume (1000)
15
10
0
89 90 91 92 93 94 95 96
Year
(ii) From β0=(S1+S2)/(t1+t2) and β1=(S2-S1) /t1 (N-t2), β0 = (53+57)/(3+3) = 18, and β1
= (57 -53)/3(6-3) = 4/9. Hence from y = β0 +β1x, y = 18 +4/9x.
Activity 7e:
Find S.A.M. and the resulting trend line based on the data below:
Year 90 91 92 93 94 95 96
School dropouts 200 90 150 80 110 250 200
A moving average is a series in which each period figure is replaced by the mean of the
values of that period range and those of the number preceding and succeeding periods.
The procedure involves the determination of moving averages based on n-yearly moving
totals. The procedure for determining the moving averages is to:-
84
(i) Find the moving totals by computing the totals for the first n (odd or
even) data values.
(ii) By omitting the first data item in the preceding group of data values
generate subsequent moving totals. This is repeated until the total for
all n data values are determined.
(iii) Determine the moving averages (trend values) by dividing the
resulting moving total by the period size, n. Each total and hence the
corresponding average is then centered corresponding to an
appropriate point in time.
Activity 7f:
Odd Period Moving Total: Compute the Trend Values using the Three-year Moving
Totals.
Year Sales. 3-YR Moving Total (MT) 3-YR Moving Average (MA)
1 10
2 12 37 12
3 15 47 16
4 20 42 14
5 7 39 13
6 12 29 10
7 10
3-Yr MT denotes the three yearly moving Totals determined as follows: 37 =(10+12+15),
47 = (12+15+20)), 42 =(15+20+7), 39 = 20+7+11), 29 = (7+12+10). While 3-Yr MA
denotes the three yearly moving averages (the trend values) determined by dividing each
of the resulting 3-Yr MT by n = 3. The value of each centered year is replaced by the
mean of the value of the three successive years. Both the three-year moving total and
moving average are centered as indicated above. The above procedure is true for all odd
number of moving average periods. By plotting the trend values and the actual values on
the same graph, the graph below is obtained.
85
Sales Volume ('000)
25
20
Sales Volume
15 Actual Sales
3 yr MA
10
0
1 2 3 4 5 6 7
Year Code
Activity 7f:
Even Period Moving Total: Given the data below representing the sales of volume of
100kg sacks of maize, in thousands of shillings from Jan-October. Determine the two
yearly moving averages and plot the result together with the original data on the same
graph.
Year 1 2 3 4 5 6 7 8 9 10
Sales 2 4 4 3 2 4 6 5 6 2
Solution:
The procedure is to
(a) Determine the two-yearly moving totals using the standard method discussed
for the case when n = odd. However, the two-yearly moving totals just as for
any even number period, are not centered as required.
(b) Center the result, by determining the two-yearly moving totals centered (2-Yr
Moving Total Centered).
(c) Divide the result in b) by 2n to obtain the centered moving averages, the trend
values. This is equal to adding four successive data values as below.
86
Year Sales 2-Yr MT 2-Yr MTC 2-Yr MA
1 2
6
2 4 14 3.5
8
3 4 15 3.75
7
4 3 12 3
5
5 2 11 2.75
6
6 4 16 4
10
7 6 21 5.25
11
8 5 22 5.5
11
9 6 19 4.75
8
10 2
5
Sales Volume
4
Sales
2-YR MA
3
0
0 2 4 6 8 10 12
Years
87
(d) Method of Least Squares:
The above three methods are inefficient when applied in forecasting of the performance
of say a business entity. The method of least squares is a more rigorous mathematical
techniques for determining of trend values and hence forecasting purposes. The
assumption for the methods of Least Squares (LSM) is that the line fitted through the
scattered data values is the line of best fit. Consider the general line y = β0 +β1x, where
the best linear estimators for the parameters β0 and β1 are βˆ0 and β̂1 . LSM is the most
appropriate tool for determining trend line for data fitting linear and non-linear models.
The procedure is similar to that used for regression analysis. That is, given y = β0 +β1x,
then the normal regression equations Σy =nβ0 +β1Σx and Σxy = β0Σx +β1Σx2 are solved
simultaneously. However, the solution is simplified by noting that for times series data
β0 and β1 can be obtained by assuming that the middle of the series as the origin. In this
case and noting that the time units are usually uniform and consecutive then ∑x=0. Σy
=nβ0 and Σxy = β1Σx2. Hence, Σy/n =β0 while Σxy/Σx2 = β1
Activity 7g:
Determination of Trend Lines using Method of Least Squares: Determine the trend
line using the method of least squares for the following crop yield data from1985 - 1991.
Year 85 86 87 88 89 90 91
Yield 10 12 4 6 8 6 10
Solution:
Assuming the origin to be the mid point then 1988 will be assigned a value of zero.
Year x y x2 xy
1985 -3 10 9 -30
1986 -2 12 4 -24
1987 -1 4 1 -4
1988 0 6 0 0
1989 1 8 1 8
1990 2 6 4 12
1991 3 10 9 30
2
Σy =56 Σ x =28 Σxy = -4
88
the curves must be linearized through the use of logarithms. For example the exponential
trend line y = β0β1x . The procedure would be to find: -
These are the pressures on Time Series data that are as a result of man-made or natural
phenomena. Seasonal Variations are usually repetitive and periodic, less than one year,
week, month or quarter. The Seasonal Variation studies aid decision makers to:-
The Ratio –to Moving method is used to measure Seasonal Variation .It describes the
degree of seasonal variation. The procedure is to:-
Activity 7h:
Determination of Seasonal Variation. Given the data in the first three columns of the
table below, determine the measures of seasonal variation using the procedure below:-
(a) Determine the four -quarter moving totals for the series .This total is written
between the II and III
(b) Compute the four quarters centered totals by adding the two successive moving
totals.
(c) Divided each of the centered moving totals by 2n.
(d) Plot the original data smoothed values in (c) in the same graph.
(e) Collect all the percentages of actual values to moving average to arrange them by
quarter
89
Year ¼ Turnover 4-QMT 4QMTCentred 4QMA Centered
85 I 6
II 4 15
III 3 13 28 3.5
IV 2 15 28 3.5
86 I 4 15 30 3.25
II 6 18 33 4.12
III 3 16 34 4.25
IV 5 15 31 3.25
87 I 2 16 31 3.25
II 5 13 29 3.63
III 4
IV 2
Further Reading
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
90
________________________________________________________________________
Exercise 7: Time Series Analysis
________________________________________________________________________
7.1 Describe the following components of time series data illustrating the components
graphically.
(a) Secular trend
(b) Cyclical variation
(c) Seasonal variation
(d) Irregular variation
7.2 State any four causes of each of the following components of the series.
7.3 Discuss the models of studying time series assuming the components interact;
(a) Multiplicatively
(b) Additively
(c) Both Multiplicatively and Additively
7.5: the maize output data in the Rift Valley form 1993 to 1997 in tons per quarter is
given below:
91
(c) The method of test square
7.6: The data below is the industrial output index for Kenya with 1980 as the base
year.
Year 1975 1976 1977 1979 1980 1981 1982 1983
Output index 120 130 140 150 141 15o 155 145
Fit
(a) Linear trend to the index
(b) Parabolic trend line to the index
7.7: Given below is the performance of the Komala share index at the NSE;
YEAR SHARE INDEX
1980-81 120
1981-82 90
1982-83 111
1983-84 115
1984-85 121
1885-86 119
1986-87 118
1987-88 121
Fit:
(a) The linear trend lines for the index
(b) Paraboric trend line for the index
(c) Based on the model resulting in (a) and (b) find the share index for the
years 1994 and 1996
7.6: The data below represents the number of passes in a national examination by
students of Komala High school.
Year 80 81 82 83 84 85 86 87 88 89
Passes 40 58 78 98 75 65 63 45 78 87
(a) Determine the Trend values using the 2,3,4,and 5, year period.
(b) Using the results in (a), draw the graphs on the using the same axis the original
and the trend values.
7.7: The quarterly wholesale prices for a group of a chain of wholesale outlets in
Western Kenya is as below:
92
(c) Using the results in (a) compute the deseasonalised data values
(d) Determine the trend line using the moving average method.
7.8: Given the following data on fish output at Kendu Bay Beach, Nyanza:
Fish yield (1000kg)
Year Jan Feb Mar April May June July Aug Sept Oct Nov Dec
1990 84 61 50 48 72 45 90 70 55 72 40 45
1991 75 55 45 60 62 84 61 85 60 55 80 90
Determine the seasonal indices hence the deseasonalised data values based on the
moving average method using 2 monthly period for each year.
7.9: A straight-line trend by the method of least square was fitted to the Rice yield of
Kano plains for six years 1980-1985.
(a) The equation showed that in 1980 the yield would be 20,000 and 17,500 in 1985,
find the equation using 1983 as the origin.
(b) Determine the monthly annual yield in 1984.
(c) Change the annual equation in (a).to a monthly equation with the origin at June
15th. 1984.
7.10. The fuel consumption volume (million of shillings) of Oginga Odinga Petrol
station along Ahero-Othoro Rural road is as in the table below:
Year 80 81 82 83 84 85 86 87
Volume 4.6 8.5 2.8 5.2 10.1 8.3 5.9 6.7
7.11 The public’s concern in the performance of the judiciary has promoted the A.G to
order an investigation to determine the rate of disposal of cases criminal cases in a
sample of Kenya’s courts. The data below shows the proportion of cases disposed
or based on the potential cases between 1985-1989 :
Year 1985 1986 1987 1988 1989
Proportion of cases 0.4 0.6 0.25 0.45 0.35
(a) Fit a linear trend line on the above data.
(b) Find the trend values using a yearly moving average method.
(c) Based on the result of (a) and ( b)determine the cyclic fluctuations.
(d) Find the seasonal data values hence the deseasonalised indices.
(e) Find the projected number of criminal cases disposed of using the models in a)
and b) for 1985 given a potential number of cases of 20,000.
93
7.12 The data below indicates the number of cases of the outpatients casualty facility
of the new Nyanza General hospital between 1990-1997 (Cases being those suffering
from malaria)
Year 1990 1991 1992 1993 1994 1995 1996 1997
Patients 1000 999 840 1500 1040 1200 1000 1150
Fit
(a) Linear trend line on the data
(b) A parabolic trend line on the data
(c) Use the following methods to determine the trend values.
(i) Semi moving average method
(ii) The yearly moving average
(d) Graph the results (a) and (b) together with the actual data.
(e) Using the values in (a) and (b)
(i)Find the cyclic variation
(ii)The seasonal indices hence the deseasonalised indices .
(f) Based on the results Find the :
94
________________________________________________________________
Lesson Eight: Index Numbers
________________________________________________________________
8.0 Objectives
At the end of this lesson, the student should be able to determine and interpret various
types of Index Numbers. Specifically, the student should be able to: -
(i) Define the Concept of Index Numbers and various Types of Index
Numbers.
(ii) List and Discuss Index Number construction considerations.
(iii) Distinguish between Simple and Aggregated Index Numbers citing
their advantages and disadvantages.
(iv) Calculate and Interpret the Paasche’s, Laspyre’s Index Numbers and
their derivatives.
8.1 Introduction
Further, the analyst must ask herself the following fundamental questions: -
95
(c) Who will use the constructed index number i.e. Family or household. Men,
Women, or children, Industry or country etc?
(d) What is the status of the potential user? Rich or Poor, Large or Small?
(e) What is the index number start date or point I e the base period?
Generally there are two approaches for constructing an index number: The Relative and
the Aggregate methods.
It may be price, volume, or value relative. This involves a comparison between the p
price, volume, or value relative of the number of items/ commodities in the base period
and the current period. That is, express the price, volume, or value relative of the current
period as a percentage of that of the base period.
(a) Let Poi = price of item i in the base period o, Pni = price of the commodity i in the
current period n. Therefore Price Relative of the ith commodity is given by
PRIi = Pni/Poi for every ith.
.
(b) Find the sum of the Price relatives: - ∑ PRIi = ∑(Pni/Poi).Divide the resulting Price
Relative ∑ PRIi of the ith item by the number of items N to obtain the Arithmetic
Mean of the Relatives, Therefore PRI =(∑Pni/Poi)/ N
Activity 8a:
Suppose that households still purchase the same quantity in 1999 as they did in 1997, but
that prices of the items used have changed over years as in the table below
PRICE (Ksh.’000)
ITEM 1997 (100%) 1999
Milk 18 20
Matches 2 3
Salt 5 6
Sugar 20 24
Unga Ugali 45 55
Charcoal 300 450
96
Find the total Relative Index and interpret the results. (100% indicated in parenthesis for
1997 shows that this year is the base year while 1999 was a current year).
Solution:
pni xp0i )
pon = ( ) x100
N
Activity 8b:
Now suppose that the households still pay the same price per unit in 1999 as did in 1997
but that the quantities they use have changed over the years as indicated below:-
Quantity
ITEM 1997 (100) 1999
Milk 300 240
Matches 500 450
Salt 250 500
Sugar 500 650
Ugali 450 500
Charcoal 3120 100
97
qoi = ith item quantity in base year o while qni = ith item quantity in current period n.
Interpretation:
On average there was a 19% increase in quantity purchased and consumed between 1997
and 1999.
Relative index numbers are sensitive to the units the items are measured in.
The measure of central tendency to use in their determination is difficult and may be
subjective hence biased.
The basis for the aggregate index Numbers is the price/quantity index Numbers; PIN = Pin
and QIN = Qin The procedure is the same as that of Price relatives however instead of
adding and obtaining the of price/quantity relatives
But, Pon is highly affected by an item with high\low prices. Similarly Qon will similarly be
affected. Also the unit in which the items are measured do have some impact on Pon and
on. These shortcomings of Pon and Qon are eliminated through aggregation by weights.
From the onset, it must be stated that prices are usually weighed by quantities and
quantities by prices.
L
The Laspeyre’s Quantity Index usually uses the base year prices as weights.
L
q on
= (∑ q ×
ni
p oi
× 100) / ∑ q
oi
poi
p
98
p
The Paasche’s Quantity Index usually uses the base year prices as weights.
p
q ni
= (∑ q ×
ni
p ni
× 100) / ∑ q
oi
pni
Note: LASPEYRES VS PAASCHE INDEX NUMBERS
Laspeyre’s index attempts to determine the value of the base year commodities
when valued at current year prices while Paasche’s index determine the value of
the current year commodities when valued at the base year price. But which of the
two is appropriate for computing price/quality indices? .Laspeyre’s index number
usually tends to over state prices in prices while Paasche’s index number tend to
under state them. There is theoretically a time index number that lies between
Laspeyre’s and Paasche’s index numbers and which satisfies the test of ideal
index numbers of the reversal factor reversal circular and proportionality
Marshall-Edgeworth Price index numbers Mepi uses the sum of current year and
Quality index numbers Meqi uses the sum of base and current year prices as
weights is exactly half way between Laspeyre’s and Paasche’s price indices
respectively
This involves the discrimination of an index for one period based on changes
previous period and is appropriate for situations where the difference between
base period and current period is large hence not responding appropriately to
changes in consumption. The procedure is to determine the PRIMARY indexes
for Po1, P12 , P23, P34,P45. Pon = Po1x P12x P23……
99
Further Reading
Shrivastava, U.K., Shenoy, G.V and Shama, S.C. Quantitative Techniques for
Managerial Decisions 2nd edition. New Age International (P) Ltd Publishers.
Paul Newbold. Statistics for Business and Economics. 4th. Edition. A Simon and
Schester Co. Englewodd Cliff. Prentice Hall, New Jersey (Available at University
Library)
Freund J.E., William F.J., Benjamin M.P. Elementary Business Statistics: Modern
Appraoch. 5th Edition. Englewodd Cliff. Prentice Hall, New Jersey (Available at
University Library)
100
___________________________________________________________________
Exercise 8: Index Numbers
___________________________________________________________________
8.3: List the basic considerations for choosing the BASE period
8.4: The data below gives the unit prices in thousands of Kenya Shillings and value
of four commodities sold at Tom Mboya Chain stores.
8.5: For each of the Indices in (a) to (j) in 8.4, perform the Ideal Index Number Test
using the following criteria:
(a) Time Reversal test
(b) Factor Reversal Test
(c) Circular Test
(d) Proportionality Test
8.6: Discuss with specific examples the use of Index Numbers in the following
decision making areas.
(a) Industrial production
(b) Consumer purchasing power
(c) Agricultural production
(d) Commodity prices
101
(e) Retail prices
(f) Foreign trade
8.8: The data in the Table below gives the prices and the average monthly quantities
of the five beauty items purchased by female customers at a retail outlet.
Using 1986 as base year calculate the following price and quantity index for
1987 and 1990.
(a) Laspeyre's model
(b) Paasche’s model
(c) Fishers model
(d) Marshals - Edgeworth model
8.9: For the crop year 1990-1994 the sugar production (in 100kg sacks) from four
provinces: Nyanza, Western, Coast and Rift valley was as follows: Nyanza
2,000,000, Western1, 400,000, Rift valley 200,000, Coast 1,500,000. Calculate
the Simple Relatives for each province using Nyanza as the base.
Is there any justification for using Link relatives? Explain.
102