BBA - Business Statistics 201

Business Statistics 201
BACHELOR OF BUSINESS
ADMINISTRATION
BUSINESS STATISTICS 201
MODULE GUIDE
Copyright © 2022
REGENT BUSINESS SCHOOL
All rights reserved; no part of this book may be reproduced in any form or by any means, including
photocopying machines, without the written permission of the publisher.
BACHELOR OF BUSINESS ADMINISTRATION

Table of Contents
TABLE OF CONTENTS ...................................................................................1
INTRODUCTION TO BUSINESS STATISTICS 201 ........................................5
CHAPTER ONE STATISTICS DEFINITIONS ................................................11
CHAPTER TWO VISUAL REPRESENTATION OF DATA .............................29
CHAPTER THREE DATA DESCRIPTORS ....................................................62
CHAPTER FOUR: ELEMENTARY PROBABILITY ....................................109
CHAPTER FIVE LINEAR CORRELATION AND REGRESSION .................131
CHAPTER SIX TIMESERIES ANALYSIS AND FORECASTING .................149
BIBLIOGRAPHY ..........................................................................................175
BACHELOR OF BUSINESS ADMINISTRATION 1

List of Figures and Tables
Figure 1.1: Branches of statistics ............................................................................. 12

Figure 1.2: Probability sampling ............................................................................... 21
Figure 1.3: Nonprobability sampling ......................................................................... 26
Figure 2.1: South Africans between 18-25 years old that play soccer, rugby, and
tennis........................................................................................................................ 33
Figure 2.2: Bar chart of points scored by rugby teams in ten matches ..................... 34
Figure 2.3: Points scored and those conceded by the four rugby teams .................. 36
Figure 2.4: Type of medication people use to cure a common cold ......................... 37
Figure 2.5: Negative linear relationship .................................................................... 38
Figure 2.6: Negative linear relationship .................................................................... 39
Figure 2.7: Non- linear relationship .......................................................................... 40
Figure 2.8: Non- linear relationship .......................................................................... 40
Figure 2.9: No relationship ....................................................................................... 40
Figure 2.10: Histogram on the weight of teenagers .................................................. 50
Figure 2.11: Frequency Polygon for the weight of teenagers ................................... 51
Figure 2.12: Ogive for the weight of teenagers ........................................................ 53
Figure 2.13: PivotTable Field List dialog box ............................................................ 54
Figure 2.14: Excel’s Data Analysis dialog box (first 10 techniques) ........................ 56
Figure 2.15: Excel’s Data Analysis dialog box (remaining nine techniques) ............ 56
Figure 2.16: Histogram data input preparation dialog box ........................................ 57
Figure 3.1: Symmetric distribution ............................................................................ 83
Figure 3.2: Histogram with uniform distribution ........................................................ 83
Figure 3.3: Distribution skewed to the right .............................................................. 84
Figure 3.4: Distribution skewed to the left ................................................................ 84
Figure 3.5: Empirical rule ....................................................................................... 101
Figure 3.6: The descriptive statistics option in Data Analysis add-in ...................... 105
Figure 4.1: Probability value ................................................................................... 110
Figure 4.2: Mutually exclusive events ..................................................................... 114
Figure.4.3: Complement of events ......................................................................... 119
Figure.4.4: Union of events .................................................................................... 121
Figure.4.5: Intersection of events ........................................................................... 121
Figure.4.6: A decision Tree .................................................................................... 125

Figure.5.1: Scatter plot for the number of Hi-Fi sold and the number of number of
advertisements placed in local newspapers ........................................................... 136
Figure 5.2: Fitted regression line for the number of Hi-Fi sold and the number of
number of advertisements placed in local newspapers .......................................... 137
Figure 5.3: Interpretation of correlation coefficients................................................ 139
Figure 5.4: Perfect positive correlation ................................................................... 139
Figure 5.5: Perfect negative correlation .................................................................. 140
Figure 5.6: Positive (direct) linear correlation ......................................................... 140
Figure 5.7: Negative (indirect) linear correlation ..................................................... 141
Figure 5.8: Positive linear correlation ..................................................................... 141
Figure 5.9: Negative linear correlation.................................................................... 142
Figure 5.10: No linear correlation ........................................................................... 142
Figure 5.11: Regression output for Example 5.1 using the Data Analysis add-in ... 145
Figure 5.12: Scatter plot for example 5.1 ............................................................... 145
Figure 6.1: Times series for number of costumers served ..................................... 154
Figure 6.2: Trend line ............................................................................................. 154
Figure 6.3: Moving average .................................................................................... 158
Figure 6.4: Time series plot and the fitted trend line .............................................. 162
Table 1.1: Level of measurements with examples ................................................... 15

Table 2.1: Sport discipline in South Africa and the age groups ................................ 29
Table 2.2: Rugby teams and their scores ................................................................. 32
Table 2.3: Points scored and those conceded by the four rugby teams ................... 34
Table 2.4: Type of medication people use to cure a common cold .......................... 35
Table 2.5: Income vis-à-vis education ...................................................................... 36
Table 2.6: Frequency distribution table format ......................................................... 41
Table 2.7: Frequency distribution table of the number of transactions ..................... 42
Table 2.8: Grouped frequency distribution table format............................................ 43
Table 2.9: The weight of 20 teenagers measured in kilograms ................................ 46
Table 2.10: Frequency table for the mass of teenagers ........................................... 47
Table: 2.11: Cumulative frequency for the weight of teenage students .................... 50
Table 3.1: The duration of colds In Eswatini ............................................................. 65
Table 3.2: Calculation of the mean duration of colds In Eswatini ............................. 66
Table 3.3: Changes in the number of the South African population ......................... 67
Table 3.4: Calculation of modal courier delivery time ............................................... 71
Table 3.5: Calculation of the median parcel delivery time ........................................ 75
Table 3.6: Levels of measurement and corresponding measure of dispersions ....... 84

Table 3.7: levels of measurement their appropriate graphical representation .......... 85

Table 3.8: Observed temperatures ........................................................................... 87
Table 3.9: Calculation of the standard deviation for raw data first method ............... 93
Table 3.10: Calculation of the standard deviation for raw data using the second
method ..................................................................................................................... 94
Table 3.11: Market value of six year old small sedan ............................................... 96
Table 3.12: Calculation of the mean from a grouped data........................................ 97
Table 3.13: Calculation of the standard deviation from a grouped data using .............
the first method ........................................................................................................ 98
Table 3.14: Calculation of the standard deviation from a grouped data using the
second method ......................................................................................................... 99
Table 5.1: Example of independent and dependent variables ................................ 132
Table 5.2: The number of Hi-Fi sold and the number of number of advertisements
placed in local newspapers .................................................................................... 135
Table 5.3: Calculation of the straight-line equation ................................................ 136
Table 6.1: Customers per quarter .......................................................................... 152
Table 6.2: Sales data ............................................................................................. 156
Table 6.3: Valley Estates Quarterly House Sales study ......................................... 159
Table 6.4: Sequential numbering method............................................................... 160
Table 6.5: Coding when n is an odd number .......................................................... 162
Table 6.6: Coding scheme for t i when n is even .................................................... 162
Table 6.7: Zero-sum ............................................................................................... 164
Table 6.8: Share portfolio performance .................................................................. 168
Table 6.9: Share portfolio performance .................................................................. 170

INTRODUCTION TO BUSINESS STATISTICS 201
1. Introduction
Welcome to the Bachelor of Business Administration programme. As part of your

studies, you are required to study and successfully complete a course on Business
Statistics. We are delighted to have you join us for the Business Statistics Module. To
make sure that you share our passion about this area of study, we encourage you to
read this overview thoroughly. You should refer to it as much as necessary because it
will make learning this module much simpler. The goal of this module is to increase
both your competence and trust in it.
Business statistics is a very dynamic and challenging field. Therefore, the learning
materials, exercises, and self-study questions in this manual will give you the chance
to examine the most recent advancements in this area and help in your discovery of
the field of Business Statistics as it is currently used.
2. Module Overview
The purpose of this module is to serve as an introduction to Statistical Techniques in

Business to orientate and equip you with dealing with basic business statistics in the
real world, and to facilitate and inform sound business decisions. This module will
cover basic managerial statistics and higher-level topics as pointed out below.
3. Aim of the Module
The main goal of this module is to serve as an introduction to statistical techniques

used in business. It will provide you with the knowledge and abilities you need to deal
with fundamental business statistics in the real world and to facilitate and inform sound
business decisions. This module tackles entry level business statistics methods such
Time Series Forecasting, Index Numbers, Seasonal Indices, and Probability as well
as fundamental descriptive statistics.

4. Essential (Prescribed) Reading
Your essential (prescribed) reading comprises the following:
4.1. Prescribed Reading
Davis, G. W., Pecar, B., & Santana, L. (2017). Business statistics using Excel: A first
course for South African students. Oxford University Press Southern Africa.
4.2. Recommended Reading
Croucher, J. (2016). Introductory Mathematics and Statistics for Business 6th Edition.
McGraw-Hill.
Wegner T. (2016) Applied Business Statistics 3rd edition. Juta and Co, Ltd.
5. How to use the Module
This module should be studied using the recommended and prescribed textbook/s and
the relevant sections of this module. You must read about the topic that you intend to
study in the appropriate section before you start reading the textbook/s in detail.
Ensure that you make your own notes as you work through both the textbook/s and
this module. You will find a list of objectives and outcomes at the beginning of each
section. These outline the main points that you need to understand when you have
completed the section/s. The purpose of this guide is to help you study. It is important
for you to work through all the tasks and self-assessment exercises as they provide
guidelines for examination purposes.

6. Navigational Icons
Think Point
When you see this icon, you should think about and reflect on the
issues/challenges/themes presented.
Tasks
When you see this icon, you will know that you are required to perform
a task to gauge how well you remember or understand what you have
read or how good you are at applying what you have learnt.
Definitions
This icon will alert you to a specific definition related to the topic under
discussion.
Case Studies
Case studies are often used to illustrate a concept within the setting
of a real-life scenario. Answer the questions that follow to ensure that
you have a proper understanding of what has been discussed.

7. Specific Outcomes and Chapter Alignment
SPECIFIC PROGRAMME OUTCOMES CHAPTER

ALIGNMENT
SO 1 Gather appropriate evidence and apply evidence- Chapter

based solutions utilising appropriate statistical 2,3,4,5,6
methods.
SO 2 Appreciate and identify relevant sources of data Chapter 1,2

and survey methods.
SO 3 Understand forecasting techniques. Chapter 5 and 6
SO 4 Understand statistical software. Chapter 3,5,6
8. Specific Outcomes and Assessment Criteria
SPECIFIC PROGRAMME ASSESSMENT CRITERIA

OUTCOMES
The student should demonstrate the ability
to:
SO 1 Gather appropriate • Correctly identify ways in which

evidence and apply statistics can be useful within
evidence-based business environments.
solutions utilising
• Recognise appropriately the
appropriate statistical
relevance of particular types of
methods.
statistical tests, and their use within
particular contexts, and in relation
to particular business problems.
• Calculate accurately the mean,
median, mode and standard
deviation on both raw (ungrouped),
and grouped data with an aim to
solve business problems.

SO 2 Appreciate and identify • Manipulate data optimally in ways

relevant sources of to inform business decisions
data and survey making.
methods.
• Be able to accurately calculate
descriptive statistics.
• Accurately identify applications of

statistical analysis in business
practice.
SO 3 Understand forecasting • Interpret the output with reference

techniques. to the scenario.
• Utilise appropriately learned

statistical techniques to generate
and interpret reports that can be
used for decision making.
• Determine the appropriate

statistical techniques necessary for
the task at hand, and the best
informed decisions that need to be
made.
• Be able to interpret and apply

concepts.
• Effectively utilise learned statistical

techniques to generate and
interpret reports that can be used
for decision making.

SO 4 Understand statistical • Use Ms Excel to carry out

software. calculations and statistical
analyses.
• Correctly interpret the output with

reference to the scenario.

CHAPTER ONE
STATISTICS DEFINITIONS
Learning Outcomes
Upon completion of this chapter, the learner should be able to:
Clearly describe the difference between primary and secondary data.

Clearly describe the difference between quantitative and qualitative data.
Demonstrate a thorough knowledge of sources of primary and secondary data.
Demonstrate a thorough understanding of several sampling techniques and
their applications.
1.1. Introduction to Statistics Definitions
One question that often hounds students is ‘What is statistics?’ A common

misconception is that the fields of statistics are only concerned with recording and
tabulating numerical values (such as those values seen in sport for example, a
cricketer’s batting Statistics or ball possession statistics that you see during televised
football matches). This view, while not entirely incorrect, is a little myopic, since the
field of statistics covers so much more. We can equate the subject of statistics to a
subject like zoology, the study of animals, or botany, the study of plants, since we can
describe statistics as a study of data. This definition includes all aspects related to
data, such as collection, tabulation, analysis, and so on. The definition in 1.2 below
perhaps expresses the idea more clearly. To fully understand what statistics is all
about, this chapter should be studied carefully as it explains fundamental statistical
concepts.
1.2. The meaning of statistics
Statistics is the scientific method that enables us to collect, organise, analyse, and
interpret data in order to make decisions as responsibly as possible. Statistics can also
be defined as ―a way to get information from data, (Devis, Pecar and Santana, 2017).

1.3. Branches of Statistics
The study of statistics has two major branches: descriptive statistics and inferential
statistics. Figure 1.1 presents these two main branches.
Statistics
Descriptive statistics: Inferential statistics:

Involves the organisation, Involves using a sample
summarisation, and to draw conclusions
display of data about a population.
Figure 1.1: Branches of statistics
Source: Davis and Pecar (2013)
1.3.1. Descriptive Statistics
Descriptive statistics is the meaningful presentation of data such that its characteristics
can be effectively observed.
1.3.2. Statistical Inference
Statistical inference relates to decision-making and is the subject that leads to future
action rather than an inspection of the past.
EXAMPLE 1.1
In a recent study, volunteers who had less than 6 hours of sleep were four times more
likely to answer incorrectly on a science test than participants who had at least 8
hours of sleep. Decide which part is the descriptive statistic and what conclusion
might be drawn using inferential statistics.

Solution
The statement “four times more likely to answer incorrectly” is a descriptive statistic.
An inference drawn from the sample is that all individuals sleeping less than 6 hours
are more likely to answer science question incorrectly than individuals who sleep at
least 8 hours
1.4. Key concepts in statistics
Below are some of the key definitions and concepts in statistics:
1.4.1 Data
Data is a ‘scientific’ term for facts, figures, information, and processing. Data are the
raw materials for data processing e.g.
• The number of tourists who visit South Africa each year.

• The sales turnover of all restaurants in Durban.
• The number of people with black hair who pass their driving test each year.
1.4.2 Information
Information is data that has been processed in such a way as to be meaningful to the
person who receives it. Information is anything that is communicated.
Information is sometimes referred to as processed data. The terms ‘information’ and

‘data’ are often used interchangeably. For instance, many companies providing a
product or service like to research consumer opinion and employ market research
organisations to do so.
A typical market research survey employs a number of researchers who request a

sample of the public to answer questions relating to the product. Several hundred
questionnaires may be completed. The questionnaires are used as input to a system.
Once every questionnaire has been input, a number of processing operations are

performed on the data. A report which summarises the results and discusses their
significance is sent to the company that commissioned the survey.
Individually, a completed questionnaire would not tell the company very much, only
the views of one consumer. In this case, the individual questionnaires are data. Once
they have been processed, and analysed, the resulting report is information. The
company will use it to inform its decisions regarding the product. If the report revealed
that consumers disliked the product, the company would be wrong, and poor decisions
would be made.
1.4.3 Quantitative and qualitative data
Quantitative data is data that can be measured. It will be assigned a numerical value
called a variable. Qualitative data is data that cannot be measured but which reflects
some quality of what is being observed. The data are said to have (non-numeric)
attributes.
1.4.4. Quantitative and Qualitative information

Qualitative information is information which is capable of being expressed in numbers.
It is usually financial in nature e.g. turnover for a year i.e. $4 million. Qualitative
information is information which may not be expressed very easily in terms of
numbers. Information of this nature is more likely to reflect the quality of something
e.g. “the standard of the books produced was very high”. This information cannot
easily be expressed in terms of numbers, as the standard of something is usually
described as being very high, quite low, or average and so on.
1.5 Types of Data
Data may be further classified as follows:
• Primary and secondary data.

• Discrete and continuous data.
• Sample and population data.

1.5.1 Primary data & Secondary data
Primary data is collected especially for the purpose of whatever survey is being
conducted. Raw data is primary data which has not been processed at all, and which
are still just a list of numbers. Secondary data is data which has have already been
collected elsewhere, for some other purpose, but which can be used or adapted for
the survey being conducted.
1.5.2 Discrete data & Continuous data
Discrete data is data which can only take took on a finite or countable number of values
within a given range e.g. number of goals scored by Arsenal could be 0,1,2 or 4 goals
1 1
but not 1 or 2 . Continuous data is data which can take on any value. They are
2 2
measured rather than counted e.g. heights of all members of your family as these can
take on any finite value i.e.1.542m, 1.639m and 1.492m.
1.5.3 Sample & Population
A sample is a subset, portion or part of a population whose characteristics are studied

to gain information about the entire population. Population is the entire aggregation of
items from which samples can be drawn.
1.5.4 Level of measurement for Statistical Data
All data measurements can be classified into the following categories:
(a) Nominal data.
(b) Ordinal Data.
(c) Interval Data.
(d) Ratio-scaled data.
These terms are used to denote the nature of data and the measurement level at which
such data has been acquired.

1.5.4.1 Nominal Data
This is the weakest level of measurement. Such a level entails the classification of
data qualitatively by name - hence the term “nominal”. For example the labelling of
data into two categories “men” and “women”, these two categories can be known only
by name. Meat can be classified as “fresh” and “stale”. Names like Caroline, Khumalo
Kadija, and so on, are classifications on the nominal scale. If you classify cats as
“black” and “white”, you are measuring them using the nominal scale.
Analysis and manipulation of this data requires those statistical techniques which can
handle names and nominal data. The Chi-Square statistic in Chapter five of this
module is one of the few applications of this level of data.
1.5.4.2 Ordinal Data
This is the kind of data which is categorized using those qualities one can differentiate
with size. In other words, data is amenable to be Transitive, that is: with magnitude
and direction. Thus, data classified as big, bigger, biggest, or: large, larger, largest,
and similar qualities, is data which has been acquired and arranged in an ordinal
manner. It is ordinal data, and the level of measurement for such data is ordinal. Chi-
Square and Analysis of Variance (to be learned later), together with any other
measures and statistics that can handle this type of data, are used to manipulate and
to make deductions with this kind of data.
1.5.4.3 Interval Data
This is data acquired through a process of measurement where equal measuring units
are employed. Such data has magnitude and direction (is transitive) and the size of
the interval between each observation and the one above it is the same for all
observations. Equal measuring units are employed. This data therefore contains all
the characteristics of nominal data, and ordinal data. In addition, the scale of
measurement moves uniformly in equal intervals up and down the respective sizes of
the data; in equal intervals - hence the name “interval” data. The only weakness with
this kind of data is that the position of zero is not clear, unless it can be assumed.
Thus, data like 2001, 2002, 2003. and so on, is interval data. The zero year then can
be assumed as 2001. Data like temperature readings have absolute zero so far that it

is not practical to find it and use it in every-day data manipulation. The same applies
to time in hours or even in minutes, and so on. The statistic used for analysis is such
measures as: analysis of variance, and regression-correlation. However, ratios are
difficult to compute.
1.5.4.4 Ratio-scaled Data
This is the highest level of measurement, with transitivity - magnitude and direction;
equal interval qualities; and the zero can be identified and used conveniently. It is
possible to perform all mathematical manipulations using this data, whereas with other
data such exercises are not possible due to lack of zero levels. Division and ratio
computation between one group of observations and another is possible - hence the
use of the word ratio. All the known statistical techniques are useful with this kind of
data. This is the kind of data most people can handle with ease, because the
observations are countable and divisible.
Table 1.1 summarizes the different measurement scales with examples provided of
these different scales.
Table 1.1: Level of measurements with examples

Measurement scale Recognizing a measurement scale
Nominal data 1. Classification data, e.g., male or female, red or black car.
2. Arbitrary labels, e.g., m or f, r or b, 0 or 1.
3. No ordering, e.g., it makes no sense to state that r > b.
Ordinal data 1. Ordered list, e.g., student satisfaction scale of 1, 2, 3, 4, and
5.
2. Differences between values are not important, e.g., political
parties can be given labels far left, left, mid, right, far right,
etc. and student satisfaction scale of 1, 2, 3, 4, and 5.
Interval data 1. Ordered, constant scale, with no natural zero, e.g.
temperature, dates.
2. Differences make sense, but ratios do not, e.g., temperature
difference.
Ratio-scaled data 1. Ordered, constant scale, and a natural zero, e.g. length,
height, weight, and age.
Source: Davis, Pecar (2010)

1.6 Sampling
It is rare to have access to all the information that we would like to know about a given
situation. Usually, we need to examine some portion (a sample) of the total system (a
population), and then extend our knowledge of that portion to the total system
(Inference)
It may not be practical to analyse all the given data in a particular set since:
• It may not be physically possible to collect the data.

• It may be too expensive to collect or process the data.
• Data may be processed more quickly by using a sample.
• The situation may change over time, so sampling needs to be done quickly.
EXAMPLE 1.2
A company that manufactures matches claims that in each box the ‘average’ content
is 50 matches. Suppose you wish to test whether such a claim is true. It would certainly
be impractical to take every box of matches, count the contents of each and arrive at
an average figure. Instead, you might draw a sample of (say) 100 boxes from the
population of all the boxes manufactured and measure the mean contents of these
100 boxes. You could then decide based on your results as to whether the claim is
valid. In this case, you would be using a sample to make an inference about the
population.
1.6.2. Bias
A sample should be unbiased (i.e., should be genuinely representative of the

population). If there is any bias the result will be valueless. In an interviewing situation,
the wording of questions, and the appearance or behaviour of the interviewer can
distort the outcome of sample surveys.

Example .1.3
Consider the situation of having homes close to an airport. If a survey was conducted
whether the noise levels were intolerable or not would depend entirely on what sample
of people are interviewed. Say, for example, people working the whole day away from
the airport and hence would not be affected by the noise were interviewed. They would
probably say the noise level is tolerable simply because they don’t hear most of it
being away from their homes so much. Solely making a judgment on these people’s
views would be totally biased and the result of the survey worthless.
1.6.3 Simple rules to help eliminate bias in sampling
There are several rules that can be used to help eliminate bias in sampling. These
include:
• Do not only use people who volunteer to be in a sample.

• Do not choose a sample using a method that omits segments of the population.
• Do not use people in the sample only because they are handy or available.
• The person selecting the sample should not have a vested interest in the
results.
1.6.4 Sampling errors
Even if a sampling process were completely free of bias, there would still be
fluctuations due to naturally occurring variation. In general, no two samples will be
identical, and it is necessary to assess how much variation can be expected to occur
from one sample to another. This variation between samples means that information
from any one sample will not be an exact representation of the population.

1.6.5 Selection of a random sample
It is often assumed that samples are drawn randomly. Choosing a random sample
from a population means that, each member of the population has the same chance
of being selected. No one member is being favoured over the other. This is intended
to ensure that the sample is unbiased, although of course there will still be sampling
error resulting from random fluctuations.
When selecting any sample, you must answer several questions before taking any
action:
1. What precisely are the members from which the sample is to be chosen (i.e. the
population)?
2. What should the size of the sample be?
3. How should the members be selected for inclusion?
Sample size depends on the type of problem and the desired accuracy of any
conclusion.
Example 1.4
A clothing designer has designed a new style of men’s socks and would like to test the
possible market reaction. She selects a random sample of adult males and finds that
a large proportion of those chosen state that they wear such socks. However, when
the product comes onto the market, very few pairs are sold. The problem here may be
one of population identification. In practice, although men will be the ones who wear
the socks, they may not be the one who buy them (it could be their wives or girlfriends).
In this case, the user is not the one who makes the purchase.
Most firms would find it either impractical or too expensive to survey all their customers
or to carefully examine every item that flows from their production line. Instead, they
usually resort to selecting a sample from the whole group or, as it is often called, the
population. We then go on to consider the distribution or sample means which, as we

shall see, can under certain conditions be regarded as being normally distributed. This
is used as the basis for testing various theories or hypotheses.
1.6.6 Sampling methods
Sampling methods are categorised as follows:
• Probability sampling methods.

• Non-probability sampling methods.
1.6.6.1 Probability sampling
The idea behind this type of probability sampling is random selection. More
specifically, each sample from the population of interest has a known probability of
selection under a given sampling scheme.
There are four categories of probability samples described, as illustrated in Figure 1.2
Probability sampling
Simple
Systematic Stratified Cluster
random
Figure 1.2: Probability sampling
(a) Simple random sampling
The most widely known type of a random sample is the simple random sample. This
is characterized by the fact that the probability of selection is the same for every case
in the population. Simple random sampling is a method of selecting ‘n’ samples from
a population of size ‘N’ such that every possible sample of size ‘n’ has equal chance
of being drawn.

Example 1.5
Consider the situation that a marketing researcher will experience when selecting a
random sample of 200 shoppers who shop at a supermarket during a particular time
period. The researcher notes that the supermarket would like to seek the views of its
customers on a proposed re-development of the store and the total footfall (the number
of people visiting a shop or a chain of shops in a period of time is called its footfall)
within this time period is 10,000.
With a footfall (or population) of this size we could employ a number of ways to select
an appropriate sample of 200 from the potential 10,000. For example, we could place
10,000 consecutively numbered pieces of paper (1–10000) in a box, draw a number
at random from the box, shake, and select another number to maximize the chances
of the second pick being random, shake, and continue the process until all 200
numbers are selected. These would then be used to select a customer entering the
store with the customer chosen based upon the number selected from the random
process. To maximize the chances that customers selected would agree to complete
the survey we could enter them into a prize draw. These 200 customers will form our
sample with each number in the selection having the same probability of being chosen.
When undertaking the collection of data via random sampling we generally find it
difficult to devise a selection scheme to guarantee that we have a random sample. For
example, the selection from a population might not be the total population that you
wish to measure or, during the time period when the survey is conducted, we may find
that the customers sampled may by unrepresentative of the population as a result of
unforeseen circumstances.
(b) Systematic random sampling
With systematic random sampling we create a list of every member of the population.
From the list, we randomly select the first sample element from the first n number
values on the population list. Thereafter, we select every nth number value on the list.
This method involves choosing the nth element from a population list as follows:

1. Step 1: Divide the number of cases in the population by the desired sample size.
2. Step 2: Select a random number between one and the value attained in step 1. For
example, we could pick the number 28.
3. Step 3: Starting with case number chosen in step 2, take every twenty-eighth
record, as per this example.
The advantages of systematic sampling compared to simple random sampling is that

the sample is easier to draw from the population. The disadvantages are that the
sample points are not equally likely.
(c) Stratified random sampling
With stratified random sampling, the population is divided into two or more mutually
exclusive groups, where each group is dependent upon the research area of interest.
The sampling procedure is to organize the population into homogenous subsets before
sampling, then draw a random sample from each group. With stratified random
sampling the population is divided into non-overlapping groups (subpopulations or
strata) where all the groups together would comprise the entire population. As an
example, suppose we conduct a national survey in the Netherlands. We might divide
the population into groups (or strata) based on the regions of the Netherlands. Then
we would randomly select from each group (or strata). The advantage of this method
is the guarantee that every group within the population is selected, and it provides an
opportunity to undertake group comparisons.
Example 1.7
To illustrate, consider the situation where we wish to sample the views of graduate job
applicants to a major financial institution. The nature of this survey is to collect data on
the application process from the applicants’ perspective. The survey will therefore
have to collect the views from the different specified groups within the identified
population.

For example, this could be based on gender, race, type of employment requested (full-
or part-time), or whether an applicant is classified as disabled. If we use simple random
sampling it is possible that we may miss a representative sample from one of these
groups as a result, for example, of the relative size of the group relative to the
population. In this case, we would employ stratified random sampling to ensure that
appropriate numbers of sample values are drawn from each group in proportion to the
percentage of the population as a whole. Stratified sampling offers several advantages
over simple random sampling:
(i) It guards against an unrepresentative sample (e.g., all male from a predominately
female group).
(ii) It provides sufficient group data for separate group analysis; it requires a smaller
sample.
(iii) Greater precision is achievable compared with simple random sampling for a
sample of the same size.
Stratified random sampling nearly always results in a smaller variance for the
estimated mean or other population parameters of interest. The main disadvantage of
a stratified sample is that it may be more costly to collect and process the data
compared with a simple random sample. Two different categories of stratified random
sampling are available:
• Proportionate stratification: With proportionate stratification, the sample size

of each stratum is proportionate to the population size of the stratum (same
sampling fraction). The method provides greater precision than for simple
random sampling with the same sample size; this precision is better when
dealing with characteristics that are the same (homogeneous) strata.
• Proportionate stratification: With disproportionate stratification the sampling
fraction may vary from one stratum to the next. If differences are explored in
the characteristics being measured across strata, then disproportionate
stratification can provide better precision than proportionate stratification, when
the sample points are correctly allocated to strata. In general, given similar
costs you would always choose proportionate stratification.

(d) Cluster sampling
Cluster sampling is a sampling technique in which the entire population of interest is

divided into groups, or clusters, and a random sample of these clusters is selected.
Each cluster must be mutually exclusive and together the clusters must include the
entire population.
After clusters are selected, then all data points within the clusters are selected. No
data points from non-selected clusters are included in the sample. This differs from
stratified sampling, in which some data values are selected from each group. When
all the data values within a cluster are selected, the technique is referred to as one-
stage cluster sampling. If a subset of units is selected randomly from each selected
cluster, it is called two-stage cluster sampling. Cluster sampling can also be made in
three or more stages: it is then referred to as multistage cluster sampling. The main
reason for using cluster sampling is that it is usually much cheaper and more
convenient to sample the population in clusters rather than randomly. In some cases,
constructing a sampling frame that identifies every population element is too
expensive or impossible. Cluster sampling can also reduce cost when the population
elements are scattered over a wide area.
(e) Multistage sampling
With multistage sampling, we select a sample by using combinations of different

sampling methods. For example, in stage 1, we might use cluster sampling to choose
clusters from a population. Then, in stage 2, we might use simple random sampling to
select a subset of elements from each chosen cluster for the final sample.
1.6.6.2 Non-probability sampling
In many situations it is not possible to select the kinds of probability samples used in
large-scale surveys. For example, we may be required to seek the views of local,
family-run businesses that have experienced financial difficulties during the bank credit
crunch of 2007–2012. In this situation there are no easily accessible lists of businesses
experiencing difficulties, or there may never be a list created or available. The question

of obtaining a sample in this situation is achievable by using non-probability sampling

methods to collect the required sample data. Figure 1.3 illustrates the four primary
types of non-probability sampling methods.
Non-probability
Convenience Purposive
Quota Snoball
Figure 1.3: Nonprobability sampling
(a) Convenience sampling
Convenience (or availability) sampling is a method of choosing subjects who are

available or easy to find. This method is also sometimes referred to as haphazard,
accidental, or availability sampling. The primary advantage of the method is that it is
very easy to carry out relative to other methods. Problems can occur with this survey
method in that you can never guarantee that the sample is representative of the
population. Convenience sampling is a popular method with researchers and provides
some data that can analysed, but the type of statistics that can be applied to the data
is compromised by uncertainties over the nature of the population that the survey data
represents.
(b) Purposive sampling
Purposive sampling is a sampling method in which elements are chosen based on the
purpose of the study. Purposive sampling may involve studying the entire population
of some limited group (accounts department at a local engineering firm) or a subset of
a population (chartered accountants). As with other non-probability sampling methods,
purposive sampling does not produce a sample that is representative of a larger
population, but it can be exactly what is needed in some cases—study of organization,
community, or some other clearly defined and relatively limited group. Examples of
two popular purposive sampling methods include quota sampling and snowball
sampling.

(c) Quota sampling
Quota sampling is designed to overcome the most obvious flaw of convenience (or
availability) sampling. Rather than taking just anyone, quotas are set to ensure that
the sample you get represents certain characteristics in proportion to their prevalence
in the population. Note that for this method you have to know something about the
characteristics of the population ahead of time. There are two types of quota sampling
(1) proportional and (2) non-proportional.
In proportional quota sampling you want to represent the major characteristics of the
population by sampling a proportional amount of each. For instance, if you know the
population has 25% women and 75% men, and that you want a total sample size of
400, you will continue sampling until you get those percentages and then you will stop.
So, if you’ve already got 100 women for your sample, but not 300 men, you will
continue to sample men, even if legitimate women respondents come along—you will
not sample them because you have already ‘met your quota’.
The primary problem with this form of sampling is that even when we know that a quota
sample is representative of the particular characteristics for which quotas have been
set, we have no way of knowing if the sample is representative in terms of any other
characteristics. If we set quotas for age, we are likely to attain a sample with good
representativeness on age, but one that may not be very representative in terms of
gender, education, or other pertinent factors.
In non-proportional quota sampling you specify the minimum number of sampled data
points you want in each category. In this case you are not concerned with having the
correct proportions, but with achieving the numbers in each category. This method is
the non-probabilistic analogue of stratified random sampling in that it is typically used
to assure that smaller groups are adequately represented in your sample. Finally,
researchers often introduce bias when allowed to self-select respondents, which is
usually the case in this form of survey research. In choosing males, interviewers are
more likely to choose those that are better-dressed, or who seem more approachable
or less threatening. That may be understandable from a practical point of view, but it
introduces bias into research findings.

(d) Snowball sampling
In snowball sampling, you begin by identifying someone who meets the criteria for
inclusion in your study. You then ask them to recommend others who they may know
who also meet the criteria. Thus, the sample group appears to grow like a rolling
snowball.
This sampling technique is often used in hidden populations, which are difficult for
researchers to access, including fi rms with financial difficulties or students struggling
with their studies. The method creates a sample with questionable representativeness,
and it can be difficult to judge how a sample compares with a larger population.
Furthermore, an issue arises in who the respondents refer you to, for example, friends
will refer you to friends but are less likely to refer to ones they don’t consider as friends,
for whatever reason. This creates a further bias within the sample that makes it difficult
to say anything about the population.
Note: The primary difference between probability methods of sampling and

nonprobability methods is that in the latter you do not know the likelihood that any
element of a population will be selected for study.
SUMMARY
In this chapter, we discussed key concepts in statistics, and introduced data as the
raw material of statistics. We distinguished between primary and secondary data. We
also looked at the different data types and different types of measurement scales.
Finally, we covered the methods of sampling.

CHAPTER TWO
VISUAL REPRESENTATION OF DATA
Learning Outcomes
Identify sources of information.

Construct tables.
Illustrate data using a graph.
Illustrate data using a pie chart.
Illustrate data using a bar chart.
Illustrate data using a pictogram.
Condense raw data using a frequency distribution.
Condense raw data using a grouped frequency distribution.
Construct a histogram and frequency polygon.
Construct and apply ogive curves.
Explain how statistics are misused.
2.1. Introduction to visual representation of data

In this Chapter we examine visual representation of data using tables and graphs.
The type of visual representation you choose to use depends on the type of variables
you are dealing with within your data set, for example, category (or nominal), ordinal
or interval (or ratio) data. Guidelines to construct tables and graphs are also discussed
in this chapter.
2.2 Data preparation rules

Data presented must always be:
• Factual.
• Relevant.

Before presentation always check the source of the data to ensure that the:
• Data has been accurately transcribed.
• Figures are relevant to the problem.
2.3 Methods of visual presentation of data

Data is usually presented in the form of:
1. Tables.
2. Graphs.
2.3.1 Tables
When constructing a table, it is important to show which relationships are being
emphasised. It is often wise to have more than one table relating the same set of
data. Some of the important points that should be followed when constructing a table
are:
• Every table should have a clear label, such as Table 1, or Table 1.1, depending
on the table number and the chapter in which it appears.
• Every table should be properly titled, that describes the type of information
given in the actual table itself.
• Rows and columns should be precise, and the units of the values included.
• Categories should not overlap, i.e., the same item should not appear in more
than one category.
• The correctness of any calculations should be verified. For example, check
that the sum of the column totals equals the sum of the row totals.
• Omit any unnecessary and/or irrelevant data.
• Clearly state the units.
• Use your imagination and common sense.
• Compute and show percentages and ratios where appropriate.
Example 2.1
Below is information concerning participation in soccer, rugby, and tennis by South
African adults all under the age of 50 years. South Africa has a total of 2 055 000

adults under 50 years of age who participate in soccer, rugby, and tennis, with 1 700
000 soccer players, 270 000 rugby players and 85 000 tennis players. The actual
breakdown of these numbers (in different age groups) is 600 000 soccer players, 120
000 rugby players and 30 000 tennis players between 18 and 25 years: 800 000 soccer
players, 100 000 rugby players and 40 000 tennis players between 26 and 35 years
and 300 000 soccer players, 50 000 rugby players and 15 000 tennis players between
36 and 49 years. Present the data in a tabular form.
Solution
It is difficult to gain any meaningful information from this narrative. Instead, we can
construct a table from the above data as shown below.
The numbers of adults in South Africa under the age of 50, who participate in soccer,
rugby, or tennis, are shown in Table 2.1 below.
Table 2.1: Sport discipline in South Africa and the age groups
AGE(YEARS) SOCCER RUGBY TENNIS TOTAL

18-25 600 000 120 000 30 000 750 000
26-35 800 000 100 000 40 000 940 000
36-49 300 000 50 000 15 000 365 000
TOTAL 1 700 000 270 000 85 000 2 055 000
The table is much more compact and useful than the narrative and helps us
distinguish patterns, namely that, in general, the most active adult South Africans
are between 26 and 35 years of age.
2.3.2 Graphs
Graphs are picture representations of data, and better depict relationships between
variables than a table does. It is also possible to depict more than one set of data on
the same graph.
When constructing a graph illustrating data:
• Give the graph a clear and appropriate title.

• Label the axes (x and y) clearly, with the units of measurement of the
quantities.
• Do not plot too many curves on the same set of axes, as it will confuse the
relationships between the variables.
• It is recommended to accompany the graph with the table of data; and,
• Try, as much as possible to show the zero on each scale.
2.3.2.1 Pie chart
A pie chart is often used to give a visual presentation of data to indicate the
proportions that make up a given total. It is one of a number of so called area
diagrams, which consists of geometrical figures (example square, rectangle). The pie
chart is a circle that is divided into sectors by lines, in such a way that the area of
each sector is proportional to the size of the quantity represented by that sector,
(Croucher, 2016).
Example 2.2
Consider the previous table. Draw a pie chart that illustrates the percentage of
the total number of South Africans between 18 and 25 years that play soccer, rugby
and tennis.
Solution
600 000
Percentage that play soccer =  100 = 80%
750 000
120 000
Percentage that play rugby=  100 = 16%
750 000
30 000
Percentage that play tennis =  100 = 4%
750 000
These percentages can be represented on a pie chart as follows:

4%
16%
Soccer
Rugby
Tennis
80%
Figure 2.1: South Africans between 18-25 years old that play soccer, rugby, and tennis
The sector for soccer is 80 % of the total pie. The sector for rugby is 16 % of the
total pie. The sector for tennis is 4 % of the total pie. The sum of the individual
percentages must be 100 % (i.e. the whole pie).
2.3.2.2 Bar charts

A bar chart consists of a series of rectangular bars where the length of each bar
represents the actual magnitude of the respective quantities. There are a number of
different types of bar charts, which may be used depending on the type of data to be
represented. Below are the guidelines for preparing a bar chart:
• Make the widths of each bar equal, since it is only the lengths which are being
compared.
• Clearly label the axes.
• Include footnotes, sources of data and tables.
2.3.2.3 Simple bar charts

A simple bar chart is one in which the bars represent one quantity or variable
only. The length of the bar is indicative of the number of items in that category.

EXAMPLE 2.3
Table 2.2 below shows the total number of points scored by four rugby teams
in ten matches. Draw a bar chart to represent the given information.
Table 2.2: Rugby teams and their scores
RUGBY TEAM POINTS SCORED

CATS 120
SHARKS 144
STORMERS 216
BULLS 103
TOTAL 583
Solution
The points are represented on a bar chart in Figure 2.2 below:
Figure 2.2: Bar chart of points scored by rugby teams in ten matches

2.3.2.4 Multiple bar charts

A multiple bar chart is one in which the bars are displayed side by side, often in pairs
or triples, to emphasize comparison. This is usually done by drawing two separate
bar charts of the quantities.
In the simple bar chart above, it is also useful to include the number of points
conceded by each of the teams since it is often the points difference that determines
the success of the team.
EXAMPLE 2.4
The number of points conceded by each team is shown in Table 2.3 below. Using
information from Table 2.2 in Example 2 above, draw a multiple bar chart
to represent the points scored and those conceded by the four rugby teams.
Table 2.3: Points scored and those conceded by the four rugby teams
RUGBY TEAM POINTS SCORED POINTS CONCEDED

CATS 120 177
SHARKS 144 97
STORMERS 216 123
BULLS 103 230
TOTAL 583 627
Solution
The points are represented on a multiple bar chart as follows:

Figure 2.3: Points scored and those conceded by the four rugby teams
Source: Croucher, 2016
From this multiple pie chart, it is easy to see that the Stormers and the Sharks
are the better performing sides with both teams scoring more points than conceding
them. The Bulls are the worst performing team conceding more than double the
number of points that they score.
2.3.3 Pictograms
A pictogram is a graph in which data is displayed using pictures, rather than
traditional methods discussed earlier. The pictures are indicative of the type of data
being presented.
EXAMPLE 2.5
A survey was conducted by a local pharmaceutical company to try and determine the
most common type of medication people use to cure a common cold. The survey
was conducted on a group of 36 randomly chosen South Africans. Research
resulted in the data in table 2.4 below:

Table 2.4: Type of medication people use to cure a common cold
MEDICATION NUMBER OF PEOPLE

Panado 13
Degoran 11
Flutex 9
Other 3
Total 36
A pictogram of the 36 randomly chosen South Africans is shown in Figure 2.4.
Panado
Degoran
Flutex
Other
Figure 2.4: Type of medication people use to cure a common cold
2.3.4 Scatter diagrams

The relationship between two quantitative variables can be depicted in a scatter
diagram. Economists, for example, are interested in the relationship between inflation
rates and unemployment rates. Business owners are interested in many variables,
including the relationship between their advertising expenditures and sales levels.
A scatter diagram is a plot of all pairs of values (x, y) for the variables x and y.
EXAMPLE 2.6
An educational economist wants to establish the relationship between an individual's
income and education. She takes a random sample of 10 individuals and asks for their
income (in R ´000s) and education (in years).

Table 2.5: Income vis-à-vis education
X (years of education) Y (income in R ‘000)
11 25
12 33
11 22
15 41
8 18
10 28
11 32
11 24
17 53
11 26
If we feel the value of one variable (such as income) depends to some degree on
the value of the other variable (such as years of education), the first variable
(income) is called the dependent variable and is plotted on the vertical or y-axis.
The second variable is the independent variable and is plotted on the x-axis.
TIP: Think of the independent variable (x-axis) as the ‘cause’ and the dependent
variable (y-axis) as the ‘effect’.
Figure 2.5: Negative linear relationship


The scatter diagram allows us to observe two characteristics about the relationship
between education (x) and income (y). As these two variables move together, i.e.
their values tend to increase together and decrease together; there is a positive
relationship between the two variables. The relationship between income and
years of education appears to be linear, since we can imagine drawing a straight
line (as opposed to a curved line) through the scatter diagram that approximates
the positive relationship between the two variables. The pattern of a scatter
diagram provides us with information about the relationship between two
variables.
If two variables move in opposite directions and the scatter diagram consists of
points that appear to cluster around a straight line, then the variables have a
negative linear relationship (Figure 2.6).
Figure 2.6: Negative linear relationship

In figures 2.7 and 2.8, there is no linear relationship between x and y.

Figure 2.7: Non- linear relationship

Figure 2.8: Non- linear relationship

In figure 2.9, there is no relationship between x and y.
Figure 2.9: No relationship


In Chapter 6 will compute numerical measures of the strength (or correlation) of

the linear relationship between two variables.
2.4 Frequency distributions
Often, in practice, data is collected and is initially presented with the observations
(i.e. sample values) in some random order. Various techniques are used for
condensing data into comprehensible form.
Example 2.4
The number of transactions per month at an Automated Teller Machine (ATM) for 10
account holders at a particular bank is as follows:
05 01 09 17 11 05 10 03 19 04
This data is randomly presented and has no order
2.4.1 Array
The raw data from above can be arranged in some meaningful order. As it stands,
not much information can be gauged from the data, except that the total number of
transactions is below 20.
One such method is to arrange the data in increasing order of magnitude i.e. from
smallest to largest (ascending order).
This is shown below:
01 03 04 05 05 09 10 11 17 19
2.4.2. Frequency tables

Raw data, which is unprocessed data that we obtain in the collection phase, is often
difficult to interpret because there are usually too many values to easily distinguish

any pattern in the data. The solution is to summarise the data by creating tables that
report how often certain sections of the data appear in the data set. We do this by
drawing a frequency table.
Presenting raw data in the table can make even the most comprehensive collection
of data more readily understandable. Apart from taking up less room, a table allows
us to locate figures more quickly, make it easy to compare different classes and
reveal patterns that we might otherwise not have noticed.
Frequency tables come in a variety of formats and range from a simple table that
contain frequencies of categories to frequency tables for continuous data that contain
grouping of values (or classes). Tables allow us to summarise data in a form that
allows us to access important information (Davis, Pecar and Santana, 2014).
The frequency table has the following elements:

The classes or class intervals represent the possible data values that the variables
can assume. In the case of categorial variable, the classes are simply the unique
data values or categories that the variable can assume. In the case of quantitative
variable (discrete or continuous), the classes represent intervals of values. These
intervals partition the entire observed range of values in such a way that each
observation in the data falls into exactly one of these intervals. The total number of
classes is k.
The class midpoint represents the middle value of a class interval. This component
of a grouped frequency distribution table only applies when we use quantitative data
in a table with class intervals.
The frequencies (or the count) represent the number of times that an observation
falls within a specified class. Each observation must fall into exactly one class.
Hence, the sum of these frequencies is equal to the total sample size. When the
frequencies are expressed in percentage of the total sample size (i.e.,
𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦
× 100), then these are called the percentage frequencies. It is worth noting
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒
that the percentage frequencies will add up to 100%. The class frequencies
(percentage frequencies) are denoted fj (pj). The total number of observations is

denoted by n =∑𝑘𝑗=𝑖 𝑓𝑗
The general form of frequency table is presented in Table 2.6. below.
Table 2.6: Frequency distribution table format
This column This column This column

displays the represents the displays the
variable or class middle value of number of times
interval class interval that data values
from the original
raw data set fall
into the specified
Classes Class midpoint Frequency class
Class 1 m1 f1
Class 2 m2 f2
Class 3 m3 f3
⁞ … …
Class k mk fk
Total n =∑𝑘𝑗=𝑖 𝑓𝑗
This row shows the

total number of
observations in the
data set

NOTE
Table 2.7: Frequency distribution table of the number of transactions
When constructing a frequency table, always check that the total frequency equals
the number of observations in the raw data, n=10 in this case.
NUMBER OF TRANSACTIONS FREQUENCY

( fi )
01 1
03 1
04 1
05 2
09 1
10 1
11 1
17 1
19 1
k
K=9  = 10
i =1
NOTE
In constructing a frequency table from continuous data, also known as a grouped
frequency table, there are several rules that should be followed, they are listed
below:
• The class intervals must never overlap.
• In most instances, the intervals should be of the same width.
• The first and/or last class interval could be open-ended but avoided as much
as possible. e.g., open ended-ness of the type, [100; 200), ‘100 under 200’
or ‘100 over 200‘.

• If there are no observations in a particular interval, it should still be included

as a zero to avoid a misleading impression of the data.
• It should be checked that the sum of the frequencies equals the total number
of observations.
Table 2.8 shows an example of a grouped frequency distribution. The data were the
total sales registered by 15 till points operate by cashiers working in a large
supermarket. In this case the number of classes is k = 5
Table 2.8: Grouped frequency distribution table format

This column The class midpoints
This column displays the
displays all possible are equal to the number of cashiers that
data values (in middle value of the made sales that fall within
classes) for the class intervals the specified class interval
variable
Classes Class midpoint Frequency

[R0; R400) R200 1
[R400; R800) R600 3
[R800; R1 200) R1 000 5
[1 200; R1 600) R1 400 4
[R1 600; R2 000) R1 800 2
Total 15
The total number of

observation in the
data set is 15
2.4.3 Creating frequency table for quantitative data1
When constructing a frequency table for continuous data, we need to set up class
intervals in order to group the values into classes. Naturally, the number of classes will
affect the size of each of the class intervals and vice versa, as each class is one part
of the whole observed range.
The ease of reading the table depends on choosing the right number and width of
classes. Too many classes and information is not much better than raw data, whereas
1 This section relies heavily on (Davis, Pecar and Santana,2014).

too few classes drastically reduces the amount of meaningful information that we can
glean from the table.
The following is a list of steps involved in constructing a frequency table form a set of
raw continuous data.
Step 1: Determine the sample size, n.
Step 2: Determine the difference between the largest observed value in the data
set (xmax) and the lowest observed value in the data set (xmin). The
difference xmax - xmin is the range of the data. Note that it is easier to work
with ‘round’ numbers when constructing a frequency table, but the values
of xmax and xmin are often decimals. To overcome this, we often replace
xmin with the largest ‘round’ number smaller than xmin and we replace xmax
with the smallest ‘round’ number larger than xmax. We then calculate
the (approximate) range from these two new rounded numbers.
Step 3: Determine the number of classes, k, for the table. It is sometimes

difficult to decide how many classes there should be, as there are no
‘rules’ as such. Usually, we choose the number of classes that conveys
an appropriate amount of information. A rule of thumb called Sturges’
rule, which state the number of classes is:
log (𝑛)
𝑘 =1+
log (2)
We then round off the results to the nearest integer value. However,
this rule can sometimes force you to use too few classes
(especially for sample sizes smaller than 50).
In general we use between 5 and 12 classes to ensure that we have what
Davis, Pecar and Santana (2014:22) call an ‘aesthetically pleasing’
table.
Step 4: The width of each class, or class width (CW), is calculated as

𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
𝐶𝑊 =
𝑘
Again, we prefer to work with round numbers, and so we round off
these values. Class widths are easier to handle if they are in multiples

of 2, 5 or 10 units
Step 5: Next we determine the class limit for each class. We can do this in one
of the two ways:
(i) Calculate or choose the number of classes, k (using the guidelines
in the steps above). From this value calculate the appropriate class,
xmin, add the value of CW to obtain the first class, i.e., the first class will
be xmin to (xmin + CW). Each subsequent class is then created by adding
the class width to the upper boundary of the previous class. For example,
suppose that for a raw data set we choose k = 4, xmin =10 and xmax= 34.
Using the values, we find that CW = 6. The k = 4 classes that we
construct will then be [10; 16), [16;22), [22; 28) and [28; 34).
(ii) Alternatively, you may decide that the class width must equal a
specific value. In this case, you will determine the class width, CW, and
then reverse the equation for CW to get a value for k:
𝑥𝑚𝑎𝑥 − 𝑥𝑚𝑖𝑛
𝑘=
𝐶𝑊
Then, carry on with the procedure as above, i.e., start with the smallest
chosen value, xmin, add the value of CW to obtain the first class, and so
on, so that each subsequent class is then created by adding the
class width to the upper boundary of the previous class. For example,
suppose that we choose CW =25, xmin =150 and xmax= 275. The number
of classes would then be k = 5, and the classes would be [150; 175),
[175;200), [200; 225), [225; 250) and [250; 275).
NOTE : The square brackets represent ‘inclusion’ and the round brackets
represents exclusion’ i.e. the class [10; 16) represent all the values starting with
and including 10, up to, but excluding ,16. In doing this we are able to avoid
overlap in classes.
Step 6: The class midpoints for each class interval are equal to the average of
the lower and upper stated class limits, i.e. the class midpoint for the jth
class is:
𝑈𝐶𝐵𝑗 − 𝐿𝐶𝐵𝑗
𝑚𝑗 =
𝐶𝑊

where 𝑈𝐶𝐵𝑗 and 𝐿𝐶𝐵𝑗 are upper class and lower class bounds of the
jth class
Step 7: Finally, we use the practical limits to determine the frequencies for
each class by tallying the observations from the raw data set that fall
within the class limits
Example 2.5
Consider the following data set in Table 2.9 depicting the weight of 20 teenagers
(sorted from smallest to largest).
Table 2.9: The weight of 20 teenagers measured in kilograms
41.49 48.22 61.63 65.03 79.99

81.71 81.75 83.52 87.59 89.77
93.29 94.10 104.67 105.60 108.36
117.12 119.79 128.50 132.82 139.96
Using the seven steps given above, we can construct the frequency table.
Step 1: The sample size is n =20

Step 2: The smallest observed value is 41.49, but we will select xmin =40. The
largest observed value is 139.96, but we will select xmax =140. The
range is then 140-40 = 100.
Step 3: Using the rule of Sturges, the number of classes that will be used is:
log (20)
𝑘 = 1+ ≈5
log (2)
𝑟𝑎𝑛𝑔𝑒 100
Step 4: The class width is calculated 𝐶𝑊 = = = 20
𝑘 5
Step 5: The classes for the table are then [40; 60), [60; 80), [80; 100), [100;
120) and [120; 140).

Step 6: The class midpoints are the average of the lower and the upper class
(40+60)
limits, for example, =50. The midpoints are then 50, 70, 90, 110
2
and 130.
Step 7: The frequencies are then tallied for each class and given in Table 2.10
Table 2.10: Frequency table for the mass of teenagers
WEIGHT OF TEENAGERS CLASS MIDPOINT TALLY FREQUENCY

(kg)
[40; 60) 50 II 2
[60; 80) 70 III 3
[80; 100) 90 IIIi II 7
[100; 120) 110 IIII 5
[120; 140) 130 III 3
TOTAL ∑5𝑗=1 𝑓𝑗 =20
2.4.4 Cumulative frequency table

A cumulative frequency table is similar to a standard frequency table, expect that the
class intervals do not have bounds. For example, the classes in a frequency table
typically have the form [50; 70), meaning that the class covers the values between
50 (inclusive) and 70 (exclusive). The corresponding cumulative frequency table only
uses the upper bound of each class. In other words, the classes in a cumulative
frequency table are characterized by upper bounds, for example, <70 (this
cumulative frequency class upper bound means that it covers all values that are less
than70). Note however that when the data are discrete in nature, we use the notation
≤ and not <.
NOTE: Cumulative frequency table The cumulative frequency for a value x, is

the total number of observed values that are less than x. For discrete data, it is
of values less than or equal to x.

2.4.4.1 Frequency Histograms
One way of presenting a graphical picture of a grouped frequency distribution is to

construct a histogram, which is basically a bar chart in which the frequency of the
observations within a class interval is represented by the corresponding bar. The
height of the bar is proportional to the frequency in that particular class. A histogram
does not have gaps between bars.
Example 2.6
A histogram for the grouped frequency distribution for the weight of students in Table
2.10 is shown in Figure 2.10 below.
Weight of Teenagers
8
7
Number od students
6
5
4
3
2
1
0
[40; 60) [60; 80) [80; 100) [100; 120) [120; 140)
weight in Kgs
Figure 2.10: Histogram on the weight of teenagers
2.5 Frequency polygon

A frequency polygon is a line graph that emphasises the continuous change in
frequency. It is formed from the histogram by joining the midpoints of the top of the
bars with the straight lines. The midpoints of the first and last classes are joined to the
x-axis at either side at a distance equal to the half the class interval of the first and last
classes.

Example 2.7
A frequency polygon for the grouped frequency distribution for the weight of students
in Table 2.9 is shown in Figure 2.11 below.
Frequency Polygon for the Weight of Teenagers

8
7
6
5
Frequency
4
3
2
1
0
0 20 40 60 80 100 120 140
Weight
Figure 2.11: Frequency Polygon for the weight of teenagers
2.6 Ogive or cumulative frequency polygon

An ogive, also known as cumulative frequency polygon, is a line graph that
displays the cumulative frequency at its upper boundary. Constructing this graph is
relevantly simple if you have already constructed the cumulative frequency table.
Step 1: Construct a frequency distribution that includes cumulative frequencies

as one of the columns.
Step 2: Specify the horizontal and vertical scales

▪ The horizontal scale consists of the upper-class
boundaries.
▪ The vertical scale measures cumulative frequencies.
▪ An Ogive is connected to a point on the x-axis representing the
actual lower limit of the first class.

Step 3: Plot points that represent the upper-class boundaries and their
corresponding cumulative frequencies.
Step 4. Connect the points in order from left to right.
Step 5. The graph should start at the lower boundary of the first class
(cumulative frequency is zero) and should end at the upper boundary
of the last class (cumulative frequency is equal to the sample size).
Example 2.8
Construct an ogive for the weight of teenagers shown in Table 2.11.
Step 1: First we construct a cumulative frequency column as shown in

Table 2.10below.
Table: 2.11: Cumulative frequency for the weight of teenage students
WEIGHT OF TEENAGERS CUMULATIVE REQUENCY CALCULATIONS

[40; 60) 2 2
[60; 80) 5 2+3
[80; 100) 12 2+3+7
[100; 120) 17 2+3+7+5
[120; 140) 20 2+3+7+5+ 3
Step 2: Plot the values of the cumulative frequency class bounds against their
cumulative frequencies:
• Plot the point 40 on the horizontal axis against the point 0 on the
vertical axis (where the ogive is connected to the x-axis).
vertical axis.
vertical axis.
• Plot the point 100 on the horizontal axis against the point 12 on
the vertical axis.
the vertical axis.

the vertical axis.
Step 3: Connect the points with a straight line. The results are shown in
Figure 2.12
Ogive for the Weight of Teenagers

25
Cumulative frequencies
20
15
10
0
0 20 40 60 80 100 120 140 160
Weight
Figure 2.12: Ogive for the weight of teenagers
2.7 Using Excel to Produce Summary Tables and Charts2
The tools available in Excel for creating summary tables and charts are outlined
below.
2.7.1 The Pivot Table

The Pivot Table option within the Insert tab can be used to construct both categorical
frequency tables (called a one-way pivot table in Excel) and cross-tabulation tables
(called a two-way pivot table in Excel). The Chart option is used to display these tables
graphically. Follow these steps to create a one-way pivot table:
• Highlight the data range of the categorical variable(s) to be summarised.
• From the menu bar in Excel, select Insert, then select the PivotTable icon.
• In the Create PivotTable input screen, check that the correct data range is
selected.
2 This section relies heavily on Wegner (2017)

• From the PivotTable Field List box, drag the categorical variable to the Row
Labels box (or the Column Labels box) and then again drag it to the Σ Values
box. The one-way pivot table is constructed as the variable is dragged to each
box in turn.
• Check that the Σ Values box displays Count of <variable name>. If not, click
the down arrow in the Σ Values box and select Count from the Value Field
Settings dialog box.
The PivotTable Field List dialog box is used to create the category frequency table
(one-way pivot table) as shown in Figure 2.13 below:
Figure 2.13: PivotTable Field List dialog box
To construct a cross-tabulation table (or two-way pivot table), follow the same steps
as for a one-way pivot table, but drag one of the categorical variables to the Row
Labels box and the other categorical variable to the Column Labels box. Then drag
either of these two variables to the Σ Values box, and again check that the Count
operation is displayed.

2.7.1 Bar Charts and Pie Charts

To construct a chart from a pivot table (pie chart, column bar chart, stacked bar chart
or multiple bar chart), place the cursor in the pivot table area and select the Insert tab
and the Chart option from the menu bar. Then select the chart type (Column, Pie or
Bar) to display the pivot table graphically. The charts that are produced are the same
as those shown in Figure 2.1 (Pie Chart), Figure 2.2 (Column Bar Chart), and Figure
2.3 (Multiple Bar Chart).
2.7.2 Scatter Plots and Line Graphs

For numeric data, the scatter plot and the trend line graphs can be generated by
highlighting the data range of the numeric variables to be displayed and selecting the
Chart option within the Insert tab in the menu bar. The Scatter chart type will produce
the scatter plot for two numeric variables (see Figure 2.5).
2.7.3 The Data Analysis Add-In

Excel offers a data analysis add-in that extends the range of statistical analyses that
can be performed to include more advanced statistical techniques. To add this module,
follow this sequence in Excel (2007): click the Office button, select Excel Options and
then Add-Ins. At the Manage option, select Excel Add-ins > Go and, in the Add-Ins
dialog box, tick Analysis ToolPak and click OK.
To use any of the statistical tools within the data analysis add-in, select the Data tab
and then the Data Analysis option in the Analysis section of the menu bar.
Figure 2.14 and Figure 2.15 show all 19 statistical techniques available in Data
Analysis

Figure 2.14: Excel’s Data Analysis dialog box (first 10 techniques)
Figure 2.15: Excel’s Data Analysis dialog box (remaining nine techniques)
2.7.4 Numeric Frequency Distribution and Histogram
The Histogram option within Data Analysis is used to create numeric frequency
distributions, histograms, ogives (both count and percentages) and the Pareto curve.
To apply the histogram option, first create a data range consisting of a label heading
and the upper limits of each interval in column format. Excel calls this data range of
interval upper limits a Bin Range. Then complete the data input preparation dialog box
for the histogram is shown in Figure 2.16.

Figure 2.16: Histogram data input preparation dialog box
To produce a numeric frequency distribution and histogram, complete the following

inputs:
• The Input Range defines the dataset (include the variable name).
• The Bin Range defines the data range of upper limits of each interval.
• Tick the Labels box (to indicate that the variable names have been included in
each of the Input Range and Bin Range).
• Tick the Chart Output box to display the histogram in the output.

Conclusion
This Chapter outlined different modes of displaying data and conveying the information
from statistical analyses. Charts such as the pie and bar charts vividly display data
associated with qualitative (categorical) random variables. Examples of how data may
be presented using bar charts, pie charts, histograms, scatter plots, frequency tables
and ogive curves were shown in this chapter.
Additionally, this chapter covered how to use pivot tables, or summary tables, in Excel
to display data graphically. In conclusion, when presenting statistical findings to
management, graphical representations should always be taken into account.
Compared to written reports and tables, a graphical depiction encourages quicker
assimilation of the information to be communicated.

Self-Assessment Questions
2.1 Consider the following data set of the number of calls received per day by a
call centre for motor vehicle insurance claims during the month of December.
31 13 21 25 61
14 19 17 30 72
14 13 21 246 298
8 29 21 217
9 25 17 80
7 17 30 118
17 25 37 80
2.1.1 According to Sturge’s rule of thumb, how many class intervals should
be used when summarising this data set?
2.1.2 Determine the class width and where the first-class interval should
start.
2.1.3 Summarise the dataset using a frequency table.
2.1.4 Use the frequency table constructed in question 2.1.3 to draw a

histogram.
2.1.5 Calculate the midpoint of every class.
2.1.6 Use the table from question 2.1.5 to draw a frequency polygon.
2.1.7 Calculate the cumulative frequencies.
2.1.8 Use the table from question 2.1.7 to draw an ogive.

2.2 In an internationally based firm, each employee can be classified as

living in one of four parts of the metropolitan area. The percentages of
employees residing in each area are shown in table 2.12 below:
Table 2.12
AREA PERCENTAGE (%)
North 31.7
South 22.4
East 17.3
West 28.6
Total 100.0
2.2.1 Illustrate the data using a pie chart.
2.2.2 There was 2000 employees. Convert the data into a frequency form.
2.3. The number of Namibians aged 15 years and older attending a sporting
event or competition increased from 6.1 million in 1995 to 6.4 million in 1999.
The table 2.13 below shows the six sports events that had the largest
change in attendance during that period along with their attendance rates:
Table 2.13 Namibians ages 15 years attending a sporting event.
ATTENDANCE RATE
SPORT 1995 1999
Soccer 29.6 32.3
Rugby 11.4 19.7
Cricket 9.8 12.2
Tennis 6.3 7.1
Golf 6.1 6.6
Bowls 2.1 1.4
Total 65.3 79.3
Illustrate the data using a multiple bar chart.

2.4. The number of burglaries reported in a major city each day over a period of
20 days is shown below:
17 16 26 36 19 8 41 32 26 27
8 38 41 19 26 17 54 16 36 26
2.4.1 Sort the data in ascending order.
2.4.2 Write down the data in the form of a frequency distribution.
2.5 Discuss the two main differences between a bar chart and a histogram.
2.6 The following table shows the number of passenger cars sold by each
manufacturer in each half-year (first and second half) of last year.
Manufacturer First half Second half Annual sales

Toyota 42 661 54 298 96 959
Nissan 35 376 27 796 63 172
Volkswagen 45 774 42 796 88 028
Delta 26 751 36 045 62 796
Ford 32 628 41 527 74 155
MBSA 19 975 17 293 37 268
BMW 24 201 27 518 51 724
MMI 14 307 11 047 25 354
2.6.1 Construct a multiple bar chart showing the number of new car sales by
the manufacturers between the first and the second half of last year.
(Use Excel’s Column (Bar) chart option in the Insert > Chart tab.)
2.6.2 By inspection of the multiple bar chart, identify which car manufacturers
performed better in terms of new car sales in the first half of the year
compared to the second half of the year.
2.6.3 Also by inspection of the multiple bar chart, identify which car
manufacturer showed the largest percentage change (up or down) in
sales from the first half to the second half of the year.

CHAPTER THREE
DATA DESCRIPTORS
Learning Outcomes
• Understand the concepts of central tendency.

• Recognise the three measures of central tendency and carry out calculations
involving them.
• Recognise when to use different measures of central tendency.
• Recognise the concept of dispersion.
• Recognise the measures of dispersion and to carry out calculations involving
them.
• Determine the significance of the skewness of a distribution.
3.1 Introduction to data descriptors

This chapter looks at statistical measures that enable us to describe and summarise
a data set using values calculated form the data set. We will distinguish values that
are calculated from ungrouped data and grouped data. Furthermore, differences
between values that are calculated from samples and values that are calculated from
entire populations are discussed. The central tendency of a data set is the centre or
middle location of values while the dispersion (or variations) of a data set indicates the
degree of the spread of the data values about the central value.

3.2 Measures of central tendency

Often it is convenient to describe a set of numbers by using a single number.
Calculating a single number is one of the most frequently encountered methods of
condensing data, such as when we work out the ‘average’ of a whole mass of
numbers. Most common measures of central tendency are mode, median, and mean,
(Davis, Pecar and Santana,2014).
3.2.1. The arithmetic mean

The most typical indicator of central tendency is the mean, also known as the average.
Most of you are probably already familiar with the idea because it is frequently used
to summarize academic success, i.e., your average or mean mark is usually calculated
at the end of each semester. The mean can be used as an indicator for the "middle"
or "center" of a set of values, like all measures of central tendency.
3.2.1.1 The arithmetic mean for ungrouped data

The most widely used average; the arithmetic mean is defined as the sum of the
observations divided by the number of observations. The formula for computing
the arithmetic mean is:
Sum of all observations

Mean =
Total number of the observations
The notations for the mean depend on whether the observations under consideration
are from a sample or a population. When the mean is calculated from sample
observations, it is denoted by x (pronounced ‘x bar’) and when it is calculated from
the entire population, it is denoted by μ (pronounced ‘mu’).

NOTE: Population is the entire set of observations. Sample is a representation or

portion of the population.
Population mean, μ
The population mean is given by:
x i
= i =1
N
μ = Population mean.
N = Population size.
xi = The ith observation for the random variable x.
 = Symbol for ‘the sum of’.

N
x
i=1
i = Sum of all the observations starting from the first (i=1) to the last (i = N).
Sample mean, x
The sample mean x , is given by
x
i =1
i
x=
n
Where:
x = Population mean.
xi = The ith observation for the random variable x.
 = Symbol for ‘the sum of’.

N
x
i=1
i = Sum of all the observations starting from the first (i=1) to the last (i = N).

Example 3.1
The number of workdays lost due to illness in a business per week is given below
(for a 10-week period):
36 28 33 29 28 32 33 33 34 32
Calculate the mean number of days lost per week during the above period.
Solution
Number of observations, (assuming that this is the population size):
=
x i
=
36+28+33+29+28+32+33+33+34+32
=
320
= 32
N 10 10
Therefore, the mean working days lost per week due to illness are about 32 days.
3.2.2. The arithmetic mean for grouped data

It is not possible to calculate the exact mean of the original data in a grouped frequency
distribution since information is lost when the data is grouped. The mean can only be
estimated.
In this case we assume that the observations are spread evenly throughout each class
interval. This essentially means that the calculations are based on the assumption that
all observations occur at the midpoint (m) of their class, so the formula for the
calculation of the mean from a frequency distribution may be used. That is, in the
equation for calculating the mean from a frequency distribution, we replace x by fm.
This yields the n formula:
f m i i
x = i =1
k
f
i =1
i

x = Sample mean.
mi = The midpoint of the ith class interval.
fi = The frequency of the ith class interval.
fi mi = The product of the midpoint mi of the ith class interval and the
frequency, f of the ith class interval.
k
f m
i =1
i i = Sum of all the observations in the sample.
f
i =1
i = Sum of the frequencies.
k = The number of class intervals.
Example 3.2
In mid-winter 2018 in Eswatini, all common colds on a group of locals wore off in less
than 10 days. A summary of the duration of these colds are shown in the grouped
frequency distribution table 3.1 below:
Table 3.1.: The duration of colds In Eswatini
DURATION OF COLD (DAYS) NUMBER OF CASES
[0, 1) 0
[1; 2) 3
[2; 5) 8
[5; 10) 4
Total 15
Solution
Construct a grouped frequency distribution table as shown in Table 3.2 below:

Table 3.2 Calculation of the mean duration of colds In Eswatini
DURATION OF COLD NUMBER OF CASES MIDPOINT mi Fimi

DAY fi
[0, 1) 0 0.5 0
[1; 2) 3 1.5 4.5
[2; 5) 8 3.5 28
[5; 10) 4 7.5 30

k
Total 15 f m i i
= 62.5
i =1
f m i i
0 + 4.5 + 28 + 30 62.5
x= i =1
k
= = = 4.2
15 15
f i =1
i
Using the equation for calculating the mean from a grouped frequency distribution,
Thus, the common cold lasted a mean of 4.2 days.
3.1.2 Geometric mean
Under certain conditions, other types of means are used. For example, the geometric
mean is used in economic data to average ratios or rates of change. On some
occasions, when we are dealing with quantities that change over a period of time, we
would like to know the rate of change. Examples may include the mean growth rate of
savings over several years and the ratios of annual price fluctuations. In these
circumstances, the arithmetic mean would be misleading and an alternative measure,
known as the geometric mean should be used.
The geometric mean of n observations is the n the root of their product.
If there are n observations x1 , x2 , x3 , ..., xn , the geometric mean is given by:
𝐺𝑒𝑜𝑚𝑒𝑡𝑟𝑖𝑐 𝑚𝑒𝑎𝑛 = 𝑛√𝑥1 × 𝑥2 × 𝑥3 × ⋯ × 𝑥𝑛
1
= (𝑥1 × 𝑥2 × 𝑥3 × ⋯ × 𝑥𝑛 )𝑛

Example 3.3
The estimated change in population as at 30 June in the five years from 1998 to
2002 for South Africa is shown in Table 3.3:
Table 3.3: Changes in the number of the South African population
YEAR PERCENTAGE CHANGE

1998 1.250
1999 1.056
2000 1.203
2001 1.109
1998 1.250
Calculate the geometric mean.
Solution
The geometric mean is given by:
n x1  x2  x3  x 4 ....  xn
= 5 1.250  1.056  1.203  1.109  1.250
= 5 2.201 130 955

=1.170 944 264
Therefore, the geometric mean of the percentage change is 1.171

NOTE:
Properties of the mean
The mean is very popular measure of central tendency and is widely used
because it is easy to understand and easy to calculate.
• The mean can only be used for quantitative data: We can only calculate
the mean of numerical, quantitative data. It does not apply to qualitative
data, or even qualitative data that has been numerically coded.
• The mean makes use of all observations: When calculating a mean,
whether it is a sample mean or a population mean, we make use of every
available observation value. This implies that the mean makes use of all
available information to determine the central tendency of the data (in fact,
of the three measures of central tendency, it is the only one that makes use
of all available information).
• Each variable in a data set will have only one mean value: The mean is
unique to a collection of data, in that there can only be one mean value per
variable in an observed data set.
• The mean is sensitive to outliers: Unfortunately, since the mean is based
on all data points, it can easily be influenced by values that have been
erroneously recorded or are simply unusually large or small. These
extreme values are called outliers and they have a tendency to skew the
data distribution. In data sets that have outliers, we would use a different
measure to calculate the value of central tendency. For example, in the
number of workdays example if we included an eleventh observation, and
that the number of days as recorded was 182 (which is very large
compared to other data entries), we would find that the new dataset mean
would be:
N
x i
= i =1
=
36+28+33+29+28+32+33+33+34+32+182
=
502
= 45.63 ≈ 46
N 10 11
Clearly, introducing this outlier significantly increase the sample mean, (Davis,
Pecar and Santana, 2014).

3.2.2 The Mode
The mode is defined as the value that occurs most frequently in a data set. It can be
used for both quantitative and qualitative data variables
3.2.2.1 The Mode for ungrouped data
Finding the entry that appears the most often is all that is required to determine the
mode of ungrouped data. If two entries occur with same greatest frequency each
entry is a mode. This is known as bimodal. If no entry is repeated data set has no
mode.
Example 3.4
Find the mode of the following data set:
Solution
3 6 4 12 5 7 9 3 5 1 5
The most frequently occurring number is 5 (which occurs three times). Hence the
mode, Mo = 5.
Sometimes, a set of data may have two modes if there are two numbers that appear
more than any of the others and they appear the same number of times. For example,
if the number 3 was added to the above set, the new set would look like:
3 6 4 12 5 7 9 3 5 1 5 3
The numbers 5 and 3 appear three times. In this particular case, the modes M0 = 5
and 3.

3.1.3.2 Calculation of the mode from grouped data
It is not possible to calculate the exact value of the mode of the original data in a
grouped frequency distribution since information is lost when the data are grouped.
However, it is possible to make an estimate of the mode.
The class frequency with the largest frequency is called the modal class. The
estimate of the mode itself is given by the equation below:
𝑓𝑚𝑜𝑑𝑒 − 𝑓𝑏𝑒𝑙𝑜𝑤
𝑀𝑜 = 𝐿 + ( )𝐶
2𝑓𝑚𝑜𝑑𝑒 − 𝑓𝑏𝑒𝑙𝑜𝑤 − 𝑓𝑎𝑏𝑜𝑣𝑒
Where:
Mo = Mode.
𝐿 = Lower limit of the modal interval.
c = The width of the modal interval.
𝑓𝑚𝑜𝑑𝑒 = Frequency of the modal interval.
𝑓𝑏𝑒𝑙𝑜𝑤 = Frequency of the interval preceding the modal interval.
𝑓𝑎𝑏𝑜𝑣𝑒 = Frequency of the interval following the modal interval.
The modal formula weights (‘pulls’) the modal value from the midpoint position
towards the adjacent interval with the higher frequency count. If the interval to the
left of the modal interval (preceding the modal interval) has higher frequency count
than the interval to the right of the modal interval (following the modal interval),
then the modal value is pulled down below the midpoint value, and vice versa,
(Croucher,2016).
Example 3.5
A courier company recorded 30 delivery times (in minutes) to deliver parcels to their
clients from its depot. The data is summarised in the grouped frequency distribution
below. Calculate the mode.

Table 3.4: Calculation of modal courier delivery time
Time FREQUENCY
(Days) ( fi )
[10; 20) 3
[20; 30) 5
[30; 40) 9
[40; 50) 7
[50; 60) 6
k
Total
f
i =1
i = 30
Solution
From the frequency distribution, the modal interval (interval with the highest
frequency) is [30; 40) minutes. The midpoint of 35 can be used as an approximate
modal courier delivery time.
To calculate a more representative modal value, apply the modal formula with:
𝐿 = 30
c = 10
𝑓𝑚𝑜𝑑𝑒 = 5
𝑓𝑏𝑒𝑙𝑜𝑤 =5
𝑓𝑎𝑏𝑜𝑣𝑒 = 7
𝑓𝑚𝑜𝑑 − 𝑓𝑏𝑒𝑙𝑜𝑤
𝑀𝑜 = 𝐿 + ( )𝐶
2𝑓𝑚𝑜𝑑𝑒 − 𝑓𝑏𝑒𝑙𝑜𝑤 − 𝑓𝑎𝑏𝑜𝑣𝑒
(9 − 5)
𝑀𝑜 = 30 + × 10 = 30 + 6.67 = 36.67 minutes
(2(9) − 5 − 7)
Thus, the most common courier delivery time form depot to customers is 36.67
minutes.

3.2.3 The Median
The median is another measure of central tendency and is denoted by 𝑥̃ . It may be

described as the ‘middle’ observation in a set. In fact, the median is defined as the
centre value when the data are arranged in order of magnitude. That is, 50 % of the
data have a value less than the mean and 50 % of the data have a value more than
the mean, (Brechner, 2020).
3.2.2.1 Calculation of the median from ungrouped data
The method of calculating the median from a set of raw data (once it has been
arranged in ascending order of magnitude) is as follows:
𝑛+1 𝑡ℎ
If n is the number of observations, the median is the value of the ( ) observation
2
This is the same as if:

𝑛 𝑡ℎ
1 n is even, the median is average of the ( ) observation and the
2
𝑛 𝑡ℎ
( 2 + 1) observation, i.e., adding these numbers and dividing by two.
2. n is odd, the median is the middle observation.
Example 3.6
The manager of a hairdressing salon was interested to know how long her customers
had to wait before getting their hair cut and styled. On a certain day she recorded the
waiting times (in minutes) of 15 randomly chosen customers. They times were as
follows:
14 28 36 15 29 16 9 40 16 21 36 17 4 15 22
Find the median waiting time.

Solution
First, arrange the data in increasing order of magnitude.
4 9 14 15 15 16 16 17 21 22 28 29 36 36 40
The median waiting time is the:

th th th
 n + 1  15 + 1   16 
 2  =  2  =  2  = 8 observation
th
     
The 8th observation = 17 minutes
NB: In this case, n =15 (this is an odd number), the middle value is the 8th observation
which is 17 minutes.
Example 3.7
For the data recorded in example 3.6 above, suppose, for some reason that the
manager realized that the last waiting time was not worthy to be recorded. What is the
median waiting time for the 14 observations recorded?
Solution
The raw data will be as follows:
14 28 36 15 29 16 9 40 16 21 36 17 4 15
Arranging the data in ascending order gives the following array:

4 9 14 15 15 16 16 17 21 28 29 36 36 40
 n + 1   14 + 1   15 
th th th
The median waiting time is the   =  =   = 7.5th observation

 2   2   2
The 7.5th observation (median) is between the 7th and the 8th observations.
The 7th observation = 16
The 8th observation =17
The median is the average of the 7th and the 8th observations and is calculated as
follows:
Median is given by:
16 + 17
x= = 16.5
2
Therefore, the median waiting time is 16.5 minutes
th th
n n 
NB: The median is the mean of the   observation and the  + 1 observation
2 2 
3.2.2.3 Calculation of the median from a grouped data

It is not possible to calculate the exact value of the median of the original data
in a grouped frequency distribution since information is lost when the data is
grouped. However, it is possible to make an estimate of the median. The class
interval that contains the median is called the median class.
The estimate of the median itself is given by the equation below:

(𝑛 + 1)
𝐶( 2 − 𝐹𝑏𝑒𝑙𝑜𝑤 )
𝑋̃ = 𝐿 +
𝑓𝑚𝑒𝑑
Where:
L = Lower limit of the median interval.
C = The median class width.
n = The total frequency (number of observation).
𝑓𝑚𝑒𝑑 = The frequency count of the median interval.
𝐹𝑏𝑒𝑙𝑜𝑤 = The cumulative frequency below the median class (i.e. the number of
observations less that of the lower bound of the ‘median class’).
𝐹𝑏𝑒𝑙𝑜𝑤 can also be described as the cumulative frequency count of all intervals before
the median class interval.
Example 3.8
Refer to example 3.5 for the problem description and Table 3.5 for the sample data
of 30 delivery times that have been summarised into a grouped frequency distribution.
Find the median delivery time of parcels to client from the courier service’s depot.
Table 3.5: Calculation of the median parcel delivery time
Time FREQUENCY Cumulative

(Days) ( fi ) f(<)
[10; 20) 3 3
[20; 30) 5 8
[30; 40) 9 17
[40; 50) 7 24
[50; 60) 6 30
Total k
f
i =1
i = 30
Estimate the median delivery time of parcels to clients by this courier company.

Solution
30+1 𝑡ℎ
Since n = 30, the median delivery time will lie in the ( ) = 15.5𝑡ℎ ordered data
2
position. The 15th data value falls in the [30; 40) minutes interval. An approximate
median delivery time for parcels is therefore 35 minutes (the interval midpoint).
However, a more representative median value can be found by using the formula of
the median for grouped frequency distribution data where:
L = 30 minutes
C = 10 minutes
n = sample size (number of observations)
𝑓𝑚𝑒𝑑 = 9 deliveries
𝐹𝑏𝑒𝑙𝑜𝑤 = 8 deliveries
(𝑛 + 1)
𝐶( − 𝐹𝑏𝑒𝑙𝑜𝑤 )
2
𝑋̃ = 𝐿 +
𝑓𝑚𝑒𝑑
30 + 1
10 ( − 8)
𝑥̃ = 30 + 2 = 30 + 8.33 = 38.33 minutes
9
Thus, the median parcel delivery time is 38.33 minutes. This means that half the
deliveries occurred within 38.33 minutes while the other half took longer than 38.33
minutes.
3.2.3 The percentiles and quartiles for raw data
Quartiles divide data into four equal parts and are often used with scores for aptitude
tests, examinations and other testing situations. They are also used in commerce and
industry when a large number of observations are involved.
If the data are divided into four equal parts, the points of separation are:
1. First quartile — Q1
There are 25% of the observations below Q1 and 75% above Q1

Q1 is sometimes called the lower quartile.
2. Second quartile — Q2
There are 50% of observations below Q2 and 50% above Q2 .

Q2 is obviously the median mean, and hence the former name is seldom
used.
3. Third quartile — Q3
There are 75% of observations below Q3 and 25% above Q3 .
Q3 is often called the upper quartile.
The following rules should be applied when calculating quartiles in small
samples:
Let n = the number of observations. The positions of the quartiles are given
as follows:
th
 n + 1
Q1 is the   position
 4 
 2 (n + 1) 
th
 4 
 3 (n + 1) 
th
 4 
To determine these positions, first sort the data in ascending order. The values
of the positions can take either whole numbers or decimal numbers.
4. If the value of the position is a whole number, simply take the observation
with that position.
5. Suppose the positions for Q1, Q2 and Q3 are not a whole number. Then
we use interpolation to find the values of Q1, Q2 and Q3 .
Similarly, the percentiles divide a sample into 100 equal-sized groups and the xth
percentile is defined as the value in the data set such that x% of the observations are
smaller that it, and (100 - x)% of the observations are larger than it. It is denoted by
Px. For example, the 33% percentile is denoted by P 33.
The quartiles (and median) are simply special cases of percentiles. For example, the
50% percentile is the same as the median (or second quartile). i.e., P50 = Q2 = 𝑥̃.
As we can express all of these values in terms of percentiles, we will only focus on
calculating the percentile values and then use the above relationships to determine

the quartiles and the median.
Calculating these percentiles is similar to the calculations we use to determine the

median expect they use different index values for each one.
To determine the value of the xth percentile in a data set, it is important to fist sort or
order the data set form the smallest value to the largest value. Once sorted, we can
obtain the xth percentile, Px using the expression:
𝑥
𝑃𝑥 = 100 (𝑛 + 1)𝑡ℎ value in the ordered data set.
15
For example, in a sample of n =30, the 15% percentile, P15 will be the 100 (30 + 1) =
4.65th value in ordered data set.
𝑥
As with the median, the index value (𝑛 + 1) can result in a fraction. In order to
100
determine the correct value associated with the index, you will need to carry out further
steps.
𝑥
The index value (𝑛 + 1) is a fraction value number that can be broken down into
100
two parts:
• A ‘whole number’ part (for example 0,1,2,3,… etc.) denoted by w.
• A ‘decimal’ part (for example, 0.12, 0.67, 0.78, etc.) denoted by d.
For example, the index number 7.85 can be broken up into w = 7 and d =0.85 (note
that w +d =7 +0.85 =7.85). We can use this partitioning of the index value to find the
percentile value for a set of ordered data value X1,X2,X3,….,Xn as follows.
𝑥 𝑥
If index number (𝑛 + 1) is equal to fraction with value w +d, then the (𝑛 + 1)𝑡ℎ
100 100
value in the ordered data set (i.e. xth percentile) is given by the following:

𝑃𝑥 = 𝑋𝑤 + 𝑑 × (𝑋𝑤+1 − 𝑋𝑤 )
Where”
𝑋𝑤 = the wth ordered value in the data set
𝑋𝑤+1 = The (w +1)th ordered value in the data set
𝑥
w and d = the whole number and decimal part of the index number 100 (𝑛 + 1),
respectively
Example 3.9
Below are the marks obtained by 13 students in their Statistics examination.

Determine the sample 25th percentile (Q1), the 33rd percentile (P33) and the third quartile
of the data.
52 60 36 27 24 85 60 90 48 95 53 52 53
Solution
First the data is ordered form the smallest value to the largest value.
24 27 36 48 52 52 53 53 59 60 85 90 95
25
We obtain Q1 from this sorted data set by first calculating the index (𝑛 +
100
25 25
1)=100 (13 + 1) = 100 (14) = 3.5
Note that we can express this index in terms of a whole number, w =3 and the decimal
number d =0.5.
Now, since w =3 and w +1 = 4, we must find the values 𝑋𝑤 =𝑋3 = 36 and 𝑋𝑤+1 = 𝑋4 =
48. From the formula, the 25th percentile is then:
𝑄1 = 𝑃25 = 𝑥𝑤 + 𝑑 × (𝑋𝑤+1 − 𝑋𝑤 )

= 𝑋3 + 0.5 × (𝑋4 − 𝑋3 )
= 36 + 0.5 × (48 − 36) =42
The first quartile Statistic examination mark is 42%.
Similarly, to obtain the 33rd percentile, we first determine the index value
33 33 33
(𝑛 + 1)= (13 + 1) = (14) = 4.62. The index number expressed in terms of a
100 100 100
whole number and a decimal number is w +d =4 +0.62, i.e w =4 and d =0.62. Note that
𝑋𝑤 =𝑋4 = 48 and 𝑋𝑤+1 = 𝑋5 = 52. Then 33rd percentile is:
𝑃33 = 𝑥𝑤 + 𝑑 × (𝑋𝑤+1 − 𝑋𝑤 )
= 𝑋4 + 0.62 × (𝑋5 − 𝑋4 )
= 48 + 0.62 × (52 − 48) = 50.48
3.2.4 Percentile and quartile for grouped data
Determining the values of individual percentiles from a grouped frequency distribution

data table is almost identical to the method used to find the median values from
grouped frequency distribution data.
To estimate the value of the percentile, Px, begin by first determining the ‘percentile
𝑥
class’ i.e., the class that contains (𝑛 + 1)𝑡ℎ value of the data set. Once you have
100
found the percentile class, you can use the formula below to estimate the value of the
xth percentile in the frequency table.
𝑥
(𝑛 + 1) − 𝐹𝑏𝑒𝑙𝑜𝑤
𝑃𝑥 = 𝐿 + ( 100 )×𝐶
𝑓𝑥
L = Lower limit of the class the percentile class.
C = The percentile class width.
n = The total frequency (number of observation).
𝑓𝑥 = The frequency count of the percentile class.
𝐹𝑏𝑒𝑙𝑜𝑤 = cumulative frequency below the percentile class

Example 3.10
To calculate the 10th percentile for example 3.8 we find that x =10, n = 30 and the
10
position of the 10th percentile is the 100 (30 + 1)𝑡ℎ = 3.1𝑡ℎ number in the data set. The
percentile class is then [20; 30) and the elements of the formula above are then L =20,
C = 10, 𝐹𝑏𝑒𝑙𝑜𝑤 = 3 and 𝑓𝑥 = 5. The 10th percentile is then:
𝑥 10
(𝑛 + 1) − 𝐹𝑏𝑒𝑙𝑜𝑤 (30 + 1) − 3
𝑃10 =𝐿+(100 ) × 𝐶 = 20 + (100 ) × 10 = 20.2
𝑓𝑥 5
3.2.5 Skewness and the shape of distributions
A frequency distribution can assume any one of a large number of shapes. The
skewness or the shape of distributions can be measured by comparing the relative
positions of the mean, median and mode.
The most common shapes of a frequency distribution are:
• Symmetric.
• Uniform (or rectangular).
• Skewed.
A symmetric distribution is identical on both sides of its central point. If the distribution
is symmetrical:
Mean = Median = Mode.
An example of a symmetric distribution is shown in Figure 3.1

Figure 3.1: Symmetric distribution
A uniform or rectangular distribution is a symmetric distribution with equal or

approximately equal frequencies for each class. An example of a uniform distribution
is shown in Figure 3.2 below:
Figure 3.2: Histogram with uniform distribution
A skewed distribution is non-symmetrical with the tail on one side longer than the
tail on the other side. It can be skewed to the left or to the right.
• A skewed-to-the-right distribution (positively skewed) has a longer tail on the right

side. If the distribution is skewed to the right the mean is to the right of the median.
This means that:
Mode < Median < Mean.

An example of positively skewed distribution is shown in Figure 3.3
Figure 3.3: Distribution skewed to the right
A skewed-to-the-left distribution (negatively skewed) has a longer tail on the

left side. If the distribution is skewed to the left the mean is to the left of the median.
This means that:
Mean < Median < Mode.
An example of negatively skewed distribution is shown in Figure 3.4
Figure 3.4: Distribution skewed to the left

3.3 Measures of Dispersion or spread
As a rule of thumb, when central tendency is reported, so should be the measures of

dispersion or spread. Once you have your central tendency it is easy to ascertain the
degree with which cases are dispersed around it. Dispersion (or spread) refers to the
extent to which the data values of a numeric random variable are scattered about their
central location value, (Wegner, 2012).
Previously, the concept of central location was introduced. The variability among data
is one characteristic to which averages are not sensitive. It is possible to have two
datasets with identical measures of central location but with wider spreads of data.
Once again, the level of measurement will determine how we gauge the spread of a
distribution. Table 3.6 summarises the levels of measurement and their measures of
dispersion.
Table 3.6: Levels of measurement and corresponding measure of dispersions
Use of Measure dispersion

Index of Range Interquartile Std
Dispersion Range Deviation
Nominal Yes No No No
Ordinal Sometimes Sometimes Yes No
Interval/Ratio No Yes Yes Yes
The best ways to analyse the spread of the distribution for each level of measurement
is as follows:

Table 3.7: levels of measurement their appropriate graphical representation
Level of Representation
measurement
Nominal Table or frequency distribution showing frequencies
Ordinal Tables/frequency distribution, but choosing a single measure is
problematic. Use interquartile range if single measure chosen.
Interval/Ratio Graphic dispersion, standard deviation provided, cases have an
approximately normal distribution.
When there is a possibility that the underlying distribution may not be normal,
interquartile range is a good alternative.
EXAMPLE 3.11
Consider two groups of data:
Dataset A Dataset B
65 42
66 54
67 58
68 62
71 67
73 77
74 77
77 85
77 93
77 100
Computed measures of central
Mean = 71.5locationMean = 71.5
Median = 72 Median = 72
Mode = 77 Mode = 77
Although there is no difference in the computed central measures between the two

groups, the scores of dataset B are much more widely scattered than those of dataset
A. The measures that are used to measure dispersion are:
• Range.
• Variance.
• Standard deviation.
• Interquartile range.
• Quartile deviation.
3.3.1. The Range
The simplest measure of dispersion is the range, which is defined as simply the
difference between the largest and smallest values in a set of data.
The range is useful for situations like daily temperature fluctuations or share price
movements. It is considered primitive as it considers only the extreme values which
may not be useful indicators of the bulk of the population.
The formula is:
Range = largest observation - smallest observation
Range = xmax – xmin
Note: If the largest observation = the smallest observation, the range is zero.
Example 3.12
The observed temperatures (in degrees Celsius) in a certain country for a week
are given in Table 3.8 below:

Table 3.8: Observed temperatures
DAYS TEMPERATURE (ºc)

Day one 22
Day two 26
Day three 21
Day four 18
Day five 30
Day six 24
Day seven 27
Calculate:
3.12.1. The mean week temperature
3.12.2. The range of the temperatures
Solution
3.12.1. The formula to calculate the mean is:

22 + 26 + 21 + 18 + 30 + 24 + 27 168
𝑥̅ = = = 24
7 7
Hence the mean week temperature is 24oC.
3.12.2. Range= Highest temperature – lowest temperature = 30 oC.– 18 oC.=
12 oC.
3.3 Standard deviation
Standard deviation is the measure of spread most commonly used in statistics when
the mean is used to calculate central tendency. The variance and standard deviation
provide a measure of how dispersed the data values (x) are about the mean value (𝑋̅).
Because of its close links with the mean, the standard deviation can be greatly affected
if the mean gives a poor measure of central tendency.
If we calculated for each data value, (𝑋 − 𝑋̅) then some would be positive and some
negative. Thus, if we were to sum all these differences then we would find that
∑(𝑋 − 𝑋̅ ) i.e., the positive and negative values would cancel out. To avoid this

problem we would square each individual difference before undertaking the

summation
3.3.1 Standard deviation of a population (𝝈 )
The aim is basically to find an ‘average’ measure of each observation away from the
mean of the set of observations. If the actual residuals are used (ie. retaining their
signs), their mean is always zero and therefore quite useless. If we take the mean of
the absolute values of the residuals, we obtain the mean deviation with the
troublesome absolute value signs.
An alternative approach is to work with the squares of the residuals. This will
eliminate the effect of the signs, since squares of numbers cannot be negative. A
first step is therefore to find the sum of the squares of the residuals, and then to find
the mean. To take into account the fact that we have squared the residuals, we take
the square root of this mean. The result of this technique is to find the population
standard deviation, which is denoted by the Greek symbol sigma, denoted  . In
most cases the standard deviation is derived from the variance (  2 ) which is given
by:
∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2
𝜎2 =
𝑁
∑𝑁
𝑖=1(𝑥𝑖 − 𝜇)
2
𝜎=√
𝑁
= √𝜎 2 = 𝜎 = √𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 variance
Where:
2 = Variance.
xi = ith observation.
 = Population.


i =1
= Sum of, starting from i =1 up to i = N.
The population standard deviation is the square root of the population variance.
NOTE: The square root of a given number is the same as raising that number to
1
1
( ) = ( )
0.5
the power of = 0.5,  = 2 = 2 2 2
The square of the population standard deviation is called the variance.
3.3.2 Standard deviation of a sample (s)
In practice, most populations are very large, and it is more common to calculate the
sample standard deviation (denoted by s) rather than the population standard
deviation.
The formula is quite similar to that for calculating the population standard deviation.
However, instead of dividing by N as above, we divide by n-1, i.e. we divide by one
less than the total number of sample observations. For the mean we now use 𝑥̅
instead of 𝜇.
As in the population, the sample standard deviation (s) is also derived from the
sample variance (s2) which is given by:
n
(x − xi )
2
i
s2 = i=1
n−1
x i
2
− nx 2
= i=1
n−1

Where:
s2 = Sample variance.
xi = ith observation.
xi = Sample mean.
n = Number of observations in the sample (also called the sample size).

n
 i =1
= Sum of, starting from i up to i = 1 up to I = n.
The simplest way to calculate the variance is to use the statistical mode of your
scientific calculator and then recall the values and substitute them into the following
formula:
x i
2
− nx 2
s2 = i=1
n−1
The sample standard deviation (s) is given by:
∑𝑛𝑖=1(𝑥𝑖 − 𝑥̄ 𝑖 )2
2
𝑠 =
𝑛−1
=√𝑠 2 =√𝑠𝑎𝑚𝑝𝑙𝑒 variance
1
(𝑠𝑎𝑚𝑝𝑙𝑒 variance)2
3.3.3 Method to calculate sample variance:

Entering the values into the calculator can simplify this method further. But let
us go through the method which follows step by step in calculating the variance
(step 1 to 5). Step 6 shows how to find the standard deviation from the variance
1. Calculate the mean i.e. Find x i
2. Calculate the residual for each x i

i.e. Find ( x i − xi )
3. Square the residuals ( xi − xi )
Find ( xi − xi )
2
i.e.
4. Calculate the sum of the squares

n
(x − x )
2
i.e. Find i i
i =1
5. Divide the sum by (n-1)

n
(x − xi )
2
i
i.e. Find i =1
n−1
6. To find the standard deviation take the square root of the above quantity
(Step 5)
n
(x − x )
2
i i
i.e., Find i=1
n −1
NOTE: To make calculations easier and avoiding complications, we

first find the variance and then the square root at the end in order to obtain
the standard deviation.
Example 3.13
A market researcher was interested in the discrepancy in the prices charged by

supermarkets for a leading brand of pet food. To check this, he selected a random
sample of 12 stores and recorded the price displayed for the same brand of a 400 g
can. The prices (in U.S cents) were:
89 72 77 78 82 94 76 78 73 80 88 85]
Calculate the:
3.13.1 Mean price.

3.13.2 Range of the prices.

3.13.3 Standard deviation of the prices.
Solution
3.13.1 The mean price x is given by

n
x 1
x= i=1
n
89 + 72 + 77 + 78 + 82 + 94 + 76 + 78 + 73 + 80 + 88 + 85 972
= = = 81
5 12
Therefore the mean price is 81 cents.
3.13.2 To find the range it is wise to first arrange the values in ascending
order as follows:
72 73 76 77 78 78 80 82 85 88 89 94
The range of the prices is:

Range = xm ax - xm in = 94 -72 =22
Therefore the range of the prices is 22 cents.
3.13.3 To calculate the mean deviation and the standard deviation, we use
either of the following two methods:
Method 1:
We construct a Table 3.9 as follows:

Table 3.9: Calculation of the standard deviation for raw data first method
PRICE (CENTS) RESIDUAL SQUARE OF RESIDUAL
xi − x ( xi − x )
2
xi
89 8 64
72 -9 81
77 -4 16
78 -3 9
82 1 1
94 13 169
76 -5 25
78 -3 9
73 -8 64
80 -1 1
88 7 49
85 4 16
n n n
 xi  ( xi − x ) (x − x)
2 2
=0 i = 504
i =1 i=1 i=1
After constructing the table, the variance s2 is given by:

n
(x − xi )
2
i
64 + 81 + 16 + .. + 1 + 49 + 16 504
s2 = i=1
= = = 45.818
n −1 12 − 1 11
Hence the sample standard deviation is given by:

n
(x − x )
2
i i
s= i=1
n −1
= s2
= 45.818 = 6.769
Therefore, the standard deviation is 6.8 cents.
Method 2:
Method to calculate the sample variance using method 2

x i
2
− nx 2
s2 = i=1
n −1
2
Add a column of xi as shown in Table 3.10 below:
Table 3.10: Calculation of the standard deviation for raw data using the second method
PRICE (CENTS) SQUARES
xi xi2
89 7 921
72 5 184
77 5 929
78 6 084
82 6 724
94 8 836
76 5 776
78 6 084
73 5 329
80 6 400
88 7 744
85 7 225
n n
x
i=1
i = 972 x
i=1
i
2
= 79 236
Substituting the values into the formula:

n
x i
2
− nx 2
s2 = i=1
n −1
79 236 -12 ( 81)
2
=
12 − 1
504
= = 45.818
11
The sample standard deviation is given by:

n
x i
2
− nx 2
s =
2 i=1
n −1
s2 = 45.818 = 6.769

That is the sample standard deviation is 6.8 cents.

This also gives the same value as the first method above.
3.3.4 Calculation of the standard deviation from a grouped data
As in calculating the mean deviation, when calculating s from a grouped frequency

distribution we should assume that the observations in each class interval are
concentrated at the midpoint of the interval. The procedure below can then be
followed.
As with the previous examples, we start by calculating the variance of the grouped
data and then find the square root to obtain the standard deviation.
The formula for estimating the variance from a grouped frequency distribution is
given by:
 f (m − xi )
2
i i
s2 = i=1
k
fi=1
i −1
 f (m − xi )
2
i i
s= i=1
k
f
i=1
i −1
 fm i i
2
− nx 2
s2 = i=1
k
f
i=1
i −1
 fm i i
2
− nx i2
s= i=1
k
f
i=1
i −1

xi = The estimated mean of the sample.
mi = The midpoint of the i th class interval.
fi = The frequency of the i th class interval.
k = Number of class intervals.
Example 3.14
One of the ways in which the market value of a used motor vehicle can be estimated
is by using the prices that vehicles of the same type bring in at auctions. Each vehicle
may be classified to be in one of several conditions (e.g. quite good; good; very
good). Much of the guesswork is taken away from car sale yards and insurance
companies, since all they have to do is consult a book (which is usually updated
each month) to find the market value for any make, model and year of any vehicle
in a particular condition. Suppose that we had the task of estimating the current
market value of a six-year-old small sedan of which 50 had been sold at auction the
previous month.
The data supplied are shown in the Table 3.11 below:
Table 3.11: Market value of six year old small sedan
SALE PRICE (R) NUMBER SOLD

[8 000; 8 500) 4
[8 500;9 000) 10
[9 000; 9 500) 21
[9 500;10 000) 12
[10 000;10 500) 3
Total 50
From the sale prices presented in the table above calculate:

3.14.1 The mean.
3.14.2 Standard deviation.

Solution
3.14.1 From the equation for calculating the mean from a grouped
frequency distribution. The required calculations are summarized in the
Table 3.12 below. Note that the midpoints of each class
i nterval (mi) are found by finding the mean of the two endpoints
of the interval
Table 3.12: Calculation of the mean from a grouped data
SALE PRICE(R) FREQUENCY MIDPOINT

Class interval fi mi fimi
[8 000; 8 500) 4 8 250 33 000

[8 500;9 000) 10 8 750 87 500
[9 000; 9 500) 21 9 250 194 250
[9 500;10 000) 12 9 750 117 000
[10 000;10 500) 3 10 250 30 750
K =5 50 462 500
k k
 fi = 50
i=1
 fm
i =1
i i = 462 500
The sample mean is given by:

k
 fm i i
462 500
x= i=1
k
= = 9 250
50
f
i =1
i
Therefore, the mean sale price is R9 250.
3.14.2 Method 1:
The required calculations for the standard deviation using the first
method are summarized in the Table 3.13 below.
Let us calculate the grouped variance using the first method which is given by
formula:

Table 3.13: Calculation of the standard deviation from a grouped data using
the first method
Class interval fi mi fimi mi − x

(m − x) fi ( mi − x )
2 2
[8 000; 8 500) 4 8 250 33 000 -1 000 1 000 000 4 000 000

[8 500;9 000) 10 8 750 87 500 -500 250 000 2 500 000
[9 000; 9 500) 21 9 250 194 250 0 0 0
[9 500;10 000) 12 9 750 117 000 500 250 000 3 000 000
[10 000;10 500) 3 10 250 30 750 1 000 1 000 000 3 000 000
K =5 50 462 500 12 500 000
k
 f (m − x )
2
i i i
s2 = i=1
k
f −1
i=1
i
 f (m − x )
2
From the table i i i = 12 500 000 hence,
i=1
 f (m − x )
2
i i i
12 500 000
s2 = i=1
= = 255102.040 8
k
50 − 1
f −1
i=1
i

k
 f (m − x )
2
i i i
12 500 000
s= i=1
= = s2 = 255102.040 8 = 505.076
k
50 − 1
f −1
i=1
i
Therefore, the standard deviation for a grouped frequency distribution is R505.08.
Method 2:
Calculate the sample variance using this method, add another column for mI2 as
shown it Table 3.14.

Table 3.14: Calculation of the standard deviation from a grouped data using the second
method
SALE PRICE (R) FREQUENCY MIDPOINT

Class
fi mi m 2i fm
i
2
i
interval
[8 000; 8 500) 4 8 250 68 062 500 272 250 000
[8 500;9 000) 10 8 750 76 562 500 765 625 000
[9 000; 9 500) 21 9 250 85 562 500 1 796 812 500
[9 500;10 000) 12 9 750 95 062 500 1 140 750 000
[10 000;10 500) 3 10 250 105 062 500 315 187 500
50 4 290 625 000
k
k  fm i i k
 fi = 50 x= i =1
k
= 9 250  fm i i
2
= 4 290 625 000
i=1
f
i =1
i
i=1
 f (m − x )
2
i i i = 12 500 000
i=1
The sample variance for this method is given by:

k
 fm i i
2
− nx 2
4 290 625 00 - 50 ( 9 250 )
2
s =
2 i=1
= =
k
50 − 1
f −1
i=1
i
12 500 000
= = 255 102.040 8
49
k
 fm i i
2
− nxi2
s= i=1
k
= s2 = 255 102.040 8 = 505.076
f −1
i=1
i
Therefore, the standard deviation for a grouped frequency distribution is

R505.08.
Interpretation standard deviation: Empirical rule

Therefore, the standard deviation for a grouped frequency distribution
is R505.08.

The standard deviation can be used to compare the variability of several

distributions and make a statement about the general shape of a distribution. If the
histogram is bell shaped (symmetric), we can use the Empirical Rule, which states:
About 68% of the data lie within one standard deviation of the mean.
About 95% of the data lie within two standard deviations of the mean.
About 99.7% of the data lie within three standard deviations of the mean.
The empirical rule for the variability of a symmetric distribution is shown in

Figure 3.5
Figure 3.5: Empirical rule
Example 3.15
Suppose that the mean and standard deviation of last year‘s mid-term test marks
are 70 and 5, respectively. If the histogram is bell-shaped then we know that:
• Approximately 68% of the marks fell between 65 and 75 (1 standard
deviation from the mean, 70 ± 5).
• Approximately 95% of the marks fell between 60 and 80 (2 deviations
from the mean, 70 ± (2 x5)).
• Approximately, 99.7% of the marks fell between 55 and 85 (3 deviations
from the mean, 70 ± (3 x 5)).

3.3.6 The Standard score
The standard score (z-score) represents the number of standard deviations a given
value (x) falls from the mean (µ).
x−
Z = (value – mean) / (standard deviation) =

EXAMPLE 3.16
In 2007, Forest Whitaker won the Best Actor Oscar at age 45 for his role in the movie
The Last King of Scotland. Helen Mirren won the Best Actress Oscar at age 61 for
her role in The Queen. The mean age of all best actor winners is 43.7, with a
standard deviation of 8.8. The mean age of all best actress winners is 36, with
standard deviation of 11.5, (Larson and Farber, 2019).
Find the z-score that corresponds to the age for the actor and actress. Then
compare the results.
Solution
• Forest Whitaker
x −  45 − 43.7
Z= =  0.15
 8.8
This is 0.15 standard deviations above the mean.
• Hellen Mirren
x −  61 − 36
Z= =  2.17
 11.5
This is 2.17 standard deviations above the mean.
3.3.7. Coefficient of Variation
The z-score corresponding to the age of Helen Mirren is more than two standard
deviations from the mean, so it is considered unusual. Compared to other Best
Actress winners, she is relatively older, whereas the age of Forest Whitaker is only
slightly higher than the average age of other Best Actor winners, (Larson and Farber,

2019).
The coefficient of variation is a measure of relative variability. It is used to measure

the changes that have taken place in a population over time, or to compare the
variability of two populations that are expressed in different units of measurement. It
is expressed as a percentage rather than in terms of the units of the particular data.
The formula for the coefficient of variation, denoted by:
S
Coefficient of variation=  100%
x
Where:
x = Sample mean.
s = Sample standard deviation.
EXAMPLE 3.17
Calculate the coefficient of variation if the mean price of a sample of pet food in
R 9250 and the standard deviation is R 505.076.
Solution
We calculate the coefficient of variation is given by:

𝑆 505.076
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 of variation= × 100% = × 100% = 5.46%
𝑥̄ 9 250
Therefore, the coefficient of variation is 5.46%.
3.4 Using Microsoft Excel to calculate data descriptors
Excel functions and the Data Analysis add-in can be used to compute all the
descriptive statistical measures of location, non-central location, spread and shape.

3.4.1 Excel functions
Below is a list of Excel functions used for descriptive statistical measures.
3.4.1.1 Central location measures
The following function keys can be used:

• Arithmetic mean =AVERAGE(data range).
• Median =MEDIAN(data range).
• Mode =MODE(data range).
• Geometric mean =GEOMEAN(percentage change data).
Note: For the geometric mean, all percentage change data must be inputted as a
decimal, (e.g. 12% increase inputted as 1.12; while an 8% decrease is inputted as
0.92).
For example:
=GEOMEAN (1.12,1.08,1.16) = 1.1195 (i.e. an 11.95% average increase).
3.4.1.2 Non-central location measures

• Minimum data value =MINIMUM(data range).
• Lower quartile =QUARTILE(data range,quart) (quart = 1 refers to Q1).
• Upper quartile =QUARTILE(data range,quart) (quart = 3 refers to Q3)
• Maximum data value =MAXIMUM(data range).
• Percentiles =PERCENTILE(data range,percent (as a decimal)).
The QUARTILE function can be used to compute all the values of the five-number
summary (i.e., min, Q1, median, Q3, max) by assigning different numeric codes to the
‘quart’ term in the function as follows: 0 = minimum data value; 1 = Q1; 2 = median; 3
= Q3; and 4 = maximum data value.
3.4.1.3 Dispersion and shape

• Variance =VARIANCE(data range).
• Standard deviation =STDEV(data range).

• Skewness =SKEW(data range)
3.4.2 Data Analysis Add-in
The Descriptive Statistics option in the Data Analysis add-in as shown in Figure 3.6
will compute all the above descriptive measures, except the quartiles.
Figure 3.6: The descriptive statistics option in Data Analysis add-in

Conclusion
This chapter discussed two categories of data descriptors namely: measures of central
tendency and measures of dispersion. In the former category were the mean, median,
and mode the latter were the range, interquartile range, and standard deviation.
Measures of central tendency describe a typical or representative score, while
measures of variability describe the spread or dispersion of scores about a central
measure. Most descriptive measures can be computed in Excel by using either
appropriate function keys or the Descriptive Statistics option in the Data Analysis add-
in.

3.1. A supporter of bicycle lanes around the city recorded the number of
bicycles that passed by her house between 7 am and 9 am for each of 10
successive weekdays. The results were:
15 18 23 22 21 18 14 20 25 12
3.1.1 Find the mean, median and range of these data.
3.1.2 Find the mean deviation.
3.1.3 Find the standard deviation.
3.2 The systolic blood pressure (in mm) was recorded for 40 salespeople at 5 pm
on a Friday afternoon. The results are shown in Table 3.15 below:
Table 3.15 The systolic blood pressure of 40 sales people
Blood pressure Frequency

mm(Hg) fi
110-under 120 5
120-under 130 9
130-under 140 15
140-under 150 11
Total 40
For the blood pressures of the salespeople estimate the

3.2.1 Mean, mode and median.
3.3. A set of data has a standard deviation 1.5 times that of the mean. What is
the value of the coefficient of variation?
3.4. A company secretary has recorded the amount of petty cash used by
the records department for each of the past 10 weeks. The amounts were (in
dollars):

32.50 56.00 48.00 43.20 78.00
21.60 34.90 66.00 40.00 18.80
For this data calculate the:
3.4.1 Range.
3.4.2 Mean deviation.
3.4.3 Interquartile range.
3.4.4 Quartile deviation.
3.4.6 Coefficient of variation.
3.5. Consider an investment whose return is normally distributed with a mean of

10% and a standard deviation of 5%.
3.5.1 Determine the z-score of earning a zero percent on this investment.

3.5.2 Find the z-score of earning 12%, when the standard deviation is equal
to 10%.
3.6 The daily electricity consumption in kilowatt hours (kWh) by a sample of 20
households in Rustenburg is recorded in Table 3.16 below. Use Excel to find
descriptive statistics for the data
Table 3.16: Daily electricity consumption for 20 households
58 50 33 51 38 43 60 55 46 43
51 47 40 37 43 48 61 55 44 35

CHAPTER FOUR:
ELEMENTARY PROBABILITY
Learning Outcomes
• Understand elementary probabilities and their usefulness in estimating the

likelihood of events occurring.
• Distinguish between mutually exclusive, independent events and their
application in assessing the likelihood of events occurring.
• Calculate conditional probabilities and their application in assessing the
likelihood of events occurring.
• Understand and use the general addition law for probabilities and its usefulness
in assessing the likelihood of events occurring.
4.1 Introduction to probability
We are now going to move away from the summary and analysis of data and look
at a new topic, probability. ‘The likelihood of rain this afternoon is fifty percent’ warns
the weather report from your radio alarm clock. ‘There’s no chance of you catching
that bus’ grunts the helpful soul as you puff up the hill. The headline on your
newspaper screams ‘Odds of Rainbow Party winning the election rises to one in
four’. There are a number of ways of analysing uncertainty. Underlying all of these
methods is, however, one concept: probability. An understanding of the concept of
probability is vital if you are to take account of uncertainty.
4.2 Basic concept
The fundamental idea of probability is captured in a number of words and

expressions, including chance, probable, odds, and so forth. In all cases we are
faced with a degree of uncertainty and concerned with the likelihood of a particular
event happening. Statistically, these words and phrases are too vague—we need

some measure of likelihood of an event occurring. This measure is termed

probability and is measured on a scale ranging between 0 and 1.
From Figure 4.1 we observe that the probability values lie between 0 and 1, with 0
representing there is no possibility of the event occurring and 1 representing the
probability that the event is certain to occur.
Figure 4.1: Probability value
In reality, the value of the probability will lie between 0 and 1. In order to determine
the probability of an event occurring data has to be obtained. This can be achieved
through, for example, experience or observation, or empirical methods.
The procedure or situation that produces a definite result (or outcome) is termed a
random experiment. Tossing a coin, rolling a dice, recording the income of a factory
worker, and determining defective items on an assembly line are all examples of
experiments. The characteristics of the random experiment are:
• Each experiment is repeatable.

• All possible outcomes can be described.
• Although individual outcomes appear haphazard.
• Continual repeats of the experiment will produce a regular pattern.
The result of an experiment is called an outcome. It is the single possible result of

an experiment; for example, tossing a coin produce ‘a head’, rolling a die gives a ‘3′.
If we accept the proposition that an experiment can produce a finite number of
outcomes, then we could, in theory, define all these outcomes.
The set of all possible outcomes is defined as the sample space. For example, the
experiment of rolling a dice could produce the outcomes 1, 2, 3, 4, 5, and 6, which
would thus define the sample space.

Another basic notion is the concept of an event and is simply a set of possible
outcomes. This implies that an event is a subset of the sample space. For example,
take the experiment of rolling a dice, the event of obtaining an even number would
be defined as the subset {2, 4, and 6}.
Finally, two events are said to be mutually exclusive if they cannot occur together.
Thus, in rolling a dice, the event ‘obtaining a two’ is mutually exclusive of the event
‘obtaining a three’. The event ‘obtaining a two’ and the event ‘obtaining an even
number’ are not mutually exclusive as both can occur together, i.e. {2} is a subset
of {2, 4, 6}.
Definitions
Probability: Probability provides a quantitative description of the likely

occurrence of a particular event.
Chance: Chance is the unknown and unpredictable element in
happenings that seems to have no assignable cause.
Uncertainty: Uncertainty is a state of having limited knowledge where it
is impossible to describe exactly the existing state or future outcome of
a particular event occurring.
Event: An event is any collection of outcomes of an experiment.
Outcome: An outcome is the result of an experiment or other situation
involving uncertainty.
Random experiment: A random experiment is an experiment, trial, or
observation that can be repeated numerous times under the same
conditions. It can also be defined as situation, procedure or activity that
will give rise to a clearly defined result.
Sample space: The sample space is an exhaustive list of all the
possible outcomes of an experiment.
Mutually exclusive: Mutually exclusive represents events that cannot
occur at the same time.
4.3. The anatomy of probability problems
Any probability can be dissected into its basic elements. Data can be obtained through,
for instance, experimentation, observation, or experience. A random experiment is a
process or circumstance that produces clearly defined results, and the results of these

experiments are their clearly defined results, which are referred to as outcomes. The
word “experiment” may be deceptive in this context since it can refer to a wide range
of methods or circumstances in which the results are not known in advance. For
example, it does not only relate to experiments carried out by researchers.
Definition
Outcomes: The potential result of a random experiment. The exact value is

unknown before the experiment but known after the experiment has been
conducted.
4.3.1 Sample space
An exhaustive list of all possible outcomes of a certain type for a given experiment is
called the sample space of an experiment. We use the symbol Ω to denote this
complete sample space. For example, consider a simple random experiment in which
we toss 2 coins. The outcome measured in the experiment is all the values that appear
face up of the two coins after they landed. The complete set of possible outcomes for
this experiment is given by:
A= {heads & heads, heads & tails, tails & heads}
This event contains three of four outcomes that appeared in the sample space Ω.
4.3.2 An event
In typical probability calculations, we are usually interested in probability that a

particular condition is satisfied or that one or more of the outcomes listed in the sample
space occurs. We call these conditions or occurrences ‘events. The formal definition
of ‘events’ is a collection of one or more outcomes from an experiment that satisfies
some predetermined conditions. We will use uppercase letters to denote this subset
or event that is A, B ,C and so on. You are probably interested in knowing the likelihood
of any one of the numerous possible outcomes in an experiment. Recalling the random
experiment where we tossed two coins, one possible outcome that we could be

interested in is if at least one head appeared in the two coins. This set of events can
be expressed as follows:
4.4 Simple Probability
If A is an event, the probability that it occurs is denoted by P(A). The probability (or
chance) that an event A occurs is the proportion of possible outcomes in the sample
that yield the event A. That is:
Number of outcomes that yield event A

P( A) =
Number of possible outcomes
EXAMPLE 4.1
What is the probability of rolling an odd number on a dice? (i.e. rolling a 1,3 or 5)?
Solution
Number of outcomes that yield event A 3 1
P(A) = = =
Number of possible outcomes 6 2
EXAMPLE 4.2
In a deck of 52 cards, what is the probability of drawing an 'ace of spades'?
Solution

In a deck of 52 cards, there are 4 aces. Out of those 4 aces there is just
one ace of spades. Let A be the event of drawing an ace of spades. Then
Number of outcomes that yield event A 1

P(A) = =
Number of possible outcomes 52
1
Therefore, the probability of drawing ‘an ace of spades’ is
52
4.4.1 Mutually exclusive events
Two events A and B are said to be mutually exclusive if they cannot occur
simultaneously (at the same time).
S
Y N
Figure 4.2: Mutually exclusive events
As you can see from Figure 4.2 above, Y and N are mutually exclusive events,
i.e., Y and N, have no points in common.
Example
A coin is tossed. If events A and B are defined as follows:
A = Outcome is a head.
B = Outcome is a tail.
Are events A and B mutually exclusive?

Solution
When you toss a coin there are just two possibilities at a time. That is, one can get
a head or tail and never both on one toss.
Since A and B cannot occur at the same time, these events are mutually exclusive.
EXAMPLE 4.3
A person is selected at random from the population. Define events A and B to be:
A = Person is a vegetarian.
B = Person is a non-vegetarian.
Are these two events mutually exclusive?
Solution
Once a person is said to be vegetarian, then already we cannot call the same person
a non-vegetarian. In other words, we cannot call one person a vegetarian and a non-
vegetarian at the same time.
Since A and B cannot occur at the same time, then the events are mutually
exclusive.
Suppose that A1; A2; A3;...; An are n mutually exclusive events. Then the following
relationship holds:
P ( A1 or A 2 or A 3 or.... or A n ) = P ( A1 ) + P ( A 2 ) + P ( A 3 ) + ... + P ( A n )
That is, the probability that events A1 or A2 or A3 or ... or An occurs is

the sum of the individual probabilities that each event occurs.

EXAMPLE 4.4
A card is drawn from a deck of 52 cards. Let events A and B be:

A = Card is a face card.
B = Card has a points value less than 6.
What is the probability that the card is a face card or has a points value less than 6?
Solution
The events A and B are clearly mutually exclusive since they cannot both occur
at the same time.
We have:
12 3
P(A) = =
52 13
16 4
P (B ) = =
52 13
Hence,
3 4 7
P ( A or B ) = P ( A ) + P (B ) = + =
13 13 13
Therefore, the probability that the card is a face or has points value less than six is
7
13
EXAMPLE 4.5
A card is drawn from a deck of 52 cards. Define events

A = Card is 2 or 3 or 4.

B = Card is 7 or 8.
C = Card is a face card.
Calculate P (A or B or C).
Solution
Events A, B and C are mutually exclusive. Their probabilities are as follows:

12 3
P(A) = =
52 13
8 2
P (B ) = =
52 13
12 3
P (C) = =
52 13
Since the events A, B and C are mutually exclusive,
3 2 3 3+2+3 8
P ( A or B or C ) = + + = =
13 13 13 13 13
Therefore the probability that the card is {2 or 3 or 4} or {7 or 8} or {is a face card} is
8
13
4.3. Independent events
Two events A and B are independent if the occurrence of one does not alter the
likelihood of the other event occurring. Events that are not independent are called
dependent.
Suppose that A1 ; A2 ; A3 ; ... ; An are n independent events then
P ( A1 and A 2 and A 3 and.... an An ) = P ( A1 )  ( A 2 )  P ( A3 )  ...  P ( An )

(Replace ‘an’ by “and” in the equation above)

Think Point
It is common to mix up the terms ‘statistically independent’ events and ‘mutually

exclusive’ events. They are two very different concepts. The distinction between
them is as follows:
• When two events are mutually exclusive, they cannot occur simultaneously.
• When two events are statistically independent, they can occur together, but
the occurrence of one does not affect the occurrence of the other, (Wenger,
2017).
EXAMPLE 4.6
Two coins are tossed independently. Find the probability that both coins show
heads.
Solution
Let‘s define events A and B as follows:

A = Head on first coin.
B = Head on second coin.
Since events A and B are independent,
1
P(A ) = P(A ) = = 0.5
2
Therefore, the probability that both coins show heads is:
1 1 1
P ( A and B ) = P ( A ) x P (B ) = x =
2 2 4

EXAMPLE 4.7
Suppose a coin is tossed, a six-sided d i c e is rolled, and a card is drawn

randomly from a deck of 52 cards. Define events A, B and C as follows:
A = Coin shows a tail.
B = Dice shows a 3.
C = Card is a heart.
Find P(A and B and C).
Solution
The probabilities of A, B and C are given as follows:
Since these events are independent and to find the probability that all three occur,
we use:
1 1 1 1
P ( A and B and C ) = P ( A )  P (B )  P ( C ) =   =
2 6 4 48
1
Hence, the probability that all three events will occur is
48
4.4. Complementary Events
The complements of an event are those outcomes of a sample space for which the
event does not occur. Two events that are complements of each other are said to be
complementary. A and Ac in Figure 4.3 are complements.
Figure.4.3: Complement of events

EXAMPLE 4.8
Suppose that a six-sided dice is thrown. Let us define an event A as

follows: A = Outcome is 1 or 2.
Then the complement of event A would be event B, where:

B = Outcome is 3 or 4 or 5 or 6.
If two events A and B are mutually exclusive and are complementary, then:
P ( A ) + P (B ) = 1
4.5. Conditional Probabilities

The probability that event A occurs, given that an event B has occurred, is
called the conditional probability. That is the probability that an event A occurs
given that event B has already occurred. The notation for this conditional probability
is:
This conditional probability is given by: P(A | B)
P ( A and B )
P ( A B) =
P (B )
or
P(A B )
P ( A B) =
P (B )
4.6 General Addition Law

When two events are not mutually exclusive, we should use the following general
additional law:
P ( A or B ) = P ( A ) + P (B) − P ( A and B)

Which is the same as:
P ( A U B ) = P ( A ) + P (B) − P ( A B)
Note: If the events A and B are mutually exclusive, P ( A B) = 0 .
4.7. Union and intersection of events
Sample spaces and events are often presented in a visual display called a Venn
diagram. While there are several variations as to how these diagrams are drawn,
we will use the following conventions.
1. A sample space is represented by a rectangle.
2. Events are represented by regions within the rectangle. This is usually
done using circles (or parts of circles).
The union of two events A and B is the set of all outcomes that are in event A or
event B . An example of a union event is shown in diagram 4.4 below:
Figure.4.4: Union of events
Union of events A and B is A U B
The intersection of two events A and B is the set of all outcomes that are in both
event A and event B.
Figure.4.5: Intersection of events

Intersection of events A and B is A B
Hence, we could write, for example P( A  B) instead of P( A and B).
Venn diagrams are used to assist in presenting a picture of the union and
intersection of events, and in the calculation of probabilities.
EXAMPLE 4.9
For events A, B and C use a Venn diagram to represent the following:

(i) Event A.
(ii) Event B.
(iii) The complement of event A.
(iv) the union of events A and B.
(v) the intersection of events A and B.
(vi) the intersection of events A , B, and C.
Solution

(ii)
(iii)
(iv)

(v)
(vi)
4.8 COUNTING TECHNIQUES
There is the reverse way of looking at probabilities. Instead of asking what probability
there is that an isolated may occur, one is interested in the number of alternative ways
a set of events may turn out. Of course, if the total number n of alternative ways that
an event can turn out is known, then the probability that one of those ways turns out
is 1/n. Consider the twenty-six letters of alphabet. In this regard, one may be interested
in the alternative ways of arranging them. Computation using modern computers
shows that there are 400 trillion ways of arranging these letters. The way which we
have memorized since we were small is just one of these. The probability that such an
1
order is selected can be said to be . Such problems and many more are
400 trillion
the concern of the counting techniques of statistics.

One of the methods of approaching similar problems to the simpler types of the
example given is by means of decision trees. This involves dividing the problem into
many events (or stages) and drawing the various interconnections between them. If,
for example, there is one production line of car assembly that produces three types of
cars of three types of engine capacities 1100cc, 1500cc, and 2000 cc. Then if among
the various engine capacities there are three types of colour configuration – like yellow,
blue and cream, and then if there are three body-build alternatives Saloon, Pick-Up
and Station-Wagon; one may wonder what is the probability of obtaining one car,
which is 1500cc, yellow in colour and saloon. This is a typical simple problem which
can be solved by means of a decision tree, like shown in Figure 4.6
Figure.4.6: A decision Tree
Physically one can count the ultimate ends of the decision tree and find that there are
27 alternatives available. We may conclude that the probability of obtaining one trail
of alternatives; like 1500cc, yellow and then saloon is exactly 1/27, because there are
these 27 alternatives.
Decision trees are convenient only for evaluating simple alternatives like the one
given. The more stages of alternatives available, we find that it is impossible to keep

drawing alternative family tree diagrams, because they become prohibitively

complicated. This is where alternative methods of counting techniques are necessary.
Using these counting techniques and appropriate computer programs is how we are
able to obtain all the possible arrangements of the 26 letters of the alphabet. These
are the methods that we briefly allude to in the discussion that follows hereunder.
Learners are advised to read more within the prescribed textbooks.
4.8.1 Permutations
This is one of those techniques which is available and can save us from the pain of
drawing decision trees always. It attempts to answer similar problems which arise from
the desire to know how many arrangements of n-objects, r taken at a time, are
possible. In permutations the order of arrangement is important. Therefore, we have
the following formula for the algorithm used to compute permutations:
n!
Prn =
(n − r )! .
The formula reads: The permutation of n objects, r taken at a time is equal to the ratio
of n! to the difference (n – r)! In that case, n is the total number of objects. The symbol
r represents equal groups of objects that are handled each time to affect the required
arrangement. P is the permutation or another name for numerous alternative
arrangements.
EXAMPLE 4.10
Find the number of all possible arrangements of five objects, taking three at a time,
where objects must be arranged in a sequential manner.

Solution
Here a specific arrangement is prescribed. This means that we can use only a
permutation computational algorithm to solve this problem. Therefore, we use the
formula:
n!
Prn =
(n − r )!
And insert the figures which we are given in the problem. Accordingly, we have the
following arrangement which includes the given figures:
n! 5! 5  4  3  2  1
Prn = = P35 = =
(n − r )! (5 − 3)! (5 − 3 )!
5  4  3  2  1
= = 5  4  3 = 60 .
2!
This means you can arrange five objects taking two objects at a time in 60 different
ways.
Learners must try as many problems as possible to familiarize themselves in the use
of permutations. The probability of obtaining any of these possible arrangements is
1/60 = 0.016667.
4.8.2 Combinations
We use combinations when the order of arrangements of objects is not necessary. In

that case, the formula for this type of computation is as follows:
n!
Cnr =
r ! (n − r ) !

If we examine the formula, we note that it is similar to the one of computing

permutations, but there is an r! in front of the denominator expression within the
brackets. The formula reads, “The combination of n objects taking r objects at a time,
is the ratio of n! to the product of r! and the difference (n - r)!
EXAMPLE 4.11
Find the number of all possible arrangements of five objects, taking two at a time,
where objects must be arranged in a no specific order.
Solution
Here no specific arrangement is prescribed. This means that we can use only a
combination computational algorithm to solve this problem. Therefore, we use the
n!
formula Cnr = . If we now insert the given figures into this formula, we have:
r ! (n − r ) !
n! 5! 5  4  3  2  1 5  4
Cnr = = C52 = = = = 10
r! ( n − r ) ! 2! ( 5 − 2 ) ! 2  1 ( 3  2  1) 2
This means that five objects can be arranged ten times if we take two objects at a time
when we are shuffling these objects. The probability of obtaining any of these possible
combinations is therefore 1/10 = 0.1.

4.9 SUMMARY
This chapter covered probability. Probability is the likelihood of something occurring.

It introduced and explained the types and use of the theoretical distributions underlying
probability calculations. Probability is important to business in that we are able to
estimate the likelihood of certain outcomes occurring and take the necessary
precautions.

4.1. A coin is tossed 3 times. Find the probability that the

outcome is:
4.1.1 All heads.

4.1.2 2 heads and 1 tail.
4.2. A six-sided die is rolled 3 times. Find the probability that all 3 outcomes are
greater than 4.
4.3. A box contains 3 red balls, 5 black balls and 8 blue balls.
Find the probability that a ball chosen at random is:
4.3.1 Red.
4.3.2 Blue.
4.3.3 Black.
4.3.4 Blue or red.
4.4 Consider sets A and B as follows:

A = 1;5;7;13;22 B = 2;6;13;31;47
Illustrate the following on a Venn diagram:
4.4.1 A B
4.4.2 A B
4.5. Suppose we randomly select two persons from the members of a club and
observe whether the person selected each time is a man or a woman. Write
all the outcomes for this experiment. Draw the Venn and tree diagrams for
this experiment.

CHAPTER FIVE
LINEAR CORRELATION AND REGRESSION
Learning Outcomes
• Carry out calculations on correlation and regression analysis for business

research.
• Carry out correlation analysis in order to quantify the degree to which two
variables are related.
• Carry out regression analysis in order to assess the relationship between an
outcome variable and one or more risk factors or confounding variables.
5.1 Introduction to regression and correlation analysis
Sales performance analysis and sales forecasting are two very important activities
within the business management function. Each involves examining the impacts of the
various elements of the marketing mix (price, promotion, place, and product) on the
level of sales volumes. If the relationship between sales volumes and the various
marketing mix elements can be measured and quantified, marketing managers will
have a powerful tool at their disposal to influence sales volumes through their
decisions on pricing.
This unit examines the likely relationship between numeric random variables (e.g sales
volume and advertising expenditure). Understanding and quantifying such
relationships can considerably enhance the planning function of marketing
management. Regression analysis and correlation analysis are two statistical methods
which collectively describe and measure the strength of possible relationships
between business management related variables.

5.2 Simple Linear Regression
Simple linear regression analysis finds a straight-line relationship between two

numerically scaled random variables. The one variable is called the independent or
predictor variable and the other is termed the dependent or response variable.
5.2.1 Independent variable x
The independent variable is represented by the symbol x. It is the variable which is

assumed to influence the outcome of the other variable. Hence is also called the
predictor variable. Its values are usually known or easily determined. In certain
instances, the independent variable’s values can be controlled or manipulated. In
marketing, for instance, the independent variable could be one of:
The marketing mix variables, such as price, in-store promotion expenditure, number
of advertisement insertions, size of advertisements, level of advertising expenditure.
Consumer attributes such as personal disposal income, family size, age of
breadwinner; or such as the various economic indicators (e.g., GNP, CPI,
export/import volumes).
5.2.2 Dependent variable y
The other random variable is called the dependent variable and is represented by the
symbol y. The dependent variable is assumed to be influenced (or determined) by the
independent variable.
Examples of Independent and Dependent variables
Table 5.1: Example of independent and dependent variables
Independent variable x Dependent variable y

(Potential predictor of y.) (Variable to be estimated from x.)
Advertising levels. Company turnover.
Training effort. Labour productivity.
Speed. Fuel consumption.

Hours worked. Machine out put.
Daily temperature. Electricity demand.

Hours studied. Statistics result.
Product X ’s price. Product X ’s sales level.
Bond interest rate. Number of bond defaulters.
Cost of living. Poverty.

Shelf space allocated. Sales volume of brand.
Consumer age. Cinema attendance.

Household disposable income. Number of cars owned.
Consumer attitude of a band. Brand sales levels.
In all the above examples, the independent variable, x is considered to influence the
outcome of the dependent variable, y.
5.3 Regression Analysis
Regression analysis is concerned with mathematically quantifying the structural

relationship between two numeric random variables, while correlation analysis
identifies the strength of this association.
5.3.1 Regression equation
The equation of the straight line that fits through the scatter plot of two variables is
called the regression equation. The equation can take quadratic, cubic or exponential
functions. In this module we will focus on straight line regression equations which is
given as follows:
y = a + bx
Where:
y = Dependent variable.
x = Independent variable.
a = y intercept (the regression coefficient).
B = Gradient or slope of the regression equation (regression coefficient).

Due to the fact that the regression coefficients a and b will be estimated, the values
of y will also be estimates and are denoted by ŷ instead of y. The regression equation
then becomes:
ŷ = a + bx
The regression coefficients are obtained by the method of least squares (derivations
beyond the scope of this module) and given as follows:
𝑦̂ = 𝑎 + 𝑏𝑥
𝑎 = 𝑦̅ − 𝑏𝑥̅
𝑆𝑥𝑦
𝑏=
𝑆𝑥𝑥
where 𝑆𝑥𝑦 = ∑ 𝑥𝑦 − 𝑛𝑥𝑦

̅̅̅ and 𝑆𝑥𝑥 = ∑ 𝑥 2 − 𝑛𝑥̅ 2
∑ 𝑥𝑦−𝑛𝑥𝑦
̅̅̅̅
Or 𝑏=
̅2
∑ 𝑥2 −𝑛𝑥
The values of a and b found from the above formulae define the best-fit linear
regression line. This means that no other straight line can be found that will give a
better fit than the regression line.
EXAMPLE 5.1
Music Centre, an electronics retail company in Durban, has kept records of the number
of Hi-Fi systems sold within a week of placing advertisements in local newspapers.
Table 5.2 shows the number of Hi-Fi systems sold each week and the corresponding
number of advertisements placed in local newspapers for 12 periods.
a) Draw a scatter plot to display the relationship between the number of

insertions and hi-fi systems sold.

b) Find the straight-line regression equation to estimate the number of hi-fi

systems that Music Centre can expect to sell each week, based on the number
of local newspaper advertisements placed.
c) Fit the regression equation onto the scatter plot.
Table 5.2: The number of Hi-Fi sold and the number of number of advertisements placed in
local newspapers
Week Number of insertions in Hi-Fi systems sold

local newspaper
1 4 24
2 4 28
3 3 23
4 2 18
5 5 36
6 4 22
7 5 36
8 4 25
9 4 28
10 5 32
11 3 30
12 5 34
Solution
Identify the dependent and independent variables first from the problem description.
This self-assessment, sales of hi-fi systems are dependent upon the number of
advertisements placed. Hence sales of hi-fi systems are the dependent variable y i,
and the number of advertisements placed is the independent variable xi.

a) Scatter plot
Scatter plot of advertisements

places and Hi -Fi systems sold
40
Hi-Fi systems sold
30
20
10
0
0 1 2 3 4 5 6
Number of Adverisements places
Figure.5.1: Scatter plot for the number of Hi-Fi sold and the number of number of
advertisements placed in local newspapers
b) Table
Table 5.3: Calculation of the straight-line equation
Number of insertions in Hi-Fi systems

local newspaper sold
𝑥𝑖 𝑦𝑖 𝑥𝑖2 𝑥𝑖 𝑦𝑖
4 24 16 96
4 28 16 112
3 23 9 69
2 18 4 36
5 36 25 180
4 22 16 88
5 36 25 180
4 25 16 100
4 28 16 112
5 32 25 160
3 30 9 90
5 34 25 170
𝑛 𝑛 𝑛
∑ 𝑥𝑖 = 48 ∑ 𝑦𝑖 = 336 ∑ 𝑥𝑖2 = 202

∑𝑛𝑖=1 𝑥𝑖 𝑦𝑖 =1393
𝑖=1 𝑖=1 𝑖=1

∑ 𝑥𝑖 48
𝑥̄ = = =4
𝑛 12
∑ 𝑦𝑖 336
𝑦̄ = = = 28
𝑛 12
𝑆𝑥𝑦 = ∑ 𝑥𝑦 − 𝑛𝑥𝑦
̅̅̅ = 1393 − 12(4)(28) = 49
𝑆𝑥𝑥 = ∑ 𝑥 2 − 𝑛𝑥̅ 2 = 202 − 12(4)2 = 10
𝑆𝑥𝑦 49
𝑏= = = 4.9
𝑆𝑥𝑥 10
𝑎 = 𝑦̅ − 𝑏𝑥̅ = 28 − 4.9(4) = 28 − 19.6 = 8.4
𝑦̂ = 𝑎 + 𝑏𝑥 = 8.4 + 4.9𝑥
c) Fitted regression equation
Scatter plot of advertisements places and Hi -

Fi systems sold
40
35
30
Hi-Fi systems sold
y = 8.4 + 4.9x
25
20
15
10
0
0 1 2 3 4 5 6
Number of Adverisements places
Figure 5.2: Fitted regression line for the number of Hi-Fi sold and the number of number of
advertisements placed in local newspapers

The purpose of constructing a regression equation is to use it to estimate values of y

from known values of x. Estimates of y are found by substituting a given x value into
the regression equation. In business management, the x variable is usually a
controllable marketing mix variable such as price, advertising expenditure, size of
sales staff, number of television spots, and so on, that is readily available or can be
decided by the marketing manager.
5.4 Correlation Analysis
A statistical technique used to measure the closeness of the linear relationship

between two or more interval/scalar variables. The purpose of correlation analysis is
to measure the strength of the relationship between two variables. The correlation
coefficient denoted by r cannot be greater than 1 or less than -1. As defined,
correlation is a number between +1 and -1 that reflects the degree to which two
variables have a linear relationship.
A business manager can have great confidence in estimates based on the regression
line if there is a strong relationship between the x and the y random variables. A
strong relationship will produce a more accurate and reliable estimate of y, meaning
that the estimated y value which will result for a given value of x is likely to be close
to the y value.
There are two types of correlation coefficients namely Pearson’s correlation coefficient
and Spearman’s rank correlation coefficient. Their interpretation is the same.
5.4.1 Interpretation of correlation coefficients
Figure 5.3 below gives a rough guide to the interpretation of the correlation
coefficients:

Figure 5.3: Interpretation of correlation coefficients
Any interpretation should take the following two points into account:
• A low correlation does not necessarily imply that the variables are unrelated,
but simply that the relationship is poorly described by a straight line.
• A correlation does not necessarily imply a cause-and-effect relationship,
merely an observed association.
The following diagrams illustrate how scatter plots can be used to interpret the
correlation coefficient.
a) Perfect positive correlation
Figure 5.4 below shows a perfect positive linear correlation ( r = +1 ). All the data points
of a scatter plot will lie on a positive straight line.
Figure 5.4: Perfect positive correlation

b) Perfect negative correlation
Figure 5.5 shows a perfect negative linear correlation r = -1. All the data points will
again lie on a straight line, but in an inverse direction (i.e. as x increases, y decreases
and vice versa).
Figure 5.5: Perfect negative correlation
c) Positive (direct) linear correlation
Figure 5.6 shows positive (direct) linear correlation 0 < r < +1 with r approaching +1.
An increase (decrease) in x results in an increase (decrease) in y (a direct
relationship).
Figure 5.6: Positive (direct) linear correlation

d) Negative (indirect) linear correlation
Figure 5.7 shows negative (indirect) linear correlation -1  r  0 with r approaching -

1. An increase (decrease) in x results in a decrease (increase) in y (an indirect
relationship).
Figure 5.7: Negative (indirect) linear correlation
e) Positive (direct) linear correlation
Figure 5.8 shows positive (direct) linear correlation 0  r  +1 with r approaching 0

from the direction of +1.
Figure 5.8: Positive linear correlation
f) Negative (indirect) linear correlation
Figure 5.9 shows negative (indirect) linear correlation -1  r  0 with r approaching 0

from the direction of -1. An increase (decrease) in x results in a decrease (increase)
in y (an indirect relationship).

Figure 5.9: Negative linear correlation
g) No linear correlation
Fig 5.10 shows no linear correlation ( r = 0). The values of x are of no value in
estimating values of y. The data points are randomly scattered. If such a scenario
exists between sales and a particular marketing variable, a marketer should seek other
numerically scaled marketing mix, consumer behaviour, financial or economic
variables x that are more likely to have an association with the dependent variable y.
Figure 5.10: No linear correlation
5.3.2 Pearson’s correlation
Pearson’s correlation coefficient computes the correlation between two numerically

scaled random variables only. Pearson’s correlation coefficient is defined by:
∑ 𝑥𝑦 − 𝑛𝑥𝑦
̅̅̅̅
𝑟𝑝 =
̅ 2 ) (∑ 𝑦2 − 𝑛𝑦
√(∑ 𝑥2 − 𝑛𝑥 ̅ 2)

Where
rp = The Pearson’s correlation coefficient.
x = The values of the independent variable.
y = The values of the dependent variable.
n = The number of paired data points in the sample.
While doing calculations, substituting everything at once can result in mathematical

errors. The following intermediate stages will help you to reduce mistakes and
increase accuracy.
𝑠𝑥𝑦
𝑟𝑝 =
√𝑠𝑥𝑥 ×𝑠𝑦𝑦
Where:
̅̅̅
𝑆𝑥𝑥 = ∑ 𝑥 2 − 𝑛𝑥̅ 2 and
𝑆𝑦𝑦 = ∑ 𝑦 2 − 𝑛𝑦̅ 2
Pearson’s correlation coefficient formula is derived from the least squares regression
approach; hence its formula has similar terms to the regression coefficients. The
calculation of Pearson’s correlation coefficient and its interpretation is illustrated in the
following two self-assessments.
EXAMPLE 5.2
Using information given in Table 5.2 from example 5.1 for the number of Hi-fi systems
that Music Centre can expect to sell in a given week.
a) Calculate the Pearson’s correlation coefficient for the two variables.
b) Comment on the strength of the identified relationship.

Solution
a) From Table 5.2 we obtained the following:
∑ 𝑥𝑦 = 193; ∑ 𝑥 2 = 202; ∑ 𝑦 2 = 9778; 𝑥̅ = 4; 𝑦̅ = 28; n =12
NowL:
̅̅̅ = 1393 − 12(4)(28)=49
𝑠𝑥𝑥 = ∑ 𝑥 2 − 𝑛𝑥̅ 2 = 202 − 12(4)2 = 10
𝑠𝑦𝑦 = ∑ 𝑦 2 − 𝑛𝑦̅ 2 = 9778 − 12(28)2 = 370
𝑆𝑥𝑦 49
𝑟= = = 0.8056
√𝑠𝑥𝑥 ×𝑠𝑦𝑦 √10×370
b) For management interpretation: The sample correlation coefficient of 0.8056

is close to +1, hence the statistical association between x (the numbers of
print advertisements placed) and y (sale of sound systems) is a strong
positive relationship. Thus, the number of print advertisements placed is of
reasonable value in estimating the actual number of systems that Music
Centre can expect to sell in the following week.
5.4 Using Excel for Regression Analysis
Regression and correlation analysis can be performed in Excel using the Regression
option in the Data Analysis add-in module. The output for Example 5.1 is shown in
Figure 5.11, with the regression equation coefficients (a and b), the correlation

coefficient, r and the coefficient of determination, r2.

SUMMARY OUTPUT
Regression Statistics
Multiple R 0.805555
R Square 0.648919
Adjusted R Square 0.613811
Standard Error 3.604164
Observations 12
ANOVA
df SS MS F Significance F
Regression 1 240.1 240.1 18.48345 0.001563
Residual 10 129.9 12.99
Total 11 370
Coefficients
Standard Error t Stat P-value Lower 95%Upper 95%Lower 95.0%
Upper 95.0%
Intercept 8.4 4.676163 1.796345 0.102662 -2.01914 18.81914 -2.01914 18.81914
X Variable 1 4.9 1.139737 4.299238 0.001563 2.360508 7.439492 2.360508 7.439492
Figure 5.11: Regression output for Example 5.1 using the Data Analysis add-in
A scatter plot between x and y can also be produced using the Chart – Scatter option
in the Insert tab. A scatter plot of the x–y data in Example 5.1 is shown in Figure 5.12.
Hi-Fi systems sold

40
35 y = 4,9x + 8,4
Hi-Fi system sold
30 R² = 0,6489
25
20
15
10
5
0
0 1 2 3 4 5 6
Number of insertions
Figure 5.12: Scatter plot for example 5.1
In addition, the regression equation can be computed and superimposed on the scatter
plot, together with the coefficient of determination. To do this, right-click on any scatter
point, select Add Trendline from the drop-down option list. Then select the Linear
option, and tick the boxes Display Equation on the chart and Display R-squared on the
chart. The graph, regression equation and R2 are shown in Figure 5.12. Insert the y-
axis label ‘Number of units sold per week’ and the x-axis label ‘Number of weekly
advertisements placed’.

Conclusion
This chapter focussed on linear regression and correlation. We distinguished

between dependent and independent variables and how they connected by the linear
regression equation. We also looked at how to estimate the strength of the
relationship by means of the scatterplot and to calculate the strength using Pearson’s
correlation coefficient. The worked example from the chapter served as an example
of how to use Excel's Regression option in Data Analysis. Managers can really
benefit from this regression analysis technique because it is employed as a
forecasting and planning tool.

QUESTION ONE
5.1 A property analyst is examining the relationship between the town council’s
valuation of residential properties in Krugersdorp and the market value (selling
price) of the properties. A random sample of 12 properties was examined. The
data is shown in Table 5.4 below:
Table 5.4: Council valuation versus Market value
Town council property Market value

18 176
48 200
32 110
50 305
28 196
44 426
18 116
40 248
38 117
24 198
46 208
28 258

5.1.1 Find the degree of association (Pearson’s correlation coefficient)

between Krugersdorp town council property valuations and market
values.
5.1.2 Is this a good statistical association? Explain your answer.
5.1.3 Find the regression equation for the above data.
5.2 The managers for an insurance firm are interested in finding out if the number
of new clients a broker brings into the firm affects the sales generated by the
broker. They sample 12 brokers and determine the number of new clients they
have enrolled in last year and their sales amounts in thousands of Rand. The
data are presented in the table that follows:
Broker clients(X) sales(Y)
1 22 52
2 6 37
3 37 64
4 28 55
5 10 29
6 10 34
7 20 58
8 25 48
9 12 31
10 17 38
5.2.1 Identify the dependent and the independent variable.
5.2.2 Find the least squares regression equation.
5.2.3 Calculate the correlation coefficient and comment your results.

CHAPTER SIX
TIMESERIES ANALYSIS AND FORECASTING
Learning Outcomes
• Understand how to use and recalculate index numbers.

• Know how to use indices to deflate prices.
• Understand the time series fundamentals.
• Inspect and prepare data for forecasting.
• Graph the data and visually identify patterns.
• Fit an appropriate model using the time series approach.
• Understand the concept of smoothing.
• Defining an index number and its purpose.
• Differentiate between price and quantity indices.
• Calculate both price and quantity indices.

6.1 INTRODUCTION: TIMESERIES ANALYSIS
When a marketing department of a company records its turnover daily, weekly, or

monthly, it is compiling a time series of sales data. For example, magazines such as
Fairlady, receive audited circulation figures twice a year (for the periods ending June
and December) from the Audit Bureau of Circulation (ABC) which reflects their sales
per issue. This data, recorded over a number of years, makes up a time series.
A time series is a variable that is measured and recorded at equally spaced intervals
of time. Inflation is a nice illustration. Inflation can be tracked on a monthly, quarterly,
or annual basis. Each of the three data sets is a time series. In other words, it is
irrelevant what time units we use as long as they are sequential and consistent, at
which point we will have time series data. We define consistency as the prohibition
against combining different time units (daily with monthly data or minute with hourly
data, for example). And by sequential, we mean that for this particular point in time,
zeroes or empty values are not allowed and that no data points may be skipped. If
this happens, we can attempt to estimate the missing value by finding the average of
the two numbers closest to it or by using any other suitable technique. What does
time series analysis seek to accomplish? Well, predicting a variable's future
movements is the primary goal of the bulk of time series analysis techniques. In other
words, the primary concern is predicting future values. A number of other auxiliary
approaches have been developed to determine whether the right forecasting method
has been applied. They're all examples of time series analysis. Forecasting is still the
primary goal, though.
Definition
Time series: Is variable measured and represented per units of time.

Forecasting: Is method of predicting the future values of a variable, usually
represented as the time series values.

6.2 STATIONARY AND NON-STATIONARY TIME SERIES
Figure 6.1 illustrates a time series plot, and what jumps out at us immediately is that
one of the time series seems to be moving upwards and the other one is following
some horizontal line. The first is called a non-stationary time series, while the second
one, following a horizontal line, is called a stationary time series.
Figure 6.1: A time series plot
Definition
Non-stationary time series: Is a time series that does not have a constant mean
and oscillates around this moving mean.
Stationary time series: Is a time series that does have a constant mean and
oscillates around this mean.
In general, all-time series will fall in to the first or the second category. A variety of
methods have been invented to handle either the stationary or non-stationary time
series.

Think Point
Visualization and charting of a time series is not an optional extra, but one of the
most essential steps in time series analysis. You can learn a lot about a variable
just by looking at the time series graph.
6.3 THE COMPONENTS OF TIME SERIES
Time series analysis assumes environmental forces, individually and collectively and
to determine the value of a time series random variable (such as sales, share price)
in any time period. These environmental influences are known as:
• Trend (T).
• Cyclical influences (C).
• Seasonal influences (S).
• Irregular or Random effects (I).
6.3.1 THE TREND
Trend is a long-term movement in a time series. It is the underlying direction (an

upward or downward tendency) and rate of change in a time series, when
allowance has been made for the other components.
It is defined as a long term smooth underlying movement in a time series. It

summarises the effect that long term factors tends to operate fairly gradually and in
one direction for a considerable period of time. Consequently, the trend component is
usually described by a smooth, continuous curve or straight line. Marketers should be
aware of the environmental factors that cause long term movements in a specific
direction in a time series as these influences, individually and collectively, result in
increased (or decreased) demand (consumption) of various categories of consumer
goods.

Some of the more important causes of long-term trend movements include:
• Population growth.
• Urbanisation.
• Technological improvements.
• Economic advancements and developments.
• Consumer shifts in habits and attitudes.
Trend analysis is the statistical technique used to isolate this underlying long-
term movement.
6.3.2 SEASONAL VARIATIONS
Seasonal variation is the component of variation in a time series which is dependent

on the time of year. It describes any regular fluctuations with a period of less than one
year. For example, the costs of various types of fruits and vegetables, unemployment
figures and average daily rainfall, all show marked seasonal variation
Some other examples of seasonal variations are as follows:
a) Sales of ice cream will be higher in summer than in winter, and sales of
overcoats will be higher in autumn than in spring.
b) Shops might expect higher sales shortly before Christmas.
c) Sales might be higher on Friday and Saturday than on Monday.
d) The telephone network may be heavily used at certain times of the day (such
as mid morning and mid afternoon) and much less used at other times (such as
in the middle of the night).
EXAMPLE 6.1
The number of customers served by a company of travel agents over the past four
years is shown in Table 6.1 below:

Table 6.1: Customers per quarter
Year Quarter 1 Quarter 2 Quarter 3 Quarter 4

2000 6 3 9 4
2001 6 4 10 5
2002 7 3 11 6
2003 8 5 12 7
Draw a time series plot of the number of customers served.

Add a trend line to the time series plot.
Solution
Determine the dependent (number of customers) and independent (time) variables

first and the scale on each axis as shown in Fig 6.1 below.
Figure 6.1: Times series for number of customers served
A trend line in this case will show the general direction of the number of customers
served and, in this case, a general increase in the number of customers as shown in
Fig 6.2 below:
Figure 6.2: Trend line

In this example, there would appear to be large seasonal fluctuations in demand, but
there is also a basic upward trend.
6.3.3 CYCLICAL VARIATIONS
In weekly or monthly data, the cyclical component describes any regular fluctuations.
It is a non-seasonal component which varies in a recognizable cycle. Cyclical
variations are longer term than seasonal variations. Cyclical variations describes any
regular fluctuations with a period of more than one year.
Cyclical variations are medium-term changes in results caused by circumstances

which repeat in cycles. In business, cyclical variations are commonly associated with
economic cycles, successive booms, and slumps in the economy. Economic cycles
may last a few years.
6.3.4 IRREGULAR OR RANDOM VARIATIONS
The irregular component is what is left over when the other components of the series
(trend, seasonal and cyclical) have been accounted for.
6.4 SUMMARISING THE COMPONENTS
The components of a time series can be summarized by the following equation, which
is called an additive formula since it a sum of the components
Y=T+S+C+I
Where:
Y = The actual time series.
T = The trend component.
S = The seasonal component.
C = The cyclical component.
I = The random or irregular component.

Though you should be aware of the cyclical component, you will not be expected to
carry out any calculations connected with it. The mathematical model which we will
use, the additive model, therefore excludes any reference to C. The model will
therefore be as follows:
Y=T+S+ I
6.4.1 FINDING THE TREND
The main problem we are concerned with in time series analysis is how to identify
the trend and seasonal variations.
There are three main methods of finding a trend:
a) A line of best fit (the trend line) can be drawn on a graph. The concept is the
same as that used for finding a regression line (chapter 5).
b) A statistical technique known as linear regression by the least squares method
can be used (chapter 5).
c) A technique known as moving average can be employed to make the long-term
trends of a time series clearer by smoothening the data.
6.4.2 The moving average method.
Moving average is an average of the results of a fixed number of periods. It is

a technique used to find the trend when it difficult to see the underlying trend.
This method attempts to remove seasonal variations from actual data by a
process of averaging.

Example 6.2
The data below shows the number of sales units recorded by a manufacturing
company manager.
Table 6.2: Sales data
Year Sales Units

2015 390
2016 380
2017 460
2018 450
2019 470
2020 440
2021 500
a) What determines the period over which a moving average should be taken?
b) Calculate the moving averages of the annual sales over a period of three years.
c) Draw a time series plot of the original data and the moving averages on the
same axis.
Solution
The moving average which is most appropriate will depend on the circumstances of
the nature of the time series.
The three point moving averages are calculated as follows:

Year sales units 3 point moving totals 3 point moving

average
2014 390
2015 380 90+380+460=1230 1230/3 =410
2016 460 380+460+450=1290 1290/3=430
2017 450 460+450+470=1380 1380/3=460
2018 470 450+470+440=1360 1360/3=453
2019 440 470+440+500=1410 1410/3=470
2021 500
The time series plot of the original data and the moving averages are shown in
Figure 6.2 as follows
600
500
400
300
200
100
0
2013 2014 2015 2016 2017 2018 2019 2020 2021 2022
sales units 3 point moving average
Figure 6.2: Moving average
Note: The following points:
I. The moving average series has five figures relating to the years from 2001
to 2005. The original series had seven figures for the years from 2000 to
2006.
II. There is an upward trend in sales, which is more noticeable from the series
of moving averages than from the series of actual sales each year.
The above example averaged over a three-year period and some of the points
to consider include the following:

I. A moving average which takes an average of the results in many time

periods will represent results over longer term than a moving average of
two or three periods.
II. On the other hand, with a moving average of results in many time periods, the
last figure in the series will be out of date by several periods. In our example,
the most recent related to 2005. With a moving average of five years’ results,
the final figure in the series would relate to 2004.
III. When there is a known cycle over which seasonal variations occur, such as
all the days in the week or all the seasons in the year, the most suitable
moving average would be one which covers one full cycle.
6.4.3 The regression
A trend line isolates the trend (T) component only. It shows the general direction
(upward, downward, or constant) in which the series is moving. It is therefore best
represented by a straight line. The method of least squares from regression analysis
is used to find the trend line of best fit to a time series e.g., sales, while the
independent variable t is time. To use time as an independent variable in regression
analysis, it must be coded, example 1 = 2014; 2 = 2015, etc.
Alternative approaches for coding the time variable t are:
• The sequential numbering method.

• The zero-sum method.
The trend line is estimated the same way as in Chapter 5. The only difference is that
the independent variable is now denoted by t instead of x. Therefore, the regression
equation is given by:
y = a + bt where
y = Dependent variable.
a = Regression coefficient (constant or y intercept).
b = Regression coefficient (slope or gradient of line).

The regression coefficients are obtained as follows:
y = a + bt
a = y − bt
S ty
b=
S tt
where
S ty =  ty − nty
and
S tt =  t 2 − nt 2
or
 ty − nty
b= 2
 t − nt
2
6.4.3.1 Sequential numbering method
Each time period t of the time series y is assigned an integer value beginning with
1 for the first time period, 2 for the second time, 3 for the third time, etc.
Example 6.3
The number of houses sold quarterly by Valley Estates in the Cape Peninsula is
recorded for the period 1996 to 1999 as shown in Table 6.3 below. The sales
director has requested a trend analysis of this sales data to determine the general
direction (trend) of future quarterly housing sales.
Table 6.3: Valley Estates Quarterly House Sales study
Quarter 1 Quarter 2 Quarter 3 Quarter 4

1996 54 58 94 70
1997 55 61 87 66
1998 49 55 95 74
1999 60 64 99 80

a) Find the trend line equation for the quarterly house sales data from 1996 to
1999 using the sequential method for coding the time variable.
b) Draw a time series plot and fit the trend line on to the graph.
Example 6.4
Sort the sales values to be in one column as shown in Table 6.4 below. Add
columns for ti , t 2 and ty
Table 6.4: Sequential numbering method

b) A
Figure 6.4: Time series plot and the fitted trend line

6.4.3.2 The zero-sum method
Table 6.5: Coding when n is an odd number
Coding when n is an even number
To code t when the number of time periods, n, is even, assign a value of t (n 1) to

the first time period. From each subsequent period, add 2 to the previous period’s x
value. Table 6.6 illustrates this coding rule for n=8 quarterly periods of caravan sales
from 1998, quarter 1, to 1999, quarter 4. Based on time series of 8periods: Assign
x=-(8-1) =-7 to 1998 quarter 1 (the first time period).
Table 6.6: Coding scheme for t i when n is even
For any number of time periods, the sequence is consistent with incremental
steps of +2. This zero-sum coding scheme simplifies the formulae for
computing the trend line since one of the terms in the formulae, namely ti 0

Regression formulae for a trend line when ti 0
When t i = 0 , it implies that t (see Chapter 5). The regression equation is
simplified to give the following regression coefficients
Example 6.5
Using information given in Table 6.4. example 6.4, find a trend line equation to the
house sales data of valley Estates using the zero-sum coding scheme for the time
variable x
Table 6.7: Zero-sum

The regression coefficients can be obtained as follows:
The regression trend line equation will now be given as follows:
Where x = -1 in 1997 Q4
+1 in 1998 Q1
+3 in 1998 Q2

6.5 INDEX NUMBERS
An index number is a summary measure of the change in the level of activity of a

single item or collection (often referred to as basket) of related items from one time
period to another. It is constructed by expressing the value of an item in the current
period as a ratio of its value in the base period. In percentage terms:
𝐶𝑢𝑟𝑟𝑒𝑛𝑡 𝑃𝑒𝑟𝑖𝑜𝑑 𝑉𝑎𝑙𝑢𝑒

𝐼𝑛𝑑𝑒𝑥 𝑁𝑢𝑚𝑏𝑒𝑟 = × 100%
𝐵𝑎𝑠𝑒 𝑃𝑒𝑟𝑖𝑜𝑑 𝑉𝑎𝑙𝑢𝑒
The base period is normally given a value of 100.
Some of the better-known index numbers in South Africa are:
• JSE Actuaries Indices – all share index.

• Gold index, industrial index.
• CPI – Consumer Price Index (1985 = 100).
• PPI – Production Price Index (1980 = 100).
There are two major categories of index numbers – price and quantity. In both cases,
a single or composite index may be used.
6.5.1 A Price Index
A price index measures the percentage change in price between any two periods of
time. For a single item, the relative price change from one time period to another is
found by computing its price relative:
𝑝1
𝑃𝑟𝑖𝑐𝑒 𝑟𝑎𝑙𝑎𝑡𝑖𝑣𝑒 = × 100%
𝑝0
Where:
𝑝1= Current period price.
𝑝0 = Base period price.

6.5.2 A Quantity Index
A quantity index measures the percentage change in consumption level of either an

individual item or a basket of items from one time period to another. For a single item,
the relative quantity change from one time period to another is found by computing its
quantity relative.
𝑞1
𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝑟𝑎𝑙𝑎𝑡𝑖𝑣𝑒 = × 100%
𝑞0
Where:
𝑞1 = Current period quantity.
𝑞0 = Base period quantity.
6.5.3 Weighted Index Numbers
Whereas simple index numbers grant equal importance to all items regardless of what
share they hold, weighted index numbers weigh or load items according to their
relative importance. For example, when calculating the price index number if the price
of a unit of petrol is fifteen times the price of a unit of rice, then the petrol will be
weighed in as ‘15’ whereas rice will be weighed in as ‘1’. It creates a more realistic
picture of the real state of affairs than simple index numbers.
There are two types of weighted indexes namely, the fixed weight index, and the
simple weighted (aggregative) index.
6.5.3.1 Fixed Weighted Index
This utilises weights that are based on a period/s considered representative. The
weight and base prices do not necessarily have to be drawn from the same period.
The formula is as follows:
∑ 𝑝1 𝑤
𝑃𝑟𝑖𝑐𝑒 𝐼𝑛𝑑𝑒𝑥 = × 100%
∑ 𝑝𝑜 𝑤

6.5.3.2 Simple Weighted (Aggregative) Index
This places the base year for both price and quantity in the numerator. It does,
however run the risk of lack of representation in the base period. The formula is as
follows:
∑ 𝑃1 𝑄1
× 100%
∑ 𝑃0 𝑄0
6.6 Composite Index
A composite index combines the relative prices and quantities. A commonly used
composite index is the Laspeyres index and Paasche index.
6.6.1 The Laspeyres price index
The Laspeyres price index is given by:
∑ 𝑝1 𝑞0
𝐿𝑝 = × 100%
∑ 𝑝0 𝑞0
Where quantities at base period levels are held constant. The Laspeyres quantity
index is given by:
∑ 𝑝0 𝑞1
𝐿𝑞 = × 100%
∑ 𝑝0 𝑞0
Where prices at base period levels are held constant:

Example 6.6
Using 1986 as the base year calculate the Lasperyers price and quantity indices,
and interpret you answer for the portfolio of shares provided below:
Share 1986 (Base year) 1992

A 65 350 115 300
B 200 240 120 60
C 1260 50 1890 100
Solution
Using 1986 as base year, the Laspeyres composite indices are calculated as follows
Table 6.8: Share portfolio performance

𝑝0 𝑞0 𝑝1 𝑞1 𝑝0 𝑞0 𝑝1 𝑞0 𝑝0 𝑞1
A 65 350 115 300 22750 40250 19500
B 200 240 120 60 48000 28800 12000
C 1260 50 1890 100 63000 94500 126000
∑ = 133750 ∑ = 163550∑ = 157500
Share Portfolio Performance
163550
Laspeyres price index: 𝐿𝑝 = 133750 × 100% = 122.3%
157500
Laspeyres quantity index: 𝐿𝑞 = 133750 × 100% = 117.8%

Interpretation
This implies that the value of share units (price) increased, on average, by 22.3%.
Number of share units (quantity) increased, on average, by 17.8%.
The price index indicates the increase in the value of the portfolio if all quantities of
shares remain the same. Conversely, the quantity index indicates the increase in the
number of shares bought if all prices are held constant. Index numbers are generally
based on samples of items. Hence sampling errors are introduced. Furthermore,
technological changes, product quality changes and changes in consumer purchasing
patterns can individually and collectively make comparisons over time unreliable.
6.6.2 Paasche index
The Paasche index is an example of a weighted aggregate index which uses current
time period weights. It is useful when the relative importance of the items making up
the basket of goods is continuously changing due to a change in the quantity for each
year. It is more accurate than the Laspeyre’s Index as it reflects what the industry is
actually using in the current year, and therefore takes account of the price changes
and the quantity changes.
The Paasche Price index is given by:
∑ 𝑝1 𝑞1
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑃𝑟𝑖𝑐𝑒 𝐼𝑛𝑑𝑒𝑥 =
∑ 𝑝0 𝑞1
The Paasche Quantity index is given by:
∑ 𝑝1 𝑞1
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝐼𝑛𝑑𝑒𝑥 =
∑ 𝑝1 𝑞0
Example 6.7
Calculate Paasche’s price and quantity indices for the same share portfolio in example
6.6 and interpret the results.

Solution
Table 6.9: Share portfolio performance

𝒑𝟎 𝒒𝟎 𝒑𝟏 𝒒𝟏 𝒑𝟏 𝒒𝟏 𝒑𝟏 𝒒𝟎 𝒑𝟎 𝒒𝟏
A 65 350 115 300 34500 40250 19500
B 200 240 120 60 7200 28800 12000
C 1260 50 1890 100 189000 94500 126000
∑ = 230700 ∑ = 163550 ∑ = 157500
∑ 𝑝1 𝑞1 230700
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑃𝑟𝑖𝑐𝑒 𝐼𝑛𝑑𝑒𝑥 = = × 100 = 146,48%
∑ 𝑝0 𝑞1 157500
This represents an Increase in price 46.48%
∑ 𝑝1 𝑞1 230700
𝑃𝑎𝑎𝑠𝑐ℎ𝑒 𝑄𝑢𝑎𝑛𝑡𝑖𝑡𝑦 𝐼𝑛𝑑𝑒𝑥 = = × 100 = 141.06%
∑ 𝑝1 𝑞0 163550
This represents an Increase in price 46.48% in quantity of 41.06%.
6.6 Using Excel for Time Series Analysis
Excel can be used in the analysis of time series data. It can produce the trendline
graph of time series data using the Insert > Chart > Line option, as illustrated in
Chapter 2 and construct and superimpose the regression trendline equation on the
line graph, as illustrated in Chapter 5.

Conclusion
This chapter focused on time series analysis as a primary tool for extrapolating and
forecasting. The components of the time series and the various methods used to
identify trends in time series were examined. The chapter also discussed how to
compute and interpret price and quantity indices used to measure trends, and track
changes between time points. Index numbers are useful measures of the change in
the activity of an item or a collection of items from one time period to another.

6.1 The average price of a 30 cm ruler, a 5 cm eraser, an HB pencil and a black

felt-tip pen were recorded in both 2009 and 2019 as below. (Prices are in R.)
Table 6.10: Average prices and quantities for year 2009 and year 2019
2009 2019
Item Price (R) Number sold Price(R) Number sold
(in millions) (in millions)
Ruler 17 7.5 23.7 9.6
Eraser 7.3 4.2 12.5 6.2
Pencil 5.5 12.6 8.7 1.65
Pen 15.8 8.5 22.9 2.68
6.1.1 Calculate and interpret the Laspeyres price and quantity indices for
2019 using a base year of 2009.
6.1.2 Calculate and interpret the Paasche price and quantity indices for 2019
using a base year of 2009.
6.2 You want to invest money in a company. The following table represents the
portfolio for shares. Use the Laspeyres and Paasche’s price indices to
determine how the shares fared. Conclude your calculations with a summary
of how the shares faired, and whether or not you would consider investing in
the company.

Table 6.11: Share prices and volume
Share Base Year 2018

𝒑𝟎 𝒒𝟎 𝒑𝟏 𝒒𝟏
A 120 210 180 125
B 150 220 175 212
C 180 250 205 220

BIBLIOGRAPHY
Brechner, R. (2020). Contemporary mathematics for business and consumers. 9th

edition. Cengage Learning.
Croucher, J. (2016). Introductory Mathematics and Statistics for Business. 6th

edition. McGraw-Hill
Davis, G. W., & Pecar, B., Santana, L. (2013). Business statistics using Excel.
Oxford University Press United Kingdom. Available at: https://b-
ok.africa/book/3719720/f36822
Davis, G. W., Pecar, B., & Santana, L. (2017). Business statistics using Excel: A first
course for South African students. Oxford University Press Southern Africa.
Larson, R., & Farber, B. (2019). Elementary statistics. Pearson Education Canada.
Mann, P. S. (2007). Introductory statistics. John Wiley & Sons.
Wegner T. (2016). Applied Business Statistics. 3rd edition. Juta and Co, Ltd.

BBA - Business Statistics 201

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BBA - Business Statistics 201

Uploaded by

Copyright:

Available Formats

Business Statistics 201

BUSINESS STATISTICS 201

BACHELOR OF BUSINESS ADMINISTRATION

TABLE OF CONTENTS ...................................................................................1

INTRODUCTION TO BUSINESS STATISTICS 201 ........................................5

CHAPTER ONE STATISTICS DEFINITIONS ................................................11

CHAPTER TWO VISUAL REPRESENTATION OF DATA .............................29

CHAPTER THREE DATA DESCRIPTORS ....................................................62

CHAPTER FOUR: ELEMENTARY PROBABILITY ....................................109

CHAPTER FIVE LINEAR CORRELATION AND REGRESSION .................131

CHAPTER SIX TIMESERIES ANALYSIS AND FORECASTING .................149

BACHELOR OF BUSINESS ADMINISTRATION 1

List of Figures and Tables

Figure 1.1: Branches of statistics ............................................................................. 12

BACHELOR OF BUSINESS ADMINISTRATION 2

Table 1.1: Level of measurements with examples ................................................... 15

BACHELOR OF BUSINESS ADMINISTRATION 3

Table 3.7: levels of measurement their appropriate graphical representation .......... 85

BACHELOR OF BUSINESS ADMINISTRATION 4

INTRODUCTION TO BUSINESS STATISTICS 201

Welcome to the Bachelor of Business Administration programme. As part of your

The purpose of this module is to serve as an introduction to Statistical Techniques in

3. Aim of the Module

The main goal of this module is to serve as an introduction to statistical techniques

BACHELOR OF BUSINESS ADMINISTRATION 5

4. Essential (Prescribed) Reading

Your essential (prescribed) reading comprises the following:

4.1. Prescribed Reading

4.2. Recommended Reading

5. How to use the Module

BACHELOR OF BUSINESS ADMINISTRATION 6

BACHELOR OF BUSINESS ADMINISTRATION 7

7. Specific Outcomes and Chapter Alignment

SPECIFIC PROGRAMME OUTCOMES CHAPTER

SO 1 Gather appropriate evidence and apply evidence- Chapter

SO 2 Appreciate and identify relevant sources of data Chapter 1,2

SO 3 Understand forecasting techniques. Chapter 5 and 6

SO 4 Understand statistical software. Chapter 3,5,6

8. Specific Outcomes and Assessment Criteria

SPECIFIC PROGRAMME ASSESSMENT CRITERIA

SO 1 Gather appropriate • Correctly identify ways in which

BACHELOR OF BUSINESS ADMINISTRATION 8

SO 2 Appreciate and identify • Manipulate data optimally in ways

• Accurately identify applications of

SO 3 Understand forecasting • Interpret the output with reference

• Utilise appropriately learned

• Determine the appropriate

• Be able to interpret and apply

• Effectively utilise learned statistical

BACHELOR OF BUSINESS ADMINISTRATION 9

SO 4 Understand statistical • Use Ms Excel to carry out

• Correctly interpret the output with

BACHELOR OF BUSINESS ADMINISTRATION 10

Upon completion of this chapter, the learner should be able to:

Clearly describe the difference between primary and secondary data.

1.1. Introduction to Statistics Definitions

One question that often hounds students is ‘What is statistics?’ A common

1.2. The meaning of statistics

BACHELOR OF BUSINESS ADMINISTRATION 11

1.3. Branches of Statistics

Descriptive statistics: Inferential statistics:

Figure 1.1: Branches of statistics

Source: Davis and Pecar (2013)

1.3.1. Descriptive Statistics