Professional Documents
Culture Documents
Module Guide
Copyright © 2021
MANCOSA
All rights reserved; no part of this book may be reproduced in any form or by any means, including photocopying machines, without the
written permission of the publisher. Please report all errors and omissions to the following email address:
modulefeedback@mancosa.co.za
This Module Guide,
Statistical Techniques in Business (STB7)
will be used across the following programmes:
Preface............................................................................................................................................................... 3
References......................................................................................................................................................143
i
Statistical Techniques in Business
List of Contents
List of Tables
Table 3.4: Extended table to calculate statistics using grouped frequency data ................................................... 43
Table 3.6: Suggested statistics and graphical representation of varying levels of measurement ......................... 46
Table 5.1. Table calculating Laspeyres Price and Quantity Index ........................................................................ 98
Table 5.2. Table of prices and quantities in millions of shares for three different car brands ................................ 99
Table 7.2 Table of demand quarterly, for three consecutive years .................................................................... 132
Figure 3.1. Bar Graph graphically illustrating years of work experience ............................................................... 32
Figure 3.2 Pie chart of the percentage of males and females (Gender) ............................................................... 34
Figure 3.17. Graphical illustration of skewness (Source: Groebner et al., 2011) .................................................. 60
Figure 3.18 Graphical illustration of a normal distribution using IQ as an example (Source: MANCOSA) ............ 61
1 MANCOSA
Statistical Techniques in Business
Figure 4.6 Graphical representation of the theory underlying the sampling distribution of the mean ................... 87
MANCOSA 2
Statistical Techniques in Business
Preface
A. Welcome
Dear Student
It is a great pleasure to welcome you to Statistical Techniques in Business (STB7). To make sure that you
share our passion about this area of study, we encourage you to read this overview thoroughly. Refer to it as
often as you need to, since it will certainly make studying this module a lot easier. The intention of this module is
to develop both your confidence and proficiency in this module.
The field of Statistical Techniques in Business is extremely dynamic and challenging. The learning content,
activities and self- study questions contained in this guide will therefore provide you with opportunities to explore
the latest developments in this field and help you to discover the field of Statistical Techniques in Business as
it is practiced today.
This is a distance-learning module. Since you do not have a tutor standing next to you while you study, you need
to apply self-discipline. You will have the opportunity to collaborate with each other via social media tools. Your
study skills will include self-direction and responsibility. However, you will gain a lot from the experience! These
study skills will contribute to your life skills, which will help you to succeed in all areas of life.
MANCOSA does not own or purport to own, unless explicitly stated otherwise, any intellectual property rights in or
to multimedia used or provided in this module guide. Such multimedia is copyrighted by the respective creators
thereto and used by MANCOSA for educational purposes only. Should you wish to use copyrighted material from
this guide for purposes of your own that extend beyond fair dealing/use, you must obtain permission from the
copyright owner.
3 MANCOSA
Statistical Techniques in Business
B. Module Overview
The purpose of this module is to serve as an introduction to Statistical Techniques in Business to
orientate and equip you with dealing with basic business statistics in the real world, and to facilitate and
inform sound business decisions
This module will cover basic managerial statistics like basic descriptives, and explores other business
statistics tools such as Time Series Forecasting, Index Numbers, Seasonal Indices and Probability
These basic managerial statistics constitutes a key contributor to your programme efficacy by equipping
you for the realities of the business world
The module is a 20 credit module at NQF level 7
Statistics is a subject that is best learned through doing. Thus, the suggested order of learning for this
module is to first grapple with the theoretical underpinnings and uses of statistical tests, thereafter work
through the examples and the activities provided. Try to work towards your own answers, and keep
practicing
Perform statistical analyses in practice Explain why statistics is important within the
and extract additional information from management context
business data;
Manipulate gathered (grouped and Identify ways in which managers need to rely on
ungrouped) data through various business statistics
statistical methods to generate useful
Identify ways in which statistics can be useful within
information to support management
business environments
decisions;
Prepare and interpret reports See the relevance of particular types of statistical tests,
expressed in statistical terms; and their use within particular contexts, and in relation to
particular business problems
MANCOSA 4
Statistical Techniques in Business
Assess the validity of statistical findings Calculate the mean, median, mode and standard
and the relevance and reliability of deviation on both raw, and grouped data
results
Interpret the output with reference to the scenario
D Acronyms
IV Independent Variable
DV Dependent Variable
The purpose of the Module Guide is to allow you the opportunity to integrate the theoretical concepts from the
prescribed textbook and recommended readings. We suggest that you briefly skim read through the entire guide
to get an overview of its contents. At the beginning of each Unit, you will find a list of Learning Outcomes and
Associated Assessment Criteria. This outlines the main points that you should understand when you have
completed the Unit/s. Do not attempt to read and study everything at once. Each study session should be 90
minutes without a break
5 MANCOSA
Statistical Techniques in Business
This module should be studied using the prescribed and recommended textbooks/readings and the relevant
sections of this Module Guide. You must read about the topic that you intend to study in the appropriate section
before you start reading the textbook in detail. Ensure that you make your own notes as you work through both
the textbook and this module. In the event that you do not have the prescribed and recommended
textbooks/readings, you must make use of any other source that deals with the sections in this module. If you
want to do further reading, and want to obtain publications that were used as source documents when we wrote
this guide, you should look at the reference list and the bibliography at the end of the Module Guide. In addition,
at the end of each Unit there may be link to the PowerPoint presentation and other useful reading.
G. Study Material
The study material for this module includes tutorial letters, programme handbook, this Module Guide, a list of
prescribed and recommended textbooks/readings which may be supplemented by additional readings.
In addition to the prescribed textbook, the following should be considered for recommended books/readings:
Durrheim, K., and Tredoux, C. (2012). Numbers, Hypotheses & Conclusions: A Course in Statistics for
the Social Sciences. (2nd ed.). Cape Town: UCT Press.
MANCOSA 6
Statistical Techniques in Business
I. Special Features
In the Module Guide, you will find the following icons together with a description. These are designed to help you
study. It is imperative that you work through them as they also provide guidelines for examination purposes.
LEARNING The Learning Outcomes indicate aspects of the particular Unit you
OUTCOMES have to master.
A Think Point asks you to stop and think about an issue. Sometimes
THINK POINT you are asked to apply a concept to your own experience or to think of
an example.
You may come across Activities that ask you to carry out specific
tasks. In most cases, there are no right or wrong answers to these
ACTIVITY
activities. The purpose of the activities is to give you an opportunity to
apply what you have learned.
At this point, you should read the references supplied. If you are
READINGS unable to acquire the suggested readings, then you are welcome to
consult any current source that deals with the subject.
PRACTICAL
Practical Application or Examples will be discussed to enhance
APPLICATION
understanding of this module.
OR EXAMPLES
7 MANCOSA
Statistical Techniques in Business
MANCOSA 8
Statistical Techniques in Business
Unit
1: Introduction to
Business Statistics
9 MANCOSA
Statistical Techniques in Business
1.3. Types of statistics Define and differentiate between the types of available statistics
Prescribed Readings
Weiers, R. M. (2011). Introduction to Business Statistics. 7th ed. South
Western, Cengage Learning.
MANCOSA 10
Statistical Techniques in Business
Numbers are useful in that they not only provide a clear, precise and objective measure, but they can also be
manipulated via calculations in order to arrive at particular answers. These answers can be used to increase our
understanding of things, and thus more accurately inform decisions.
Statistics as a field is highly variable, and can range from highly superfluous, to highly useful. For example, in
2011 the Pew Internet & American Life Project study found that 8% of internet users do not complete their
searches using search engines. This could possibly imply that they are completing searches by typing URL’s in.
An important finding by the University of Connecticut in Mansfield debunked the cholesterol myth. Traditionally,
cholesterol was attributed to saturated fats clogging arteries, leading to coronary heart disease. However, it was
found that clogged arteries may be a result of bacteria, not diet. So you can see, the use of statistics has varied
importance, and use, including the political, medical, educational and social science arenas, but in business it is
a formidable and invaluable tool in order to effectively assimilate and use information in ways as to facilitate
sound business decisions.
For example, in marketing research, our behaviour as consumers will generate statistics that will inform
companies about what products can be retained, discontinued or modified.
11 MANCOSA
Statistical Techniques in Business
Think Point
Inferential statistics, on the other hand, move beyond merely describing data to makes inferences or
generalisations about the population from which the sample was drawn. For example, based on surveying 80%
of Netflix user preferences, decide to cancel a show, under the assumption that the entire Netflix watching
population hold similar views regarding that series.
Activity
1.1. Research reveals that men are twice as likely to watch soccer on TV in
South Africa. Is this statistic descriptive or inferential? Why?
1.4 Conclusion
This chapter provides an introduction to statistics, define the role and need for statistics, with special
To business contexts
Summary
This chapter serves to describe what is inferred when using terms such as statistics, and aimed to create a case
for statistics, particularly in business contexts.
MANCOSA 12
Statistical Techniques in Business
13 MANCOSA
Statistical Techniques in Business
Unit
2: Types of Data
MANCOSA 14
Statistical Techniques in Business
2.4 Variables Understand what is inferred by the term variable, and discuss
the various types and characteristics of variables
2.5 Levels of Measurement List and understand the four levels of measurement
15 MANCOSA
Statistical Techniques in Business
Continuous variables can take on any value on the number line from negative to positive infinity, and inlcludes
decimals and fractions
Mutually exclusive means that belonging to one category, they are automatically excluded from belonging to
another
Independent variables - variables which are under the control of the researcher, and are manipulated in order
to bring about changes to dependents variables.
Dependent variables - outcome variables, are observed to see if they change when the Independent variable
changes
Durrheim, K., and Tredoux, C. (2002). Numbers, Hypotheses & Conclusions: A Course in
Statistics for the Social Sciences. Cape Town: UCT Press.
MANCOSA 16
Statistical Techniques in Business
TYPES OF DATA
g
Processin
MANAGERS
Presentation Use information to
INFORMATION
Useful form make decisions
17 MANCOSA
Statistical Techniques in Business
When we are preparing data into useable information, we generally take four steps:
1. Sampling
2. Data Collection
3. Processing data using descriptive and inferential statistics
4. Presenting results
2.2 Sampling
We sample because often we do not have sufficient resources like time or money, in order to collect data from
everyone in the population. Please note that when we refer to population, we do not mean everyone in South
Africa – we referring rather to all of the people we wish to say something about. For example, if I wished to say
something about MANCOSA employees, my target population are all MANCOSA employees from which I draw
the sample from.
When drawing a sample – you need to consider:
Who will be surveyed? (The Sample)
How many people will be surveyed? (Sample Size)
How should the sample be chosen? (Sampling Methodology)
At all times, you should guard against bias and error.
There are two approaches to sampling, probability and non-probability sampling.
MANCOSA 18
Statistical Techniques in Business
Say for example you wished to draw a simple random sample from your place of
work. Say your workplace has 467 employees. A comprehensive list of each
employee should be compiled, and a key identifier assigned to each. Thereafter,
you place all the numbers in a hat, and randomly draw numbers. Alternatively, you
could utilise a computerised random number generator, or a table of random
numbers to whom the questionnaire should be sent.
19 MANCOSA
Statistical Techniques in Business
An example would be, because it is impossible to list all of the people in, say for
example, KwaZulu Natal, you could sample the number of people over an x km
radius. Those people represent a cluster of subjects, similar in certain
characteristics associated with living within proximity of each other. Your clusters
need to be randomly chosen from the population of clusters, and all members of
the cluster sample need to be included.
Non-probability, or non-random sampling is one in which random sampling is either not possible, not
permissible, or not required. In non-random sampling, not every element in the target population has an
equal chance of being selected.
MANCOSA 20
Statistical Techniques in Business
You are a market researcher, and are interested in determining people’s preference for
coffee brand. You pop down to your nearest shopping centre and interview shoppers as
they walk past.
21 MANCOSA
Statistical Techniques in Business
There are several means by which to collect data. Some will involve primary data, and others, secondary data.
Primary data refers to data that has been collected for the first time by the researcher themselves, captured for
the first time, with a particular purpose in mind. Secondary data is data that has already been collected and is
already been in existence for purposes other than the original study and accessed for the purposes of another
study. Primary data is useful in that it is directly related to the research problem, and the researcher themselves
expressed greater control over the data collection process, and can therefore ensure better accurcy and
credibility over the manner with which data was collected. Despite these benefits though, primary data can be
somewhat cumbersome to collect, often taking time and eliciting a poor response rate. It can also generally cost
more to collect than data that is already in existence.
Secondary data has the advantage of already being in existence with short access times and incurs less
expenditure, but because it is already in existence and was not collected for the purposes of the study, it may not
be entirely relevant. Secondary data may also be dated, and assessments regarding its accuracy and reliability
may be difficult to ascertain.
If the researcher intends on collecting primary data, quantitative data collection instruments include
questionnaires, rating scales and observations. Secondary sources of quantative data may include market
research figures, financial statements, census data, government data, economic indexes, epediomological and
population statistics to mention a few. Qualitative primary data collection instruments include interviews (often
face-to-face, semi-structured) interviews, qualitative observations, and focus groups.
As previosuly mentioned, there exist several means by which we collect primary data.
MANCOSA 22
Statistical Techniques in Business
2.3.1. Questionnaires
A questionnaire is most often used in research to collect data from several people using a standard or ordered
list of questions in written form. A researcher is able to email, personally adminster, telephonically adminster or
post questionnaires. Although questionnaires are useful in collecting large amounts of data in a relavtively short
period of time, researchers often battle with poor response rates, and they are highly infelxible in that you are
unable to ask follow-up questions, or seek clarification of responses. Reliability of answers to questionnaires are
inflexible, and highly influenced by wording and understanding of the questions therein, the layout, level of
literacy of respondents and so forth.
2.3.2. Observation
This method of data collection involves directly observing and counting instances of an event, taking
measurements, or determining how things work, or have changed as a result of an intervention of sort.
Observation can be both a qualitative or quantitative method of data collection. For quantitative observations
the researcher is first required to select that aspect of behaviour they wish to observe, then define the behaviour
characteristic of that behaviour, develop a system to quantify observations and procedures to record behaviour.
Whereas the observation involved in qualitative research is a systematic, structured observation, qualitative
observers aim to describe a particular behaviour within the setting in which it naturally occurs, and observation
tends to take place over extended periods of time (more so than quantitative observation). Unlike quantitative
observation, there is no previously determined hypotheses, and whereas quantitative observation involves the
use of checklists and other tools developed prior to investigation, qualitative researchers rely on narratives and
words about the setting, behaviours and interactions.
Direct observation is made to ascertain the extent to which a particular behaviour is demonstrated, and the
number of occurrences is recorded. Before the behaviour can be recorded – there is need to identify the
behaviour of interest, define markers thereof and devise a procedure to identify, categorise, and record it.
Observation is useful in that it provides a record of behaviour, as it occurs, without having to ask test subjects
what they feel or think, and is therefore particularly useful for studies involving young children who cannot as yet
effectively articulate and interpret their feelings or communicate. In addition, observation can occur in natural
settings, like the classroom or playground.Although directly observing and gathering data is reliable and relatively
easy to do, human observers are prone to get tired, make mistakes, get distracted and misinterpet what they’re
observing.
2.3.3. Interviews
Interviews are often the most widely used Qualitative data collection technique, and usually involves a
conversation between two people. It allows for the collection of opinions, beliefs, and feelings about situations,
using words, and that cannot otherwise be collected through, for example, observation. The difference between
a normal, day to day, conversation and an interview is that interviews are merely conversations with a particular
topic or focus in mind. Interviews are often recorded, transcribed and words are analysed in order to find
23 MANCOSA
Statistical Techniques in Business
common themes and findings across the data. The advantages of interviews is that they allow the respondent to
relax (unlike questionnaires where they feel like they are being tested) and you can often gain in-depth insight
and understanding into the phenomenon at hand. They are highly flexible in that you can ask probing and follow-
up questions based on interviewee answers, and can provide clarification where necessary.
2.4. Variables
Variables are constructs or characteristics that can take on different values or scores. Variables are created by
converting constructs into a measurable form. Researchers study variables, and are particularly interested in the
relationships that exist between them. The variable under the control of the researcher, and which he/she
manipulates is known as the independent variable. The observed, measured or outcome variable is known as
the dependent variable.
There are two types of variables, namely, qualitative and quantative variables. Qualitative variables indicate that
the person or object belongs to a particular category. Examples of categorical variables include gender
(male/female), in posession of a car, or not. Quantative variables, on the other hand, can be either discrete or
continuous. Similarly, categorical variables represent qualitative data, and take on names or labels, whereas
quantitative dtaa is essentially numerical.
Discrete variables refers to whole numbers (integers), whereas continuous variables can take on any value on
the number line from negative to positive infinity, and inlcludes decimals and fractions. Height is a good example
of a continuous variable. Discrete variables represent categories, and are essentially whole numbers when
represented by numbers. They are mutually exclusive in the sense that by belonging to one category, they are
automatically excluded from belonging to another. For example, driver’s licences, a person may be in possession
of a valid driver’s licence, or not. They cannot be in possession of a valid drivers licence, whilst at the same time
not possessing one. It is important to consider the type of variable data is, as this determines the level and type
of statistics that can be run on them. For categorical data, you would produce frequencies and percentages, and
measures of centrality like the mode.
In research we are particularly interested in Independent and Dependent variables. Independent variables are
those variables which are under the control of the researcher, and are manipulated in order to bring about
changes to dependents variables. Dependent, are outcome variables, are observed to see if they change when
the Independent variable changes. In other words, imagine a researcher were to be interested in the effect of
training on performance. In this case, we have two variables:
Independent variable (IV) - Training
Dependent variable (DV) – Performance
The researcher divides subjects into two groups, one of which received training and the other no training. He
then opbserves and compares performance (DV) between the groups to see if there is any difference in
performance as a result of having received training.
MANCOSA 24
Statistical Techniques in Business
25 MANCOSA
Statistical Techniques in Business
Ratio
– Highest level of measurement
– True zero, numbers mean something, it is numeric data with a zero origin
– Has equal intervals
– Can take on any number on the number line, from negative to positive infinity, including fractions and
decimals
e.g. Age, distance, time, mass, sales, units and income are examples of ratio data
2.6. Constructs
Constructs refer to phenomena that are not directly observable, it is an abstract idea, characteristic or subject
matter that one wishes to measure. Intelligence is an example of a construct. It is not directly observable, but in
order to account for differences in scholastic performance, scientists or researchers came up with the idea that
this thing, called intelligence, accounts for this difference. Other examples of constructs include motivation,
school and reading readiness, creativity, emotional intelligence and so forth. Constructs are defined according to
their general meaning and characteristics, and includes how they will be measured, or manipulated in a study.
When we define a construct by their general meaning, or a formal definition much like the ones provided in a
dictionary, this is known as a constitutive definition. If we were to look at the constructive definition of intelligence,
it may be defined as the ability to acquire and apply knowledge and skills, and although this definition is useful in
conveying the general meaning of the construct, it is insufficient for the purposes of research because it lacks the
level of specificity required to replicate the study.
Whereas if we were to define a construct by the operations by which they will be measured it is known as an
operational definition. It defines how researchers are to measure a construct, and determine how to collect the
data relevant to that observable event. It serves the purpose of delimiting a term to ensure that everyone knows
what is meant by that term, and provide indicators of constructs.
ACTIVITY 2.1.
1. In each of the following examples, think about the level of measurement of the
data, and if they are categorical/discrete or continuous:
a) A persons gender
b) A persons height (in cm)
c) A student’s business stats average for the semester
d) The number of children in a class
e) An index of learner motivation aggregated from three measures
f) The items on a 5-point Likert scale ranging from 1 – strongly agree, to 5 –
Strongly disagree
MANCOSA 26
Statistical Techniques in Business
2. Define what a variable is, and provide an example to illustrate your answer
3. Define a construct and provide an example to illustrate your answer
4. What is the difference between a qualitative variable, and a quantitative variable?
2.7 Conclusion
This chapter provided a brief introduction to the types and characteristics of data that need to be considered
when deciding on the most appropriate statistics to run in order to make sense of data in meaningful and
accurate ways.
Variables are constructs or characteristics that can take on different values or scores. Age, sex, business
income and expenses, country of birth, capital expenditure, class grades, eye colour and vehicle type are
examples of variables. We sometimes differentiate between two types of variables, Independent and
Dependent variables – IV = Test scores, DV = Hours slept.
Constructs refer to phenomena that are not directly observable, it is an abstract idea, characteristic or
subject matter that one wishes to measure. For example, motivation. Motivation is not directly observable, so
wee create indices to measure this unobservable phenomenon “motivation”.
Qualitative variables indicate that the person or object belongs to a particular category, whereas quantative
variables, on the other hand, can be either discrete or continuous. So qualitative variables are only categorical,
and use numbers to represent categories, they are very limited in their mathematical and statitical use as a
result. Whereas actual numbers can be used to illustrate quantitative variables, and therefore possess more
mathematical; and statistical properties because of their numeric nature.
27 MANCOSA
Statistical Techniques in Business
Unit
3: Management Statistics
MANCOSA 28
Statistical Techniques in Business
3.2 Presentation of Data Understand and discuss the various ways in which data can be
presented
3.4 Measures of Dispersion or Define and understand what is implied by spread, and why it’s
spread important
3.5 The shape of the distribution Determine the need to assess the shape of distributions
3.6 Inferential Statistics Define inferential statistics, and when they are necessary
29 MANCOSA
Statistical Techniques in Business
MANCOSA 30
Statistical Techniques in Business
3.1 Introduction
In previous chapters, it was mentioned that statistics is the science of collecting, analysing and presenting data in
informative and informational ways. Statistical analysis of numerical data can be done descriptively, and/or
inferentially. Descriptive statistics involves describing the data summatively through looking at the central
tendency, dispersion/spread, and shape of the distribution. Inferential statistics on the other hand involves
analysing sample data in such a way as to make inferences or say something about the population from which
the sample is drawn. As such, this chapter will cover the various methods with which data can be presented, and
analysed.
For example, we are trying to asses the overall number of years at an organisation, and receive raw data as
follows:
0.5 0.5 0.5 1 1 1 1 1 2 2 3 3 3 2 2 3
2 3 4 4 5 5 4 4 5 4 5 4 6 6 7
7 7 8 6 7 6 7 9 10
An example of a frequency table output of working experience in SPSS (statistical software for data analysis) is
as follows:
31 MANCOSA
Statistical Techniques in Business
From the table above, 20% of respondents have been working for between 0-1 years, 25% 2-3 years, 25% 4-5
years, and 30% more than 6 years (n = 40).
25 10 10
8
20
15
10
The frequency distribution indicates a moderate left skew, with a bulk of frequencies falling to the right i.e. more
than 16 years. Skewness and kurtosis will be discussed a little further on in the chapter. This frequency
distribution is a graphic presentation of categorical data, and as such is known as a Bar Chart. The two most
commonly used charts for presentations are bar charts and pie charts – both of these very clearly and simply
convey a large amount of information. We will look at the bar chart first. A bar chart consists of a series of bars,
the length of each bar representing the value of the variable being plotted. The bars can be either drawn
vertically or horizontally. When we graphically present continuous data, the chart used then becomes a
histogram, and there are no gaps between the rectangles used to represent categories as histograms deal with
continuous data.
MANCOSA 32
Statistical Techniques in Business
When we convert raw data into frequency distributions, there are a number of decisions we need to make when
grouping the data.
You need to fist decide on the class. The class refers to each category of the frequency distribution. Class
limits refer to the boundaries for each class, and determines which scores fall within that class. The class
interval refers to the width of each class, and delineates the lower and upper limits. When we are trying to
ascertain the optimal class width we can use the following formula:
𝐻𝑖𝑔ℎ𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑟𝑎𝑤 𝑑𝑎𝑡𝑎−𝐿𝑜𝑤𝑒𝑠𝑡 𝑣𝑎𝑙𝑢𝑒 𝑖𝑛 𝑡ℎ𝑒 𝑑𝑎𝑡𝑒
Approximate Class width = 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑑𝑒𝑠𝑖𝑟𝑒𝑑 𝑐𝑙𝑎𝑠𝑠𝑒𝑠
3.2.2. Cross-tabulations
Whereas frequency tables are used to summarise a single categorical variable, cross-tabulations are used to
summarise the relationship between two categorical variables (also called a contingency table). A cross-
tabulation (or crosstab for short) is a table that depicts the number of times each of the possible category
combinations occurred in the sample data.
From the table above, you can see how the left-hand column of the table shows task grades, and the top row
refers to the extent to which the learner successfully performed the task grade. The cells contain the frequency
with which that certain phenomenon where they meet a particular task grade, and level of satisfaction, occurs.
33 MANCOSA
Statistical Techniques in Business
11
89
Male Female
Figure 3.2 Pie chart of the percentage of males and females (Gender)
From the pie chart above, it is evident that females constitute the majority when compared to the sample as a
whole.
MANCOSA 34
Statistical Techniques in Business
Activity 3.1
Male Female
Leisure/holiday 1.5 1.1
Shopping - business 3.1 2.2
Shopping - personal 35 39.6
Shopping -spectator 1.1 0.5
Visit friends/family 20.8 15.3
Medical 3.2 5.2
Wellness (Spa, health farm) 2.2 0.2
Religious 5 6.5
Wedding 1.9 2.3
Other 26.2 27.1
3.1.5 Using the example above to illustrate your answer – provide a definition for
mutually exclusive and exhaustive.
3.1.6 If you were to make sense of this data, what type of statistical activities
would you perform?
51 55 92 71 87 73 52 62 40 54
40 41 31 16 38 11 23 25 23 9
3.1.8 a. Construct a frequency distribution
3.1.8 b. Is your distribution a histogram, or a bar Chart? Why?
35 MANCOSA
Statistical Techniques in Business
In the example below, we wished to ascertain the rate of productivity pre- and post-2015. Company x noticed
that the productivity rates were dropping subsequent to acquisition in 2015. Management hypothesised that due
to the acquisition, motivation, morale and performance was dropping. As a result, they managed to perform
multiple teambuilding sessions, increased communication and transparency to all employees, secure more trust
and decrease overall uncertainty in staff, and decided to measure success of their initiatives by measuring
productivity pre=- and post-intervention.
Year Productivity
2014 89%
2015 76%
2016 80%
2017 90%
2018 93%
To illustrate the relationship between the year and the pass rate, a scatterplot was produced.
Productivity
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
2013.5 2014 2014.5 2015 2015.5 2016 2016.5 2017 2017.5 2018 2018.5
3Figure 3.3: Scatterplot of productivity by year
MANCOSA 36
Statistical Techniques in Business
The scatterplot clearly illustrates a decline in productivity in 2015, and picks up again steadily for the
next three years subsequent to implementing changes in response to the perceived negative effects of
the acquisition. Think of the independent variable (x-axis) as the ‘cause’ and the dependent variable (y-axis) as
the ‘effect’. The scatter diagram allows us to observe two characteristics of the relationship between year
(x) and productivity (y). Because these two variables move together, i.e. their values tend to increase together
and decrease together; there is a positive relationship between the two variables.
If we feel the value of one variable (such as productivity) depends to some degree on the value of the
other variable (such as the intervention), the first variable (productivity) is the dependent variable and is
plotted on the vertical or y-axis. The second variable is the independent variable and is plotted on the x-
axis. The pattern of a scatter diagram provides us with information about the relationship between two variables.
25
20
Y 15
10
0
0 5 10 15 20 25
X
Non-linear relationship
100
80
60
Y
40
20
0
0 5 10 15 20 25
X
37 MANCOSA
Statistical Techniques in Business
Non-linear relationship
25
20
15
Y
10
0
0 5 10 15 20 25
X
No relationship
30
25
20
X 15
10
0
0 5 10 15 20 25
Y
If we wished to determine the overall trend, we can fit a “best-fit” line to the data which best minimises the
average distance between each point and the overall trend. By fitting a best fit line, we can determine how
variables are related. We can also develop a straight line equation to illustrate the relationship, and make future
predictions. This will be covered in further detail later in the Study Guide. e.g.
MANCOSA 38
Statistical Techniques in Business
25
20
15
Y
10
0 y = -1.4448x + 28.649
0 5 10 15 20 25
-5
X
In addition to frequencies and counts, there are three general ways in which we can summarise data:
1. Central tendency, or the single value that best describes the sample (Mean, Median, Mode)
2. The spread of the distribution (Variance and Standard Deviation)
3. The shape of the distribution (Skewness and Kurtosis)
When data exists alone, they make very little intuitive sense to us. As such, they need to be processed and
converted to an intelligible form, and by looking at the highest, most frequently occurring scores, the shape and
spread, we are able to make a good start at manipulating data into more useful and understandable forms.
Central tendency describes the most typically occurring values. There are three commonly used numerical
measures of central tendency or central location of a dataset: the mean, the median and the mode. You are
expected to know how to compute each of these measures for a given dataset. Moreover, you are expected to
know the advantages and disadvantages of each of these measures, as well as the type of data for which each
is an appropriate measure.
An easy way to remember the difference between the mean, median and mode is through the following useful
riddle:
Hey diddle diddle,
The median is the middle
You add and divide for the mean
The mode is the one that appears the most
And the range is the difference in between
39 MANCOSA
Statistical Techniques in Business
3.3.1.1. Mean
The formula for calculating the mean of a set of individual scores is:
𝑡ℎ𝑒 𝑠𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑜𝑓 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 ∑𝑛
𝑖=1 𝑥𝑖
x̅ = 𝑡ℎ𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠
OR x̅ = 𝑛
To illustrate, take for example the marks received for a spelling test for seven learners:
The mean is the sum of all data divided by the total number of scores (n = 7)
(48 + 52 + 64 + 64 + 75 + 80 + 85)
7
Mean = 66.86%
MANCOSA 40
Statistical Techniques in Business
The problem inherent and as depicted in the illutsration above is that the mean is susceptible to extreme values,
that pull the mean up, or down. So whereas the majority of the scores lie to the left of the see-saw, the nature of
the extreme right value is pulling the mean up. One manner in which to overcome this problem is through the use
of the median.
3.3.1.2. Median
The median would be the score that falls in the middle of an ordered dataset and has as many values that fall
above, as it does below it. So you need to order the dataset from smallest to largest (or largest to smallest), and
the median will be the score occuring in the middle of the ordered dataset. If there are an even number of scores
in the dataset, you will average the two middle scores. In the example above:
3.3.1.3. Mode
The mode would be the most frequently occurring score, or the number that occurs the most frequently. In the
case of the marks above, 64% occurs twice, so the mode is 64
Activity 3.2.
3.2.3. Group the data into a grouped frequency table with a lower limit of 30
and a class width of 10.
41 MANCOSA
Statistical Techniques in Business
These are the number of cars sold monthly at 20 various dealerships throughout Durban for the month of
February. These numbers represent raw scores. If I were to group them based on their frequency, it would look
as follows:
Interval Frequency
10 - 19 2
20 - 29 5
30 - 39 6
40 - 49 4
50 - 59 3
Σ 20
Table 3.3: Grouped frequencies of cars sold
The manner in which you can convert data from raw, to grouped data is by counting the number of numbers
falling within each category or interval. For example, the number of scores that fell between 10-19 were 2, i.e. 17
and 19. The interval refers to the range of scores that you are summarising data into, and you will always
include the first number in the interval i.e. 10 when determining the interval width. The interval width is the
actual number of values that each interval consists of, and in this case the width is 10 units. The width will remain
the same for each row.
Because we are now dealing with grouped data, rather than individual scores, the formula for calculating the
mean, median and mode changes. Because we are dealing with tables, the starting point is to create an even
bigger table (below). Then:
1. Calculate the mid-point: The mid-point is then calculated by adding the lower limit to the upper limit of each
interval, and dividing that number by two. For example, and using the table below, the mid-point of the first row is
(10+19)/2 = 14.5. We know the width is 10, so you can add ten to arrive at each of the subsequent mid-points.
2. Multiple each column by the next: The mid-point is annotated using an x, and frequency f. Multiply the
frequency by the mid-point, then multiply the mid-point by fx to give you fx².
3. Calculate the cumulative frequency: The cumulative frequency is the sum of all of the frequencies line-by-
line, adding each frequency cumulatively to the next.
4. Summate the scores: Work out the summation (Σ) of the fx and fx². Check that the cumulative frequency
equals the summation of the frequency column to ensure all data has been accounted for.
5. Determine the modal class and median interval: This is the row with the highest frequency, or the row into
𝑛
which the 2 𝑡ℎ observation falls. If you can, highlight this row, as it will inform all subsequent substitutions. The
MANCOSA 42
Statistical Techniques in Business
Table 3.4: Extended table to calculate statistics using grouped frequency data
Interval Frequency (f) Midpoint (x) fx fx² Cum. Freq.
10 - 19 2 14.5 29 420.5 2
20 - 29 5 24.5 122.5 3001.25 7
30 - 39 6 34.5 207 7141.5 13
40 - 49 4 44.5 178 7921 17
50 - 59 3 55.5 163.5 8910.75 20
Σ 20 700 27395
Where:
700
x̅ = 20
x̅ = 35 cars sold
Where:
𝑜𝑚𝑒 = lower limit of the median class
𝑓𝑚𝑒 = absolute frequency of the median interval
𝑓 (<) = cumulative absolute frequency of the interval before the median interval
𝑛 = sample size
c or i = interval or class width
𝑀𝑒 = 35 cars sold
43 MANCOSA
Statistical Techniques in Business
Where:
𝑂𝑚𝑜 = lower limit of the modal class
𝑓𝑚 = frequency of the modal class
𝑓𝑚−1 = the frequency of the class before or above the modal class
𝑓𝑚+1 = the frequency of the class after or below the modal class
c or i = interval or class width
Check that the answers you’ve calculated for all three (the mean, median and
mode) are roughly the same. If they are very different – you may have made a
mistake with your calculations.
As a rule of thumb, when central tendency is reported, so should spread. Once you have your central tendency it
is easy to ascertain the degree with which cases are dispersed around it.
Activity 3.3.
MANCOSA 44
Statistical Techniques in Business
Activity 3.4.
3.4.1. Expand the table to include the mid-point, fm, fm² and the cumulative
frequency
3.4.2. Determine the mean and modal age
Once again, the level of measurement will determine how we gauge the spread of a distribution (GAO, 1992:41)
4 Table 3.5: Suggested statistics for varying levels of measurement
Use of Measure
Index of Dispersion Range Interquartile Std Deviation
Range
Nominal Yes No No No
Ordinal Sometimes Sometimes Yes No
Interval/Ratio No Yes Yes Yes
45 MANCOSA
Statistical Techniques in Business
The best ways to illustrate the spread of the distribution for each level of measurement is as follows:
5 Table 3.6: Suggested statistics and graphical representation of varying levels of measurement
Level of Representation
measurement
Nominal Table or frequency distribution showing frequencies
Ordinal Tables/frequency distribution, but choosing a single measure is problematic. Use
interquartile range if single measure is chosen.
Interval/Ratio Graphic dispersion, standard deviation provided cases have an approximately normal
distribution.
When there is a possibility that the underlying distribution may not be normal, interquartile range is a good
alternative.
Consider two groups of data:
Dataset A Dataset B
65 42
66 54
67 58
68 62
71 67
73 77
74 77
77 85
77 93
77 100
Computed measures of central location
Mean = 71.5 Mean = 71.5
Median = 72 Median = 72
Mode = 77 Mode = 77
Figure 3.10. Histogram illustrating a roughly normally Figure 3.11. Histogram illustrating a roughly shaped
distribution around mean 4.72, and normally shaped distribution around mean 9.22 with a
small deviation of 2.207 a larger deviation of 4.463
MANCOSA 46
Statistical Techniques in Business
Figure 5:3.11. Histogram illustrating a roughly shaped Figure 6:3.10 Histogram illustrating a roughly
normally shaped distribution around mean 9.22 with a normally distribution around mean 4.72, and
a larger deviation of 4.463 small deviation of 2.207
If you look at the distributions for each company packing their containers:
Company A: The mean is 4.721 Tonnes with a standard deviation of 2.21
Company B: The mean is 9.219 Tonnes with a standard deviation of 4.46
Company C: The mean is 10.642 Tonnes with a standard deviation of 1.80
47 MANCOSA
Statistical Techniques in Business
Company A has the smallest mean tonnage of goods packed into containers. So you can see that although
company B has of the highest means, the also have the highest standard deviation. Which means they are the
most inconsistent when it comes to packing containers efficiently. Whereas Company C is most efficient as it not
only has the highest mean, but also has the smallest standard deviation. So of all the companies, Company C is
the most economical and cost saving.
3.4.2. Range
The range measures the difference between the highest and lowest values in a dataset. It is considered a rough
measure of spread as it depends on only two values. It is affected by outliers or extreme values and gives no
indication of the clustering of the data.
Formula: Range for ungrouped data:
EXAMPLE
For the data in a previous example:
Dataset A Dataset B
𝑟𝑎𝑛𝑔𝑒 = 77 − 45 = 32 𝑟𝑎𝑛𝑔𝑒 = 100 − 42 = 58
Table 3.8 Table indicating the range calculations for datasets A and B
The ranges indicate that the data in dataset B are more widely spread than that in dataset A.
Notation:
s = standard deviation of a set of sample scores.
σ = standard deviation of a set of population scores.
s2 = variance of a set of sample scores.
σ2 = variance of a set of population scores.
The variance (𝑠 2 ) measure the average squared deviation from the mean for a dataset.
Formula: Variance for ungrouped data:
∑(𝑥 − 𝑥̅ )2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) =
𝑛−1
MANCOSA 48
Statistical Techniques in Business
or
∑ 𝑥 2 − 𝑛𝑥̅ 2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) =
𝑛−1
Where:
x is each value of the dataset
𝑥̅ is the mean of the dataset
n is the sample size
When dealing with ungrouped data, the following steps should be followed:
1. Calculate the mean: Find the mean of the dataset
2. Subtract the mean from each score: Use a table to help you subtract the mean from each individual score,
create a new column for the result
3. Square it: Square the newly acquired figure from step two
4. Summate it: Add all of the squared deviations together
5. Divide it: Divide this figure by n-1
Let’s look at a sample of 7 the ages of cars sold in the previous example of car sales:
7 9 10 12 13 15 18
1. Calculate the mean:
∑𝑥 7 + 9 + 10 + 12 + 13 + 15 + 18
𝑥̅ = =
𝑛 7
84
𝑥̅ =
7
𝑥̅ = 12 𝑦𝑒𝑎𝑟𝑠 𝑜𝑙𝑑
Add the
column
Step 4:
Σ 0 84
49 MANCOSA
Statistical Techniques in Business
∑(𝑥 − 𝑥̅ )2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) =
𝑛−1
84
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) =
7−1
84
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) = = 14
6
∑ 𝑥 2 − 𝑛𝑥̅ 2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) =
𝑛−1
Step 1: Compute the mean: Calculate the mean of all the raw scores
Step 2: Square all of the raw scores
Step 3: Summate the result of the squared values
Step 4: Substitute it into the formula
For illustrative purposes:
Step 1: Calculate the mean:
∑𝑥 7 + 9 + 10 + 12 + 13 + 15 + 18
𝑥̅ = =
𝑛 7
84
𝑥̅ =
7
𝑥̅ = 12 𝑦𝑒𝑎𝑟𝑠 𝑜𝑙𝑑
Step 2: Square each raw
score (x)
Car age x²
7 49
9 81
10 100
12 144
13 169
15 225
Step 3: Add all
values together
18 324
the squared
Σ 1092
MANCOSA 50
Statistical Techniques in Business
∑ 𝑥 2 − 𝑛𝑥̅ 2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) =
𝑛−1
∑ 1092 − (7)(12)2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) =
7−1
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑠 2 ) = 14
Practical Application
Calculate the variance of the sample scores: 2, 3, 5, 6, 9, 17
Both variance formulae are used in this example, with all the necessary table columns included
for both formulae.
First it is necessary to calculate the mean:
∑x 42
x̅ = = =7
n 6
x̅ 2 = 49
x 𝐱̅ x - 𝐱̅ (x - 𝐱̅²) x²
2 7 -5 25 4
3 7 -4 16 9
5 7 -2 4 25
6 7 -1 1 36
9 7 2 4 81
17 7 10 100 289
Σ 42 0 150 444
or
51 MANCOSA
Statistical Techniques in Business
For grouped data, the original dataset values are changed to the interval midpoints.
Formula: Variance for grouped data:
(∑ fx)2
∑ fx 2 −
variance (s 2 ) = n
n−1
Where:
f is the interval frequency
x is the interval midpoint
n is the sample size
The steps for calculating the variance for grouped data is as follows:
Step 1: Calculate the mean
Step 2: Determine the mid-point
Step 3: Multiple the mid-point by the frequency
Step 4: Square the mid-point
Step 5: Multiply the frequency by the squared mid-point
Step 6: Summate the columns f, fx and fx²
Step 7: Substitute the summed values it into the formula
Whenever you see the ∑ sign in an equation, you will need a column in the
table for the expression immediately following the ∑ sign. Consider also having
columns for each of the components of the expression.
MANCOSA 52
Statistical Techniques in Business
Remember – when dealing with grouped data, it is important to draw a table, as illustrated as follows:
Step 1: Calculate the mean:
∑ 𝑓𝑥
x̅ = 𝑛
Step 4: Square the mid-
5080 point
x̅ = 100 = 50.8 Step 2: Calculate Step 3-5: Multiply out the
the mid-point columns
Summate the
91-100 1 95,5 95,5 9120,25 9120,25
columns
Step 6:
Σ 100 5080 281355,00
variance (s 2 ) = 235.26
53 MANCOSA
Statistical Techniques in Business
Activity 3.5
Find the standard deviation of the sample scores in a practical example above.
For grouped data, the original dataset values have been changed to the interval midpoints.
Formula: Standard deviation for grouped data:
2
2 (∑ 𝑓𝑥)
√∑ 𝑓𝑥 − 𝑛
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝑠 =
𝑛−1
Where:
f is the interval frequency
x is the interval midpoint
𝑛 is the sample size
Note: Articles in professional journals and reports often use SD for standard deviation and Var for variance.
Let the mean of the data be 33, and the standard deviation 11.21:
∑ 𝑓𝑥
𝑚𝑒𝑎𝑛, 𝑥̅ = = 33
𝑛
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝑠 = 11.21
𝑠 11.21
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛, 𝐶𝑉 = %= % = 33.97%
𝑥̅ 33
Interpretation: the data are moderately dispersed around the mean.
MANCOSA 54
Statistical Techniques in Business
All the measures of dispersion described so far have dealt with a single set of data. In practice, it is often
important to compare two or more sets of data with different means, sample sizes or measurement units
and the coefficient of variation can be used to do this.
The higher the coefficient of variation result, the more variability there is in a set of data.
Calculate the coefficient of variation for each filling machine and determine which
machine is more consistent.
𝑠
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 1 000 𝑚𝑙 𝑝𝑟𝑜𝑑𝑢𝑐𝑡, 𝐶𝑉 = %
𝑥̅
5
= % = 0.5%
1 000
𝑠 4
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 500 𝑚𝑙 𝑝𝑟𝑜𝑑𝑢𝑐𝑡, 𝐶𝑉 = % = %
𝑥̅ 500
= 0.8%
Interpretation
Although the machine filling the smaller bottle has a lower standard deviation,
the CVs indicate that the machine filling the larger bottle is relatively more
consistent.
Activity 3.6
Two growers of apples have obtained statistics regarding the mass of their current
crops:
Grower A: x = 300 g with s = 20 g
Grower B: x = 280 g with s = 40 g
Which grower’s apple’s are more uniform in mass?
55 MANCOSA
Statistical Techniques in Business
• An inter-percentile or mid-percentile range excludes a certain percentage of values at the lowest and
highest ends of the dataset.
Interquartile Highest
Lowest range data value
data value
25 25 25 25
% % % %
𝑖𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 = 𝑄3 − 𝑄1
where,
40 < 45 37 249
45 < 50 1 250
Total 250
Calculate the interquartile range (the first and third quartiles were
calculated in a previous self-assessment exercise.
MANCOSA 56
Statistical Techniques in Business
SOLUTION
Quartiles already calculated in a previous self-assessment activity:
1×𝑛
𝑐[ − 𝑓(<)] 5(62.5 − 35)
𝑄1 = 𝐿𝑄1 + 4 = 30 +
𝑓𝑄1 85
= 31.62 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
3×𝑛
𝑐 [ 4 − 𝑓(<)] 5(187,5 − 120)
𝑄3 = 𝐿𝑄3 + = 35 +
𝑓𝑄3 92
= 38.67 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
𝑖𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 = 𝑄3 − 𝑄1 = 38.67 − 31.62
= 7.05 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
The mid-percentile range is the percentage of the range exactly in the middle of the dataset.
To calculate the upper and lower percentiles required for the upper and lower limits of the range:
100% − required range percentile
lower percentile of range =
2
upper percentile of range = lower percentile of range + required range percentile
Calculate the required positions and values for these percentiles
interpercentile or mid percentile range
= value of upper percentile of range − value of lower percentile of range
Activity 3.7
Using the data from the example in previous section, calculate the mid-70% range:
Interval Frequency Cum. Freq.
(weight in lbs)
140 < 150 1 1
150 < 160 4 5
160 < 170 8 13
170 < 180 7 20
180 < 190 5 25
Calculate the upper and lower percentiles required for the upper and lower limits
of the range
57 MANCOSA
Statistical Techniques in Business
where,
IQR is the interquartile range
𝑄3 is the third or upper quartile
𝑄1 is the first or lower quartile
3.5.1. Skewness
If there are large extreme values in the data, the mean is pulled to the right or left and we say that the distribution
exhibits skewness or kurtosis.
For a symmetrical distribution or normal distribution, the mean, median and mode will be about the same.
𝑚𝑒𝑎𝑛 ≈ 𝑚𝑒𝑑𝑖𝑎𝑛 ≈ 𝑚𝑜𝑑𝑒
18 18
12 12
8 8
3 5 5 3
1 2 3 4 5 6 7 8 9 10 11
MANCOSA 58
Statistical Techniques in Business
For a distribution that is skewed to the right, the mode will be less than the median and the median will be less
than the mean.
𝑚𝑜𝑑𝑒 < 𝑚𝑒𝑑𝑖𝑎𝑛 < 𝑚𝑒𝑎𝑛
32
27
15 15
3 8 4 3 1
1 2 3 4 5 6 7 8 9 10 11
For a distribution that is skewed to the left, the mean is the smallest, followed by the median, while the mode is
the largest.
TIP: A negatively skewed distribution (skewed to the left) has the mean, median and mode in alphabetical order.
32
27
18
15
12
3 5 8 3
1 2 3 4 5 6 7 8 9 10 11
As a general rule, the difference between the median and the mode is about twice the difference between the
mean and the median.
If the data are skewed to the left, there are some outliers on the left (small values). If the data are skewed to the
right, then there are some large outliers.
59 MANCOSA
Statistical Techniques in Business
Revision:
For a dataset that is approximately symmetrical with one mode, the mean, median and mode tend to have about
the same value. For a dataset that is obviously asymmetrical, it is preferable to report both the mean and
median. The mean is relatively reliable; that is, when samples are drawn from the same population, the sample
means tend to be more consistent than other averages.
A comparison of the mean and median can reveal information about skewness. Data can be identified as skewed
to the left, symmetrical or skewed to the right. Data skewed to the left will have the mean and median to the left
of the mode:
If we were to extend our (level of measurement permitting) analysis beyond merely descriptive statistics, we
delve into the world of inferential statistics.
Why?
Can you imagine having to collect information from the whole population to draw conclusions? This would be
near impossible! Statistics have derived means by which samples can be drawn to facilitate generalisations be
made to populations. In other words, we are able to use sub-sets of the population in order to say something
about that population.
MANCOSA 60
Statistical Techniques in Business
Once you have set up your research questions, it is necessary to ascertain what level of measurement is needed
to yield data of a certain complexity in order to answer them. This should be done from the outset of your
research project.
Inferential statistics include methods for answering questions about cases we have no observations for by using
a sample of that population of interest to make inferences about that population.
Quick Recollection!
Why do we collect samples and not information about the whole population of interest?
1. Time constraints
2. Financial constraints
3. Practically infeasible
4. Populations are just too large – can’t handle and process ALL that data
We can’t just draw the sample in any haphazard way, it needs to occur in such a way that the sample we have
drawn is representative of our population, and therefore generalisable to the larger population, and that the
manner in which we select our sample will relinquish information about the inherent biases of that sample that
may affect our outcomes and conclusions. Sampling to ensure these criteria are met is called probabilistic
sampling, or statistical sampling, and every member of that population has a known and equal chance of being
selected in the sample, and random selection is key. Random assignment helps to ensure that the groups to
which they are randomly assigned are approximately equal with respect to the variables at question. If we were
to take, as a crude example, the population distribution of IQ:
11Figure 3.18 Graphical illustration of a normal distribution using IQ as an example (Source: MANCOSA)
61 MANCOSA
Statistical Techniques in Business
The mean population IQ is 100, with a Standard Deviation of 15. This means that most people will have an IQ
from between 85-115. Similarly, not a lot of people (as demonstrated in the figure above) have an IQ of above
130, or below 50. This means that, if I were to randomly select a sample, the more there is of something, the
more likely they will be included in my sample. Similarly, the less there is of something, the less likely they will be
included in the sample at a high frequency. So the theory is that, if randomly chosen, where each unit in the
population has an equal chance of being selected (randomness), then the more likely the sample will be
representative of the population.
Furthermore, the larger our sample, the more likely we can reduce sampling errors, and thereby ensure that the
difference we are seeing actually exists, as our sample is representative of the population from which it was
drawn.
There are two estimates that we can make about population parameters based on statistics. When we decide on
what statistics to produce, and what output is required to yield evidence related to answering our research
questions, achieving our objectives, and ultimately the aim of our study – it is important that we choose the
correct statistical tests to run. The most useful technique for doing so, is through the use of the Decision Making
Tree (Adapted from Tredoux and Durrheim, 2002: 427).
THINK POINT
Now that you are familiar with inferential statistics, can you see why it is important
to have large, representative samples in quantitative research?
If the wrong decision is made, and the inappropriate tests are run, we will have erroneous output. As a result,
decisions made based on that output will be problematic.
MANCOSA 62
Statistical Techniques in Business
3.7 Summary
This chapter covered the fundamentals of business statistics. It covered ways of presenting data, descriptives
(central tendency, dispersion/spread , and the shape of the distribution). It demonstrated, by worked examples,
the calculations for both grouped, and ungrouped data, and the calculations for spread. These statistics are at
the heart of basic reporting and presentation of findings in order to support and make decisions in business
contexts.
Revision exercises
1.1. What is meant by the term “central location”?
1.2. You are given the following marks:
12% 40% 48% 52% 56% 56% 56% 58% 64% 72% 90%
a) The mode
b) The median
c) The mean
1.3. State which is the most appropriate for this range of scores, and state why.
2. If the data were graphically presented as follows:
63 MANCOSA
Statistical Techniques in Business
∑ 50
1-13 7
14-26 6
27-39 5
40-52 9
53-65 15
66-78 12
MANCOSA 64
Statistical Techniques in Business
7.
Mid-70% Highest
Lowest range data value
data value
8. The time taken to complete an assembling task has been measured for 250 employees:
Time taken (minutes) Number of people (𝒇) Cumulative frequency 𝒇(<)
0<5 2 2
5 < 10 2 4
10 < 15 3 7
15 < 20 5 12
20 < 25 5 17
25 < 30 18 35
30 < 35 85 120
35 < 40 92 212
40 < 45 37 249
45 < 50 1 250
Total 250
Calculate the upper and lower percentiles required for the upper and lower limits of the range and
calculate the required positions and values for these percentiles.
9. The time taken to complete an assembling task has been measured for 250 employees:
65 MANCOSA
Statistical Techniques in Business
READINGS
Prescribed Textbook:
Recommended Reading:
3.1. Frequency distributions indicate the number of instances a variable takes each of its possible values. It is
used to summarise a single categorical variable. We generate frequency tables as a basic descriptor of the
number of times a particular response/outcome occurs. It assist by graphically representing data, and thus to
make it more intelligible.
3.1.3.a. Crosstabulation
3.1.4a Gender, Colours (purple, red, yellow, green etc)
3.1.4b colour of the wall (red, blue, green, yellow), breed of dog (Collie, Alsatian, Labrador, Staffie etc)
3.1.4c Number of cars sold, units of pens sold, number of people in a classroom
3.1.5 Mutually exhaustive means that belonging to one category automatically discounts the possibility of
belonging to the other. Using the example, if you use the money for a wedding, it automatically discounts that
money spend being allotted to leisure. Exhaustive means that all elements in the population have been
represented. This is accounted for by the summation to 100% of the sample as per breakdown.
3.1.6 I would look at the highest frequencies, I would look at the minimum and maximum activity on which the
money was spent. I would generate a clustered bar chart. I would do a crosstabulation. I would compare males
and females, and their expenditure.
MANCOSA 66
Statistical Techniques in Business
3.1.7
16
14
12
10
8
6 12.5
4
2
0
Males spend more on visiting family, business related shopping, and wellness than females do. Females spend
more on weddings, medical, personal shopping and medical than their male counterparts. Both males and
females spend the highest percentage on visiting friends/family
3.1.8
67 MANCOSA
Statistical Techniques in Business
Activity 3.2.
3.2.1.
Descriptive Statistics
N Minimum Maximum Mean Std. Deviation
Closing_Price 15 31.69 121.44 64.8141 24.93420
Valid N (listwise) 15
3.2.2.
Statistics
Closing_Price
N Valid 15
Missing 0
Median 65.5000
Activity 3.3.
3.3.1
MANCOSA 68
Statistical Techniques in Business
3.4.1.
Ages Frequency m fm fm² cf
(years)
1-13 7 7 49 343 7
14-26 6 20 120 2400 13
27-39 5 33 165 5445 18
40-52 9 46 414 19044 27
53-65 15 59 885 52215 42
66-78 12 72 864 62208 54
Σ 54 2497 141655
3.5.
𝑠 = √𝑠 2 = √235.26 = 15.34
3.6. SOLUTION TO ACTIVITY
𝑠 20
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑔𝑟𝑜𝑤𝑒𝑟 𝐴, 𝐶𝑉 = %= % = 6.67%
𝑥̅ 300
𝑠 40
𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑜𝑟 𝑔𝑟𝑜𝑤𝑒𝑟 𝐵, 𝐶𝑉 = % = % = 14.29%
𝑥̅ 280
Grower A’s apples has the lower CV and therefore is more consistent.
3.7.
100% − 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑟𝑎𝑛𝑔𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 100% − 70%
𝑙𝑜𝑤𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = = = 15%
2 2
𝑢𝑝𝑝𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = 𝑙𝑜𝑤𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 + 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑟𝑎𝑛𝑔𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
= 15% + 70% = 85%
1.1. The most frequently occurring score, denotes the centre of the dataset. Central location refers to the most
frequently occurring scores or highest frequencies in a given dataset. It indicates the “average’ or the centre
of a dataset, or group of scores
1.2. a. Mode – 56%
b. Median – 56%
c. Mean – 54.91%
1.3. The median or the mode. The mean is being dragged down by the outlier/extreme score – 12%.
2. a. It is not skewed, and it is rather peaked.
69 MANCOSA
Statistical Techniques in Business
b. Average – and normally distributed – unimodal and bell-shaped. The test was fair as student marks were
normally distributed.
3. . 1.
2 (∑ 𝑓𝑥)2 4 2352
√∑ 𝑓𝑥 − 𝑛 √ 720 425 −
25 = √720 425 − 717 409
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝑠 = =
𝑛−1 25 − 1 24
∑(𝑥 − 𝑥̅ )2 9 158
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝑠 = √ =√ = 39.07 ≈ 39 𝑒𝑟𝑟𝑜𝑟𝑠
𝑛−1 7−1
5.
Interval Frequency (f) Midpoint (x) 𝒙𝟐 𝒇𝒙 𝒇𝒙𝟐
(time in hours per week)
0<3 14 1.5 2.25 21.00 31.50
3<6 6 4.5 20.25 27.00 121.50
6<9 6 7.5 56.25 45.00 337.50
9 < 12 7 10.5 110.25 73.50 771.75
12 < 15 14 13.5 182.25 189.00 2 551.50
15 < 18 3 16.5 272.25 49.50 816.75
∑ 50 54.0 643.50 405.00 4 630.50
2 (∑ 𝑓𝑥)2 4052
√∑ 𝑓𝑥 − 𝑛 √4 630.50 −
50 = √4 630.50 − 3 280.50
𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛, 𝑠 = =
𝑛−1 50 − 1 49
MANCOSA 70
Statistical Techniques in Business
6.
∑𝑥 252
𝑥̅ = = = 36
𝑛 7
7.
𝑗 × 𝑛 85 × 25
85𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = = = 21.25
100 100
𝑗 × 𝑛 15 × 25
15𝑡ℎ 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 = = = 3.75
100 100
85 × 𝑛
𝑐 [ 100 − 𝑓(<)] 10(21.25 − 20)
𝑢𝑝𝑝𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, 𝑃85 = 𝐿𝑃85 + = 180 + = 182.5 𝑙𝑏𝑠
𝑓𝑃85 5
15 × 𝑛
𝑐 [ 100 − 𝑓(<)] 10(3.75 − 1)
𝑙𝑜𝑤𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒, 𝑃15 = 𝐿𝑃15 + = 150 + = 156.88 𝑙𝑏𝑠
𝑓𝑃15 4
𝑖𝑛𝑡𝑒𝑟𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑟 𝑚𝑖𝑑 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒
= 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑢𝑝𝑝𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒
− 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑙𝑜𝑤𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = 182.5 − 156.88 = 25.62 𝑙𝑏𝑠
8.
100% − 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑟𝑎𝑛𝑔𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 100% − 60%
𝑙𝑜𝑤𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = = = 20%
2 2
𝑢𝑝𝑝𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 = 𝑙𝑜𝑤𝑒𝑟 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒 𝑜𝑓 𝑟𝑎𝑛𝑔𝑒 + 𝑟𝑒𝑞𝑢𝑖𝑟𝑒𝑑 𝑟𝑎𝑛𝑔𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑖𝑙𝑒
= 20% + 60% = 80%
Mid-60% Highest
Lowest range data value
data value
20th
Minimum percentile 80th Maximum
percentile
71 MANCOSA
Statistical Techniques in Business
9. SOLUTION TO SELF-ACTIVITY
Interquartile deviation already calculated in the previous self-assessment activity:
𝑖𝑛𝑡𝑒𝑟𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑟𝑎𝑛𝑔𝑒 = 7.05 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
𝐼𝑄𝑅 7.05
𝑞𝑢𝑎𝑟𝑡𝑖𝑙𝑒 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = = = 3.53 𝑚𝑖𝑛𝑢𝑡𝑒𝑠
2 2
MANCOSA 72
Statistical Techniques in Business
Unit
4: Probability and Probability
Distributions
73 MANCOSA
Statistical Techniques in Business
4.1 An introduction probability Introduces the topic area for the unit
4.4 The Normal Distribution Understand the underlying characteristics and purpose of
normal distributions
Estimating probability using normal distributions
4.5 The Standard Normal Understand the characteristics, theory and application of
4.6 Sampling distribution of the Distinguish between the normal and sampling distribution of the
mean mean
Demonstrate a theoretical understanding of sampling
MANCOSA 74
Statistical Techniques in Business
The Standard Normal Distribution - the standard normal distribution has standardised z-values. z-scores are
standardised scores.
Sampling distribution of the mean - a variant of the normal distribution. A frequency distribution of sample
means and not individual scores
75 MANCOSA
Statistical Techniques in Business
So for example, if we were to roll a dice (with 6 sides), the probability of getting a “3” is:
1 1
Probability = 1+1+1+1+1+1 = 6
1
Similarly – the probability of throwing a head when tossing a coin is 2
Although the classic probability theory is pertinent to games of chance, it has limited real-world applications,
where all possible outcomes are not equally likely, or where there is little known about the underlying processes.
That is where the relative frequency approach steps in.
The relative frequency approach is calculated as the proportion of times an event is observed to occur in a very
large number of trials. The formula is as follows:
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠 𝑖𝑛 𝑤ℎ𝑖𝑐ℎ 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 𝑜𝑐𝑐𝑢𝑟𝑠
Probability = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑟𝑖𝑎𝑙𝑠
The assumption underlying this approach is known as the law of large numbers. The law of large numbers
assumes that as you increase the number of trials, the overall probability starts tending towards the actual
probability with which that event is supposed to occur. So for example, if we were to toss a coin, the probability of
getting heads or tails will start to converge towards 0.5 the larger the number of trials.
The term odds is often used to express the likelihood that something will happen. So for example, if someone
said the odds of something is 4 to 1, it is basically saying that the chance of it occurring is four times more likely
than what won’t occur.
proportions are easy to discern and facilitates comparisons. When we ascribe a number to the probability with
which something is likely to occur, then we give a measure of confidence in our assertion. The higher the
probability, the more the confidence.
MANCOSA 76
Statistical Techniques in Business
If we are 100% sure or confident in the outcomes, we will ascribe the probability a 1. Alternatively, if we are
utterly sure that the event will not occur, we ascribe the probability a value of 0. Between 0 – 1, the stronger the
evidence, the closer to 1 the ascribed probability will be. For example, given the ongoing increases in petrol
prices, and given that the current petrol price has hit an all-time high of R15.73 with a pending increase of 71c,
the probability of it dropping to R6.92 (the price as of July 2008) is very small and probably closer to 0. The
probability, however, of it increasing another 20c to 30c within the next few months, given previous trends, is
quite high, with a probability closer to 1.
When discussing probabilities, the probability with which something will occur is denoted a p, whereas the
probability of that thing not occurring is denoted using q.
It can therefore be expressed as:
p+q=1
So,
q=1–p
p=1–q
When we are interested in achieving a particular outcome (p), when we achieve that outcome, it is known as a
success. Sometimes, the probability of success does not hinge on a one-shot attempt at achieving it, but rather a
succession of events. For example, the probability of throwing two heads in three tosses of a coin. In this case,
the number of possible outcomes could be HHT, TTH, HTH, THT, HHH, TTT, HTT, THH. In other words, there
3
are eight possible outcomes when flipping our coin thrice. The probability is thus 8 = 37.5%.
As mentioned – sometimes there are more than two possible outcomes (binomial distribution). If we think of the
example of rolling a dice – there are six possible outcomes. If we think of the probability of selecting a Spade
from a pack of cards, we know there are 52 cards, four suites, which makes it 13 cards for each suite. Therefore,
the probability of selecting a Diamond from a pack of cards is as follows:
𝑎
p=𝑛
Where:
a = number of outcomes counted as a success
n = total number of equally possible outcomes
so,
13
p (drawing a diamond) = 52
p = 0.25, or 25%.
77 MANCOSA
Statistical Techniques in Business
A jar contains 100 marbles, identical except that 30 are red, 20 black, 5 green
and the rest white.
If a marble is taken from the jar at random, what is the probability that the
marble is:
a. red ?
b. black or green?
c. not red?
d. multicolor?
SOLUTION:
A. 3/10
B. ¼
C. 7/10
D. 0
Sometimes though, we need to consider overlapping probabilities, and such need to discern between the
probability laws of conjunctions or distinctions. Essentially, and taking the instance of drawing a card from a deck
of cards. You need to consider whether or not, once drawing the card, you replace it in the pack. If you draw a
card, and put it back afterwards, it is known as sampling with replacement. If you draw a card but do not put it
back in the deck before the next draw, you are sampling without replacement, and you have now altered the
probability of drawing a particular card in the next draw. In this instance, your draws are not independent in that,
through drawing one card, you affect the probability of the next draw.
When you have independent events, the law of conjunctions apply, and you multiply the probabilities. In other
words, you are looking at the probability of two jointly occurring events i.e. a and b.
Alternatively, when you are looking at the probability of either of two independent events, you will concern
yourself with the law of disjunctions, and add the probabilities i.e. a or b.
For example, the probability of drawing two Diamonds in two successive draws. When we replace the card after
having drawn it (sampling with replacement) then the probability will be:
1
p = 0.25 x 0.25 (4 chance of selecting a heart each draw)
p = 6.25%
MANCOSA 78
Statistical Techniques in Business
However, if we were to draw the cards without replacement, then we alter the probabilities:
13
p (first draw) =
52
12
p (second draw) =
51
13 12
p (a and b) = 52 x 51
p (a and b) = 0.059
However, when we are looking at disjunctions, we are trying to estimate the probability of either of two
independent events occurring. The events need to be mutually exclusive (by belonging to one category it
automatically excludes belonging to another) So, for example if you were to try estimate the probability of
drawing a heart or a diamond in two successive draws, the probability would be:
1
p (hearts) = 4
1
p (diamonds) = 4
Therefore:
1 1
p (Heart or Diamond) = 4 + 4
Half the faces of a fair die are painted blue, half yellow. The die is rolled twice. What is
the probability the die will turn up blue both times? Can you cite a probability “rule” that
models your answer?
SOLUTION
Additive -
¼ p(A and B) = p (A) p (B) for independent events, A & B
79 MANCOSA
Statistical Techniques in Business
Where:
n = number of trials or events
r = number of successes
! = factorial – multiply each number by the number before it e.g. 3! = 3 x 2 x 1
So when you are determining the potential number of possible outcomes, you use the following formula:
So,
Step 1: Work out the total possible number of outcomes
Step 2: Define what is meant by a success, and how many successes in n trials
Step 3: Determine what the probability is associated with each success
Step 4: Substitute it into the probabilities formula
If we had a four-sided dice, each side representing a suite (hearts, diamonds, spades, clubs), then when trying to
determine the probability of obtaining 2 hearts in 6 rolls of the dice:
Step 1: The total number of possible outcomes is:
𝒏 𝒏!
=
𝒓 𝒓!(𝒏−𝒓)!
𝟔 𝟔!
=
𝟐 𝟐!(𝟔−𝟐)!
𝟔 𝟕𝟐𝟎 𝟕𝟐𝟎
= = = 𝟏𝟓 possible outcomes
𝟐 𝟐!(𝟐𝟒)! 𝟐!(𝟐𝟒)!
Step 4:
𝒏!
p(2 hearts in 6 rolls) = 𝒓!(𝒏−𝒓)! 𝑝𝑟 x 𝑞 𝑛−𝑟
720
p (2 hearts in 6 rolls) = 2!(24)! 0.252x 0.756−2
Please note that the difference between permutation and combination is that permutation the order matters,
whilst for combinations it does not.
MANCOSA 80
Statistical Techniques in Business
The Binomial distribution is a hypothetical frequency distribution that allows us to estimate the probability of one
out of two mutually exclusive and jointly exhaustive possible outcomes. Although it looks the same as other
distributions, such as the sampling distribution of the mean, and the normal distribution, it has the following
characteristics:
Events must be independent
Events must only have two possible outcomes
Equally probable outcomes or two outcomes of unequal probability
Discrete variables only
Mutually Exclusive: Occurrence of one event makes the occurrence of all other events impossible
Exhaustive: All possible outcomes or states of phenomena are represented
Estimates r occurrences of successful outcomes in n events
The binomial question is “What is the probability that r successes will occur in n trials of the process
under study?”
There are five things you need to work a binomial story problem:
1. Define Success first - Success must be for a single trial
2. Define the probability of success
3. Find the probability of failure
4. Define the number of trials
5. Define the number of successes out of those trials
6. Plug all values into the formula
81 MANCOSA
Statistical Techniques in Business
A car hire firm rents out only Toyota and VW cars. Experience has shown that one in
four clients choose a VW. If 5 reservations are randomly selected from today’s
bookings, what is the probability that 2 will have requested a VW?
SOLUTION:
Step 4 & 5: We want to know the probability of 2 success outcomes, i.e. we require
p(2) and we have 5 observations, thus n = 5.
Step 6:
𝟓 𝟔!
= 0.252 x 0.755−2
𝟐 𝟐!(𝟓−𝟐)!
𝟓
= 0.2637
𝟐
MANCOSA 82
Statistical Techniques in Business
The normal distribution is a smooth continuous curve representing the form a binomial distribution would take for
an infinite number of events with equiprobable outcomes. The characteristics of a Normal Distribution are as
follows:
Bell-shaped curve
Symmetrical
Unimodal (Mean, Median, Mode all coincide)
Asymptotic -Tails extend indefinitely to the left
and right
It is a model of the shape of the frequency distribution of many naturally occurring phenomena and helps us
understand the “relative position” of a case relative to other cases. In other words it allows us to determine where
an individual score lies in relation to other scores. The area under the curve of a normal distribution represents
probability.
Every phenomenon has a different distribution (different means and variances), but all have the same shape
(normal shape). Think for example female shoe sizes. In South Africa, the average (mean) shoe size for females
is size 6, with a standard deviation of 2.5. In China, on the other hand, have an average female shoe size of 3.5
with a standard deviation of 2. If we were to plot them graphically:
You can see from the histograms above that both distributions are normally distributed despite having different
means and standard deviations. Distributions allow us to predict probability or proportion from an individual
score, but in order for us to do so, we need three pieces of information:
Mean
Variance
Shape
83 MANCOSA
Statistical Techniques in Business
But, because there are so many different types of distributions – each distribution has a different proportion of
cases falling below any particular score so it becomes difficult to determine positions when all the distributions
are different. As such, we standardise distributions in order to determine absolute positions.
MANCOSA 84
Statistical Techniques in Business
z-tables are used to determine the exact proportion of cases falling above and below a particular score, and
contain z-scores and proportions. You will find the z-scores in the horizontal and vertical margins, and their
associated proportions as columns and rows. When we convert the raw scores from the normal distribution into
z-scores we facilitate comparisons across distributions with different means and std deviations, and make real-
world distributions comparable. In order to convert raw scores into z-scores, we use the following formula:
𝑥−µ
z= Ơ
Where:
z = z-score
X = score in real-world distribution
µ = population mean
Ơ = population standard deviation
When dealing with a z-score related problem, you follow the following steps:
Step 1: Determine the mean and standard deviation
Step 2: Plug the statistics into the formula and derive a z-value
Step 3: Draw the standard normal distribution in order to determine the proportion (larger or smaller) you are
interested in
Step 4: Determine the actual proportion using the column determined above using your z-table
Step 5: Conclude with reference to the scenario
In the United States, the average IQ is 100, with a standard deviation of 15. What
percentage of the population would you expect to have an IQ lower than 85?
SOLUTION
Step 3: You plot -1, and because its less/smaller than – you look to the left which as
you can see from the shaded area to the left of -1, it is smaller in relation to the
whole. This means you are interested in the smaller p
85 MANCOSA
Statistical Techniques in Business
Step 4: You can ignore the sign (+/-) as the distribution is symmetrical – look in the
z-column value 1, and column smaller p - you will read off a value of 0.15866 =
15.87%.
Step 5: Conclude with reference to the scenario ~ About 16% of the population has
an IQ score lower than 85
Activity 4.1.
1. 1.17
2. -0.85
3. 2.07
4. -1.37
1. 2.24
2. -1.65
3. 1.47
4. -0.47
MANCOSA 86
Statistical Techniques in Business
4.1.4. For the numbers below find percent of cases falling between the two z-
scores:
Because we use samples to make inferences – we need sample distributions. To draw scientific inferences, we
need to know where a sample mean stands in a distribution relative to other sample means. The mean of the
sampling distribution is equal to the mean of the population. In other words, once we start to sample all of the
available samples, they sample mean will tend towards the population mean.
x̅ 6
x̅ 5
x̅ 4 µ
x̅ 2
x̅ 1
x̅ 3
16Figure 4.6 Graphical representation of the theory underlying the sampling distribution of the mean
87 MANCOSA
Statistical Techniques in Business
When we sample repeatedly from the same population, we expect the means of these samples to be different.
Plotting the means of an infinite number of samples of size n, drawn from a population, will give us a sampling
distribution of the mean. This tendency for repeated samples statistics to tend towards the population parameters
is known as the Central Limit Theorem. The Central Limit Theorem states that
“Given a population with a mean ì and a variance ó2, the sampling distribution of the mean will have a mean
equal to ì and a variance ơ²/n. The shape of the sampling distribution approaches normal as the sample size (n)
increases”.
Therefore, the mean of the sampling distribution of the mean is equal to the population mean:
µx̅ = µ
The variance of the sampling distribution of the mean is equal to the population variance divided by n
ơ²
ơ²x̅ =
𝑛
The distribution will be approximately normally distributed as long as the sample size not too small.
Using the Central Limit theorem concept, we can compute the proportion of cases lying above or below a
specified value, and ask:
What proportion of samples have a mean greater or smaller than a particular value?
𝑥− µ
𝑧=
ơ
x̅− µ
𝑧= ơx̅
The variance is not the same between sampling distribution and population, therefore you need to change the
formula for ơx̅
x̅− µ
𝑧= ơ
√𝑛
Step 2: Calculate
MANCOSA 88
Statistical Techniques in Business
From years of testing, we know that IQ scores for individuals are normally
distributed with a mean of 100 and a standard deviation of 15. If we select a
random sample of 10 secondary school pupils, what is the probability that their
mean is less than 95?
SOLUTION
95− 100
𝑧= 15
√10
−5 −5
𝑧= 15 =
4.744
3.162
𝑧 = -1.054
A proportion of 0.146 lies above a z-score of 1.054. This means that a proportion
of 0.146 lies below a z-score of –1.054.
4.7 Summary
This chapter covered probability. Probability is the likelihood of something occurring. It introduced and explained
the types and use of the theoretical distributions underlying probability calculations. Probability is important to
business in that we are able to estimate the likelihood of certain outcomes occurring, and take the necessary
precautions.
89 MANCOSA
Statistical Techniques in Business
Solutions to activities
4.1.1. 0.37900
4.1.2. 0.30234
4.1.3. 0.48077
4.1.4. 0.41466
4.2. SOLUTION
4.3. SOLUTION
4.4. SOLUTION
Both Pos.:
From µ:
MANCOSA 90
Statistical Techniques in Business
Revision Questions
1. Serena Williams is known to serve an ace at Wimbledon 70% of the time. If she continues to serve at
the same rate for her next match, and serves 5 times, what is the probability that:
1.1. All five serves will be aces
1.2. At least 2 serves will be aces
2. What is the area under the standard normal distribution between z = -0.6 and z = 2.4?
3. Research is shown that children can concentrate on average for four minutes, with a standard deviation
of 1 minute. What is the probability that the child will be able to concentrate for:
3.1. Between 5-6 minutes
3.2. Less than 2 minutes
𝑛! 𝑝𝑟 𝑞 𝑛−𝑟
1.1. P(r) =
𝑟!(𝑛−𝑟)!
5!0.705 0.300
P(5) = = 0.1681
5!0!
5!0.700 0.305
1.2. P(0) = = 0.00243
0!5!
91 MANCOSA
Statistical Techniques in Business
5!0.701 0.304
P(1) = = 0.0284
1!4!
P(at least 2 serves in) = 1.0 – P(0) – P(1) = 1.0 – 0.00243 – 0.0284 = 0.969
2. 0.2257 + 0.4918 = 0.7175
1.1. P(5<x<6)=P((5 - 4)/1<z< (6 - 4)/1)
=P(1.00<z<2.00)
=0.4772-0.3413 = 0.1359
=13.59%
1.2. P(x<2)=P(z<(2-4)/1)=P(z<-2.00)
=0.5-0.4772
=0.0228
=2.28%
MANCOSA 92
Statistical Techniques in Business
Unit
5: Index Numbers
93 MANCOSA
Statistical Techniques in Business
Index number - relative figure, expressed as a percentage, which is used to measure how much an economic
variable changes over time or differs between two locations. It is a summary measure of the change in the
activity of an item or a collection of items (known as a basket) from one time period to another.
Base Period - Point in time to which the comparison is made
Price Index - Measures the percentage change in price between any two periods of time
Quantity index - Measures the percentage change in consumption level of individual items or baskets of items
from one time period to another
Composite index - Combines both the relative prices and the quantities
Recommended Reading:
Goodridge, P. (2007). Methods explained: Index Numbers. Available from:
https://www.ons.gov.uk/ons/rel/elmr/economic-and-labour-market-review/no--3--
march-2007/methods-explained--index-numbers.pdf
MANCOSA 94
Statistical Techniques in Business
An index is constructed by expressing the value of an item in the current period as a ratio of its value in the base
period. This is then expressed as a percentage.
There are generally two major categories of index numbers – price and quantity.
For both, we can use a single or composite index.
Single index
According to the concise Encyclopaedia of Statistics (2008), a simple index number is, “the ratio of two values
representing the same variable, measured in two different situations or in two different periods”. For example,
price. Price will provide an index of the change in price between the current and the reference period. Other
examples of simple index prices include quantity and value.
95 MANCOSA
Statistical Techniques in Business
𝑝
Price relative = 𝑝1 x 100
0
Where:
𝑝1 = Price in the current period
𝑝0 = Price in the base period
𝑞
Quantity relative = 𝑞1 x 100
0
Where:
𝑞1 = Quantity in the current period
𝑞0 = Quantity in the base period
There are two types of weighted indexes, the fixed weight index, and the simple weighted (aggregative) index.
MANCOSA 96
Statistical Techniques in Business
P w o
PQ 1 1
100
P Q o o
Commonly used composite indexes using weighted aggregate indexes include the Laspeyres index and the
Paasche index. The most commonly used composite index is the Laspeyres Index.
∑(𝑝 𝑥 𝑞0 )
Laspeyres Index = ∑(𝑝1 x 100
0 𝑥 𝑞0 )
The price index indicates an increase in the value of the portfolio if all quantities of shares remain the same.
Alternatively, the quantity index indicates an increase in shares ought since all prices have been kept constant in
the calculation.
Index numbers are based on samples and are thus error prone, and changes in technology, purchasing
behaviours, quality changes etc can create inconsistencies.
97 MANCOSA
Statistical Techniques in Business
𝒑𝟎 𝒒𝟎 𝒑𝟏 𝒒𝟏 𝒑𝟎 ∗ 𝒒𝟎 𝒑𝟏 ∗ 𝒒𝟎 𝒑𝟎 ∗ 𝒒𝟏
A 70 362 120 305 25340 43440 21350
B 202 250 122 72 50500 30500 14544
C 1280 52 1900 102 66560 98800 130560
TOTAL 142400 172740 166454
∑(𝑝 𝑥 𝑞 )
Laspeyres Price Index = ∑(𝑝1 𝑥 𝑞0 ) x 100
0 0
172740
Laspeyres Price Index = 142400 x 100
166454
Laspeyres Quantity Index = 142400 x 100
This means that the number of units of shares held increased by roughly 16.89%
Paasche index
The Paasche index is an example of a weighted aggregate index which uses current time period weights. It is
useful when the relative importance of the items making up the basket of goods is continuously changing due to
a change in the quantity for different each year. It is more accurate than the Laspeyre’s Index as it reflects what
the industry is actually using in the current year, and therefore takes account of the price changes and the
quantity changes.
∑(𝑝 𝑞1 )
The formula for Paasche’s Price Index is ∑(𝑝1 x 100
0 𝑞1 )
∑(𝑝 𝑞1 )
The formula for Paasche’s Quantity Index is ∑(𝑝1 x 100
1 𝑞0 )
MANCOSA 98
Statistical Techniques in Business
The table shows the 2005 and 2006 prices and volumes in millions of shares for Toyota,
VW, and BMW. Calculate the Paasche Index using 2005 as the base period
Toyota VW BMW
Price Quantit Price Quantit Price Quantit
y y y
2005 45.51 0.8 13.17 7 36.81 5.6
2006 61.41 0.2 7.51 10 30.72 6.1
8Table 5.2. Table of prices and quantities in millions of shares for three different car
brands
∑(𝑝 𝑞1 )
The calculation for Paasche’s Price Index is ∑(𝑝1 x 100
0 𝑞1 )
0.2(61.41)+10(7.51)+6.1(30.72)
0.2(45.51)+13.17(10)+36.81(6.1)
x 100
274.774
365.343
x 100
= 75.2
2006 prices represent a 24.8% (100 – 75.2) decrease from 2005 (assuming quantities
were at 12006 levels for both periods).
∑(𝑝 𝑞1 )
The calculation for Paasche’s Quantity Index is ∑(𝑝1 x 100
1 𝑞0 )
∑ 0.2(61.41)+10(7.51)+6.1(30.72)
61.41(0.8)+7.51(7)+30.72(5.6)
x 100
274.774
= 273.73
x 100
= 100.381
5.5 Summary
This unit covered price and quantity indices used to measure trends, and track changes between time points.
Indices are useful measures of the change in the activity of an item or a collection of items (known as a basket)
from one time period to another.
99 MANCOSA
Statistical Techniques in Business
Revision questions
5.1. The following table represents the portfolio for shares. Use the Laspeyres price and quantity indices to
determine how the shares fared.
Share Base Year 1997
𝒑𝟎 𝒒𝟎 𝒑𝟏 𝒒𝟏
A 60 350 118 299
B 180 221 115 68
C 1113 49 1750 97
5.2. The following table represents the portfolio for shares. Use the Laspeyres price and quantity indices to
determine how the shares fared.
Share Base Year 2000
𝒑𝟎 𝒒𝟎 𝒑𝟏 𝒒𝟏
A 80 300 100 250
B 100 100 120 60
C 200 50 800 85
5.3. Given the following prices and quantities for copper and steel for the following period:
Copper Steel
Period Price Quantity (Tons) Price Quantity (Tons)
Bas
1000 200 130 8700
e
Cur
1010 190 120 9000
rent
Calculate Paasche’s Price and Quantity Index.
152465
Laspeyres Price Index = 115317 x 100 =
132.2138%
This means that the value of the shares increased by roughly 32.21%
138141
Laspeyres Quantity Index = x 100 =
115317
119.7924%
This means that the number of units of shares held increased by roughly 19.79%
MANCOSA 100
Statistical Techniques in Business
82000
Laspeyres Price Index = 44000 x 100 =
186.363%
This means that the value of the shares increased by roughly 86.363%
43000
Laspeyres Quantity Index = 44000 x 100 =
97.727%
This means that the number of units of shares held decreased by roughly 2.272%
= 93.5%
= 102.079%
101 MANCOSA
Statistical Techniques in Business
Unit
6: Linear Correlation
and Regression
MANCOSA 102
Statistical Techniques in Business
6.1 Introduction to Correlation and Introduce topic areas for the unit
Regression
Correlation - on the other hand allow us to gauge the strength and direction of a relationship
Regression - observes the spread of scores to create a mathematical summary of what we think the relationship
between the two variables might be. We can use this mathematical relationship to make predictions
Recommended Reading:
103 MANCOSA
Statistical Techniques in Business
6.2 Regression
Regression presents a refined way of analysing scatterplots, and observes the overall shape of plotted points. It
observes the spread of scores, and creates a best fitting line that can be drawn through the points on a
scatterplot. When the lines that fit the data best are straight – we refer to it as linear regression. When the best fit
line is curved – it is non-linear. A regression equation is essentially a mathematical summary of what we think the
relationship between the two variables might be. We can use this mathematical relationship to make predictions,
though not without some danger of making a mistake
In order to fit a straight line to the data, 2 pieces of information are required:
– Slope
– Intercept: point on graph where crosses the y-axis
The straight line formula is:
𝑦 = 𝑎 + 𝑏𝑥
y represents the percentage of people on the criterion variable
x represents the predictor variable
a and b represent the two pieces of information required to fit the line (i.e. b is the slope, and a is the intercept)
(Regression coefficients)
When calculating regression coefficients:
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑠𝑥𝑦 = 𝑛
𝑛−1
Where:
n: the number of pairs of values
Σx: the sum of the x values
Σy: the sum of the y values
Σx²: the sum of the squares of the x values
Σxy: the sum of x multiplied by y values
MANCOSA 104
Statistical Techniques in Business
These intermediate values are substituted into the following equation to find the covariance, 𝑠𝑥𝑦 , and following
this, the slope, b:
𝑠𝑥𝑦
b=
𝑠²
OR
𝑛(∑ 𝑥𝑦)−(∑ 𝑥)(∑ 𝑦)
b= 𝑛(∑ 𝑥²)−(∑ 𝑥)²
105 MANCOSA
Statistical Techniques in Business
Practical Application
MANCOSA 106
Statistical Techniques in Business
107 MANCOSA
Statistical Techniques in Business
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑠𝑥𝑦 = 𝑛
𝑛−1
1253 + 4518
146229 − 40
𝑠𝑥𝑦 =
40 − 1
𝑠𝑥𝑦 = 120.8
b = 1.528
Step 5: Calculate a
(∑ 4518)−1.53(1253)
a=
40
a = 65.072
Equation: y = 65.07 + 1.53x
How it works:
So for the scores 37, 21, and 48, it would be calculated as follows:
Score b’ = 65.07 + 1.53 x 37 = 121.68 ~ 122
Score b’ = 65.07 + 1.53 x 21 = 97.2 ~ 97
Score b’ = 65.07 + 1.53 x 48 = 138.51 ~ 139
Whilst using the regression line is a useful statement of the underlying trend, but it tells us nothing about the
strength of the relationship. Correlation is a measure of the strength of linear association between two variables.
6.3. Correlation
Correlations on the other hand allow us to gauge the strength and direction of a relationship. Correlations are
calculated on the basis of how far the points lie from the ‘best-fit’ regression line. Correlations are measured
using the correlation coefficient, and symbolised by the small letter r.
r will fall with in the range –1 to +1:
-1 means a perfect negative correlation (a perfect inverse relationship, where, as the value of x rises, so
the value of y falls)
+1 means a perfect positive correlation (where the values of x and y rise or fall together)
An r of 0 means zero correlation, which means that there is no relationship between x and y
MANCOSA 108
Statistical Techniques in Business
OR
𝑛(∑ 𝑥𝑦)−(∑ 𝑥)(∑ 𝑦)
r=
√(𝑛 ∑ 𝑥 2 −(∑ 𝑥)²)(𝑛 ∑ 𝑦 2 −(∑ 𝑦)²)
Activity 6.1
2. Draw the table, and calculate 𝑠𝑥𝑦 and state the linear regression equation
6.4 Summary
This unit provided an overview of how we analyse bivariate, or paired data using correlation and regression.
Whereas regression can help to make predictions, correlations allow us to observe the relative strength and
direction of the relationship between variables. These measures are useful in order to determine the relationship
between variables, and how strong the relationship is. Just remember though, that just because there’s a
relationship, it does not infer causality.
109 MANCOSA
Statistical Techniques in Business
Answers to activities
450
400
350
300
250
200
150
100
50
0
0 2 4 6 8 10 12
x y xy x² y²
5.77 385 2221.45 33.2929 148225
6.55 321 2102.55 42.9025 103041
9.9 265 2623.5 98.01 70225
7.21 256 1845.76 51.9841 65536
6.37 287 1828.19 40.5769 82369
6.51 309 2011.59 42.3801 95481
5.77 370 2134.9 33.2929 136900
48.08 2193 14767.94 342.4394 701777
∑𝑥∑𝑦
∑ 𝑥𝑦 −
𝑠𝑥𝑦 = 𝑛
𝑛−1
(48.08)(2193)
14767.94 −
𝑠𝑥𝑦 = 7
7−1
𝑠𝑥𝑦 = −49.14
MANCOSA 110
Statistical Techniques in Business
Step 3: Calculate b
7(∑ 14767.94)−(∑ 48.08)(∑ 2193)
b= 7(∑ 342.4394)−(∑ 48.08)²
−2063.86
b= 85.3894
b = -24.169
Step 4: Calculate a
(∑ 2193)−(−24.169(48.08))
a= 7
a = 479.29
y = a + bx
y = 479.29 - 24.169x
Calculate r
𝑛(∑ 𝑥𝑦)−(∑ 𝑥)(∑ 𝑦)
3. r =
√(𝑛 ∑ 𝑥 2 −(∑ 𝑥)²)(𝑛 ∑ 𝑦 2 −(∑ 𝑦)²)
7(∑ 14767.94)−(∑ 48.08)(∑ 2193)
r=
√(7 ∑ 342.44−(∑ 48.08)²)(7 ∑ 701777−(∑ 2193)²)
r = −0.695
Revision Questions
1. You are interested in investigating the relationship between hours spent studying, and test performance. You
collect data from 5 of your fellow students, and it looks as follows:
111 MANCOSA
Statistical Techniques in Business
2. An estate agent is interested in determining the relationship between the average distance a house (in KM’s)
is from the Durban CBD, and the average rent paid (in thousands of rands). He samples houses from five
different sub-areas throughout Durban’s surrounds.
MANCOSA 112
Statistical Techniques in Business
1.2.
x y xy x2 y2
8 65 520 64 4225
2 30 60 4 900
4 45 180 16 2025
12 70 840 144 4900
9 67 603 81 4489
35 277 2203 309 16539
𝑛∑𝑥𝑦− ∑𝑥∑𝑦
𝑟=
√(𝑛∑𝑥 2 − (∑𝑥)2 ) √(𝑛∑𝑦 2 –(∑𝑦)2 )
(5 × 2203)− (35)(277)
= = 𝟎. 𝟗𝟓𝟓
√(5 × 309 − 352 )(5 × 16539 – 2772 )
2.1.
x y xy x2 y2
20 12 240 400 144
16 10 160 256 100
30 15 450 900 225
25 16 400 625 256
34 22 748 1156 484
125 75 1998 3337 1209
𝑛∑𝑥𝑦− ∑𝑥∑𝑦
2.2. 𝑟=
√(𝑛∑𝑥 2 − (∑𝑥)2 ) √(𝑛∑𝑦 2 –(∑𝑦)2 )
113 MANCOSA
Statistical Techniques in Business
(5 × 1998)− (125)(75)
= = 𝟎. 𝟗𝟐𝟏𝟕
√(5 × 3337 − 1252 )(5 × 1209 – 752 )
3. The difference between correlation and regression is that although they both deal with binary data, and the
relationship between them, regression uses the relationship to make predictions, whereas correlations allow
you to observe the strength and direction of the relationship.
MANCOSA 114
Statistical Techniques in Business
Unit
7: Time Series Forecasting
115 MANCOSA
Statistical Techniques in Business
7.1 Introduction to times series and Introduce topic areas for the unit
forecasting
7.2 Decomposition and Smoothing Determine the need for, and uses of time series forecasts
Exponential Smoothing - allows us to calculate a “smoothed average” which consists of two parts:
– The most recent demand (new information) and
– The historical smoothed average (old information)
Moving average - removes the short-term fluctuations in a time series by taking successive averages of groups
of observations
Seasonal Analysis - deseasonalising data by removing seasonal fluctuations or patterns in the data in order to
make predictions about potential future values
Recommended Reading:
MANCOSA 116
Statistical Techniques in Business
Forecast
Model Q1 Q2 Q3 Q4 Q1 Q2 Q3 Q4 Q1
2016 2016 2016 2016 2017 2017 2017 2017 2018
VAR00002- Forecast 50.37 50.76 51.14 51.53 51.91 52.30 52.68 53.07 53.45
Model_1 UCL 51.92 52.63 53.43 54.32 55.28 56.29 57.35 58.46 59.61
LCL 48.83 48.89 48.86 48.74 48.55 48.31 48.02 47.68 47.30
For each model, forecasts start after the last non-missing in the range of the requested estimation period, and end at the
last period for which non-missing values of all the predictors are available or at the end date of the requested forecast
period, whichever is earlier.
9Table 7.1. Table illustrating the forecasted figures
As you can see from the above graphical model utilising the Holt Model, and the associated forecast table, SPSS
used the existing data collected every quarter from 2004 – 2015 in order to make forecasts into the first quarter
of 2018. It also produced the upper and lower class limits, but we are most interested in the actual forecasted
figures. Using those forecasts, we are then able to make decisions regarding expected growth, resource
117 MANCOSA
Statistical Techniques in Business
requirements to accommodate the growth, and so forth. This is a typical example of a time series. It assumes
that actual values of a random variable in a time series are influenced by a variety of environmental forces
operating over time. As such, it attempts to isolate and quantify the influence of these different environmental
forces operating on the time series into a number of different components. There are four underlying forces
individually and collectively determine the random variables value
– Trend (T)
– Cyclical Variations (C)
– Seasonal Variations (S)
– Random (irregular) variation (R)
Each of these account for a type of variation that causes a fluctuation in the data.
7.1.1 Trend
Trend is denoted T, and is defined as a long-term smooth underlying movement in time series, and describes the
effect that long-term factors have on the series. These long-term factors tend to operate fairly gradually and in
one direction for a long period of time, and for a duration of longer than a year. These trends may be linear, or
non-linear.
MANCOSA 118
Statistical Techniques in Business
119 MANCOSA
Statistical Techniques in Business
The four components of the time series, that being the trend, seasonal, cyclical random variations) combine in
different ways. Using time series analysis, we try to isolate the influence of each of the four components in the
series.
There are two models for doing so, the additive and the multiplicative models.
Additive:
Y=T+S+C+R
In additive models, the seasonal, cyclical and random variations are absolute deviations from the trend, and do
not depend on the level of the trend.
Multiplicative:
Y=TxSxCxR
In multiplicative models, the seasonal, cyclical and random variations are relative deviations from the trend, thus,
the higher the trend, the more intensive the variations.
To illustrate:
MANCOSA 120
Statistical Techniques in Business
21Figure 7.5. Illustration of the additive model 22Figure 7.6. Illustration of the multiplicative model
Both of these time series have a general upward trend, but the fluctuations around the additive model have
roughly the same intensity, whereas the fluctuations around the multiplicative model become increasingly more
intensive.
There are generally two types of methods to identify the underlying pattern, namely the Smoothing and
Decomposition.
Practical Application
Let’s say we sold 30 widget spinners during the month of September. We want to estimate
what our sales will be for October. Our best guess might be that we will sell 30 widgets
during October – basically we have used a “one month moving average” as our forecast.
When we want to forecast for November, we may want to take into account what happened
during September and October. Let’s say we had sales of 40 during October. If we took a
two-month moving average, our forecast for November would be:
(𝐴𝑐𝑡𝑢𝑎𝑙)𝑆𝑒𝑝𝑡 +(𝐴𝑐𝑡𝑢𝑎𝑙)𝑂𝑐𝑡
𝐹𝐶𝑁𝑜𝑣 = 2
30 +40
FC = 2
FC = 35
121 MANCOSA
Statistical Techniques in Business
If we sold 30 in November, we will have three types of moving averages available to us were
we try to make the next prediction:
𝐹𝐶𝐷𝑒𝑐 = 35
Option 3: 3 month moving average
(𝐴𝑐𝑡𝑢𝑎𝑙)𝑆𝑒𝑝𝑡 +(𝐴𝑐𝑡𝑢𝑎𝑙)𝑂𝑐𝑡 +(𝐴𝑐𝑡𝑢𝑎𝑙)𝑁𝑜𝑣
𝐹𝐶𝐷𝑒𝑐 = 3
30 +40+30
𝐹𝐶𝐷𝑒𝑐 = 3
𝐹𝐶𝐷𝑒𝑐 = 33.3
MANCOSA 122
Statistical Techniques in Business
Activity 7.1.
7.1. 1. You are trying to estimate the number of earphones using the moving averages
method. From the following amounts, draw a table to estimate the figures at 1, 3 and 5
months, then draw and comment on the graph:
Months Earphones
1 30
2 42
3 50
4 43
5 42
6 48
7 52
8 36
9 40
10 37
11 40
12 53
13 52
14 51
15 55
16 58
17 52
18 53
19 59
20 61
The moving average technique has the advantage that it is simple to use and easy to understand. Two of its
major disadvantages, however, are:
123 MANCOSA
Statistical Techniques in Business
𝐹𝑡+1 = 𝐹𝑡 + 𝑎 (𝐷 − 𝐹𝑡 )
Where:
𝐹𝑡+1 = Forecast for the next period (t + 1)
𝐹𝑡 = Forecast for the latest period (t)
𝑎 = Smoothing coefficient
𝐷 = Actual demand for period t
The object is to select a value for the smoothing coefficient – the error to test such would be the Mean Absolute
Deviation (MAD) calculated such that:
∑𝑛𝑡=1 |𝐷 − 𝐹|
𝑀𝐴𝐷 =
𝑛
Where:
|𝐷 − 𝐹| = absolute value of the error
𝑛 = number of periods reviewed
Practical Application
Let’s say you estimated that you would sell 100 fidget spinners, but actually only sold 90.
You chose the smoothing coefficient to equal 0.2.
𝐹𝑡+1 = 𝐹𝑡 + 𝑎 (𝐷 − 𝐹𝑡 )
𝐹𝑡+1 = 100 + 0.2 (90-100) = 98
𝐹𝑡 = 90
𝑎 = 0.2
𝐷 = 100
∑𝑛𝑡=1 |100 − 90|
𝑀𝐴𝐷 =
1
𝑀𝐴𝐷 = 10
MANCOSA 124
Statistical Techniques in Business
The best way in which to forecast using different coefficients is to create a table:
Given the following dataset:
Month Actual Demand
1 22
2 18
3 23
4 21
5 17
6 24
7 20
8 19
9 18
10 21
Step 1: Create a first forecast – we usually do this using the first known measure
Step 2: You will use the previous estimate, add it to (alpha*(previous demand-the previous exponential
smoothing number) (𝐹𝑡+1 = 𝐹𝑡 + 𝑎 (𝐷 − 𝐹𝑡 ))
125 MANCOSA
Statistical Techniques in Business
Activity 7.2.
Using the smoothing average, calculate the forecast for alpha = 0.2 for the following values,
include the forecast for the 11 month:
MANCOSA 126
Statistical Techniques in Business
To date, we have covered the two ways in which we can decompose the various elements (trend, cyclical,
seasonal and random) from data in order to discover underlying trends. Seasonal indices is one manner in which
we can deseasonalise data by removing seasonal fluctuations or patterns in the data in order to make
predictions about potential future values. We develop a seasonal index as a ratio of the demand for a particular
season to the demand for an average season.
If demand is for 100 units in an average season and demand for the summer season is 80, the summer season
index is 80 / 100 = 0.8.
There are four seasons in a year. Thus, the mean seasonal demand is calculated using a four-period moving
average, and is centred on the middle of a given season, that is a month and a half into the season. It includes
demands going back six months and forward six months from that point.
Practical Application
You own a Beach store in Scottburgh, and you sell mainly beachwear, especially costumes.
You are interested in determining how many costumes you wish to sell going forward into
2019, and look back on sales data since inception of your store in 2015.
127 MANCOSA
Statistical Techniques in Business
1
Step 1: Create a mean seasonal demand by adding 2 of the first season + the middle 3 into
half of the last season. Your first mean will fall in the middle season e.g.
Calculate as follows:
MANCOSA 128
Statistical Techniques in Business
Step 2: Create a seasonal index by dividing the actual demand by the mean seasonal
demand:
You do this by adding together the index scores for each season, and dividing by the total
number of seasons included in that period. In this case, it was 3. So you average the index
score to generate an index for the year you wish to forecast to. The resultant table looked as
follows:
129 MANCOSA
Statistical Techniques in Business
Step 4: Based on previous estimates, you approximate based on the mean of previous
years’ demands that you are going to sell roughly 356 costumes in 2019, an average of
89.94 costumes a season = 90.
Then you multiply an average of which is the average number of costumes each season by
the index number to give it a weighting.
MANCOSA 130
Statistical Techniques in Business
Activity 7.3.
You are the owner of a painting company. Painting is best done when conditions are dry, and
thus favours Autumn/Winter seasons. You have owned the business since 2015, and wish to
make projections to 2019 for paint sales. Using a seasonal analysis, calculate the forecasted
seasonal paint estimates for 2019.
131 MANCOSA
Statistical Techniques in Business
When we are dealing with quarters, rather than actual seasons, it is not imperative that we deal with half seasons
as per the examples above. If we plot the data, and it is apparent that there is a trend in the data that occurs the
repeatedly in each quarter every year, seasonality is present. For example:
You can see that the data varies with the same/similar patterns annually. In other words, it dips every fourth
quarter, and peaks every second quarter across each year from year 1 to year 3. In these instances, you do not
have to find the mean seasonal demand, but rather the mean of each quarter, which simply involves averaging
quarters 1 – 4.
Practical Application
MANCOSA 132
Statistical Techniques in Business
Actual Demand
100
90
80
70
60
50
40
30
20
10
0
You can see a general trend – as the data peaks around quarter 4, and dips around quarter 3.
There is a similar trend across the four quarters between years 1-3. Data is seasonal.
Year Average
2015 68.5
2016 73.5
2017 76.5
133 MANCOSA
Statistical Techniques in Business
Step 2: Workout the proportion or index for each quarter by dividing the actual demand by the
average/mean demand for that year
This yeilds and index for each quarter for each year:
Year Quarter Actual Demand Seasonal Index
Year 1 Quarter 1 72 1.051094891
Year 1 Quarter 2 64 0.934306569
Year 1 Quarter 3 63 0.919708029
Year 1 Quarter 4 75 1.094890511
Year 2 Quarter 1 75 1.020408163
Year 2 Quarter 2 66 0.897959184
Year 2 Quarter 3 64 0.870748299
Year 2 Quarter 4 89 1.210884354
Year 3 Quarter 1 76 0.993464052
Year 3 Quarter 2 68 0.888888889
Year 3 Quarter 3 67 0.875816993
Year 3 Quarter 4 95 1.241830065
You can see that any number over 1 indicates an increase above the average for that quarter.
Calculate the annual proportion for each year.
Your seasonal indices will always add up to the number of time periods i.e. quarters = 4,
Months = 12 etc.
For example – if we were to add the seasonal indices for the above they tally to four.
MANCOSA 134
Statistical Techniques in Business
Step 4: Calculate the deseasonalised values. This is done by dividing the actual value by the
seasonal index.
When you plot the seasonal data against the deseasonalised data, it looks as follows:
135 MANCOSA
Statistical Techniques in Business
From the graph above it is evident that once you remove the seasonality, the line is a lot
smoother. When the data has the seasonality, there are rather large fluctuations, with high
peaks and troughs because of its seasonality. When you take the seasonality out of the data,
you get to see the overall trend. From the above, you can see a slight upward trend, noting a
dip in quarter 4, but only in the first year. This means that given that time of the year – sales
were relatively low.
Step 5: Calculate the forecast to the first quarter in the fourth year assuming the expected
sales will be 312
79*1.021
= 79.68914476
Activity 7.4.
MANCOSA 136
Statistical Techniques in Business
7.4 Summary
This final unit explored the various ways in which we can analyse temporal data, often to make predictions based
on past values. It demonstrated the various ways in which we can decompose data in order to discover
underlying trends. This included seasonal decomposition for seasonal data. Time Series Forecasting present a
very powerful way of analysing and using historical data in order to plan for the future.
Answers to acitivities
7.1.1.
137 MANCOSA
Statistical Techniques in Business
70
60
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
7.1.2. The problem with moving averages is that they will always lag – this means there will be no estimate
until after 2 months for the 1 month, 4 months for the 3 month and 6 months for the 5-month moving
average. This means that there will be no data available from months 1 – 5 for the 5-month moving
average, similarly, no data for months 1-4 for the 3-month forecast.
The more the number of months included in the estimate – the smoother the curve
7.2.1.
MANCOSA 138
Statistical Techniques in Business
7.3.
Estimated projection
Spring 2019 114.0569 115
Summer 2019 127.7186 128
Autumn 2019 263.855 264
Winter 2019 304.3711 305
812
139 MANCOSA
Statistical Techniques in Business
7.4.1. You can see a general trend – as the data peaks around quarter 4, and dips around quarter 3. There is a
similar trend across the four quarters between years 1-3. Data is seasonal.
Year Year 1 Year 2 Year 3
Q1 145 140 145
Q2 185 190 188
Q3 132 135 130
Q4 94 90 95
TOTAL 556 555 558
AVG 139 138.75 139.5
Revision Questions
1. You are a distributor of pop grips, and wish to determine how many pop-grips you will need to order from
China. You decide to use the moving average in order to estimate different numbers using a 1 month, 3
month and 5 month moving average.
MANCOSA 140
Statistical Techniques in Business
Months Pop-Grips
1 30
2 42
3 50
4 43
5 42
6 48
7 52
8 36
9 40
10 37
11 40
12 53
13 52
14 51
15 55
1.1.
141 MANCOSA
Statistical Techniques in Business
1.2.
Trend (T)
Trend is defined as a long-term smooth underlying movement in time series. It describes the effect that long-term
factors have on the series. These long-term factors tend to operate fairly gradually and in one direction for a long
period of time.
The most common form of cycle is the business cycle between periods of relatively good economic
activity to poor economic activity. The causes of these are difficult to determine. Action by government, trade
unions and world organisations induce levels of pessimism and optimism into the economy which are reflected in
changes in the time series levels. Index numbers are used to describe cyclical fluctuations.
These variations follow no specific pattern, and cannot be analysed statistically, and thus cannot be incorporated
into forecasts.
MANCOSA 142
Statistical Techniques in Business
References
Brownlee, J. (2016). What is Time Series Forecasting? Accessed September 26, 2018, from:
https://machinelearningmastery.com/time-series-forecasting/
Durrheim, K., and Tredoux, C. (2002). Numbers, Hypotheses & Conclusions: A Course in Statistics for the
Social Sciences. Cape Town: UCT Press.
Durrheim, K., and Tredoux, C. (2012). Numbers, Hypotheses & Conclusions: A Course in Statistics for the
Social Sciences. (2nd ed.). Cape Town: UCT Press.
GAO. (1992). Quantitative Data Analysis: An Introduction. Accessed February 4th, 2018, from:
http://archive.gao.gov/t2pbat6/146957.pdf
Groebner, D.F., Shannon, P.W., Fry, P.C., and Smith, K.D. (2011). Business Statistics: A Decision-Making
Approach. 8th ed. Boston: Prentice Hall.
Wegner, R. (2012). Applied Business Statistics Method and Excel-Based Applications. (3rd ed.). Cape Town: Juta
& Company Limited.
Wegner, T. (2015). Applied Business Statistics. (4th ed.). Juta: Cape Town
Weiers, R. M. (2011). Introduction to Business Statistics. 7th ed. South Western, Cengage Learning. Chapter 18
pages 688-715.
Yaffee, R.A. & McGee, M. Introduction to Time Series Analysis and Forecasting:
https://core.ac.uk/download/pdf/44191640.pdf
143 MANCOSA
Statistical Techniques in Business
APPENDICES
APPENDIX 1 – z-table
MANCOSA 144
Statistical Techniques in Business
145 MANCOSA
Statistical Techniques in Business
MANCOSA 146