Professional Documents
Culture Documents
BACHELOR OF COMMERCE
(GENERIC)
QUANTITATIVE TECHNIQUES
STUDY GUIDE
2023
COPYRIGHT © EDUCOR, 2023
All rights reserved. No part of this publication may be reproduced, distributed, or transmitted
in any form or by any means, including photocopying, recording, or other electronic or
mechanical
methods, without the prior written permission of Educor Holdings. Individual’s found guilty
of copywriting will be prosecuted and will be held liable for damages.
1
Quantitative Techniques Damelin ©
TABLE OF CONTENTS
2
Quantitative Techniques Damelin ©
3
Quantitative Techniques Damelin ©
4
Quantitative Techniques Damelin ©
5
Quantitative Techniques Damelin ©
1. ABOUT BRAND
Damelin knows that you have dreams and ambitions. You’re thinking about the future, and
how the next chapter of your life is going to play out. Living the career you’ve always dreamed
of takes some planning and a little bit of elbow grease, but the good news is that Damelin will
be there with you every step of the way.
We’ve been helping young people to turn their dreams into reality for over 70 years, so rest
assured, you have our support.
As South Africa’s premier education institution, we’re dedicated to giving you the education
experience you need and have proven our commitment in this regard with a legacy of
academic excellence that’s produced over 500 000 world – class graduates! Damelin alumni
are redefining industry in fields ranging from Media to Accounting and Business, from
Community Service to Sound Engineering. We invite you to join this storied legacy and write
your own chapter in Damelin’s history of excellence in achievement.
A Higher Education and Training (HET) qualification provides you with the necessary step in
the right direction towards excellence in education and professional development.
6
Quantitative Techniques Damelin ©
7
Quantitative Techniques Damelin ©
• Taking multi culturality into account in a responsible manner that seeks to foster an
appreciation of diversity, build mutual respect and promote cross-cultural learning
experiences that encourage students to display insight into and appreciation of
differences.
8
Quantitative Techniques Damelin ©
Icons
The icons below act as markers, that will help you make your way through the study guide.
Additional information
Find the recommended information listed.
Case study/Caselet
Apply what you have learnt to the case study presented.
Example
Examples of how to perform a calculation or activity with the
solution / appropriate response.
Practice
Practice the skills you have learned.
Reading
Read the section(s) of the prescribed text listed.
Revision questions
Complete the compulsory revision questions at the end of each
unit.
Self-check activity
Check your progress by completing the self-check activity.
Think point
Reflect, analyse and discuss, journal or blog about the idea(s).
Video / audio
Access and watch/listen to the video/audio clip listed.
Vocabulary
Learn and apply these terms.
9
Quantitative Techniques Damelin ©
Welcome to Quantitative Techniques. This course is a Business Statistics in nature and its
part of every management education programme offered today by academic institutions and
business schools. Statistics provides evidence-based information which makes it an important
decision support tool in management.
Although students are encouraged to use this guide, it must be used in conjunction with other
prescribed and recommended text.
3.1Module Information
Qualification title Bachelor of Commerce (Generic)
Module Title Quantitative Techniques
NQF Level 6
Credits 10
Notional hours 100
3.3 Outcomes
At the end of this module, students should be able to:
• Describe the role of Statistics in management decision making and the importance of
data in statistical analysis.
• Describe the meaning of and be able to calculate the mean, confidence intervals for
the mean, standard deviation, standard error, median, interquartile range, and mode.
• Summarise tables (pivot tables) and graphs providing a broad overview of the profile
of random variables, identifying the location, spread, and shape of the data.
• Describe and make use of probability distributions that occur most often in
management situations, that describe patterns of outcomes for both discrete as well as
continuous events
• Review the different methods of sampling and the concept of the sampling distribution.
• Describe the concept of interval estimation
• Describe hypothesis testing and construct the null and an alternate hypothesis.
10
Quantitative Techniques Damelin ©
• Test the difference between correlated and uncorrelated sample means using a t-test
for two means and analysis of variance of several means.
• Express the relationship between two variables by regression and calculating their
correlation.
• Analyse the dependence of one variable upon another by regression.
• Sole well defined but unfamiliar problems using correct procedures and appropriate
evidence.
• Describe the time series analysis using a statistical approach to quantify the factors
that influence and shape time series data and apply it to making forecasts of future
levels of activity of the time series variables.
3.4 Assessment
You will be required to complete both formative and summative assessment activities.
Formative assessment:
These are activities you will do as you make your way through the course. They are designed
to help you learn about the concepts, theories and models in this module. This could be
through case studies, practice activities, self-check activities, study group / online forum
discussions and think points.
You may also be asked to blog / post your responses online.
Summative assessment:
You are required to write Exam at the end of the semester. For online students, the tests
are made up of the revision questions at the end of each unit. A minimum of five revision
questions will be selected to contribute towards your test mark.
Mark allocation
The marks are derived as follows for this module:
Test 20%
MCQ’s 10%
Assignments 10%
Exam 60%
TOTAL 100%
3.5 Planning Your Studies
You will have registered for one or more modules in the qualification and it is important that
you plan your time. To do this look at the modules and credits and units in each module.
Create a time table / diagram that will allow you to get through the course content, complete
the activities, and prepare for your tests, assignments and exams. Use the information
provided above (How long will it take me?) to do this.
4. PRESCRIBED READING
• www.statisticssa.gov.za
Video / Audio
• https://www.youtube.com/watch?v=8SHnJfPQ9qc
5. MODULE CONTENT
You are now ready to start your module! The following diagram indicates the topics that will
be covered. These topics will guide you in achieving the outcomes and the purpose of this
module. Please make sure you complete the assessments as they are specifically designed
to build you in your learning.
12
Quantitative Techniques Damelin ©
Unit7: CONFIDENCE
INTERVAL
ESTIMATION
Unit8: HYPOTHESE
TEST– SINGLE
POPULATION
(PROPORTIONS
& MEANS)
S S
Unit9: CHI-SQUARE
D HYPOTHESE
TEST
S S
Unit10: SIMPLELINEARREGRESSIO
ANDCORRELATIO
ANALYSI
N N S
Unit11: TIMESERIEANALYSIS:
A FORECASTIN
TOOL
5.1 S in Management
Study Unit 1: Statistics G
The purpose of this unit is to introduce the common terms, notations and
Purpose concepts in statistical analysis
13
Quantitative Techniques Damelin ©
It will take you 4 hours to make your way through this unit.
Time
Important
Information • is organised (collected, collated, summarized,
terms and analysed and presented) data values that are
definitions meaningful and can be used to make business
decisions
•
Variable
14
Quantitative Techniques Damelin ©
15
Quantitative Techniques Damelin ©
5.1.1 Introduction
This unit focuses on describing what is statistics and the role it plays in management and
decision making. The importance of data in statistics is also discussed. Some basic statistical
terms and concepts are also explained
Information
In order to make sound and viable business decisions, managers need high-quality
information. Information must be relevant, adequate timeous, accurate and of easy access.
Information is organised (collected, collated, summarized, analysed and presented) data
values that are meaningful and can be used to make business decisions. Most often the
information is not readily available in the formats required by the decision makers. Wegner
(2016)
Data
Date constitute of individual values, for instance, observations or measurements on an issue
e.g. R400.50, 5 days, 70 meters, strongly agree, etc. Data is readily available and carries a
little useful and usable information to decision makers. Wegner (2016)
Statistics
use of statistics empower managers to become confident and quantitative reasoning skills that
enhance decision-making capabilities and provides an advantage over colleagues who do not
possess them Black (2013).
Transformation process from data to information
INPUT___________PROCESS____________OUTP UT ____________BENEFIT
Video / audio
Watch the YouTube link below and write a brief essay on why
you think statistics is important.
https://www.youtube.com/watch?v=yxXsPc0bphQ
5.1.3 The terminology of Statistics
Some essential terms and concepts are:
Random Variable is any attribute or characteristic being measured or observed. It takes
different values at each point of measurement e.g. Years of experience of an employee.
Data, these are real values or outcomes drawn from a random variable e.g. Years of experience of
an employee might be (2, 4, 1, 3, 2, 6).
See below some examples of random variables and related data:
• Travel distances of delivery vehicles (data: 22 km, 18 km, and 29 km)
• Daily occupancy rates of hotels in Pretoria (data: 34%, 48%, and 34%)
• Duration of machine spends on working (data: 13 min, 21 min, and 18 min)
• Brand of washing powder preferred (data: Sunlight, OMO, and Aerial).
Sampling Unit, this will be the item being measured, observed or counted with respect to the
random variable under study. e.g. employees.
17
Quantitative Techniques Damelin ©
Population represents every possible item that contains a data value (measurement or
observation) of the random variable under study. The sampling units should possess the
characteristics that are relevant to the problem. e.g. all employees of Damelin Randburg.
Population Parameter, the actual value of a random variable in a population. It’s derived
from all data values on the random variable in the population. It is constant. e.g. about 57%
of MTN employees have more than 5 years’ experience.
The sample is a subset of items drawn from a population. e.g. employees in the Finance
department of MTN.
Sample Statistic, It is a value of a random variable derived from sample data. It is NOT
constant as its value always depends on the values included in each sample drawn.
Table 1. 1 Examples of population and associated samples
Randomvariable Population Sampling Sample
unit
Size of bank An Absa 400
overdraft client with randoml
All current accountswith a current y
Absa account selected
client’s
current
account
s
Mode of daily All commuters to Cape A commuter 600 randomly
commuter transport Town’s central business district (CBD) to Cape selected
to work Town’s commuters to
CBD Cape Town’s
CBD
TV programme All TV viewers in Gauteng A 2000
preferences TV randomly
vie selected TV
wer viewers in
in Gauteng
Gau
teng
18
Quantitative Techniques Damelin ©
Mean x µ
Standard deviation s Σ
Variance s2 σ2
Size n N
Proportion p π
Correlation r ρ
19
Quantitative Techniques Damelin ©
▪ Human Resources
▪ Operation/Logistics
▪ Economics
5.1.6 Statistics and Computers
The invention of computers has opened many new opportunities for statistical analysis. A
computer allows for storage, retrieval, and transfer of large data sets. Some widely used
statistical techniques, such as multiple regression, are so tedious and cumbersome to compute
manually that they were of little practical use to researchers before computers were developed.
Some statistical software packages, include i.e. R, Minitab, SAS, and SPSS.Wegner (2016)
Study group
The link below takes you through various statistical packages that
are used in industry. Identify three statistical packages that are of
interest to you and carry out a brief research writing what features
they entail. https://www.youtube.com/watch?v=hHywVkLwLzg
20
Quantitative Techniques Damelin ©
Nominal is the lowest level of data measurement followed by ordinal, interval, and ratio. Ratio
is the highest level of data
Figure 1.2 Hierarchy of Levels of Data
21
Quantitative Techniques Damelin ©
The Internet
In these last 20 years, Internet has become an almost limitless source of business and economic
data. With the help of powerful search engines, even the casual user has instant access to data
that once would have required hours, if not weeks, of painstaking research (Bergquist et al,
2013).
22
Quantitative Techniques Damelin ©
on topics of narrower interest, are also important sources of business and economic data
(Bergquist et al, 2013).
Original Studies
If the data you need doesn't already exist, you may have to design an original study. In
business, this often means setting up an experiment, conducting a survey, or leading a focus
group. To estimate the effect of price on sales of your company's new 3D camera, for
example, you might set up a three- month experiment in which different prices are charged
in each store or region in which your product is sold. At the end of the experiment, differences
in sales volume can be compared and analyzed. Surveys typically involve designing and
administering a questionnaire to gauge the opinions and attitudes of a specific target group.
Although this may sound like a simple data-gathering procedure, you'll need to take care. The
way in which questions are phrased, and the order by which questions are asked, can have
a significant impact on the answers you get (Bergquist et al, 2013).
Internal versus External Sources
▪ Internal data refers to the availability of data from within an organization, examples are.
Financial, production, human resources etc.
▪ External data refers to data available from outside an organization, examples are.
Employee associations, research institutions, government bodies etc.
Direct observation –by directly observing the respondent or object in action data is collected.
E.g.
vehicle traffic survey
Desk Research (Abstraction) – extracting secondary data from a variety of source documents.
E.g. books, publications, newspapers etc.
▪ Survey Methods:
Primary data is gathered through the direct questioning of respondents.
Personal interviews: involves face-to-face with a respondent during which a questionnaire gets
to be completed.
Postal surveys: involve posting questionnaires to respondents for completion
23
Quantitative Techniques Damelin ©
Reading
Read T. Wegner, 4th edition, Chapter 1, p 2 – 19 for a
comprehensive coverage of this unit on statistics in management.
5.1.11 Summary
In this unit, the importance of statistics has been explained as a support tool in management
decision making. The language of statistics will be key as it allows for interaction with various
statistical data analysts. Components of statistics were discussed as well as the different
statistical packages that may be used in data analysis. Data quality was discussed in terms of
the four factors that affect it, namely data type, data source, data collection method and the
process used to prepare it. The manner in which each data type influences the choice of an
appropriate statistical method will be addressed more fully in later units. As shown in this unit,
data sources, data-gathering methods, and data
preparation issues all have an influence on the quality of data, and hence the accuracy,
reliability, and validity of statistical findings.
24
Quantitative Techniques Damelin ©
25
Quantitative Techniques Damelin ©
Think point
Explain in your opinion if the statistics presented in this report are
exact figures or estimates? Describe in your thinking possibilities
of how and where could the analysts have gathered such data?
Discuss the levels of data measurement are represented by data
on rural India Black (2013).
5.1.12 Revision
Let’s see what you have learned so far by taking this short self-assessment.
Self-check activity
The following types of questions are sometimes on a survey, to
determine how providers can better serve their clientele,
hospital administrators sometimes administer a quality
satisfaction survey to their patients after the patient is released.
What level of data measurement will result from these
questions?
1. How long has it been since released from the hospital?
2. Which type of unit did you spend all of the time in during
your stay?
a) Coronary care
b) Intensive care
c) Maternity care
d) Medical unit
e) Pediatric/children unit
f) Surgical Unit
3. When choosing a hospital, how important is the hospital’s
location?
a) Very Important b) Somewhat importantc) Not very important
The purpose of this unit is to explain how to summarise data into table
Purpose format and how to display the results in an appropriate graph or chart
26
Quantitative Techniques Damelin ©
It will take you 9 hours to make your way through this unit.
Time
Important Histogram • is a graphic display of numeric frequency
terms and
definitions distribution
Pie Chart
• A circular display of data where the area of the
whole circle represents 100% of the data and
slices of the circle represents a percentage
Frequency breakdown of the different sublevels
polygon
Ogive • Is a graphical display of class frequencies.
However, instead of using rectangles like a
histogram, in a frequency polygon, each class
frequency is plotted as a dot at the class
midpoint, and the dots are connected by a
series of line segments
5.2.1 Introduction
This study unit aims to give the students an understanding of the common ways in which
statistical findings are conveyed. The most commonly used means of displaying statistical
results are summary tables and graphs. Summary tables can potentially be used to summarise
single random variables as well as examine the relationship of 2 random variables. The choice
of the summary table and graphic to be used depends on the type of data that need to be
displayed. Managers do benefit from the statistical findings if the information will easily be
interpreted and communicated effectively to them. Tables and graphs convey information
much more efficiently and quick than a written report. In graphs and tables, for instance, there
is much truth in the old adage ‘a picture is worth a thousand words’.
While in practice, analysts’ should most if not all the times consider the use summary tables
and graphical displays more than written texts. Profile consisting of a single random variable
27
Quantitative Techniques Damelin ©
(e.g. most- preferred TV channel by viewers or pattern of delivery times) or to examining the
relationship between two random variables (e.g. between gender and newspaper readership)
can be easily summarised by summary tables and graphs.
Toyota 3 20
Nissan 2 13
Isuzu 4 27
Total 15 100
28
Quantitative Techniques Damelin ©
Electronics R211.89
Clothing R134.40
Misc. R93.72
29
Quantitative Techniques Damelin ©
Pie Chart
A circular display of data where the area of the whole circle represents 100% of the data and
slices of the circle represents a percentage breakdown of the different sublevels is known as
a pie chart. Pie charts are widely used in business, mostly in showing things such as budget,
market share, ethnic groups and time/resource allocations categories. Since pie charts can
lead to less accuracy than are possible with other types of graphs there are, however, their
use is minimized in the sciences and technology. In general, it is harder for the viewer to
interpret the relative sizes of angles in a pie chart than judging the length of rectangles in a bar
chart Black (2013).
Construction of a Pie chart
• Divide a circle into segments.
• Size of each segment should be proportional to the frequency count/ percentage of its
category.
• The sum of the segment frequency must equal to the whole.
Figure 2.3 Pie Chart for Table 2.1
30
Quantitative Techniques Damelin ©
2007 2008
Mazda 6 5 11
Toyota 3 2 5
Nissan 2 1 3
Isuzu 4 7 11
Total 15 15 30
Data from a cross-tabulation table can be displayed as a component bar chart or a multiple
bar chart.
Figure 2.4 Component (Stacked) Bar Chart for Table 2.2
Absolute 2007
2008
31
Quantitative Techniques Damelin ©
Frequency
A numeric frequency table (distribution) is a summary table which groups numeric data into
intervals and reports the frequency count of numbers assigned to each interval.
Construction of a frequency table:
• Determine data range, often is defined as the difference between the largest and smallest
numbers
• Decide on a number of classes, one rule of thumb is to select between 5 and 15 classes.
If the frequency distribution contains too few classes, the data summary may be too general
to be useful. Too many classes may result in a frequency distribution that does not
aggregate the data enough to be helpful. The final number of classes is arbitrary. The
business researcher arrives at a number by examining the range and determining the
number of classes that will span the range adequately and also be meaningful to the user
• Determine class width, an approximation of the class width can be calculated by dividing
the range by the number of classes
• Determine class limits, are selected so that no value of the data can fit into more than one
class Table 2. 4 Frequency Table of Office Data
32
Quantitative Techniques Damelin ©
145-<160 6 15 45 70 152.5
160-<175 12 30 75 55 167.5
175-<190 8 20 95 25 182.5
Total 40 100
33
Quantitative Techniques Damelin ©
Frequency polygon
A frequency polygon, like the histogram, is a graphical display of class frequencies. However,
instead of using rectangles like a histogram, in a frequency polygon, each class frequency is
plotted as a dot at the class midpoint, and the dots are connected by a series of line segments.
Construction of a frequency polygon begins by scaling class midpoints along the horizontal
axis and the frequency scale along the vertical axis. A dot is plotted for the associated
frequency value at each class midpoint. Connecting these midpoint dots completes the graph
Black (2013).
34
Quantitative Techniques Damelin ©
interested in controlling costs, an ogive could depict cumulative costs over a fiscal year. Steep
slopes in an ogive can be used to identify sharp increases in frequencies Black (2013).
Stem-and-Leaf Plots
Another way to organize raw data into groups besides using a frequency distribution is a
stem-and- leaf plot. This technique is simple and provides a unique view of the data. A stem-
and-leaf plot is constructed by separating the digits for each number of the data into two
35
Quantitative Techniques Damelin ©
groups, a stem, and a leaf. The leftmost digits are the stem and consist of the higher valued
digits. The rightmost digits are the leaves and contain the lower values. If a set of data has
only two digits, the stem is the value on the left and the leaf is the value on the right. For
example, if 34 is one of the numbers, the stem is 3 and the leaf is 4. For numbers with more
than two digits, division of stem and leaf is a matter of researcher preference Black (2013)
76 92 47 88 67
23 59 72 75 83
77 68 82 97 89
81 75 74 39 67
79 83 70 78 91
68 49 56 94 81
Stem Leaf
2 3
3 9
4 7 9
5 5 6 9
6 0 7 7 8 8 6 7 7 8 9
7 0 2 4 5 5 6 8 9
8 1 1 2 3 3
9 1 1 2 4 7
5.2.4 Summary
In this unit, we have explained a categorical frequency table and how different charts (pie and
bar) are constructed from it. The cross-tabulation table was also explained and how different
bar charts (stacked and multiple) are constructed from it. The unit also looked at the
summarization of one or two numeric variables. The construction of numeric frequency
distribution tables, histograms, ogives, frequency polygons, scatter plot, box plot, trend line
graphs and Lorenz curves were explained.
36
Quantitative Techniques Damelin ©
Think point
If you are a shipping container industry specialist, and you are requested
to prepare a brief report displaying the leading shipping companies both
in a number of ships and TEU shipping capacity. What could be the best
way to statistical display information from this shipping container
company? Is the raw data adequate? Can you graphically display the
data?
5.2.5 Revision
Self-Assessment
37
Quantitative Techniques Damelin ©
Let’s see what you have learned so far by taking this short self -assessment.
38
Quantitative Techniques Damelin ©
39
Quantitative Techniques Damelin ©
40
Quantitative Techniques Damelin ©
STEP 2. For an odd number of terms, find the middle term of the ordered array. It is the median.
STEP 3. For an even number of terms, find the average of the middle two terms. This average
is the median.
−
(fx)
Mean = x= x = mid-point of each class interval, f = frequency of each data value, n
n
= total frequency.
2. The Mode
The mode for grouped data is the class midpoint of the modal class. The modal class is the
class interval with the greatest frequency. The formula for the mode of grouped data follows
Modal Formula:
m −fm )1 −
Modal Value = M o = Omo
+2fm − fm− 1 − fm+1
O
Where mo =lower limit of the modal class interval
f
c = thewidthof the modal class interval m =the frequency
f
of the modal class m1 + =thefrequencyof the class after the
f
modal class m1− =thefrequencyof the class preceding the
modal interval.
3. The Median
The median for ungrouped or raw data is the middle value of an ordered array of numbers.
For grouped data, solving for the median is considerably more complicated. The calculation
of the median for grouped data is done by using the following formula.
Median Formula:
41
Quantitative Techniques Damelin ©
n
c −f
2
Median = M e = O + fme me
)= the cumulative frequency counts of all intervals before the median interval.
Q−3Q1
i.e. QD=
2
• Range
• Variance
• Standard deviation
• Coefficient of variation
• Coefficient of skewness
Range
42
Quantitative Techniques Damelin ©
Sample
2
−
f xi − x
Variance(s
n− 1
2
) =
Population standard
deviation
43
Quantitative Techniques Damelin ©
(n()1−n)2−s
Interpretation:
SK = 0: Complete symmetry
SK > 0: Positively skewed
SK < 0: Negatively skewed
An approximation test for skewness
Box and whisker plot is another way to describe a distribution of data. Sometimes referred as
a box plot, is a depiction of the upper and lower quartiles together with the median and the two
extreme values to show a distribution graphically. The median is then enclosed by the box.
The box gets extended outward from the median along a band to the lower and upper quartiles,
encompassing not only the median but also the middle 50% of the data. From the lower and
upper quartiles, lines referred to as whiskers are stretched out from the box toward the
44
Quantitative Techniques Damelin ©
outermost data values. The box- and-whisker plot is determined from five specific numbers
sometimes referred to as the five-number summary Black (2013).
1. median (Q2)
2. lower quartile (Q1)
3. upper quartile (Q3)
4. smallest value in the distribution
5. largest value in the distribution.
Data type and the shape of a histogram (for numeric variables only determine the choice of a
statistical measure to describe the profile of any random variable. In the case of categorical
type data i.e. gender, job, marital status categorical frequency table (count and percentage,
the bar chart, pie chart and the modal category are the only valid statistical measures.
Dispersion and skewness measures do not exist and do not have any meaning Wegner (2016).
In numeric data type, the following are the appropriate descriptive statistics measures:
45
Quantitative Techniques Damelin ©
This unit covered most of the numeric descriptive statistical measures that can be used to
profile sample data. These measures expressed the location, spread, and shape of the
sample data in numeric terms. Each measure was defined and calculated and the conditions
under which each would be appropriate to use were identified. The influence of data type
and the presence of outliers are identified as the primary criteria determining the choice of
a suitable numeric descriptive measure to describe sample data.
Case study
Laundry Statistics
According to Procter & Gamble, 35 billion loads of laundry are run in
South Africa each year. Every second 1,100 loads are started. Statistics
show that one person in South Africa generates a quarter of a ton of
dirty clothing each year. South Africans appear to be spending more
time doing laundry than they did 40 years ago. Today, the average South
African woman spends seven to nine hours a week on laundry.
However, industry analysis shows that the result is dirtier laundry than
in other developed countries. Various companies market new and
improved versions of washers and detergents. Yet, South Africans seem
to be resistant to manufacturers' innovations in this area. In South Africa,
the average washing machine uses about 25 litres of water. In Europe,
the figure is about 20 litres. The average wash cycle of a South African
wash is about 35 minutes compared to 90 minutes in Europe. South
Africans prefer top-loading machines because they do not have to bend
over, and the top- loading machines are larger. Europeans use the
smaller front-loading machines because of smaller living spaces.
Source: Black (2011)
Think point
Most of the statistics presented here are drawn from studies or surveys.
Let’s say a study of laundry usage is done in 50 South Africa households
that have washers and dryers. Water measurements are taken for the
number of gallons of water used by each washing machine in completing
a cycle. The following data presented are the number of litres used by
each washing machine during the washing cycle. Summarize the data so
those study findings can be reported.
46
Quantitative Techniques Damelin ©
5.3.8 Revision
Self- check activity
1. Select the appropriate central location measure (mean,
median, mode) referred to in each of the following statements.
(a) A quarter of our lecturers have more than 10 years’ work
experience.
(b) The most wealth city in South Africa in Johannesburg. (c)
The average time taken by a runner to finish the 200m race is 17
seconds.
2. Identify for which of the following statements would the
arithmetic mean be inappropriate as a measure of central
location? (Give a reason.) State which measure of central
location would be more appropriate, if necessary? (a) The
ages of children at a playschool
(b) The number of cars using a parking garage daily
(c) The brand of cereal preferred by consumers
(d) The value of transactions in a clothing store
3. A candy shop owner documented the daily revenue of his
outlet for 300 trading days as shown in the frequency the
table.tr
Number of da
(a) Calculate
and interpret – < 750 15 the
(approximate) average daily
revenue of the – < 1 000 23
candy shop
(b) Find the median daily
revenue of the 000 – < 1 250 55
candy shop and
interpret its meaning.
250 – < 1 500 92
(c) What is the modal
daily revenue 500 – < 1 750 65 of the candy
shop?
5.4 Study Unit 4: Basic Probability – < 2 000
750Concepts 50
47
Quantitative Techniques Damelin ©
Combinations •
2.1.1. Introduction
Most business decisions are made under ambiguous conditions. Probability theory provides
the underpinning for quantifying and assessing uncertainty. It is used to estimate the
dependability in making inferences from samples to populations, as well as to quantify the
uncertainty of future occurrences.
48
Quantitative Techniques Damelin ©
Subjective Probability
The subjective method of assigning probability is based on the feelings or insights of the
person determining the probability. Subjective probability comes from the person's intuition
or intellectual. Although not a scientific approach to probability, the subjective method often
is based on the accrual of knowledge, understanding, and experience stored and processed
in the human mind. At times it is merely a supposition. At other times, the subjective
probability can potentially yield accurate probabilities. Subjective probability can be used to
exploit the background of experienced workers and managers in decision making. It is based
on an educated guess, expert belief or value judgment. This type of probability cannot be
confirmed statistically, hence it has limited use. E.g. the probability that it will rain in Capetown
tomorrow is 0.15 Wegner (2016).
Objective Probability
e.g. A container contains 3 red balls and 2 black balls. If a ball is picked at random from the
bag, what is the probability that it is: (i) red, (ii) black?
Solution:
(i) P(Red) = 3/5 (ii) P(Black) = 2/5
Structure of probability
In the study of probability, developing a language of terms and symbols is helpful. The structure
of probability provides a common framework within which the topics of probability can be
explored
Experiment
An experiment is a procedure that produces outcomes. Examples of business-oriented
experiments with outcomes that can be statistically analyzed might include the following.
• Interviewing 10 randomly selected consumers and asking them which brand of washing
powder do they prefer
49
Quantitative Techniques Damelin ©
• Sampling every 100th bottle of KOO beans from an assembly line and weighing the
contents
• Testing new antibiotic drugs on samples of HIV patients and measuring the patients'
improvement
• Auditing every 5th account to detect any errors
• Recording the S&P 500 index on the first Monday of every month for 5 years
• Event
Since an event is an outcome of an experiment, the experiment expresses the possibilities of
the event. If the experiment is to sample five bottles coming off a production line, an event
could be to get one defective and four good bottles. In an experiment to roll a die, one event
could be to roll an even number and another event could be to roll a number greater than two
50
Quantitative Techniques Damelin ©
NB* The probability of two mutually exclusive events taking place at the same time is
zero. Concept 4: Collectively Exhaustive Events
Events are collectively exhaustive when the union of all possible events is equal to the sample
space.
i.e. at least one of the events is certain to occur in a randomly drawn object from the sample
space Wegner (2016).
Concept 5: Statistically Independent Events
Two events A and B are statistically independent if the happening of event A has no effect on
the outcome of event B and vice-versa. Two or more events are independent events if the
occurrence or nonoccurrence of one of the events does not affect the occurrence or
nonoccurrence of the other event(s). Certain experiments, such as rolling dice, yield
independent events; each die is independent of the other. Whether a 6 is rolled on the first
die has no effect on whether a 6 is rolled on the second die. Coin tosses always are
independent of each other. The event of getting ahead on the first toss of a coin is
independent of getting ahead on the second toss. It is generally believed that certain human
characteristics are independent of other events Wegner (2016).
5.4.4 Calculating Objective Probabilities
Joint Probability
A joint probability is the probability of both event A and event B occurring simultaneously on a
given trial of a random experiment Wegner (2016).
Conditional Probability
A conditional probability is the probability of one event A occurring, given information about
the occurrence of a prior event Wegner (2016).
51
Quantitative Techniques Damelin ©
• Addition Rule
Multiplication Rule Addition
Rule
For non-mutually exclusive events
B )= )+ )− B)
For mutually exclusive events
B)= )+ )
Multiplication Rule
For statistically dependent events
B) = / B) )
For statistically independent events
B) = ) )
52
Quantitative Techniques Damelin ©
It is used to find the total number of different ways in which n objects of a single event can be
arranged (ordered).
n! = n factorial = n(n-1)(n-2)(n-3) ….. 3.2.1
Permutations
A permutation is the number of distinct ways of arranging a subset of r objects selected
from a group of n objects where the order is important. Each possible arrangement
(ordering) is called a permutation Formula nPr = n!/(n-r)!
Counting Rules
A combination is the number of diverse ways of bringing together a subset of r objects
selected from a group of n objects where the order is not important. Each separate grouping
is called a combination. Formula nCr = n!/r!(n-r)!
Reading
For extensive coverage of this topic on basic probability, concept
read the prescribed textbook T. Wegner Chapter 4, 4thedition,
p106-123
5.4.8 Summary
This unit presented the concept of probability as a basis for inferential statistics. The term
probability, probability concepts and types of probability were expounded. The calculations of
objective probability were carried out. The probability rules and counting rules were explained
and applied to everyday real-life situations.
53
Quantitative Techniques Damelin ©
Think point
What could be the reason that two-thirds of all dry-cleaning
customers are female? What are some of the factors to consider?
What could dry cleaners do to increase the number of male
customers? Sixty-five percent of customers at dry-cleaning
establishments are married. Can you consider reasons why a
smaller percentage of dry-cleaning customers are single? What
could dry cleaners do to grow the number of single customers?
54
Quantitative Techniques Damelin ©
5.4.9 Revision
Self-check activity
1. If an event has a probability equal to 0.2, what does this
mean?
2. What term is used to describe two events that cannot
occur simultaneously in a single trial of a random experiment?
3. What is meant when two terms are said to be ‘statistically
independent’?
4. If P (A) = 0.26, P (B) = 0.35 and P(A and B) = 0.14, what
is the value of P(A or B)?
5. If P(X) = 0.54, P(Y) = 0.36 and P(X and Y) = 0.27, what
is the value of P(X/Y)? Is it the same as P(Y/X)?
6. Economic sectors
In a survey of companies, it was found that 45 were in the
mining sector, 72 were in the financial sector, 32 were in the IT
sector and 101 were in the production sector.
(a) Show the data as a percentage frequency table.
(b) What is the probability that a randomly selected company
is in the financial sector?
(c) If a company is selected at random, what is the probability
that this company is not in the production sector?
(d) What is the likelihood that a randomly selected company
is either a mining company or an IT company?
(e) Name the probability types or rules used in questions b,
c and d.
5.5 Study Unit 5: Probability Distributions
Purpose The purpose of this unit is to introduce a few important probability
distributions that occur most often in management situations and also
describe patterns of outcomes for both discrete as well as continuous
events.
Learning By the end of this unit, you will be able to:
Outcomes • Understand the concept of a probability distribution.
• Describe three common probability distributions used in
management practice.
• Identify applications of each probability distribution in
management
• Calculate and interpret probabilities associated with each of
these distributions.
Time It will take you 12 hours to make your way through this unit.
55
Quantitative Techniques Damelin ©
56
Quantitative Techniques Damelin ©
57
Quantitative Techniques Damelin ©
two-thirds of cell phone owners in the 18 to 29 age bracket sent text messages using
their cell phones, 55% take pictures with their
phones, 47% play games on the phones, and 28% use the Internet through their cell
phones.
Source: Black (2013)
Think point
The study reports that nearly 34% of cell phone owners in the US
use only cellular. What if you randomly choose 20 Americans,
what is the probability that more than 10 of the sample use only
cell phones? The research also reports that 9 out of 10 cell users
come across others using their phones in an annoying way.
Based on this, if you were to randomly choose 25 cell phone
users, what is the probability that fewer than 20 report that they
encounter others using their phones in an annoying way? The
source, Dead Zones states that the average person makes or
receives eight mobile phone calls per day. This averages to about
two mobile phone calls per every six hours. If this may be true,
what is the probability that a mobile phone user receives no calls
in a six-hour period? What is the probability that a cell phone user
receives five or more calls in a six-hour period?
5.5.1 Introduction
This unit makes known to us the probability distributions. Probabilities can also be
derived using mathematical functions known as probability distributions. Probability
distributions quantify the uncertain conduct of many random variables in business
practice. Probability distributions can define patterns of outcomes for both discrete as
well as continuous events.
e.g. 0, 1, 2, 3, 4, etc. A random variable is a discrete random variable if the set of all possible
values is
at most a finite or a countably infinite number of possible values. In most statistical situations,
discrete random variables produce values that are nonnegative whole numbers.
The two common discrete probability distributions are:
• The Binomial Probability Distribution
• The Poisson Probability Distribution
• There are only two, mutually exclusive and collectively exhaustive outcomes of the
random variable, success & failure.
• Each outcome has an associated probability:
• Probability of success = p and probability of failure = q o p + q = 1 (always)
• The random variable is observed n times/trials. Each trial generates either a success
or failure.
• Then trials are independent of each other. i.e. p & q are constant.
The Binomial question: What is the probability that r successes will occur in n trials of the
process under study?
59
Quantitative Techniques Damelin ©
Mean =
60
Quantitative Techniques Damelin ©
Example,
If a worker is assembling a product component, the time it takes to accomplish that feat could
be any value within a reasonable range such as 3 minutes 36.4218 seconds or 5 minutes
17.5169 seconds.
A list of measures for which continuous random variables might be generated would include
time, height, weight, and volume.
The following are examples of experiments that could produce continuous random variables:
1. Sampling the volume of liquid nitrogen in a storage tank
2. Determining the time between customer arrivals at a retail outlet
3. Determining the lengths of newly designed automobiles
4. Determining the weight of grain in a grain elevator at different points of time NB* The main
continuous probability random variable is the normal distribution.
61
Quantitative Techniques Damelin ©
that are normally distributed. Many variables in business and industry also are normally
distributed.
Examples
1. The annual cost of household insurance.
2. The cost per square foot of renting warehouse space.
3. Managers' satisfaction with support from ownership on a five-point scale.
In addition, most items produced or filled by machines are normally distributed.
Characteristics:
• It is a smooth bell-shaped curve
• It is symmetrical about the central mean value.
• The tails of the curve are asymptotic.
• The distribution is always described by two parameters, mean & standard deviation
• The total area under the curve will always equal one.
• The probability associated with a particular range of x-values is described by the area
under the curve between the limits of the given x range. ( x1 x x2 )
Finding Probabilities using the normal distribution
Special statistical tables are used to obtain probabilities for a range of values of x.
A z score is the number of standard deviations that a value, x, is above or below the mean. If
the value of x is less than the mean, the z score is negative; if the value of x is more than the
mean, the z score is positive; and if the value of x equals the mean, the associated z score is
zero. This formula allows transformation of the distance of any x value from its mean into
standard deviation units. A standard z score table can be used to find probabilities for any
normal curve problem that has been converted to z scores. The z distribution is a normal
distribution with a mean of 0 and a standard deviation of 1. Any value of x at the mean of a
normal curve is zero standard deviations from the mean. Any value of x that is one standard
deviation above the mean has a z value of 1.
62
Quantitative Techniques Damelin ©
Reading
For an in-depth understanding of probability distribution study the
prescribed textbook T. Wegner Chapter 5, 4th edition p132 - 153
5.5.7 Summary
In this unit, we have explained the three theoretical probability distributions. The binomial and
the Poisson probability distributions are used to find probabilities for discrete numeric random
variables. The normal probability distribution calculates probabilities for continuous numeric
random variables.
The characteristics of each probability distribution were described.
Case study The Cost of Human Resources
What is the human resource cost of hiring and maintaining
employees in a company? Studies conducted by Damelin
Institute, PricewaterhouseCoopers Human Resource Services,
determined that the average cost of hiring an employee is R3,
270, and the average annual human resource expenditure per
employee is R1, 554. The average health benefit payment per
employee is R6, 393, and the average employer cost per
participant is R2, 258. According to a survey conducted by the
South African Society for Training and Development, companies
annually spend an average of R955 per employee on training,
and, on average, an employee receives 32 hours of training
annually. Business researchers have attempted to measure the
cost of employee absenteeism to an organization. A survey
conducted by TTC, Inc., showed that the average annual cost of
unscheduled absenteeism per employee is R660. According to
this survey, 35% of all unscheduled absenteeism is caused by
personal illness.
Think point
The survey conducted by the South African Society for Training
and Development reported that, on average, an employee
receives 32 hours of training per year. Suppose that number of
hours of training is uniformly distributed across all employees
varying from 0 hours to 64 hours. What percentage of employees
receive between 20 and 40 hours of training? What percentage
of employees receive 50 hours or more of training? Studies
conducted by Damelin Institute, PricewaterhouseCoopers
Human Resource Services estimated that, on average, it costs
R3, 270 to hire an employee. Suppose such costs are normally
distributed with a standard deviation of R400. Based on these
figures, what is the probability that a randomly selected hire costs
more than R4, 000? What percentage of employees is hired for
less than $3,000?
63
Quantitative Techniques Damelin ©
5.5.8 Revision
Self-check activity
1. Name two regularly used discrete probability
distributions.
2. Specify whether each of the subsequent random
variables is discrete or continuous:
(a) The mass of cans coming off a production line
(b) The number of employees in a company
(c) The number of households in Gauteng that have solar
heating panels
(d) The distance traveled daily by a courier service truck.
64
Quantitative Techniques Damelin ©
65
Quantitative Techniques Damelin ©
Case study
What Is the Attitude of Maquiladora Workers?
In 1965 Mexico initiated its widely known maquiladora program
that permits corporations from the United States and other
countries to build manufacturing facilities inside the Mexican
border, where the company can import supplies and materials
from outside of Mexico free of duty, assemble or produce
products, and then export the finished items back to the country
of origin. Mexico's establishment of the maquiladora program was
to promote foreign investment and jobs in the poverty-stricken
country and, at the same time, provide a cheaper labor pool to
the participating companies, thereby reducing labor costs so that
companies could more effectively compete on the world market.
After 2006, the Mexican government renamed the Maquila
program the INMEX program.
The maquiladora effort has been quite successful, with more than
3,500 registered companies participating and more than 1.1
million maquiladora workers employed in the program. It has
been estimated that $50 billion has been spent by maquiladora
companies with suppliers. Recently, industry exports were
approaching $65 billion. About 1,600 of the maquiladora plants
are located in the U.S.-Mexico border area, where about 40%
manufacture electronic equipment, materials, and supplies. In
recent years, the maquiladora program has spread to the interior
of Mexico, where maquiladora employment growth has been
nearly 30%. Maquiladora companies also manufacture and
assemble products from the petroleum, metal, transportation, and
medical industries, among others. Whereas most maquiladora
companies in the early year's utilized low-skilled assembly
operations, in more recent years, maquiladoras have been
moving toward sophisticated manufacturing centers. The
maquiladora program now encompasses companies from all over
the world, including Japan, Korea, China, Canada, and European
countries.
What are the Mexican maquiladora workers like? What are their
attitudes toward their jobs and their companies? Are there cultural
gaps between the company and the worker that must be bridged
in order to utilize the human resources more effectively? What
culture- based attitudes and expectations do the maquiladora
laborers bring to the work situation? How does a business
researcher go about surveying workers? Source:
Black (2013)
Think point
If in a study analysts decide to survey maquiladora workers so
as to ascertain the workers' attitudes toward and
expectations of the work environment and the
company. Should the analysts take a census of all
maquiladora workers or just a sample? What could be the
reasons for each?
66
Quantitative Techniques Damelin ©
5.6.1 Introduction
This unit focuses on the different sampling methods and concepts of the sampling distribution.
Both methods and the concepts are essential in inferential statistics. Sampling affects the
validity of inferential findings in a great way, while sampling distribution determines the
confidence a manager could have any inferences made.
Why we do Sampling?
There are several advantages to taking a sample instead of conducting census advantages.
1. Sample saves money.
2. Sample saves time.
3. The sample can broaden the scope of the study for given resources.
4., the sample can save the product in the case where the research process is destructive.
5. The sample is the only option when if accessing the population is impossible,
For a given number of questions from a survey or a given set of measurements obtained in a
study, taking a sample versus a census can result in a savings of both money and time.
Sampling can produce results quicker considering an example, that when an 8-minute
telephone interview is conducted as part of a survey. Conducting such interviews with a sample
of 100 customers is very less expensive and time-consuming than taking a census of 100,000
customers. If obtaining the outcomes of a study is a matter of urgency. Sampling has an
advantage over a census in terms of turnaround time. If resources allocated to a research are
fixed, more detailed information can be collected by taking a sample than by conducting a
census.
67
Quantitative Techniques Damelin ©
1. to eliminate the possibility that by chance a randomly selected sample may not be
representative of the population
2. for the safety of the consumer.
A sample should be a representation of all the members of the target population for the results
to be valid
68
Quantitative Techniques Damelin ©
Convenience Sampling
With convenience sampling, members of the sample are chosen on the basis of convenience
of the analyst. The analyst typically selects elements that are readily available, nearby, or
willing to participate. Sample tends to be less variable than the population because in many
instances the extreme elements of the population may not be readily available. The analysts
would choose more members from the middle of the population. For instance, a convenience
sample of homes for door- to-door interviews might include houses where there are people
at home, or may house with no dogs, and near the street, or could be first-floor apartments
only, and maybe where there are friendly people only. While if its random sample it would
require the analyst to collect data only from houses and apartments that have been selected
randomly, regardless of how inconvenient or unfriendly the house or apartment may be. Black
(2013).
Judgment Sampling
With judgment sampling, it occurs when members chosen for the sample are selected by the
judgment of the analyst. In some cases, analyst often believe they can obtain a representative
sample by using sound judgment, which could probably result in saving time and money.
Sometimes ethical, a professional analyst might, believe they can choose a more
representative sample than the random process would provide. That could be right in some
cases but not always the case Black (2013).
Quota Sampling
With quota sampling.it involves the constructing of quotas of sampling units to interview from
certain subgroups of the population. When the quota for any of the subgroup is met, no more
sampling units would be selected from that subgroup for an interview. This potentially
introduces selection bias to the sampling process. For instance, an analyst may set a quota
to interview 40 males and 70 females from the 25- to 40-year age group on savings practices.
When the quota of interviews for anyone subgroup is reached (either males or females), no
more eligible sampling units from that same subgroup are chosen for interview purposes. The
main characteristic of quota sampling is the non- random selection of sampling units to fulfill
the quota limits Black (2013).
Snowball Sampling
It is a non-random sampling method in which survey elements are chosen based on the
referral from other survey respondents. The analyst would identify a person who fits the profile
of subjects wanted for the study underway. The analyst would ask this person for the names
and whereabouts of others who would also fit the profile of elements wanted for the study.
69
Quantitative Techniques Damelin ©
Via the referrals, survey elements can be identified efficiently and cheaply, which would
particularly be useful when survey elements are difficult to locate. This main advantage of
snowball sampling; its main disadvantage will be that it is non-random. Snowball sampling is
used in reaching target populations where the sampling units could be difficult to identify.
Each identified subject is asked to identify other sampling units who belong in the same target
population Black (2013).
income distance and weight the sample mean is often the statistic of choice. While the mean
is computed by averaging a set of values, the sample proportion is computed by dividing the
frequency with which a given characteristic occurs in a sample by the number of items in the
sample. For instance, in a sample of 100 factory workers, 30 workers might belong to Cosatu.
The value of this characteristic, Cosatu membership, is 30/100 = .30. In a sample of 500
businesses in Eastrand malls, if 10 are shoe stores, then the sample proportion of shoe stores
is 10/500 = 0.2. Standard deviation of the sample proportion is a measure of sampling error
called standard error of sample proportions and is calculated by the formula:
70
Quantitative Techniques Damelin ©
(1 − )
=√
Reading
On an in-depth coverage of sampling and sampling distributions
study the prescribed textbook, T. Wegner Chapter 6,4th edition
p160 – 174
5.6.8 Summary
In this unit, we have discussed the building blocks of inferential statistics. Inferential statistics
is important because most statistical data is gathered from a sample instead of from the
population as a whole. The different types of sampling methods, such as non-probability and
probability sampling
methods were reviewed. The concept of sampling distribution was briefly discussed.
5.6.9 Revision
Self-check activity
1. Explain the difference between the following terms:
(i) A population and a sample
(ii) A survey and a census
(iii) Probability sampling methods and non-probability
sampling
methods
(iv) Convenience sampling and snowball sampling
(v) List two advantages and two disadvantages of probability
sampling methods
2. What is the purpose of inferential statistics?
3. Describe advantages of stratified sampling
4. Discuss two disadvantages of non-probability sampling
5. Explain the term ‘sampling error’
71
Quantitative Techniques Damelin ©
Time It will take you 12 rs to make your way through this unit.
hou
5.7.1 Introduction
Inferential statistics’ role is to use sample evidence to ascertain population parameters. An
important and. reliable procedure to estimate a population measure will be to use the sample
statistic as a reference point and to create intervals of values around it. This would likely cover
the true population parameter with a stated level of confidence and the procedure would be
called confidence interval estimation.
72
Quantitative Techniques Damelin ©
Because a sample mean can be greater than or less than the population mean, z can be
positive or negative. Thus the preceding expression takes the following form:
When this expression is rewritten it would yield the confidence interval formula for estimating
μ with large sample sizes when the population standard deviation is known.
100(1-α) % confidence interval to estimate µ, when σ is known:
± α/2
√
Alpha (α) would be the area under the normal curve in the tails of the distribution outside the
area defined by the confidence interval. Here we use α to locate the z value in constructing the
confidence interval. For instance, if we would want to build a 95% confidence interval, the level
of confidence is 95%, or .95. If 100 such intervals are constructed by taking random samples
from the population, it is likely that 95 of the intervals would include the population mean and
5 would not. As the level of confidence is increased, the interval gets wider, provided the
sample size and standard deviation remain constant.
For 95% confidence, α = .0.5 and α/2 = .0.25. The value of zα/2 or z.025 is found by looking
in the standard normal table under .0.5000 = .4750. This area in the table is associated with a
z value of 1.96. Another way can be used to locate the table z value. Because the distribution
is symmetric and the intervals are equal on each side of the population mean, ½(95%), or
.4750, of the area is on each side of the mean. It would yield a z value of 1.96 for this portion
of the normal curve. Thus the z value for a 95% confidence interval is always 1.96. In other
words, of all the possible values along the horizontal axis of the diagram, 95% of them should
be within a z score of 1.96 from the population mean. Figure 7.1 Z scores for confidence
Intervals in Relation to α
73
Quantitative Techniques Damelin ©
Study group
A survey was taken of South African. Companies that do business
with firms in Nigeria. One of the questions on the survey was:
Approximately how many years has your company been trading
with firms in Nigeria? A random sample of 44 responses to this
question yielded a mean of 10.455 years. Suppose the population
standard deviation for this question is 7.7 years. Using this
information, construct a 90% confidence interval for the mean
number of years that a company has been trading in Nigeria for
the population of South Africa. Companies trading with firms in
Nigeria.
74
Quantitative Techniques Damelin ©
(n ≥ 30). In the business world, however, sample sizes are may be small. While the central
limit theorem applies only when sample size is large, the distribution of sample means is
approximately normal even for small sizes if the population is normally distributed Thus, if it
is known that the population from which the sample is being drawn is normally distributed and
if σ is known, the z formulas presented in this previously can still be used to estimate a
population mean even if the sample size is small (n < 30).
Example
Suppose a South Africa. car rental firm wants to estimate the
average number of kms traveled per day by each of its cars
rented in Capetown. A random sample of 20 cars rented in
Capetown reveals that the sample mean travel distance per day
is 85.5 km, with a population standard deviation of 19.3 km.
Compute a 99% confidence interval to estimate μ.
75
Quantitative Techniques Damelin ©
Here, n = 20, ̅ = 85.5, and σ = 19.3. For a 99% level of confidence, a z value of 2.575 is
obtained. Assume that number of kms traveled per day is normally distributed in the population.
The confidence interval is
The point estimate indicates that the average number of kms traveled per day by a rental car
in Capetown is 85.5 with a margin of error of 11.1 kms. With 99% confidence, we estimate
that the population means is somewhere between 74.4 and 96.6 kms per day.
5.7.7 Confidence Interval for a Single Population Mean (µ) when the Population
Standard Deviation (σ) is unknown
On many occasions estimating the population mean is useful in business research.
76
Quantitative Techniques Damelin ©
Example
1. The manager of human resources in the corporation
may want to estimate the average number of days of work an
employee misses per year because of being sick. If the
corporation has thousands of employees, direct calculation of
a population means such as this may be practically impossible.
Instead, a random sample of employees can be taken, and the
sample means a number of sick days can be used to estimate
the population mean.
2. Suppose another company developed a new process
for elongating the shelf life of a loaf of bread. The company
wants to be able to date each loaf for freshness, but company
officials do not know exactly how long the bread will stay fresh.
By taking a random sample and determining the sample mean
shelf life, they can estimate the average shelf life for the
population of bread.
This formula is essentially the same as the z formula, but the distribution table values are
different.
Confidence interval to estimate : Population Standard deviation unknown and the population
normally distributed,
± , −1
√
77
Quantitative Techniques Damelin ©
Example
In the aerospace industry, some companies allow their employees to accumulate extra
working hours beyond their 40-hour week. These extra hours sometimes are referred to as
green time or comp time. Many managers work longer than the eight-hour workday
preparing proposals, overseeing crucial tasks, and taking care of paperwork. Recognition of
such overtime is important. Most managers are usually not paid extra for this work, but a
record is kept of this time and occasionally the manager is allowed to use some of this comp
time as extra leave or vacation time. Suppose a researcher wants to estimate the average
amount of comp time accumulated per week for managers in the aerospace industry. He
randomly samples 18 managers and measures the amount of extra time they work during a
specific week and obtains the results shown (in hours).
Here constructs a 90% confidence interval to estimate the average amount of extra time
per week worked by a manager in the aerospace industry. He assumes that comp time is
normally distributed in the population.
The sample size is 18, so df = 17. A 90% level of confidence results in α/2 = .05 area in each
tail. The table t value is
t.05, 17 =1,740
The subscripts in the t value denote to other researchers the area in the right tail of the t
distribution (for confidence intervals α/2) and the number of degrees of freedom. The sample
mean is 13.56 hours, and the sample standard deviation is 7.80 hours. The confidence
interval is computed from this information as
The point estimate for this problem is 13.56 hours, with a margin of error of ±3.20 hours. The
researcher is 90% confident that the average amount of comp time accumulated by a manager
per week in this industry is between 10.36 and 16.76 hours. From these figures, aerospace
managers could attempt to build a reward system for such extra work or evaluate the regular
40-hour week to determine how to use the normal work hours more effectively and thus reduce
comp time.
78
Quantitative Techniques Damelin ©
Example
What proportion of the market does our company control (market
share)? What proportion of our products is defective? What
proportion of customers will call customer service with
complaints? What proportion of our customers is in the 20-to-30
age group? What proportion of our workers speaks Xhosa as a
first language? Techniques similar to those previously discussed
can be used to estimate the population proportion.
The central limit theorem for sample proportions leads to the following formula
ρ(1 − ρ)
≈√
n
Confidence Interval to estimate p:
.
- √
79
Quantitative Techniques Damelin ©
Think point
What may be some of the reasons a lower percentage of
young adults drink coffee than do older adults? Of the young
adults who do drink coffee, consumption is higher than for
other groups. Why might this be? The percentages of consumers who drink coffee
black vary by region in South Africa. How might that affect the marketing of various
coffee cream products?
80
Quantitative Techniques Damelin ©
Example
A study of 87 randomly selected companies with a telemarketing
operation revealed that 39% of the sampled companies used
telemarketing to assist them in order processing. Using this
information, how could An analyst estimate the population
proportion of telemarketing companies that use their
telemarketing operation to assist them in order processing?
For n = 87 and p = .39, a 95% confidence interval can be
computed to determine the interval estimation of p. The z value
for 95% confidence is 1.96.
This interval suggests that the population proportion of telemarketing firms that use their
operation to assist order processing is somewhere between .29 and .49, based on the point
estimate of .39 with a margin of error of ±.10. This result has a 95% level of confidence
Reading
An extensive study on confidence interval can be done from the
prescribed textbook T. Wegner Chapter 7, p176- 190
5.7.9 Summary
In this unit, we have clarified on the estimation of population parameters using confidence
intervals.
Two population parameters – the population mean and the population proportion was
estimated. In cases where the population standard deviation is known (sample size is large),
the z-distribution is used. In cases where the population standard deviation is unknown
(sample size is small), the t- distribution is used
81
Quantitative Techniques Damelin ©
Case study
Batteries: How Long Do They Last?
What is the average life of a battery? As you can imagine,
the answer depends on the type of battery, what it is used
for, and how it is used. Car batteries are different from iPhone
batteries, which are different from flashlight batteries.
Nevertheless, a few statistics are available on the life of
batteries. For example, according to Apple, an iPhone 4 has
up to 300 hours of life if kept on standby, without making
any calls or running any applications. In a PC World study,
several smartphones were tested to determine battery
life. In the tests, a video file was played at maximum
brightness looped over and over with the audio set played
very loud. As a result, the average battery life without recharge
for the brand with the top battery life was 7.37 hours, with
other brands varying from 4.23 hours to 6.55 hours.
According to another study, the average battery life of a tablet
computer is 8.78 hours. Overall, a laptop computer battery
should last through around 1,000 charges. Source: Black
(2013)
Think point
In the study mentioned above, the average battery life of a
tablet computer is given as 8.78 hrs. The figure came
through testing the sample of tablet computers. If 8.78 hrs
average was founded through sampling 60 tablet computers.
Suppose another second 60 tablet computers sample is
chosen would the average battery life also be 8.78 hours?
How can this number be utilized to also approximate the
average battery life of all the tablet computers? 7.2 Point
Estimation
82
Quantitative Techniques Damelin ©
5.7.10 Revision
Self-check activity
83
Quantitative Techniques Damelin ©
Time It will take you 12 rs to make your way through this unit.
hou
p-value
• is another way to reach a statistical conclusion
approach
in hypothesis testing problems
5.8.1 Introduction
In this unit, we focus on another approach of inferential statistics, where a claim made about
the true value of a population parameter is assessed for validity. The hypothesis testing is a
statistical process to test the validity of such claims using sample evidence.
84
Quantitative Techniques Damelin ©
Step4: Comparing the sample test statistic to the rejection criteria and make a
conclusion.
Step 5: Make statistical and management conclusions
85
Quantitative Techniques Damelin ©
Most of the times when a business analyst is gathering data to test hypotheses about a single
population mean, the value of the population standard deviation is unknown and the analyst
must use the sample standard deviation as an estimate of it. In such cases, the z test cannot
be used. In the previous study unit, the t distribution was presented which can be used to
analyse hypotheses about a single population mean when σ is unknown if the population is
normally distributed for the measurement being studied. In this part of the unit, the t-test is
discussed for a single population mean. More often, the t-test is applicable whenever the
analyst is drawing a single random sample to test the value of a population mean (μ), the
population standard deviation is unknown, and the population is normally distributed for the
measurement of interest.
Example
1. Suppose a company held a 36%, or .36, share of the
market for several years. Resulting from a massive marketing
effort and improved product quality, company officials believe
that the market share increased, and they want to prove it.
2. A market researcher analyst wishes to test to determine
whether the proportion of old car purchasers who are female
has increased.
3. A financial analyst wants to test to determine whether
the proportion of companies that were profitable last year in
the average investment officer's portfolio is 0.50.
4. A quality assurance manager for a large manufacturing
firm wishes to test to determine whether the proportion of
defective items in a batch is less than 0.04.
86
Quantitative Techniques Damelin ©
The formula below makes possible the testing of hypotheses about the population proportion
in a manner similar to that of the formula used to test sample means
−
− =
(1 − )
√
Example
Suppose the p-value of a test is 0.038, the null hypothesis cannot
be rejected at α = 0.01 because 0.038 is the smallest value of
alpha for which the null hypothesis can be rejected and is larger
than 0.01. However, the null hypothesis can be rejected for α =
0.05 because the p-value = 0.038 is smaller than α = 0.05.
87
Quantitative Techniques Damelin ©
Reading
For a comprehensive study on the topic hypothesis testing,
single populations (means and proportions read prescribed
textbook T. Wegner Chapter 8, p199- 225.
5.8.7 Summary
In this unit, we have explained the concept of hypothesis testing as a process by which claims
or assertions that are made about population parameter central location values can be tested
statistically. The five steps of hypothesis testing were identified and explained. The process
of
identifying the correct nature of the hypothesis test – whether the test is two-tailed, one-sided
lower tailed or one-sided upper-tailed was emphasized.
88
Quantitative Techniques Damelin ©
Case study
Think point
The statistics presented here, show that 80% of South African.
Commuters drive alone to work. Why is this so? What are some
reasons why South Africans drive to work alone? Is it a good
thing that South Africans drive to work alone? What are some
reasons why South Africans might not want to drive to work
alone? The mean commute time for a private vehicle in South
Africa is about 20 minutes. Can you think of some reasons why
South Africans might want to reduce this figure? What are some
ways that this figure might be reduced?
89
Quantitative Techniques Damelin ©
Think point
The defining attribute of Chevron Company brings to mind the
fact that there are many possible research hypotheses that may
be generated in the oil-refining business. For instance, if it was
determined that oil refineries have been running below capacity
for the last 2 months. What are at least four research hypotheses
that could explain this? According to the National Association of
Convenience Stores, 62% of all convenience stores in South
Africa are owned and operated by someone who has only one
store. Supposedly you live in a city that seems to have fewer
entrepreneurs and more corporate influence and you believe that
in your city, significantly fewer than 62% of all convenience stores
are owned and operated by someone who has only one store.
You are determined to “prove” your theory by taking some
random sample of 328 convenience stores; it turns out that 187
of them are owned and operated by someone who has only one
store. Would this be enough evidence to conclude that in your
city, a
significantly lower percentage of these convenience stores are
owned and operated by someone who has only one store?
90
Quantitative Techniques Damelin ©
5.8.8 Revision
Self-check activity
1. What is meant by the term ‘hypothesis testing’?
2. What determines whether a claim about a population
parameter value is accepted as probably true or rejected as
probably false?
3. Name the five steps of hypothesis testing.
4. What information is required to determine the critical
limits for the region of acceptance of a null hypothesis?
5. If −1.96 ≤ z ≤ 1.96 defines the limits for the region of
acceptance of a two-tailed hypothesis test and z-stat = 2.44, what
statistical conclusion can be drawn from these findings?
91
Quantitative Techniques Damelin ©
Important
terms and
definitions
Goodness
of Fit Test
2.1.1. Introduction
In this unit, we discuss the Chi-Squared statistic ( 2) which is a statistical measure used to
test hypotheses on patterns of outcomes of a random variable in a population. The patterns
of outcomes are based on frequency counts of categorical random variables. It can be used
on the following three scenarios in inferential statistics:
Test for independence of association
Test for equality of proportions in 2 or more populations
Goodness of fit tests
92
Quantitative Techniques Damelin ©
Step 1
Define the Null and alternative hypotheses
i.e. There is no association between the two variables
There is association between the two variables
Step 2
Computing the sample statistic
Where = observed frequency = expected frequency
Step 3
Decision Rule (Value from table)
If number of rows = 2, number of columns = 3, level of significance = 10%.
Degrees of freedom = (2-1) (3-1) = 1x2 = 2
Step 4
Comparison
>: Reject Null Hypothesis
< : Accept Null Hypothesis
Step 5
Conclusion
The conclusion is based on results of step 4.
5.9.2 Hypothesis Test for Equality of Several Proportions
Step 4
Comparison
> : Reject Null Hypothesis
< : Accept Null Hypothesis
Step 5
Conclusion
93
Quantitative Techniques Damelin ©
Step 2
Compute the sample statistic ( )
Where = observed frequency, = expected frequency
Step 3
Decision Rule (Value from table)
Degrees of freedom = k-m-1, where k= number of classes, m= number of population
parameters to estimate from the sample data. E.g. If Level of significance = 5%, k = 4, m =
1, d.f = 4-1-1 = 2 Step 4
Comparison
> : Reject Null Hypothesis
< : Accept Null Hypothesis
Step 5
Conclusion
The conclusion is based on results of step 4.
Reading
For in-depth coverage of the topic on Chi-squared and
hypothesis tests refer to the prescribed textbook T. Wegner
Chapter 10, 4th edition p272 – 273
Case study
Students at Damelin Engineering Department studied which
vehicles come to a complete stop at an intersection with four-way
stop signs, selecting at random the cars to observe. They looked
at several factors to see which (if any) were associated with
coming to a complete stop. (They defined a complete stop as “the
speed of the vehicle will become zero at least for an [instant]”).
Some of these variables included the age of the driver, how many
passengers were in the vehicle, and type of vehicle. The variable
94
Quantitative Techniques Damelin ©
Think point
Could there an association between the arrival position of the vehicle
and whether or not it comes to a complete stop? Is a dependency
between the arrival position of the vehicle and whether or not it comes
to a complete stop?
5.9.4 Summary
In this unit, we have explained the chi-squared test statistic. It is used to test whether the
observed pattern of outcomes of a random variable follows a specific hypothesized pattern.
The chi-square test for equality of proportions was explained. The chi-square test for
goodness of fit also was explained. The chi-square test for independence of association
was also explained.
5.9.5 Revision
Self-check activity
1. What is the purpose of a chi-squared test for
independence of association?
2. What type of data (categorical or numerical) is
appropriate for a chi-squared test for independence of
association?
3. What does the null hypothesis say in a test for
independence of association?
4. What is the role of the expected frequencies in chi-
squared tests?
5. In three rows by four columns cross-tabulation table of a
chi- squared test for independence of association, what is the
critical chi-squared value (χ2-crit) in a hypothesis test conducted
(a) At the 5% level of significance?
(b) At the 10% level of significance?
95
Quantitative Techniques Damelin ©
5.10 Study Unit 10: Simple Linear Regression and Correlation Analysis
Purpose The purpose of this unit is to explain the technique used to quantify
the relationship between variables and also identify the strength of
the relationship as well as pointing out the significant variable in the
prediction
Learning By the end of this unit, you will be able to:
Outcomes
• Explain the meaning of regression analysis
• Identify practical examples where regression analysis can be
used construct a simple linear regression model
• Use the regression line for prediction purposes
• Calculate and interpret the correlation coefficient
• Calculate and interpret the coefficient of determination
• Conduct a hypothesis test on the regression model to test for
significance.
Time It will take you 9 to make your way through this unit.
hours
2.1.1. Introduction
Business decisions most often are made through predicting the unknown values of numeric
variables using other numeric variables that may be related to it and for which values could
be known. A statistical method that quantifies the relationship between a single response
variable and one or more predictor variables is called regression analysis. This relationship,
which is referred to as a statistical model, is used for prediction purposes. Correlation analysis,
on the other hand, determines the strength of the relationships and determines which variables
are useful in predicting the response variable.
96
Quantitative Techniques Damelin ©
97
Quantitative Techniques Damelin ©
98
Quantitative Techniques Damelin ©
0 =
∑ − 1∑
99
Quantitative Techniques Damelin ©
100
Quantitative Techniques Damelin ©
Example
1. Do the stocks of two airlines rise and fall in any related
manner. For a sample of pairs of data, correlation analysis can
yield a numerical value that represents the degree of
relatedness of the two stock prices over time.
2. In the transportation industry, is a correlation evident
between the price of transportation and the weight of the
object being shipped? If so, how strong are the correlations?
3. In economics, how strong is the correlation between the
producer price index and the unemployment rate?
4. In retail sales, are sales related to population density,
number of competitors, size of the store, amount of
advertising, or other variables?
Several measures of correlation are available, the selection of which depends mostly on the
level of data being analyzed. Ideally, analysts would like to solve for ρ, the population
coefficient of correlation. However, because analysts virtually always deal with sample data,
this unit introduces a widely used sample coefficient of correlation, r
The term r is a measure of the linear correlation of two variables, it ranges between −1 and
+1, representing the strength of the relationship between the variables. For r-value of +1
denotes a perfect positive relationship between two variables. For r-value of −1 denotes a
perfect negative correlation, indicating an inverse relationship between two variables: as one
variable gets increases, the other decreases. For r-value of 0 means, no linear relationship is
present between the two variables. It measures the strength of the linear association between
X and Y. Pearson’s Correlation Coefficient (r) measures the correlation between two ratio-
scaled random variables.
Formula
∑( − )̅ ( − )̅
=
√∑( − )̅ 2 ∑( − )̅ 2
5.10.4 The Coefficient of Determination (r2)
Suppose the sample correlation coefficient, r, is squared (r2), the resulting measure is called
the coefficient of determination. The coefficient of determination measures the proportion (or
percentage) of variation in the dependent variable, y, that is explained by the independent
variable,
x. The coefficient of determination values ranges between 0 and 1 (or 0% and 100%). 0 ≤ r2≤
1 or 0%
101
Quantitative Techniques Damelin ©
‘Does the statistical relationship between the variables x and y, as given by the regression
model, a true relationship or it is just purely by chance?’ Since the regression equation is
obtained from sample data, it is possible that there might be no genuine relationship between
the x and y variables in the population and that any observed relationship from the sample is
purely by chance. A hypothesis test can be carried to find out whether the sample-based
regression relationship is genuine or not. To test the regression model for significance, the
sample correlation coefficient, r, is tested against its population correlation coefficient, ρ,
which is hypothesised to be 0. In the test, the null hypothesis states that there is no
relationship between x and y in the population. See prescribed textbook pages 340to 342 for
a comprehensive example testing for model significance.
102
Quantitative Techniques Damelin ©
103
Quantitative Techniques Damelin ©
5.10.6 Summary
In this unit, we have described the simple linear regression as a procedure that builds a
straight-line relationship between a single independent variable, x, and a dependent variable,
y. The resolve of the regression equation is to estimate y-values from known, or assumed, x-
values by substituting the x- value into a regression equation. The method of least squares is
used to find the best-fit equation. The coefficients of the regression equations, b0, and b1, are
weights that measure the significance of each of the independent variables in estimating the
y-variable. Two descriptive statistical measures were identified to indicate the usefulness of
the regression equation to estimate y-values. They are the coefficient of correlation, r, and the
coefficient of determination, r2.
Reading
For comprehensive coverage of this topic read prescribed
textbook pages 328-345 from Wegner T, (2016) 4thEdition
5.10.7 Revision
Self-check activity
1. What is regression analysis? What is correlation
analysis?
2. What name is given to the variable that is being estimated
in a regression equation?
3. What is the purpose of an independent variable in
regression analysis?
4. What is the name of the graph that is used to display the
relationship between the dependent variable and the independent
variable?
5. What is the name given to the method used to find the
regression coefficients?
6. Explain the strength and direction of the association
between two variables, x, and y that have a correlation coefficient
of −0.78.
5.11 Study Unit 10: Time Series Analysis: A Forecasting Tool
Purpose The purpose of this unit is to explain how to treat time series data and
how to prepare forecasts of future levels of activities
104
Quantitative Techniques Damelin ©
Time It will take you 3 to make your way through this unit.
hours
2.1.1. Introduction
Data collected on a given phenomenon over a period of time at systematic intervals is known
as time- series data. Time-series forecasting methods endeavors to account for changes over
time by studying patterns, trends, or cycle, or making use of information about previous time
periods to predict the outcome for a future time period. Time-series methods include naïve
methods, averaging, smoothing, regression trend analysis, and the decomposition of the
possible time-series factors. Most data used in the statistical analysis is known as cross-
sectional data, meaning that it is gathered from sample surveys at one point in time.
Conversely, data can also be collected over time. For instance, when a business collects its
daily, weekly or monthly gross revenue; or when a household records their daily or monthly
electricity usage, they are gathering a time series of data.
The general conviction is that time-series data is comprised of four components: trend, cycles,
seasonal effects, and irregular fluctuations. Not all time-series data have all these features.
105
Quantitative Techniques Damelin ©
106
Quantitative Techniques Damelin ©
107
Quantitative Techniques Damelin ©
T =trend
C =cycles
S =seasonal effects
I =irregular fluctuations
In this section of the unit, we examine statistical approaches to quantify trend and seasonal
variations only. These two components account for the most significant proportion of an
actual value in a time series. By isolating them, most of an actual time series value will be
explained.
The moving average technique is an average that is updated or recomputed for every new
time period being considered. The most recent information is utilized in each new moving
average. This advantage is offset by the disadvantages that:
1. it is difficult to choose the optimal length of time for which to compute the moving
average, and
2. moving averages do not usually adjust for such time-series effects as trend, cycles, or
seasonality.
To determine the more optimal lengths for which to compute the moving averages, we would
need to forecast with several different average lengths and compare the errors produced by
them.
5.11.4 Seasonal Analysis
Seasonal effects are patterns of data trait that ensue in periods of time of less than one year.
How can we separate out seasonal effects? The ratio-to-moving-average method is used to
measure and quantify these seasonal effects. This method asserts the seasonal influence as
an index number. It measures the percentage digression of the actual values of the time
series, y, from a base value that disregards the short-term seasonal effects. These base
values of a time series represent the trend/cyclical impacts only.
108
Quantitative Techniques Damelin ©
Months Shipments
January 1056
February 1314
March 1381
Source: Black (2011)
April 1191 Reading
For comprehensive coverage of this
May 1259 topic read prescribed Chapter 15
textbook pages 410-429 from
June 1361 Wegner T, (2016) 4thEdition
July 1110
August 1334
5.11.6 Summary
This unit September 1416 deliberated on the tactic to
investigate time series data as
October 1282 opposed to cross-sectional data.
The November 1341 exploration and harnessing of time
series data are used for short- to medium-
term December 1382 forecasting. This unit identified and
defined the nature of each of four possible
effects on the values of a time series, y. These forces were
identified as a trend, cyclical, seasonal and irregular influences. Time series analysis is used
to decompose a time series into these four constituent components, using a multiplicative
model. Trend analysis can be performed using either the method of moving averages or by
fitting a straight line using the method of least squares from the regression analysis. The
seasonal
component is described by finding seasonal indexes using the ratio-to-moving average
approach. The interpretation of these seasonal indexes was also given in the unit.
109
Quantitative Techniques Damelin ©
5.11.7 Revision
110
Quantitative Techniques Damelin ©
Think point
From the data gathered given could it be possible to forecast the
emissions of nitrogen oxides or carbon monoxide for the years
2013, 2017, or even 2025? What could be the possible
techniques best suited to forecast the emissions of nitrogen
oxides or carbon monoxide for future years from the data given?
111
Quantitative Techniques Damelin ©
6. REFERENCES
Bergquist, T., Jones, S. and Freed, N (2013) Understanding Business Statistics. John Wiley
& Sons
Black K (2013). Business Statistics: For Contemporary Decision Making, 7th Edition.
Wegner, T, (2016) Applied Business Statistics: Methods and Excel-based Applications, 4th
ed. Juta.
Cape Town South Africa
112
Quantitative Techniques Damelin ©
113