Business Statistics Bsa Day Program All Chapters

Chapter One: An Introduction to Business
Statistics
• Definition of business statistics
• Basic Vocabulary Terms
• Limitation of Statistics
• Uses of Statistics
• Type of data
• Populations and Samples
* Prepared by Silas BAHIZI, PhD Candidate 2

Definition of Statistics
Statistics is the science which deals with
the collection, organization,
presentation, analysis and interpretation
of numerical data, as well as drawing
valid conclusions and making reasonable
decision on the basis of such analysis.

Uses of Statistics
• To present the data in a concise and definite
form:
Statistics helps in classifying and tabulating raw
data for processing and further tabulation for
end users.
• To make it easy to understand complex and
large data:
This is done by presenting the data in the form
of tables, graphs, diagrams etc., or by
condensing the data with the help of means,
dispersion etc.
• For comparison:
• In forming policies:
It helps in forming policies like a production
schedule, based on the relevant sales figures. In
this case, it can be used in forecasting future
demands.
• Enlarging individual experiences:
Complex problems can be well understood by
statistics, as the conclusions drawn by an
individual are more definite and precise than
simple statements on facts.
• In measuring the magnitude of a phenomenon:
Statistics has made it possible to count the
Limitations of Statistics
• Statistics does not deal with individual
measurements.
Since statistics deals with aggregates of facts, it cannot
be used to study the changes that have taken place
in individual cases.
For example, the wages earned by a single industry
worker at any time, taken by it, is not a statistical
datum. But the wages of workers of that industry can
be used statistically. Similarly the marks obtained by
Kalisa of your class or the height of Frank (also of
your class) are not the subject matter of statistical
study. But the average marks or the average height of
*
your class has statistical relevance.
Prepared by Silas BAHIZI, PhD Candidate 6
• Statistics cannot be used to study qualitative
phenomenon like morality, intelligence, beauty etc.
as these cannot be quantified. However, it may be
possible to analyze such problems statistically by
expressing them numerically. For example we may
study the intelligence of boys on the basis of the
marks obtained by them in an examination.
• Statistical results are true only on an average.
The conclusions obtained statistically are not universal
truths. They are true only under certain conditions.
This is because statistics as a science is less exact as
compared to the natural science.

• Statistical data, being approximations,
are mathematically incorrect. Therefore,
they can be used only if mathematical
accuracy is not needed.
Statistics, being dependent on figures, can
be manipulated and therefore can be
used only when the authenticity of the
figures has been proved beyond doubt.
*
Types of Statistics
Statistics may be divided into two categories, i.e.
Descriptive and Inferential statistics.
When analyzing data, for example, the marks
achieved by 100 students for a piece of
coursework, it is possible to use both
descriptive and inferential statistics in your
analysis of their marks. Typically, in most
research conducted on groups of people, you
will use both descriptive and inferential
statistics to analyze your results and draw
* conclusions. Prepared by Silas BAHIZI, PhD Candidate 9
Descriptive Statistics
Descriptive statistics is the term given to the
analysis of data that helps describe, show or
summarize data in a meaningful way such
that, for example, patterns might emerge from
the data. Descriptive statistics do not,
however, allow us to make conclusions
beyond the data we have analyzed or reach
conclusions regarding any hypotheses we
might have made.

Inferential Statistics
While descriptive statistics examine our
immediate group of data (for example, the
100 students' marks), inferential statistics aim
to make inferences from this data in order to
make conclusions that go beyond this data. In
other words, inferential statistics are used to
make inferences about a population from a
sample in order to generalize (make
assumptions about this wider population)
and / or make predictions about the future.

For example, a Board of Examiners may want to
compare the performance of 1000 students
that completed an examination. Of these, 500
students are girls and 500 students are boys.
The 1000 students represent our
"population". Whilst we are interested in the
performance of all 1000 students, girls and
boys, it may be impractical to examine the
marks of all of these students because of the
time and cost required to collect all of their
marks.

Instead, we can choose to examine a "sample"
of these students and then use the results to
make generalizations about the performance
of all 1000 students. For the purpose of our
example, we may choose a sample size of 200
students. Since we are looking to compare
boys and girls, we may randomly select 100
girls and 100 boys in our sample. We could
then use this, for example, to see if there are
any statistically significant differences in the
mean mark between boys and girls, even
though we have not measured all 1000
*
students. Prepared by Silas BAHIZI, PhD Candidate 13
Quantitative and Qualitative Data
• A qualitative variable is a variable with qualitative

data
• A quantitative variable is a variable with

quantitative data.

Types of Quantitative Data
• Discrete or ungrouped data

• Continuous or grouped data

• The elements are the entities on
which data are collected.
• Data: Any recorded event (e.g. times
to assemble a product)
• The set of measurements collected for
a particular element is called an
observation.
• A variable is a characteristic of interest
for the elements.

Short
Exercise
In the following example, determine which
variables are qualitative and which are
quantitative:
1.Number of Districts of Rwanda
2.A man
3.College of Education
4.A car
5.The number of OAM Level three students

Census, Populations and Samples
• The population is the set of all elements of interest in a
particular study.
• A sample is a subset of the population.

Census
A census is the procedure of systematically
acquiring and recording information about the
members of a given population. It is a
regularly occurring and official count of a
particular population. The term is used mostly
in connection with national population and
housing censuses; other common censuses
include agriculture, business, etc.

Populations and Samples
Population
Sample

Descriptive Statistics and Statistical
Inference
Descriptive Statistics is tabular, graphical, and

numerical methods used to summarize data.

Example
Descriptive Statistics
Graphical Summary (Histogram)

1
8
1
6
1
4
1
Frequenc
2
1
0
8
y
6
4
2
Parts
50 60 70 80 90 100 110 Cost
($)
Normal Distribution and Standard
Deviation

Frequency distribution for quantitative - Discrete
(Ungrouped)data
Example: The data below are the marks
obtained by 24 students out of 100:
45,46,50,54,75,50,75,54,46,75,50,54,50,54,54,
50,54,54,54,54,54,20,30,40
Score(X) 20 30 40 45 46 50 54 75
Frequency (f) 1 1 1 1 2 5 10 3

Frequency distribution for Quantitative -
Continuous (grouped)data
Example: The table below shows the expenditure on
examiners (in thousand of Rwandan Francs) during
the 2003 Ordinary Level marking exercise:
10,11,10,12,14,16,20,25,21,22,13,17,18,24,30,32,27,35
,40,44,39,50,54,53,44,37,36,39,52,51,57,15,16,19,34
,34,43,26,38,53,
40.
The following is the frequency distribution with classes:
10-14;15-19;20-24;……55-59.

Thus, these classes should be presented as follows:
Assume that those classes are corresponding to the
frequencies shown in the table below:
Class 10- 15-19 20-24 25-29 30-34 35-39 40-44 45-49 50-54 55-59
14
Frequen 6 6 4 3 4 6 5 0 6 1
cy (f)

Note:
❖The class 55-59, for instance , means: 54.5
less or equal to Marks less or equal to 59.5
where : 54.5 is Lower Class Boundary(l.c.b)
and 59.5 is Upper Class Boundary (u.c.b)
❖For the class 55-59,
55 is lower class limit (l.c.l) while 59 is upper
class limit(u.c.l)Hence every class : l.c.b = l.c.l –
0.5 and u.c.b = u.c.l + 0.5
❖ The class size/length/width (c) is given by: c=
u.c.b- l.c.b or c= (u.c.l – l.c.l) +1
Therefore ,for the above example, class width is
How to calculate the measures of central
tendency for quantitative data?
As discussed above , quantitative data
are classified in two categories:
❖ Quantitative – discrete data or
ungrouped data
❖Quantitative – continuous data or
grouped data.

Mean (Arithmetic mean)
The mean is the most important measure of location.
It is at times referred to as the average value of the
variable.
It is obtained by adding all the data values and dividing
them
by the number of items.
Example: find out the mean of the data given below:
10,2,5,7,12,9,6,3,5,4,5,0,2,1,0,3,2,4,7,5,3,6,6,10

The Median
The Median is the middle value in your list.
When the totals of the list are odd, the
median is the middle entry in the list after
sorting the list into increasing order. When the
totals of the list are even, the median is equal
to the sum of the two middle (after sorting the
list into increasing order) numbers divided by
two. Thus, remember to line up your values,
the middle number is the median. And also be
sure to remember the odd and even rule.
Examples:
1. Find the Median of:
9, 3, 44, 17, 15
(Odd amount of numbers)
Line up your numbers: 3, 9, 15, 17, 44 (smallest to
largest)
The Median is: 15 (The number in the middle)
2. Find the Median of:
8, 3, 44, 17, 12, 6
(Even amount of numbers)
Line up your numbers: 3, 6, 8, 12, 17, 44
*
Add the 2 middles numbers and divide by 2: 8 12 = 20 ÷ 2 = 10
The Mode
The mode in a list of numbers refers to the list of
numbers that occur most frequently. A trick to
remember this one is to remember that mode
starts with the same first two letters that most
does. Most frequently - Mode.
Examples:
Find the mode of:
9, 3, 3, 44, 17 , 17, 44, 15, 15, 15, 27, 40, 8,
Put the numbers is order for ease:
3, 3, 8, 9, 15, 15, 15, 17, 17, 27, 40, 44, 44,
* The Mode is 15 (15 occurs the most at 3 times)
Mean Absolute Deviation (MAD)
In statistics, the Median Absolute Deviation (MAD) is a
forceful measure of the variability of a univariate
sample of quantitative data. It can also refer to the
population parameter that is estimated by the MAD
calculated from a sample.
For a univariate data set X1, X2, ..., Xn, the MAD is
defined as the median of the absolute deviations
from the data's median:
MAD = ∑f |X-its mean|
∑f

Numerical Descriptive Statistics
The most common numerical descriptive statistic is the average (or mean). For example
Find out the mean of the following scores:
a.1,2,3,4,5,6,7,8,9
b.2,4,6,8,7,9

Organisation of Data
Before data are interpreted , they must be organised so
that they become manageable. There are many of ways
of organizing data but 3 basic are: Tabular, graphic and
pictorial.
Example: Suppose a statistics teacher asks each of his
students to toss coin 4 times and record the number of
heads the experiment may give results:
2;2;0;1;1;1;2;3;1;2;2;3;3;4;1;3;2;3;3;1;2;2;1;2;2
Even with any 25 Observation, it becomes clear that the
data needs a more comprehensive display.

One quick way would be to arrange the data in
ascending order:
0,1;1;1;1;1;1;1;2;2;2;2;2;2;2;2;2;2;3;3;3;3;3;3;4
The array organises the data but does not reduce the
quantity of data. Therefore array is most useful as
quick way of organising few observation but becomes
less useful when dealing with a great number
observations
The above data however can still arranged in such a
way the experiment produces 5 different numerical
values given: 0;1;2;3;4
Frequency Distribution
Frequency distribution showing number of head
obtained in four tosses of coin in an
experiment done 25 times:
Observation (X) 0 1 2 3 4
Frequencies (f) 1 7 10 6 1

The above table is named Frequency
Distribution because it shows how frequencies
are distributed among the outcomes (results)
of a given experiment
Cumulative frequency : it the total of all

frequencies.
From the previous example the cumulative
frequency can be presented as follows:

Cumulative Frequency
Cumulative frequency
Cumulative 1 8 18 24 25
Frequency

Relative frequency
If we take the same example, N equal to 25 That
is to say that 25 is number of all observations,
in the context of relative frequency, number
of all observation should be equal to 1.
Therefore N= 25=1

Relative Frequency
Frequency
Relative Frequency 1/25= 7/25 = 10/25 = 0.4 6/25=0.24 1/25 =
0.04 0.28 0.04

Cumulative Relative Frequency
Frequency
Relative Frequency 0.04 0.28 0.4 0.24 0.04
Cumulative Relative 0.04 0.32 0.72 0.96 1

Frequency

Variance of scores
• The variance of scores in data set is
conceptualized as the arithmetic means or
average of squared deviations of scores in
data set. That is, algebraically suppose the
numerical variable of interest is X and that of
sample of n scores on this variable has been
taken.
• This n Scores can algebraically be denote as

X1,X2,X3………..,Xn.
Variance of Scores
For these n scores, the arithmetic mean or
average, (X) can be computed using the sum
of scores divided by the number of scores .
The different between any given score,X1 and
the arithmetic mean that is X1 –X is termed as
the Deviation of that observation.
Therefore the conceptualization above, The
VARIANCE
of n scores denoted by S2
Formul
Standard deviation (s) of scores
This conceptualization as positive square
root of the variance of scores in the data
set of interest. The standard deviation is
thus used when the data are numerical
variables and by the most frequently
used measures.

Exercises
Given the following Data:
50,84,65,32,69,87,60;43;45;50;75;80;87;43;50;55;6
5;75;39;20;80;20;20;43;50;84;60;50;65;75;80;50;
32;69;87;40;90,65,87,50.
1.Construct:
Cumulative frequency table,
Relative Frequency,
Cumulative relative frequency,
2.The variance
3.Arithmetic mean
*
Prepared by Silas
46
BAHIZI, PhD Candidate
2. Find out:
⮚ The number of Observations,
⮚ The mean
⮚ The variance
⮚ Standard Deviation

MEASURES OF DISPERSION OR VARIATION
The degree of which numerical data tend to

spread about an average value is called the
dispersion or variance of the data.
Various measures of this dispersion(or
variation)are available, the most common being
the range, mean deviation, semi-inter-quartile
range, 10-90 percentile range and Standard
deviation.

The range
The range is difference between the Largest and
Smallest numbers in the set of numbers.
Example 1:
Given set of numbers:
2,3,3,5,5,8,10,12 the range of this set will be the
highest number – lowest number:
12-2 = 10

The Mean Deviation
Average of absolute differences

(differences expressed without plus or
minus sign) between each value in a set
of values, and the average of all values of
that set. For example, the average
(arithmetic mean or mean) of the set of
values 1, 2, 3, 4, and 5 is (15 ÷ 5) or 3.
Mean Deviation
The difference between this average (3) and the
values in the set is 2, 1, 0, -1, and -2; the
absolute difference being 2, 1, 0, 1, and 2. The
average of these numbers (6 ÷ 5) is 1.2 which
is the mean deviation. Also called mean
absolute deviation, it is used as a measure of
dispersion where the number of values or
quantities is small, otherwise standard
deviation is used.
Mean deviation
The mean Deviation or average deviation of a
set of N numbers X1,X2,…….Xn is abbreviated
MD.
In the Mean Deviation , the arithmetic mean

symbol, is the arithmetic mean of the
numbers and the absolute value symbol of the
deviation of X1 Form its mean.
The absolute value of a number, is the number
* without the associated sign and is indicated by
Mean Deviation
Example 2: Find the mean deviation of the set
2,3,6,8,11
Arithmetic mean= 2+3+6+8+11_= 6
5
MD= |2-6|+|3-6|+|6-6|+|8-6|+|11-6|= |-0.4|+ |-3|+ |0|+ |2| +|5| = 4+3+0+2+5 =
2.8
5 5 5
If N X1,X2,X3………..Xn, occur with frequencies, f1,f2,f3……….fn,

respectively, the mean deviation will be found by considering
the frequencies.
Mean Deviation
This form is useful for grouped data, where
Xfs represent class marks and the Fs are
the corresponding class frequencies.
Occasionally the mean deviation is defined
in terms of absolute deviation from the
median or other average instead of from
the mean.

The standard Deviation
The standard deviation of a set on N numbers
X1,X2,….Xn is denoted by S and is defined by:
s ∑(X-Its mean)2
N
S is the root mean square of the deviations
from the mean or as it is sometimes called the
Root mean square deviation.
Standard Deviation
Example 3: Consider a Population consisting of the
following eight values:
2,4,4,4,5,5,7,9
These eight data points have the mean of 5 and the
Standard deviation will be 2.
You are required to show all calculations used to

obtain 5 as Mean and 2 as Standard deviation.

Formulas of Standard Deviation
Standards deviation of Scores:
S= ∑(X-Its mean)2 =
N
The Population Standard Deviation:
Ơ= ∑(X- u)2 =
N
The Sample Standard Deviation:
S= ∑(X-Its mean)2
N-1
The Variance
The variance of a set of data is defined as the square of the
standard deviation and is thus given by S2.
When it is necessary to distinguish the standard deviation of a
population from the standard deviation of a sample drawn
from the population, thus
Standard Deviation Variance
Population δ δ2
Sample S S2

Variance formula
Variance of Scores:
S2 = ∑(X-Its mean)2 =
N
The Population Variance:
Ơ2 = ∑(X- u)2 =
N
The Sample Variance:
S2 = ∑(X-Its mean)2
* N-1 Prepared by Silas BAHIZI, PhD Candidate 59
Variance of Scores
The variance of scores in a data set is
conceptualized as the arithmetic mean or
average of squared deviations of scores in the
data set. That is algebraically suppose the
numerical variable of interest is x, and that of
sample of n scores on this variable has been
taken, These n scores can algebraically be
denoted as X1, X2, ……….Xn.

Variance
For these n scores, the arithmetic mean or average, (x) can be
computed using the sum of scores divided by the number of
scores. The difference between any given score, x1 and the
arithmetic mean, that is, X1 – x, is termed as the deviation of
that observation, (x1).
Therefore the conceptualization above, the “Variance” of the n
scores, denoted by “S2” as given below,
S2 =Sum of squared deviations/ Number of scores
Thus, on substitution, we have (x1-x)2+(x2-x)2+…+(xn-x)2
n

Percentile
A percentile is a measure that locate
values in the data set that are not
necessarily central locations. A
percentile provides information
about how the data items are spread
over the interval from the smallest
value to largest value.
1. Percentile for Ungrouped data
For the data do not have numerous repeated
values, the ith percentile divides the data into
two parts:
❖Approximately i percent of the items have
values less than the ith percentile.

❖Approximately (100 - i) percent of the items
have values greater than the i th percentile.
Step1: Arrange the data in ascending order from
smallest value to largest value
Step2: Compute the position of the percentile
from data set
Position = (i/100)n where i is the percentile of
interest and n is the number of items.
Step3:
❖ if the position is not an integer, round up. The
next integer value greater than this position
denotes the position of the ith percentile.
Working Example
The table below shows the monthly starting
salaries for a sample of 12 business school
graduates:
Graduates Monthly salary
1 2350
2 2450
3 2550
4 2260
5 2255
6 2210
7 2390
8 2630
9 2440
10 2825
11 2420
12 2380
Working Example
Find the
(i) Median
(ii) 50th percentile and 85th percentile
(iii) First, second and third quartiles
Solution
Step 1:Arranging the data in ascending order:
2210,2255,2260,2280,2350,2390,2420,2440,245
0,2550,2630,2825.
Note: n= 12 When n is even number the median
Solution
Step2:
Position of ith percentile = (i/100)n
Hence, position of 50th percentile = (50/100)12 =
6.
Position of 85th percentile = (85/100)12 = 10.2
Step3:
Since the position of the 50th percentile is an
integer (6),
Then the 50th percentile is the average of 6th and
* 7th data values.
Solution
i.e 50th percentile, P50= (2390+2420)/2 = 2405 =
Median
For 85th percentile, since the position (10.2) is
not an integer, we round up. Hence the
position of the 85th percentile is the next
integer greater than10.2, the 11th position.
Therefore, the 85th percentile corresponds to
the 11th data value.
Thus , the 85th percentile ,P85 = 2630
(b) The position of ith quartile (Q1) is (25i/100)n
Solution
Note: All position are integers. Therefore:
Q1= (2260+2280)/2= 2270
Q2= (2390+2420)/2 = 2405
Q3= (2450+2550)/2 =2500

2. Percentile for Grouped
Data
The steps are the same as those for finding the
median
For grouped data the i the percentile is
computed from the following expression:
i-th percentile (Pi) = Lo+ C i(∑ f+1)- C.Fb
100
fpi
Percentile
Where:
Lo = low class boundary of the percentile class
C = Class width
∑f = total frequency = n
C.Fb = Cumulative frequency preceding
(before)that of the percentile class
f pi = frequency of the percentile class
Note: i(∑ f+1) = Position of the ith percentile

Quartiles
It is often desired to divide data into four parts,
with each part containing approximately one –
fourth (1/4) Formula:
For grouped data: the i the quartile is computed
from the following expression:
i-th quartile (Qi) = Lo+ C i(∑ f+1)- C.Fb
4
fQi
For ungrouped data:
Position=( i/100)n = (25/100)n = (1/4)n
The semi interquartile range
The semi interquatile range or quartile deviation
of set of data is denoted Q and is defined by:
Q3 – Q1 = Q
2
Where Q1 and Q3 are the first and the third
quartiles for the data. The interquartile range
Q3 – Q1 is sometimes used but the semi
interquartile range is more common as a
measure of dispersion.
The 10-90 Percentile range
The 10-90 percentile range of set of data is defined by:
10-90 percentile range: P90 – P10
Where P10 and P90 are the 10th and 90th percentiles for
the data . The semi 10-90 percentile range ½(P90-
P10) can also be used but is not commonly
employed.

Example 4
The following table shows the weights of 100
Students measured to the nearest kg:
Weight (kg) Number of Children
10-14 5
15-19 9
20-24 12
25-29 18
30-34 25
35-39 15
40-44 10
45-49 6

Example 4
• Find 30th percentile, First and third quartile,
Mode, Median and 10th percentile.
The computations are arranged
Weight (kg) Class boundaries Midpoint (x)
in the following
Frequency Cumulative
table: frequency
10-14 9.5-14.5 12 5 5
15-19 14.5-19.5 17 9 14
20-24 19.5-24.5 22 12 26
25-29 24.5-29.5 27 18 44
30-34 29.5-34.5 32 25 69
35-39 34.5-39.5 37 15 84
40-44 39.5-44.5 42 10 94
45-49 44.5-49.5 47 6 100

Weight (kg) Class Midpoint (x) Frequency Cumulative RF CRF
boundaries frequency
10- 14 9.5-14.5 12 5 5
0.05 0.05
15-19 14.5-19.5 17 9 14
0.09 0.14
20-24 19.5-24.5 22 12 26
0.12 0.26
25-29 24.5-29.5 27 18 44
0.18 0.44
30-34 29.5-34.5 32 25 69
0.25 0.69
35-39 34.5-39.5 37 15 84
0.15 0.84
40-44 39.5-44.5 42 10 94
0.1 0.94
45-49 44.5-49.5 47 6 100
0.06 1

Example4
The position of 65th Percentile =
Lo+ C i(∑ f+1)- C.Fb

100
fpi
i(∑ f+1) =65(100+1)/100 = 65.65
100
Class= 29.5 class width = 5
Then P65 will be: 29.5 +5(65.65-44)/25 =33.83
Example4
The semi- interquartile range:
Position of First quartile, Q1 = i(∑ f+1) =
(100+1)/4 = 25.25
4
Hence the first quartile class is 20-24,
Now first quartile(Q1)= Lo+c (∑ f+1)-C.Fb

4
fq1
Example 4
Where:
Lo = Low class boundary of the Quartile Class =
19.5
C = Class width =5
∑f = total frequency = n = 100
(before)that of the Quartile class = 14
f q1 = frequency of the percentile class = 12
Hence Q1 = 19.5+5(25.25-14)/12 = 24.19
Example 4
Position of Third Quartile,Q3= i(∑ f+1) = 3(100+1)/4 = 75.75
4
Hence the third quartile class is 35-39
Now third quartile (Q3)= Lo+ c 3(∑ f+1)- C.Fb

4
fq3

Example 4
Where:
Lo = Low class boundary of the Quartile Class =
34.5
C = Class width =5
∑f = total frequency = n = 100
(before)that of the Quartile class = 69
f q3 = frequency of the percentile class = 15
Hence Q3 = 34.5+5(75.75-69)/15 = 36.75
Group Work
Form the groups of 6 Students and
answer the following questions:
1. Define Business statistics .
2. Explain how the knowledge of statistics may
be applied in business situation.
3. Distinguish between descriptive and
inferential statistics.
4. Distinguish between primary and secondary
data.
5. Discuss the various methods of data
Group Work
6. Given the following data:
Goals (x) 0 1 2 3 4 5 6
Frequency (f) 1 2 5 8 6 3 5
a. Find out the Mode

b.Calculate the Mean
c. Calculate the Mean Absolute Deviation (MAD)
d.The sample Variance and sample Standard
Deviation.
The following table shows the marks obtained
by the students of level III OAM in Business
Statistics last academic year:
Marks Number of Students
0-9 3
10-19 8
20-29 11
30-39 22
40-49 6
50-59 24
60-69 17
70-79 12
80-89 10
90-99 7
i. Find the total number of Level
III Students OAM in last
Academic Year.
ii. Find:
a. 78 Percentile
th
b. First and Third quartile

Chapter Four:
Correlation and Simple
regression

Correlation and Simple regression
Correlation refers to any of a broad class of
statistical relationships involving
dependence.
dependence refers to any statistical

relationship between two sets of data. In
other words , correlation is the relation
ship between two variables.
In statistics, we have two types of variables:
1.Independent variable
2.Dependent variable
When dealing with two variables, it is important

to know which is independent variable and
which is dependent.

For instance, if a car owner were to record
daily the petrol he or she used and the
kilometres he or she covered, he or she
would find a very relationship between
the two sets of figures.
The relationship between kilometres
driven and petrol used in an obvious one.

Difference between independent and
dependent variables
The independent variable is the
variable which is not affected by the
changes in the other.
Dependent variable is the variable
which affected by the changes in the
other.

Working Examples
The sales and Advertising expenditure for
Bralirwa Ltd from 2006 to 2010 are
presented as follow:
Year Advertising in 000’ Rwf(x) Sales in ooo’Rwf (y)
2006 2 60
2007 5 100
2008 4 70
2009 6 90
2010 3 80

From the previous example,
changes in advertising
(Independent)for the year can be
expected to affect the sales.
However, a change in the
sales(dependent) will not directly
affect advertising expenditure.

Formula for correlation(r)
The formula for calculating r can be
expressed in a number of ways
but the most practical is :
n* ∑xy- ∑x * ∑y/√(n* ∑x2 –(∑x )2 ) (n* ∑y2 –(∑y )2 )

From the previous example, find r of
advertising expenditure and the sales of
Bralirwa Ltd from 2006 to 2010.
Procedures
As presented in the formula above, first we
find the various ∑ values.
Year x y x2 y2 xy
2006 2 60 4 3600 120

2007 5 100 25 10000 500
2008 4 70 16 4900 280
2009 6 90 36 8100 540
2010 3 80 9 6400 240

n* ∑xy- ∑x * ∑y/√(n* ∑x2 –(∑x )2 ) (n* ∑y2 –(∑y )2 )
N= 5
∑xy= 1680
∑x = 20
∑y = 400
∑x2 = 90
(∑x )2 = 400
∑y2 = 33000
(∑y )2 = 160000

Then we apply the formula,
5* 1680-20* 400/√(5* 90–(20 )2 ) (5* 3300–(400)2 )
=8400-8000/ √50* 5000

= 0.80
Note that r is not expressed in
any units.
Types of correlations
a. Perfect positive correlation
b. Perfect negative correlation
c. High /weak/Lower positive
correlation
d. Lower /weak/ High negative
correlation
e. Zero correlation ie variables
uncorrelated.
Exercise
Determine the coefficient of correlation for the
following data:
Year Independent Variable Dependent Variable
2000 5 20
2001 8 15
2002 2 6
2003 6 8
2004 4 12
2005 7 40
2006 3 30

Scatter Plots and Correlation
• A scatter plot (or scatter diagram) is used to

show the relationship between two variables
• Correlation analysis is used to measure
strength of the association (linear
relationship) between two variables
– Only concerned with strength of the
relationship
– No causal effect is implied
Scatter Plot Examples
Linear relationships Curvilinear relationships
y y
x x
y y
x x
(continued
)
Strong relationships Weak relationships
y y
x x
y y
x x
(continued
No relationship )
x
Correlation Coefficient
(continued
)
• The population correlation coefficient ρ
(rho) measures the strength of the
association between the variables
• The sample correlation coefficient r is an
estimate of ρ and is used to measure the
strength of the linear relationship in the
sample observations

Features of ρ and r
• Unit free
• Range between -1 and 1
• The closer to -1, the stronger the negative
linear relationship
• The closer to 1, the stronger the positive
linear relationship
• The closer to 0, the weaker the linear
relationship
Examples of Approximate
r Values
y y y
x x x
r = -1 r = -.6 r=0
y y
x x
* r = +.3Prepared by Silas BAHIZI, PhD Candidater = +1 106
R2 Values
y
R2 = 1
Perfect linear relationship between x and

y:
x
R2 = 1 100% of the variation in y is explained by
variation in x
y
x
R = +1
2

R2 Values
y
0 < R2 < 1
Weaker linear relationship between x and

y:
x
Some but not all of the variation in y is
explained by variation in x
y
x
R2 Values
R2 = 0
No linear relationship between x and y:
The value of Y does not depend on x.

(None of the variation in y is explained by
variation in x)
x
R2 = 0

Linear Regression Analysis
Regression analysis is used to predict the value of one
variable (the dependent variable) on the basis of other
variables (the independent variables).
Dependent variable: denoted Y

Independent variables: denoted X1, X2, …, Xk
If we only have ONE independent variable, the model is
which is referred to as simple linear regression. We would

be interested in estimating β0 and β1 from the data we
collect.

Linear Regression Analysis
Variables:
X = Independent Variable
Y = Dependent Variable
Parameters:
β0 = Y-Intercept
β1 = Slope
ε = Error
Simple Linear Regression Model
rise
ru
n =slope (=rise/run)
=y-intercept

The Sum of Squares for Error
(SSE)
Explained Sum of square (ESS)
Total Sum of Square (TSS)

Y = Ao + A1X
Y estimate = Ao + A1 X
A1 = sum(Y - its mean)(X- its mean)/ sum(X- its

mean)Squared)
A0 = the mean of Y minus A1 (the mean of
X)
* Prepared Silas BAHIZI, PhD Candidate 114

The model to be estimated

Estimated model

Exercise
The sales and Advertising
Expenditure for SULFO Rwanda from
2008 to 2014 are presented as
follows in USDs:

Sales Advertisements
Years
2008 800 70
2009 1000 65
2010 1200 90
2011 1400 95
2012 1600 110
2013 1800 115
2014 2000 120

You are required to compute:
1. TSS
2. ESS
3. R Squared (Use two ways)
What R Squared tells you?

Chapter Five:Introduction
to Probability theory and
Distribution.

Probability is a measure of the
possibility of a random
phenomenon or chance behavior.
Probability describes the long-term
proportion/ratio with which a
certain outcome will occur in
situations with short-term
uncertainty/doubt.

Probability deals with experiments
that yield random short-term results
or outcomes, yet reveal long-term
predictability.
The long-term proportion with
which a certain outcome is
observed is the probability of that
outcome.

THE PROBABILITY SCALE
0-------------------------+----------------------------+1
0 .5
event never event and "not event" always
will occur event are likely will occur
to occur

A probability is a number from 0 to 1.
If we assign a probability of 0 to an
event, this indicates that this event
never will occur. A probability of 1
attached to a particular event
indicates that this event always will
occur. What if we assign a probability
of 0.5? This means that it is just as
likely for the event to occur as for the
event to not occur.
Note:
❖ In mathematical usage the meaning of the word
probability is established by definition and is not connected
with beliefs or wishful thinking.
❖ Statistics and probability are so fundamentally interrelated
that is impossible to discuss statistics without an
understanding of the meaning of probability.
❖ A knowledge of probability theory makes it possible to
interpret statistical results, since many statistical
procedures involve conclusions based on sample which are
always affected by random variation, and it is by means of
probability theory that we can express numerically the
inevitable uncertainties.
Basics Set Concepts
A set: a set is a well defined collection of
object. Elements of a set are enclosed in the
braces {} for instance A= {Students of CBE}
is a set.
If every element of a set A is also contained in
a Set B, then A is called a Subset of B that is
to say A C B.
A Universal set: this is a set containing every
thing under the field of study. And this is
denoted by U.
Note that:
1. All subsets belong to the Universal set
2. If x is an element or member of the set A, we
write x E A
An empty set: This kind of set is denoted by:
empty sign. This is a set with no elements.
The intersection of two sets A and B is denoted
by: A ∩ B, and is the set of all elements in A
and B. That is A ∩ B = {x E A and x E B}
If A ∩ B = to empty, then the sets A and B are
said to be disjoint, that is to say that A and B
*
do not have any element(s) in common.
Prepared Silas BAHIZI, PhD Candidate 130
The union of two set A and B is denoted by A U
B is the set of all elements in either A or B or
both that is to say A U B = {x E A or x E B}
If A is set of the Universal set U, then the
complements of the set A with respect to U is
denoted by A’ and is the set of all elements in
U but not in A.
That is to say A’ = {x E U and x not included in B}

Sample space, Sample Point, Event and
Probability of an Event
A Statistical experiment is a process that leads
to one of several possible outcomes. It can be
expressed as a process that can generates raw
data
For instance:
Tossing a die
Flipping a coin
An outcome of an experiment is some
observation or measurement.
Sample space
The sample space is the universal

set S pertinent to a given
experiment. It is the set of all
possible outcomes of an
experiment.

So each outcome is visualized as a sample point in the sample space. For instance:
Experiment Sample Space
Drawing a Card {all 52 cards in the deck}

EVENT
An event, in probability theory, constitutes
one or more possible outcomes of an
experiment.
An event is a subset of a sample space. It
is a set of basic outcomes. We say that
the event occurs if the experiment gives
rise to a basic outcome belonging to the
event.

For example, tossing a die,
S= {1,2,3,4,5,6}
Flipping a coin, S= {H,T} where H=
Head, and T= Tail
N.B: Any element of a sample space is

called Sample Point

Exercise: Identifying Events and the Sample
Space of a Probability Experiment
Consider the probability experiment of having

two children.
(a) Identify the Sample Points of the Probability
Experiment.
(b) Determine the sample space.
(c) Define the event E = “Have one Boy”.

Note :
✔An event consists of one or more
sample point.
✔ The probability of an event A in the
sum of weights assigned to all
sample point in an event A and it is
denoted by P(A) or Pr (A) or Prob
(A).

✔The probability of an event A, P(A), lies
between o and 1. That is ,it can neither
be negative nor greater than one. i.e 0
less than P(A) less than 1
✔ P (empty)=0, and P(S)= 1, that is, the
probability of an empty set is 0 and that
of a sample space is 1.
✔ If P(A)=0, the event A cannot occur, and
if If P(A)=1, then the event will certainly
occur.

✔Weights are assigned to sample
points in the sample space, S.
✔ If A and B are events,
then Probability P(A U B)=P(A) +
P(B) - P ( A ∩ B ) ,
✔ If A is an event and A’ is its
complement, then
P(A) + P(A’) = 1

Working Example 1
A coin is tossed twice. What is the probability that at least one head
will occur?
Note:
▪ Tossing a coin twice is the same as tossing two coin once,
▪ If the coin is balanced, then all the sample points are equally likely,
i.e the sample points should be equally weighted,
Therefore, the sample space, S = {HH,HT,TH,TT} ,
Let w be the weight of each sample point.
Hence w+w+w+w = P(S) = 1 That is to that 4w = 1 and w= ¼ and the

probability of each sample point is ¼.

Assume that A is the Event that at least one
head will occur, thus A = {HH,HT, TH} in other
words w,w,w
Hence ,P(A)= w+w+w= 1/4+1/4+1/4= ¾
Note: If an experiment can result in any of the N
different equally likely outcomes and if exactly
n of these correspond to event A, then the
probability of event A is : P(A)= n/N
From the above example, the process of tossing
a coin twice, results in four outcomes (N=4),
exactly three (n=3) of these correspond to
* event A, thus P(A)=3/4.
Working example two
An even number is twice as like to occur as an odd
number when a die is tossed. What is the
probability that a number less than 4 occur?
Solution:
The sample space (S) ={1,2,3,4,5,6}
Let w be the weight of an odd number. Then the
weight of an even number is 2w, since an even
number is twice as likely to occur, as an odd
number.
Hence: S= {w,2w,w,2w,w,2w} therefore,
w+2w+w+2w+w+2w=9w and w= 1/9
Thus, S = {1, 2, 3, 4, 5, 6}
1/9, 2/9, 1/9, 2/9, 1/9, 2/9
Now let A be the event that a number less than
4 occurs, then A= {1,2,3}
Hence P(A)= 1/9+2/9+1/9 = 4/9
Note : The definition, P(A)= n/N, is not

applicable here, since the outcomes are not
“equally likely”

TYPES OF EVENTS
1. MUTUALLY EXCLUSIVE EVENTS
Two events A and B are said to be mutually
exclusive events if A ∩ B = ф, therefore,
P(A ∩ B )=0.
Thus, two events are mutually exclusive, if the
occurrence of either event excludes the
possibility of the occurrence of the other
event i.e either one or the other event, but
not both , can occur.

Note:
If two events A and B
are mutually
exclusive, then , P(A U
B) = P(A) + P(B)
Working Example 3
(a)If a die is tossed and A is the event
of obtaining an even number ie A =
{2,4,6} and B is the event of
obtaining an odd number, i.e B=
{1,3,5} then A ∩ B = ф, therefore,
P(A ∩ B)= 0.Therefore A and B are
mutually exclusive events.

(b) The probability that a student
passes statistics is 2/3, and the
probability that she or he passes
principle of management is 4/9. If
the probability of passing at least
one course is 4/5, what is the
probability of passing both courses?

Let A = Event of passing statistics P(A) = 2/3
B= Event of passing principle of
Management
P(B) =4/9
Let AUB = Event of passing at least one of them
P(AUB) = 4/5
Let A ∩ B = event of passing both P(A ∩ B )=?

Note: these are not mutually
exclusive events both A and B can
occur.
Hence, Using P(A U B )= P(A)+
P(B) - P(A ∩ B ),
Therefore we have 4/5=2/3+4/9-
P(A ∩ B )
P(A ∩ B ) = 14/45
(c) What is the probability of getting a total of 7 or
11 when a pair of dice is tossed?
Solution: The sample points are shown in the
following diagram. Let take Fist die on
Horizontal Line and Second one of vertical line,
thus: + 1 2 3 4 5 6
1 2 3 4 5 6 7
2 3 4 5 6 7 8
3 4 5 6 7 8 9
4 5 6 7 8 9 10
5 6 7 8 9 10 11
6 7 8 9 10 11 12

Total number of sample points, in sample space,
S is 36. Thus N(S) = 36.
Let A = Event of getting a total of 7 , n(A) = 6,
Hence , P(A)= n(A)/N(S) = 6/36
Let B = Event of getting a total of 11 , n(B) = 2,
Hence , P(B)= n(B)/N(S) = 2/36
Note: A ∩ B =Ф, and thus P (A ∩ B) =0. Therefore
A and B are mutually exclusive.
Hence, P (A U B) = P(A) + P(B) = 6/36 + 2/36 =
8/36 =2/9

Independent Events
Two events are said to be independents
if P ( A ∩ B ) = P(A) * P(B).
Therefore, two events are said to be
independent if the occurrence or
non occurrence of one event has no
influence on the occurrence or non
occurrence of the other.

Contingency Table
This table is quite useful in
obtaining probability
especially if events are
independent. It is illustrated
by way of example below.

Working example 4
(a) If a coin (with sides labeled: H

and T) and a die (with sides
numbered: 1,2,3,4,5,6) are
thrown, the way in which a die
lands is no way does it affect the
possible ways in which a coin
lands and vice versa.
Therefore, throwing a die with a
six, for example, and throwing a
coin with a head , is independent
events. If both are fair /unbiased,
then
P(Head and Six)= P(six) * P(head)
=1/2 *1/6 =1/12
(b) In examination only two papers namely
Microeconomics I and Microeconomics II were
done. The failure rate were 45% and 40%
respectively. The number of candidates who
sat the examination was 2000. You are
required to find the probability that a
candidate selected at random,
i. Passed both Microeconomics I and II.
ii. Failed both Microeconomics I and II.
iii.Passed Microeconomics I and failed
Microeconomics II.
iv.Failed Microeconomics I and Passed II.

Solution
Let A= Event of passing Microeconomics I, then
A’ is the event of failing it,
Thus, P(A’) = 45%=0.45, P(A) = 1-0.45 = 0.55
Let B= Event of passing Microeconomics II,
then B’ is the event of failing it,
Thus, P(B’) = 40%=0.40, P(A) = 1-0.40 = 0.60
Note: Passing/failing microeconomics I is no
way affect passing/failing Microeconomics II
and vice versa.
Hence the two events are independents.
(i)Thus, P(A and B) = P ( A ∩ B )=
P(A)*P(B) = 0.55*0.60 = 0.33
(ii) Note: It can be shown that if A and B
are independent, so are A’ and B’.
Hence, P (A’and B’) = P ( A’∩ B’ )=
0.45*0.4
(iii) P(A and B’) = P ( A∩ B’ )=
P(A)*P(B’) =
0.45*0.60 = 0.27
Contingency table
The above results may be obtained from
the following contingency table:
A’ A Total
B’ P ( A’ ∩ B’ )=0.33 P ( A∩ B’ )=0.27 P ( B’)=0.60
B P ( A’ ∩ B)=0.22 P ( A∩ B)=0.18 P ( B)=0.40
Total P ( A’)=0.55 P ( A)=0.45 1

Note: The table above is filled by using the
following equations:
(a)Adding row - wise we have:
i. P ( A ∩ B ) + P ( A’ ∩ B ) = P(B)
ii.P ( A ∩ B’ ) + P ( A’ ∩ B’ ) = P(B’)
iii.P(A)+P(A’)= 1
(b) Adding column- wise we have:
iv.P ( A ∩ B ) + P ( A ∩ B’ ) = P(A)
v.P ( A’ ∩ B)+ P ( A’ ∩ B’ ) = P(A’)
vi.P(B)+P(B’)= 1
Therefore, P ( A ∩ B’ ) =0.22
Conditional Probability
The conditional probability of B given A,
denoted by: P(B/A), is the probability that B
occurs given that A has already occurred, and
it is defined by:
P(B/A)=P ( A ∩ B )/P(A), provided P(A) ≠ 0
Note:
(i)P ( B ∩ A ) = P ( A ∩ B ) Using commutative
law.
(ii) Similarly, P(A/B)= P ( A ∩ B )/P(B), provided
P(B) ≠ 0

Working Example Five
A loaded (biased)die with an even
number twice as likely as an odd
number is tossed,
(i)What is a probability of getting a
four?
(ii)What is the probability of getting a
four given that a number greater
than three occurred?
Solution
Using the results of Example 2, we have:
S ={1, 2, 3, 4, 5, 6}
1/9, 2/9, 1/9, 2/9, 1/9, 2/9
Let A: Event of obtaining a Four,
(i) A={4} , hence P(A) = 2/9
(ii)Let B: Event of obtaining a number grater than 3,
then B= {4,5,6}, hence (B)= 2/9+1/9+2/9= 5/9
Note: A ∩ B ={4}, hence , P(A ∩ B ) = 2/9
Therefore,
P(A/B) = P(A ∩ B )/P(B)= (2/9)/(5/9) = ?
Properties of Probabilities
1. The probability of any event E, P(E), must be between 0 and 1 inclusive.

That is,
0 < P(E) < 1.

2. If an event is impossible, the probability of the event is 0.
3. If an event is a certainty, the probability of the event is 1.
4. If S = {e1, e2, …, en}, then
P(e1) + P(e2) + … + P(en) = 1.

Baye’s Theorem
This is useful theorem used to simplify and
obtain conditional probabilities. It illustrated
by way of example below;
Working example Six
The following table shows the
number of females and males who
are employed and those who are not
employed.
Employed Unemployed
Males 140 260
Females 460 40

(i) Determine the probability that a
female is selected given that she is
unemployed.
(ii) If the following additional
information is available:
36 employed and 12 unemployed
belong to a club, determine the
probability that the selected person
belongs to a club.
Solution
Let F: Event a female is selected i.e a
man is not selected = M’. Then, F’ is
event female is not selected i.e a man
is selected = M
Let E: Event a person is employed .
Then E’ : Event a person is
unemployed.
(I) Required to find P(F/E’)

Employed Unemployed Total
Males 140 260 400
Femal 460 40 500

es
Total 600 300 900

Let S = Sample space, N(S) = 900, also
n(E) = 600, n(E’)=300, n(M)= 400 n(F) =
500
Now , P(F/E’)= P(F∩E’)/P(E’), provided P
(E’) is not zero.
But n(F∩E’) = 40,thus, P(F∩E’) =
n(F∩E’) /N(S) =40/900 = 4/90
Also P(E’)= n(E’)/N(S) = 300/900 = 1/3
Hence, P(F/E’) = (4/90)/(1/3)= 12/90
(i) Required to find the probability that the
selected persons belongs to a club. This may
be rephrased as: “ the selected person is
employed given that s/he belongs to a club”.
Now , let C be the event that the selected
person belongs to a club. Thus, required to find
P(E/C)
Therefore, P(E/C) = P(E∩C)/P(C), and P(C) is the
probability that: “a person belongs to a club and is
employed” or “ a person belongs to a club and s/he
is unemployed” i.e :
P(C) = P[(E and C) or (E’ and C)] = P[(E ∩ C) U (E’ ∩ C)]
Note:
(i)The Event (E and C) and (E’ and C) are mutually
exclusive.
Hence, P[(E ∩ C) U (E’ ∩ C)] = P[(E ∩ C) + (E’ ∩ C)]
Therefore, P(C) = P(C ∩ E) + P(C ∩ E’)
Hence, P(E/C) =
P(E ∩ C) /P(C) =
P(E ∩ C) / P[(E ∩ C) + (E’ ∩ C)] …………………(1)
(ii) The results in equation (1) is known as : “ Baye’s
Theorem”which is a useful tool in evaluating
conditional probabilities.
(iii) The following multiplicative law is useful in
simplifying Baye’s Theorem:
P(E ∩ C)= P(E)*P(C/E)
Similarly, P(E’ ∩ C)= P(E’)*P(C/E’)
Hence, equation (1) may as well be written as:
P(E/C)= P(E ∩ C)/ P[(E ∩ C) + (E’ ∩ C)]
= P(E)*P(C/E) / [P(E)* P(C/E) +P (E’)* P(C/E’)] .
……..(2)
Now, P(E ∩ C) =n(E ∩ C)/N(S) = 36/900
And P(E’ ∩ C)= n(E’ ∩ C)/N(S = 12/900
Hence, using equation(1), we have: P(E/C)=
*
(36/900)/[36/900+12/900] = 3/4
Working Example Seven
Tom is to travel from Muhanga
District to Kicukiro District for an
interview. The probability that he
will be on time for the interview
given that he travels by Volcano Bus
and Taxi are respectively 0.1 and 0.2.
The probability that he will travel by
bus and Taxi are respectively 0.6
and 0.4 .
i. Find the probability that he will be on
time.
ii. Find the probability that he travelled
by taxi given that he is on time.
iii. Find the probability that he travelled
by Volcano Bus given that he is not on
time.
iv. Find the probability that he travelled
by Taxi given that he is not on time.
Solution:
Let A be the event that he travels by
Volcano Bus, P(A) = 0.6
And B = event that he travels by taxi,
P(B) = 0.4
Let T = Event that he is on Time=?
Given: P(T/A)= 0.1
and P(T/B)= 0.2
P(T).Now T= Event that:
Required to find
“He is on time and he has travelled
by Volcano Bus.” or “He is on time
and he has travelled by Taxi” i.e
T= (T and A) or (T and B)
= (T∩A) U (T∩B)
Hence, P(T) = [P (T∩A) U (T∩B) ]
But the Event (T∩A) and (T∩B) are
mutually exclusive Events.
Therefore,
P(T)= P (T∩A) + P(T∩B)
= P(A)*P(T/A) + P(B)*P(T/B)
= (0.1*0.6) + (0.2*0.4) = 0.14
(ii) Required to find the probability that he travelled by
Taxi given that he is on time i.e required to find
P(B/T).
Solution:
Now P(B/T)= P(B∩T)/P(T)
= P (B∩T) / [P (B∩T) + P (A∩T)]
= P(B)*P(T/B)/[P(B)*P(T/B) + P(A)*P(T/A) ]
= (0.4*0.2)/[(0.4*0.2)+(0.6*0.1)] = 8/14
(iii) Required to find the probability that he
travelled by Volcano Bus given that he is not on
time i.e required to find P(A/T’).
Solution:
Now P(A/T’)= P(A∩T’)/P(T’)
= [P(A)*P(T’/A)]/ P(T’) This is because
they are mutually exclusives
But P(T)+P(T’) = 1 Hence, P(T’) = 1- P(T)
= 1-0.14 = 0.86
And P(T)+P(T’) = 1 Hence, P (T’/A) = 1- P (T/A)
=
1-0.1 = 0.9
Therefore, P(A/T’)= (0.6*0.9)/0.86 = 54/86

iv. Find the probability that he
travelled by Taxi given that he is not
on time?

Probability Tree Diagrams
A probability tree is a diagram used to obtain
probabilities of events. For Example:
A coin is tossed three times:
1.Determine all possible outcomes;
2.What is the probability of getting at least one
Tail?
3.What is the probability of getting at least Three
Heads ?
4.What is the probability of getting at least two
Tails?
Random Variable

• After you have studied this
chapter you will learn the
application of discrete random
variables, and
• How to use the binomial and the
Poisson distributions. These
subjects are treated as follows:
✔ Distribution for discrete random variables •
Characteristics of a random variable • Expected
value of rolling two dice • Application of the random
variable• Covariance of random variables •
Covariance and portfolio risk • Expected value and
the law of averages
✔ Binomial distribution • Conditions for a binomial
distribution to be valid
✔ • Mathematical expression of the binomial function
• Application of the binomial distribution• Deviations
from the binomial validity

✔Poisson distribution Mathematical
expression for the Poisson
distribution • Application of the
Poisson distribution.
• Poisson approximated by the
binomial relationship •Application of
the Poisson–binomial relationship.

Introduction
Discrete data are statistical information
composed of integer values, or whole
numbers. They originate from the
counting process.
For example, we could say that 9 machines
are shutdown, 29 bottles have been sold,
8 units are defective, 5 hotel rooms are
vacant, or 3 students are absent.
It makes little sense to say 9.5
machines are shutdown, 29.75
bottles have been sold, 8.5 units are
defective, 5.5 hotel rooms are empty,
or 3.25 students are absent. With
discrete data there is a clear
segregation and the data does not
progress from one class to another. It
is information that is unconnected.
Distribution for Discrete Random Variables
If the values of discrete data occur in no

special order, and there is no explanation
of their configuration or distribution,
then they are considered discrete
random variables. This means that,
within the range of the possible values of
the data, every value has an equal
chance of occurring.
Characteristics of a random variable
Random variables have a mean value
and a standard deviation. The mean
value of random data is the
weighted average of all the possible
outcomes of the random variable
and is given by the expression:

Mean value, μ = Σx * P(x) = E(x)
Here x is the value of the discrete random
variable, and P(x) is the probability, or
the chance of obtaining that value x. If
we assume that this particular pattern of
randomness might be repeated we call
this mean also the expected value of the
random variable, or E(x).

The variance of a distribution of a discrete random
variable
The variance of a distribution of a

discrete random
variable is given by the
expression:
Variance, σ2 = Σ(x-μ )2P(x)

The standard deviation of a
random variable
The standard deviation of a random variable
is the square root of the variance or,
Standard deviation,
Or σ = Square root of Σ(x-μ )2P(x)

Working Example One
What are all Possible outcomes when
two dice are thrown?
What are Expected value of the
outcome of this experiment?
Compute:
Variance and Standard Deviation.

Working Example Two
Assume that a distributor sells
wine by the case and that each
case generates €6.00 in profit.
The sale of wine is considered
random as it is shown in the table
below:

Cases of wine sold over the last 200
days:
Cases of wine sold 10 11 12 13 Total Days
per Day
Days this amount of 30 40 80 50 200

Wine is sold
Probability of Selling 15 20 40 25 100

this amount (%)

If we consider that this data is representative
of future sales, then the frequency of
occurrence of sales can be used to estimate
the expected or average value, of future
profits.
Here, the values, “days this amount of wine is
sold” are used to calculate the probability of
future sales using the relationship:
(Days this amount of Wine is sold * 100)/total

days considered in analysis

What is the Expected Value?
to calculate the mean value, we have,
μ = 10 * 15% + 11 * 20% + 12 * 40% + 13 * 25% = 11.75 cases
From this, an estimate of future profits is

€6.00 * 11.75 = €70.50/day.
What is the variance?
to calculate the variance,
σ2 = (10 - 11.75)2 * 15% +(11 - 11.75)2
* 20% +(12 - 11.75)2 * 40% + (13 -11.75)2 *
25% = 0.9875

What is Standard Deviation?
Standard Deviation is the Square root of

0.9875 = 0.9937
For more details, refer to the document

in Excel

Covariance of random variables
Covariance is an application of the
distribution of random variables
and is useful to analyze the risk
associated with financial
investments. If we consider two
datasets then the covariance
denoted by σxy,
between two discrete random
variables x and y in each of the
datasets is,
σxy= Σ(x - μx)(y - μy)*P(xy)

Here x is a discrete random variable in
the first dataset and y is a discrete
random variable in the second
dataset.
The terms μx and μy are the mean or expected
values of the corresponding datasets and P(xy)
is the probability of each occurrence.
The expected value of the sum of two random
variables is,
E(x + y) = E(x) + E(y)= μx +μy

The variance of the sum of two random
variables
Variance (x + y) = σ2 (x+y) = σ2x + σ2y + 2Qxy

The standard deviation
The standard deviation is the

square root of the variance or
Standard deviation (x + y)= square
root of σ (x+y)
2

Working Example Three
Assume that you are considering
investing in two types of
investments:
One is a high growth fund, X. and
the other is essentially a bond
fund, Y.
An estimate of future returns, per $1,000
invested, according to expectations of the
future outlook of the macro economy is given
in a table below:
Probability
economic
of
change
20 35 45
(%)
High growth fund (X ) - $100 $125 $300
Bond fund (Y ) $250 $100 $10

You are required to calculate:
a. The mean or expected values,
b. The variance,
c. The covariance
d. The expected value of the sum of the two
investments
e. The variance of the sum of the two
investments
f. The standard deviation of the sum of the two
investments
Binomial Distribution
In statistics, binomial means
there are only two possible
outcomes from each trial of an
experiment. The tossing of a
coin is binomial since the only
possible outcomes are heads or
tails.
This is a binomial condition. If in a market survey, a
respondent is asked if she likes a product, then the
alternative response must be that she does not.
Again, this is binomial. If we know before hand
that a situation exhibits a binomial pattern then
we can use the knowledge of statistics to better
understand probabilities of occurrence and make
suitable decisions. We first develop a binomial
distribution, which is a table or a graph showing
all the possible outcomes of performing many
times, the binomial-type experiment. The
binomial distribution is discrete.
Conditions for a binomial distribution
to be valid
In order for the binomial distribution to be valid,
we consider that each observation is selected
from an infinite population, or one of a very
large size usually without replacement.
if we say that the probability of obtaining one
outcome, or “success” is p, then the probability of
obtaining the other, or “failure,” is q. The value
of q must be equal to (1 - p).

The idea of failure here simply
means the opposite of what you
are testing or expecting. As it is
shown in a table below, you can
give some various qualitative
outcomes using p and q.

Qualitative outcomes for a binomial
occurrence
Prob. p Success win Work Good Pass Open Odd Yes Presen
t
Prob. Failure lose Defecti Bad Fail Shut Even No Absent
q= (1-p) ve

Other criteria for the binomial
distribution are that the probability,
p, of obtaining an outcome must be
fixed over time and that the outcome
of any result must be independent of
a previous result. For example, in the
tossing of a coin, the probability of
obtaining heads or tails remains

always at 50% and obtaining a
head on one toss has no effect
on what face is obtained on
subsequent tosses.
Probability, q = (1 - p), the value
of Prob.(p)= 0.5 and Prob.(q)=
1-p = 0.5
Mathematical expression of the
binomial function:
Probability of x successes, in n trials

Where,
⮚p is the characteristic probability, or
the probability of success,
⮚q = (1 - p) or the probability of
failure,
⮚x = number of successes desired,
⮚ n = number of trials undertaken, or
the sample size.
The binomial random variable x can
have any integer value ranging from
0 to n, the number of trials
undertaken. Again, if p = 50%, then q
is 50% and the resulting binomial
distribution is symmetrical regardless
of the sample size, n.

In the binomial function, the expression below is the probability of obtaining exactly x successes
out of n observations in a particular sequence.

Then the relationship below is how
many combinations of the x successes,
out of n observations are possible.

The expected value of the binomial
Distribution
The expected value of the binomial
distribution E(x) or the mean value,
μx, is the product of the number of
trials and the characteristic
probability.
μx =E(x) = n * p
For example, if we toss a coin
40 times, then the mean or
expected value would be,
40 * 0.5 = 20

The variance of the binomial
distribution
The variance of the binomial
distribution is the product of the
number of trials, the
characteristic probability of
success, and the characteristic
probability of failure. Then
variance= n*p*q
The standard deviation of the
binomial distribution
The standard deviation of
the binomial distribution is
the square root of the
variance

Working Example
Assume that Brad and Delphine are newly
married and wish to have seven children. In
the genetic makeup of both Brad and Delphine
the chance of having a boy and a girl is equally
possible and in their family history there is no
incidence of twins or other multiple births.

Required:
1. What is the probability of Delphine

giving birth to exactly two boys?
2. Develop a complete binomial
distribution for this situation and
interpret its meaning.

Answers
1. p = q = 50%
x, the random variable can take on the values,
0, 1, 2, 3, 4, 5, 6, and 7.
n is the sample size is 7
Then for this particular question, x = 2

Check the answer!!!!

Check answer please!!!!!

P(x=2) = 21*0.25*0.1313=
16.41%

Poisson Distribution
The Poisson distribution, named after the
Frenchman, Denis Poisson (1781–1840), is
another discrete probability distribution to
describe events that occur usually during a
given time interval.
• Illustrations might be the number the number
of patients arriving at the emergency centre of
a hospital in one day, or the number of
airplanes waiting in a holding pattern to land at
a major airport in a given 4-hour period.
The Poisson Probability Distribution
The Poisson probability

distribution provides a good
model for the probability
distribution of the number of
“rare events” that occur randomly
in time, distance, or space
Poisson Probability Distribution
Assume that an interval is divided into a very large
number of subintervals so that the probability of the
occurrence of an event in any subinterval is very
small. Assumptions of a Poisson probability
distribution:
⚫The probability of an occurrence of an event is
constant for all subintervals: independent events;
⚫You are counting the number times a particular
event occurs in a unit; and
⚫As the unit gets smaller, the probability that two or
more events will occur in that unit approaches zero.
The poisson distribution be regarded as a special case of
the binomial distribution. As with the binomial
distribution, the poisson distribution can be used
where there are only two possible outcomes “success”
p or “failure” q, and these events are independent, The
poisson distribution is usually used where n is very
large but p is very small, and where np is constant and
typically <5. as (p <0.1 and often much less),then the
chance of the event occurring is extremely low.
Therefore the poisson distribution is typically used for
unlikely events such as accidents, strikes,etc. The
poisson distribution is also used to solve problems
where events tend to occur at random, such incoming
phone call, passenger arrivals at a terminals,etc.

Whereas the formula for solving
binomial problems, uses the
probabilities for both “ success” p
and “failure”q, the formula for
solving poisson problems, only uses
the probability for “success”p
Hence:

Poisson Probability Distribution formula

⚫ where
• P(x) = the probability of x successes over a given
period of time or space, given λ
• λ = μ or the expected number of successes
per time or space unit; λ > 0
• e = 2.71828 (the base for natural logarithms)
• The mean and variance of the Poisson probability

distribution are:

Where:
λ: (lambda the Greek letter l) is the mean
number of occurrences or mean number
of successes =np
e: is the base of the natural logarithm, or
Exponential constant which equals to
2.7183;
x: is the Poisson random variable;
P(x): is the probability of exactly x
*
occurrences.Prepared by Silas BAHIZI, PhD Candidate 238
If we substitute x=0,1,2,3,4,5,…n in the
formula, we obtain the following expressions:

Working Example 1
In a large Company with many
employees, accidents occur at
random at the mean rate of 3
per day. Calculate the
probability that four accidents
occur on single day.

Solution
Given:
The mean number (μ)= 3
Number of success (x) =4
As we know, e=2.7183
λ =μ =3
Then, the probability that four
accidents occur on single day (x)
The prob(4) will be 0.1681
In other words there is a
0.1681 let say 16.81% the
probability of four
accidents occurring on
single day.
The standard deviation of the Poisson distribution
The standard deviation of the Poisson

distribution is given by the square root
of the mean number of occurrences or,
σ = v (λ)

Working example 2
• A small coffee shop on a certain

extend of highway knows that on
average nine people per hour come
in for service. Sometimes the only
waitress on the shop is very busy,
and sometimes there are only a few
customers.
The owner has decided that if there is greater
than a 10% chance that there will be at least
13 clients coming into the coffee shop in a
given hour, the manager will hire another
waitress.
Required: Develop the information to help the

manager make a decision.

To determine the probability of there being
exactly 13 customers coming into the coffee
shop in a given hour we can use equation the
following:
x is 13 & λ is 9.

2 541 865 828 329 * 0.
000123/6227 020 800
= 5.04%

Chapter 6: Hypothesis Testing
Concept of Hypothesis Testing:
A hypothesis is a judgment
about a situation, outcome, or
population parameter based
simply on an assumption or
intuition with no concrete
backup information or analysis.
Hypothesis testing is to take sample data and
make on objective decision based on the
results of the test within an appropriate
significance level.
1. Significance level
When we make quantitative judgments, or
hypotheses, about situations, we are either
right, or wrong. However, if we are wrong
we may not be far from the real figure or
that is our judgment is not significantly
different. Thus our hypothesis may be
acceptable.
Examples
• 1. A Lecturer says that it will take 30
Minutes to finish a Business Statistics The
CAT is finished 35 Minutes . The
completion time is not 30 Minutes
however it is not significantly different
from the estimated time of 30 Minutes.

2.A financial advisor estimates that a
client will make $15,000 on a certain
investment. The client makes
$14,900. The number $14,900 is not
$15,000 but it is not significantly
different from $15,000 and the
client really does not have a strong
reason to complain.
However, if the client made only $8,500 he
would probably say that this is significantly
different from the estimated $15,000 and
has a justified reason to say that he was
given bad advice.
Thus in hypothesis testing, we need to decide
what we consider is the significance level or
the level of importance in our evaluation.

This significance level is giving a ceiling level usually in
terms of percentages such as 1%, 5%, 10%, etc. To a
certain extent this is the subjective part of
hypothesis testing since one person might have a
different criterion than another individual on what
is considered significant.
However in accepting, or rejecting a hypothesis in

decision-making, we have to agree on the level of
significance. This significance value, which is
denoted as alpha, α, then gives us the critical value
for testing.
Null and alternative hypothesis
In hypothesis testing there are two defining statements
premised on the binomial concept.
One is the null hypothesis, which is that value
considered correct within the given level of
significance.
The other is the alternative hypothesis, which is that

the hypothesized value is not correct at the given
level of significance. The alternative hypothesis as a
value is also known as the research hypothesis since
it is a value that has been obtained from a sampling
experiment.
Example
For example, the hypothesis is that the
average age of the population in a
certain country is 35. This value is the
null hypothesis.
The alternative to the null hypothesis is
that the average age of the population is
not 35 but is some other value.
In hypothesis testing there are three
possibilities.
In hypothesis testing there are three possibilities:
1. The first is that there is evidence that the

value is significantly different from the
hypothesized value.
2. The second is that there is evidence that the
value is significantly greater than the
hypothesized value.
3. The third is that there is evidence that the
value is significantly less than the
hypothesized value.
Null and Alternative hypothesis
• A null hypothesis (H0) is any hypothesis which
is to be tested for possible rejection or
nullification under the assumption that is true.
• An alternative hypothesis (H1/HA) is any other
hypothesis which we are willing to accept
when the null hypothesis H0 is rejected.
• For example, if our null hypothesis is
• H0: μ=150 cms, then our alternative
hypothesis may be:-
• H1: μ≠150 cms or H1: μ>150 cms or H1: μ<150
*
cms Prepared by Silas BAHIZI, PhD Candidate 259
Type I and type II errors
• While testing hypothesis (H0) and deciding to either
accept or reject a null hypothesis, there are four
possible occurrences.
• Acceptance of a true hypothesis (correct decision) –
accepting the null hypothesis and it happens to be
the correct decision. Note that statistics does not
give absolute information, thus its conclusion could
be wrong only that the probability of it being right
are high.
• Rejection of a false hypothesis (correct decision).
• Rejection of a true hypothesis – (incorrect decision)
– this is called type I error, with probability = α.
Hypothesis Testing for the
Mean Value
In hypothesis testing for the mean, an assumption is made
about the mean or average value of the population.
Then we take a sample from this population, determine
the sample mean value, and measure the difference
between this sample mean and the hypothesized
population value.

Note:
If the difference between the sample mean and
the hypothesized population mean is small,
then the higher is the probability that our
hypothesized population mean value is
correct.
If the difference is large then the smaller is the

probability that our hypothesized value is
correct.
One-sample test of hypothesis
Introduction
In statistics, there are some important basic

concepts of hypothesis testing. Those basic
concepts are summarized as follow:

⚫Null Hypothesis (Ho): this is an assertion that a
parameter in statistical model takes a particular
value and is assumed true until experimental
evidences suggest otherwise.
⚫An alternate hypothesis (H1): is formulated when
a researcher totally rejects null hypothesis.
⚫Type 1 Error: this rejecting the null hypothesis
when it is in fact true.
⚫Type 2 Error: this accepting the null hypothesis
when it is in fact false.
⚫Significance level: this is the size of the critical
region, the probability of the type 1 Error.
⚫Critical region: The null hypothesis is rejected
when a calculated value of the Test statistics lies
within this region.
The formula for a one-sample z-test is as follow:
Where,
equals the sample mean, μ0 equals the hypothesized

population mean, σ the population standard deviation, and n
the sample size.
The purpose of the t-test’s statistical
significance and the t-test’s effect size are
the two primary outputs of the t-test. Statistical
significance indicates whether the difference
between sample averages is likely to represent
an actual difference between populations and
the effect size indicates whether that difference
is large enough to be practically meaningful. A t-
test asks whether a difference between two
groups’ averages is unlikely to have occurred
because of random chance in sample selection.
Working example1. Normal population mean where n
Small and Standard Deviation unknown
Example: a consumer group concerned

about the mean fat content a certain
grade of steakburger submits to an
independent laboratory a random
sample of twelve steakburger for
analysis. The percentage of fat in each of
the steak burger is as follows:

A manufacturer claims that the mean fat of this
grade of steak burger is less than 20%. Assume
that the percentage fat content to be normally
distributed with unknown standard deviation.
Carry out an appropriate hypothesis test in
order to advice the consumer group as to the
validity of the manufacturer’s claim.

21 18 19 16 18 21
17 19 24 14 18 15

Solution:
When Standard Deviation is known, then Z equals to:

21 18 19 16 18 21
17 19 24 14 18 15

And when the Standard Deviation is
unknown Z, is replaced by t and therefore:
t = (Mean - µ)/ (δ/ Ѵn). In this case Standard deviation squared
equals to:
δ2 = 1/n-1 ∑(X- Mean) 2 or δ2 = 1/n-1 ((∑X 2 - ∑ Mean 2 ) /N

Finally, Standard deviation will be Square root of 1/n-1 ((∑X 2 - ∑
Mean 2 ) /N

• H0 =µ= 20%
• H1 = µ< 20 (one tailed)
• Significance level, (α) = 0.05 = 5%
• Degree of freedom, v = n- 1= 11
• Critical region< -1.796
• Under H0, Mean….N (20, δ2 /12)
• Now the test statistic is t = (Mean - µ)/ (δ/ Ѵn)
and
the Standard deviation squared equals to:
δ2 = 1/11(4118-(2162 /12)) then Ơ=
4.583
Then t= (18-20)/4.583/Ѵ12= -1.51
This value does not lie in the critical

region. Therefore there is no
evidence to support the
manufacture’s claim.

Working example2
• The scores obtained from the 6 students were as
follows:
X
Person 1: 110
Person 2: 118
Person 3: 110
Person 4: 122
Person 5: 110
Person 6: 150

On average, do the population of
undergraduates at CBE have higher than
average intelligence scores (IQ > 100)?

• First, we must compute the mean (or average) of this
sample:
=
• In the above example, there is some new mathematical
notation.

=
• First, a symbol that denotes the mean of all Xs
or intelligence scores.

=
• The second part of the equation shows how

this quantity is computed.

=
• The sigma symbol ( ∑ ) tells us to sum all the

individual Xs.

=
• Lastly, we must divide by ‘n’,

that is: the number of observations.

=
• Notice, these 6 people have higher than average
intelligence scores (IQ > 100).
*
• However, is this finding likely
to hold true in repeated samples?
• What if we drew 6 different people from CBE?
• A one-sample t-test will help answer this question.
• It will tell us if our findings are ‘significant’, or in
other words, likely to be repeated if we took another
sample.

Computing the sample variance
110 120 -10 100

118 120 -2 4
110 120 -10 100
122 120 2 4
110 120 -10 100
150 120 30 900

110 120 -10 100

118 120 -2 4
110 120 -10 100
122 120 2 4
110 120 -10 100
150 120 30 900

• To get the third column we take each
individual ‘X’ and subtract it from the mean
(120).
• We square each result to get the fourth
column.
• Next, we simply add up the entire fourth
column and divide by our original sample size
(n = 6).
• The resulting figure, 201.3, is the sample
*
variance. Prepared by Silas BAHIZI, PhD Candidate 287
Important:
All sample variances
are computed this way!
We always take the mean;
subtract each score from the mean;
square the result;
sum the squares;
and divide by the sample size
(how many numbers, or rows we have).
Computing the sample
variance
• Now that we have the mean (X = 120) and the
variance ( ) of our sample, we have
everything needed to compute whether the sample
mean is ‘significantly’ above the average intelligence.
• In the formula that follows, we use a new symbol mu
( μ ) to indicate the population standard value ( μ =
100 ) against which we compare our obtained score
(X = 120).

Our sample has ‘n = 6’ people, so the degrees of freedom for this t-test are:
df = n – 1 = 5
This degrees of freedom figure will be used in our test of significance.

The ‘t’ statistic
• Most often our computed ‘t’
should be around 0. Why?
Because the numerator, or top part of the
formula for t is:
• If our first sample of 6 people is truly

representative of the population, then our
sample mean should also be 100, and therefore
our computed t should be

On average, do the population of
undergraduates at CBE have higher than
average intelligence scores (IQ > 100)?

• This is a 1-tailed test, because we are asking if
the population mean is ‘greater’ than 100.
• If we had only asked whether the intelligence of

students were ‘different’ from average (either
higher or lower) then the test would be 2-tailed.
• In the appendixes of your textbook, look at the
table titled, ‘critical values of the t-distribution’.
• Under a 1-tailed test with an Alfa-level of and
degrees of freedom df = 5, and you should find a
critical value (C.V.) of t = 2.02.

Is our computed t = 3.152
greater than the C.V. = 2.02?
Yes!
Thus we reject the null hypothesis
and live happily ever after.
Right?
Not so fast.
What does this really mean?
• We assume the null hypothesis when
making this test.
• We assume that the population
mean is 100, and therefore we will
most often compute a t = 0.
• Sometimes the computed ‘t’ might
be a bit higher and sometimes a bit
lower.
What does the ‘critical value’ tell us?
• Based on knowledge of the distribution table we

know that 95% of the time, in repeated samples, the
computed ‘t’ statistics should be less than 2.02.
• That’s what the critical value tells us.
• It says that when we are sampling 6 persons from a

population with mean intelligence scores of 100, we
should rarely compute a ‘t’ higher than 2.02.

What happens if
we do calculate a ‘t’ greater than 2.02?
• Well, we can be pretty confident that our sample

does not come from a population with a mean of
100!
• In fact, we can conclude that the population mean
intelligence must be higher than 100.
• How often will we be wrong in this conclusion?
• If we do these t-tests a lot, we’ll be wrong 5% of the
time. That’s the Alfa level (or 5%).

Conclusions
How would we state our
conclusions ?
The mean intelligence score of
undergraduates at CBE (M = 120) was
significantly higher than the standard
intelligence score (M = 100), t(5) =
3.15, p < .05 (one-tailed).

Two sample testing of Hypothesis:
Introduction
⚫The two sample test is similar to the one sample test,
except that we are now testing for differences between two
populations rather than a sample and a population. There
are three types of two sample tests:
⮚Hypothesis Testing with Sample Means
(Large Samples)
⮚Hypothesis Testing with Sample Means
(Small Samples)
⮚Hypothesis Testing with Sample
Proportions (Large Samples)
Null Hypothesis:
• The H0 is that the populations are the same.
H0: μ1 = μ2
• If the difference between the sample statistics is

large enough, or, if a difference of this size is unlikely,
assuming that the H0 is true, we will reject the H0 and
conclude there is a difference between the
populations.
Alternate Hypothesis:
• The alternate hypothesis is the research
hypothesis.
• If the null hypothesis is rejected, then we will
have found evidence to support the research
hypothesis.
H1: μ1 ≠ μ2

Formula for Hypothesis Testing with Sample Means (Large Samples)

Explanation of formula:
• The numerator is the difference in
sample means.
• The denominator is the “joint estimate”

of the standard error for both samples.
• The pooled estimate is calculated by using the
sample information in the following formula:

• (x1 - x2) is the difference between
the sample means taken from the
population and (μ1 - μ2)H0 is the
difference of the hypothesized means
of the population.

The Five Step Model
1. Make assumptions and meet test
requirements.
2. State the H0 and H1.
3. Select the Sampling Distribution and
Determine the Critical Region.
4. Calculate the test statistic.
5. Make a Decision and Interpret Results.

Example: Hypothesis Testing in the Two
Sample Case
– Middle class families average 8.7 email

messages and Working class families
average 5.7 messages.
– The middle class families seem to use
email more but is the difference
significant?

Problem Information:
E-Mail Messages
Sample 1 (M.Class) Sample 2 (W.Class)
= 8.7 = 5.7
S1 = 0.3 S2 = 1.1
n1 = 89 n2 = 55

Step 1 Make Assumptions and Meet Test
Requirements
• We have:
• Independent Random Samples

• Level of Measurement is Interval Ratio
• Sampling Distribution is normal in shape
because we have a large sample:
n1 + n2 ≥ 100 (in this case, n1 + n2 = 144)

Step 2 State the Null Hypothesis
• H0: μ1 = μ2
– The Null asserts there is no significant

difference between the populations.
• H1: µ1≠ µ2
– The research hypothesis contradicts the H0
and declares there is a significant
*
difference between the populations.
Step 3 Select the Sampling Distribution and
Establish the Critical Region
• Sampling Distribution = Z distribution
• Alpha (α) = 0.05
• Z (critical) = ± 1.96

Using the formula:
• Compute the joint estimate
• Solve for Z:

Step 5 Make a Decision
⚫The obtained test statistic (Z = 19.74) falls in the
Critical Region so reject the null hypothesis.
⚫The difference between the sample means is so
large that we can conclude (at α = 0.05) that a
difference exists between the populations
represented by the samples.
⚫The difference between the email usage of
middle class and working class families is
significant (Z=19.74, α=.05)

Two-tailed Hypothesis Test:
When α = .05, then .025 of the area is distributed on either side of the
curve in area (C )
The .95 in the middle section represents no significant difference
between the two populations.
The cut-off between the middle section and +/- .025 is represented by a
Z-value of +/- 1.96.
Prepared by Silas BAHIZI, PhD Candidate 313*
Chapter Seven: Index Numbers
Definition:
It is indicator of average percentage change in
a series of figures where one figure (called
the base ) is assigned an arbitrary of 100, and
other figures are adjusted in proportion to
the base.

Base 100- the percentage relative
Index
In many ordinary day to day comparisons 100 is
used as base. For example, the academic year
2012, the number of level II students was 150
and in this Academic year is 300 student, by
comparing this year and the last one by saying
that (300/150)*100 = 200
Note that an index number computed in this
way is called Percentage Relative Index.

Base Year
When comparing a series of annual figures,
it is necessary first to select one of the
years as a base and choose the figure
relating to that year as 100. Then all the
other figures are expressed in terms of
this selected year. A base year can ,
therefore be defined as “the year against
which all other years are compared”.
Index number symbols
All the different methods of calculating index numbers can be
expressed concisely and unambiguously as formula. The
symbols used in these formulas are the following:
P: Price of individual items;
P0: Price of individual items in base year;
P1; P2; P3;…..: Price of individual items in subsequent years,
Q: Quantity of individual items;
Q0: Quantity of individual items in base year;
Q1; Q2; Q3; ….: Quantities of individual items in subsequent
years
Index Number symbols
W: Weight
Note that Σpq means that the price and
quantity of each item in turn are first
multiplied together and the product then
added.
If the current year only is being compared
with base year, then suffix 1 indicates the
current year.
One item index numbers
Where only one item is involved in comparison
between different periods, the calculation of
index numbers is very simple. One year is
chosen as base and the values for other for
other years are stated in proportion to the
value of the base year, that is to say:
Quantity index = (q1/q0)*100
And the price index will be( p1/p0)*100

Learning Objectives
• Define an index number and explain its use
• Perform calculations involving simple,
composite and weighted index numbers
• Understand the basic structure of the
consumer price index (CPI) and perform
calculations involving its use
• Understand other indexes used in the
Australian business sector

Introduction
• An index number is a statistical value that measures
the change in a variable with respect to time
• Two variables that are often considered in this
analysis are price and quantity
• With the aid of index numbers, the average price of
several articles in one year may be compared with
the average price of the same quantity of the same
articles in a number of different years
• There are several sources of ‘official’ statistics that
contain index numbers for quantities such as food
prices, clothing prices, housing, wages and so on.
Simple index numbers
• We will examine index numbers that are
constructed from a single item only
• Such indexes are called simple index numbers
• Current period = the period for which you
wish to find the index number
• Base period = the period with which you wish
to compare prices in the current period
• The choice of the base period should be
considered very carefully
• The choice itself often depends on economic
factors
*20-322 Prepared by Silas BAHIZI, PhD Candidate
1. It should be a ‘normal’ period with respect to the
• The notation we shall use is:
– pn = the price of an item in the current period
– po = the price of an item in the base period
• Price relative
– The price relative of an item is the ratio of
the price of the item in the current period to
the price of the same item in the base
period
– The formal definition is:
Prepared by Silas BAHIZI, PhD Candidate 20-323

• Simple price index
– The price relative provides a ratio that
indicates the change in price of an item
from one period to another
– A more common method of expressing this
change is to use a simple price index

• The simple price index finds the percentage change in the price of an item
from one period to another
– If the simple price index is more than 100,

subtract 100 from the simple price index. The
result is the percentage increase in price from the
base period to the current period
– If the simple price index is less than 100, subtract

the simple price index from 100. The result is the
percentage by which the item cost less in the base
period than it does in the current period
Composite index numbers
• A composite index number is constructed from
changes in a number of different items
• Simple aggregate index

– the simple aggregate index has appeal
because its nature is simplistic and it is easy
to find
Where
Σpn = the sum of the prices in the current
period Prepared by Silas BAHIZI, PhD Candidate 20-326
• Simple aggregate index
– Even though the simple aggregate index is
easy to calculate, it has serious disadvantages:
1. An item with a relatively large price can dominate the index
2. If prices are quoted for different quantities, the simple
aggregate index will yield a different answer
3. It does not take into account the quantity of each item sold
– Disadvantage 2 is perhaps the worst feature of
this index, since it makes it possible, to a
certain extent, to manipulate the value of the
index

• Averages of relative prices
– This index also does not take into account
the quantity of each item sold, but it is still a
vast improvement on the simple aggregate
index
where
k = the number of items
pn = the price of an item in the current period
Weighted index numbers
• The use of a weighted index number or weighted index allows greater
importance to be attached to some items
• Information other than simply the change in price over time can then be
used, and can include such factors as quantity sold or quantity consumed
for each item
• Laspeyres index
– The Laspeyres index is also known as the average
of weighted relative prices
– In this case, the weights used are the quantities of
each item bought in the base period

– The formula is:
Where:
qo = the quantity bought (or sold) in the base period
pn = price in current period
po = price in base period
– Thus, the Laspeyres index measures the
relative change in the cost of purchasing
these items in the quantities specified in the
• Paasche index
– The Paasche index uses the consumption in the
current period
– It measures the change in the cost of purchasing
items, in terms of quantities relating to the
current period
– The formal definition of the Paasche index is:
Where:
pn = the price in the current period
po = the price in the basebyperiod
Prepared Silas BAHIZI, PhD Candidate 20-331
• Comparison of the Laspeyres and Paasche indexes
– The Laspeyres index measures the ratio of
expenditures on base year quantities in the
current year to expenditures on those quantities
in the base year
– The Paasche index measures the ratio of
expenditures on current year quantities in the
current year to expenditures on those quantities
in the base year
– Since the Laspeyres index uses base period
weights, it may overestimate the rise in the cost of
living (because people may have reduced their
*
consumption of items that have become
• Comparison of the Laspeyres and Paasche indexes (cont…)
– Since the Paasche index uses current period
weights, it may underestimate the rise in the cost
of living
– The Laspeyres index is usually larger than the
Paasche index
– With the Paasche index it is difficult to make year-
to-year comparisons, since every year a new set of
weights is used
– The Paasche index requires that a new set of
weights be obtained each year, and this
information can be expensive to obtain
* – Because of thePrepared
lastby2Silaspoints above, the Laspeyres 333
BAHIZI, PhD Candidate
• Fisher’s ideal index
– Fisher’s ideal index is the geometric mean of
the Laspeyres and Paasche indexes
– Although it has little use in practice, it does
demonstrate the many different types of
index that can be used

The Consumer Price Index
• The measure most commonly used in Australia as a general indicator

of the rate of price change for consumer goods and services is the
consumer price index
• The Australian CPI assumes the purchase of a constant ‘basket’ of
goods and services and measures price changes in that basket alone
• The description of the CPI commonly adopted by users is in terms of
its perceived uses; hence there are frequent references to the CPI as
– a measure of inflation
– a measure of changes in purchasing power,
or
– a measure of changes in the cost of living
• The CPI has been designed as a general measure of price inflation for
the household sector.
• The CPI is simply a measure of the changes in the cost of a basket, as
the prices of items in it change
• From the September quarter 2005 onwards, the total basket has been
divided into the following 11 major commodity groups:
– food
– alcohol and tobacco
– clothing and footwear
– housing
– household contents and services
– health
– transportation

• Historical details of the CPI
– Retail prices of food, other groceries and average
rentals of houses have been collected by the ABS
for the years extending back to 1901
– From its inception in 1960, the CPI covered the six
state capital cities. In 1964 the geographical
coverage of the CPI was extended to include
Canberra. From the June quarter in 1982
geographic coverage was further extended to
include Darwin
– Index numbers at the ‘Group’ and ‘All groups’
*
levels are published for each capital city and for 337
Prepared by Silas BAHIZI, PhD Candidate
• The conceptual basis for measuring price changes
– The CPI is a quarterly measure of the change in

average retail price levels
– In measuring price changes, the CPI aims to
measure only pure price changes
– The CPI is a measure of changes in transaction
prices, the prices actually paid by consumers for
the goods and services they buy
– It is not concerned with nominal, recommended
or list prices
– The CPI measures price change over time and
*
does not provide comparisons between relative
• The index population
– Because the spending patterns of various groups
in the population differ somewhat, the pattern of
one large group, fairly homogeneous in its
spending habits, is chosen for the purpose of
calculating the CPI
– The CPI population group is, in concept,
metropolitan employee households
– For this purpose, employee households are
defined as those households that obtain the major
part of their household income from wages and
salaries
• Details of the 15th series CPI
– Since 1960, when the CPI was first compiled, the
ABS has maintained a program of periodic reviews
of the CPI
– The main objective of these reviews is to update
item weights, but they also provide an
opportunity to reassess the scope and coverage of
the index
– The latest (15th series) review has resulted in
three main outcomes:
1. updating the weighting pattern for the CPI
2. incorporating a price index for financial services into the CPI
The Consumer Price Index (CPI)
• Collecting prices
– This involves collecting prices from many sources,
including supermarkets, department stores,
footwear stores, restaurants, motor vehicle
dealers and service stations, dental surgeries, etc.
– In total, around 100 000 separate price quotations
are collected each quarter
– Prices of the goods and services included in the
CPI are generally collected quarterly
– The prices used in the CPI are those that any
member of the public would have to pay on the
pricing day to purchase the specified good or
service
• Periodic revision of the CPI
– The CPI is periodically revised in order to
ensure it continues to reflect current
conditions
– CPI revisions have usually been carried out at
approximately 5-yearly intervals
• Changes in quality
– it is necessary to ensure that identical or
equivalent items are priced in successive time
periods
– This involves evaluating changes in the quality
*
of goods and services
• Long-term linked series
– A single series of index numbers has been
constructed by linking together selected retail
price index series
– The index numbers are expressed on a reference
base 1945 = 100
• International CPIs
– A comparison of the CPIs for a number of
countries, including Australia, as measured in
September quarters
– The base year is 2000
– During the 8-year period up to 2008–09, of the
Using the CPI
• There are a number of situations where the CPI is
used to make adjustments to prices charged and
payments made
• Examples include changes in rent, pension
payments and child support payments
• Such adjustments are often made by the relevant
government agency, but if members of the
community wish to use the CPI for such purposes,
it is their responsibility to ensure that this index is
suitable

Index numbers as a measure of deflation
• One of the uses for price indexes is to measure the changes in the
purchasing power of the dollar
• This is known as deflation
• In order to eliminate the effect of inflation and obtain a clear
picture of the ‘real’ change, the values must be deflated
• For example, to deflate an actual salary and express it in terms of
‘real’ salary (of the base year), use:

Chaining indices
• Example: 4 time periods
• Calculate indices between adjacent years
(I12, I23, I34)=(1.03, 1.04, 0.98)
• Then form the chained index:
C1=1.00
C2=C1×I12=1.00×1.03=1.03
C3=C2×I23=1.03×1.05=1.08
C4=C3×I34=1.08×0.98=1.06
• Advantage is that weights change
regularly
*346 Prepared by Silas BAHIZI, PhD Candidate
Summary
• We have interpreted and used a range of index
numbers commonly used in the Australian business
sector
• We defined an index number and explained its use
• We performed calculations involving simple, composite
and weighted index numbers
• We understood the basic structure of the consumer
price index (CPI) and performed calculations involving
its use
• We understood other indexes used in the Australian
business sector

Working Example
With 2005 as base year , calculate quantity and
price
Year
indicesTV sets
Price in
for the year 2003Quantity
Price Index
to 2009:index
US D sold
2003 450 12912 (450/500)*100= 90 (12912/21200)*100 = 61
2004 480 18671 ? ?
2005 500 21200 ? ?
2006 530 28633 ? ?
2007 530 35028 ? ?
2008 550 40650 ? ?
2009 600 44531 ? ?

Price Index
A price index (plural: “price indices” or
“price indexes”) is a normalized average
(typically a weighted average) of prices
for a given class of goods or services in a
given region, during a given interval of
time. It is a statistic designed to help to
compare how these prices, taken as a
whole, differ between time periods or
geographical locations.
Importance of Price index
Price indexes have several potential uses.
For particularly broad indices, the index
can be said to measure the economy's
price Level or a cost of living. More
narrow price indices can help producers
with business plans and pricing.
Sometimes, they can be useful in helping
to guide investment.
Types of Price Indices
Consumer price index

Producer price index

Consumer Price Index
A consumer price index (CPI)

measures changes in the price
level of consumer goods and
services purchased by
households.

Formula of Consumer Price
Calculation of CPI Index
:
CPI= Price2/Price1
Where 1 is usually the comparison year.
Alternatively, the CPI can be performed as
CPI= (Updated Cost/Based period cost)*100
The "updated cost" (i.e. the price of an item at a given
year, e.g.: the price of bread in 2010) is divided by
the initial year (the price of bread in 1970), then
multiplied by one hundred.

Producer Price Index
Definition
A Producer Price Index (PPI)
measures average changes in
prices received by domestic
producers for their output. It is
one of several price indices.
Paasche and Laspeyres price indices
The two most basic formulas
used to calculate price indices
are the Paasche and the
Laspeyres index

Laspeyres price and quantity Index
1.Price Index:(Σp1q0/
Σp0q0 )*100
2.Quantity Index: :(Σq1p0/
Σq0p0 )*100
Paasche price and quantity Index
1.Price Index:(Σp1q1/ Σp0q1

)*100
2.Quantity Index: :(Σq1p1/
Σq0p1)*100
Exercise
Compute Laspeyre and paasche price indices of
the following data 2000 as a base year:
Price 2000 Quantity Price 2005 Quantity 2005
2000
Bread 45 Rwf per 80000 50 100000 Loaves

Loaf Loaves
Cheese 500 Rwf 10000 kgs 1000 15000 kgs
Fish 900 Rwf 1000 kgs 750 3000 kgs

Laspeyre and Paache indices contrasted
1. Paasche indices require actual
quantities to be ascertained for each
year of the series. In contrast, a laspeyre
index requires quantities to be found for
the base year only.
2. With Paasche indices the denominator
of the Formula, Σp0q1 needs
recomputing every year as q1 changes
yearly.
Laspeyre and Paache indices contrasted
In the case of Laspeyre numbers, however, the

denomination, Σp0q0, always remains the
same. Moreover, a consequence of this is that
the different years in a Laspeyre index can be
directely compared with each other, whereas
in a Paasche series the changing denominator
means that different years can be compared
only with the base year and not with each
other.
Group Work
Compute Laspeyre and paasche price indices of
the following data 2010 as a base year:
Price 2010 Quantity Price 2012 Quantity 2012
2010
Bread 45 Rwf per 80000 60 100000 Loaves

Loaf Loaves
Cheese 500 Rwf 10000 kgs 1200 15000 kgs
Fish 900 Rwf 1000 kgs 750 3000 kgs

Chapter Eight
Linear Programming
• Graphing Systems of Linear Inequalities in

Two Variables
• Linear Programming Problems
• Graphical Solutions of Linear
Programming Problems

Graphing Systems of Linear
Inequalities in Two Variables

Graphing Linear Inequalities
• We’ve seen that a linear equation in two variables x and y
has a solution set that may be exhibited graphically as points on

a straight line in the xy-plane.
• There is also a simple graphical representation for linear
inequalities of two variables:

Procedure for Graphing Linear Inequalities
1. Draw the graph of the equation obtained for the given

inequality by replacing the inequality sign with an equal
sign.
◼ Use a dashed or dotted line if the problem involves a
strict inequality, < or >.
◼ Otherwise, use a solid line to indicate that the line itself
constitutes part of the solution.
2. Pick a test point lying in one of the half-planes determined
by the line sketched in step 1 and substitute the values of
x and y into the given inequality.
◼ Use the origin whenever possible.
3. If the inequality is satisfied, the graph of the inequality
includes the half-plane containing the test point.
◼ Otherwise, the solution includes the half-plane not
containing the test point.
Graphing Systems of Linear Inequalities
• The solution set of a system of linear

inequalities in two variables x and y is the set
of all points (x, y) that satisfy each inequality
of the system.
• The graphical solution of such a system may
be obtained by graphing the solution set for
each inequality independently and then
determining the region in common with each
solution set.
Example
• Determine the solution set for the system
Solution
• The intersection of the solution regions of the
y
two inequalities represents the solution to the
4x + 3y =
system: 12
4
3 4x + 3y ≥
12
2
x
–1 1 2 3
Example
Solution
y
system: 4
x–y≤ x–y=0
3
0
2
x
–1 1 2 3
Example
Solution
y
4x + 3y =
system: 12
4
x–y=0
3
x
–1 1 2 3
Bounded and Unbounded Sets
▣ The solution set of a system of linear

inequalities is bounded if it can be
enclosed by a circle.
▣ Otherwise, it is unbounded.

Example
• The solution to the problem we just
discussed is unbounded, since the
solution set cannot be enclosed in a
circle: y
4x + 3y =
4
12
3
x–y=0
2
x
–1 1 2 3

Linear Programming Problems
Maximize
Subject to

Linear Programming Problem
▣ A linear programming problem consists

of a linear objective function to be
maximized or minimized subject to
certain constraints in the form of linear
equations or inequalities.

Applied Example 1: A Production Problem
▣ Ace Novelty wishes to produce two types of
souvenirs: type-A will result in a profit of $1.00,
and type-B in a profit of $1.20.
▣ To manufacture a type-A souvenir requires 2
minutes on machine I and 1 minute on machine
II.
▣ A type-B souvenir requires 1 minute on machine
I and 3 minutes on machine II.
▣ There are 3 hours available on machine I and 5
hours available on machine II.
▣ How many souvenirs of each type should Ace
make in order to maximize its profit?

Solution
• Let’s first tabulate the given information:
Type-A Type-B Time Available
Profit/Unit $1.00 $1.20
Machine I 2 min 1 min 180 min
Machine II 1 min 3 min 300 min
• Let x be the number of type-A souvenirs and y

the number of type-B souvenirs to be made.

Solution
• Then, the total profit (in dollars) is given by
* which is the objective function to be

Solution
▣ Let’s first tabulate the given information:
▣ The total amount of time that machine I is used

is
and must not exceed 180 minutes.

▣ Thus, we have the inequality
Solution
▣ The total amount of time that machine II is used

is
and must not exceed 300 minutes.

▣ Thus, we have the inequality
Solution
• Finally, neither x nor y can be negative, so

Solution
▣ In short, we want to maximize the objective
function
subject to the system of inequalities
▣ We will discuss the solution to this problem in

section 6.4.
Applied Example 2: A Nutrition Problem
▣ A nutritionist advises an individual who is suffering
from iron and vitamin B deficiency to take at least
2400 milligrams (mg) of iron, 2100 mg of vitamin B1,
and 1500 mg of vitamin B2 over a period of time.
▣ Two vitamin pills are suitable, brand-A and brand-B.
▣ Each brand-A pill costs 6 cents and contains 40 mg of
iron, 10 mg of vitamin B1, and 5 mg of vitamin B2.
▣ Each brand-B pill costs 8 cents and contains 10 mg of
iron and 15 mg each of vitamins B1 and B2.
▣ What combination of pills should the individual
purchase in order to meet the minimum iron and
vitamin requirements at the lowest cost?
Solution
Brand-A Brand-B Minimum Requirement
Cost/Pill 6¢ 8¢
Iron 40 mg 10 mg 2400 mg
Vitamin B1 10 mg 15 mg 2100 mg
Vitamin B2 5mg 15 mg 1500 mg
• Let x be the number of brand-A pills and y the

number of brand-B pills to be purchased.
Solution
Cost/Pill 6¢ 8¢
• The cost C (in cents) is given by

Solution
Cost/Pill 6¢ 8¢
▣ The amount of iron contained in x brand-A pills

and y brand-B pills is given by 40x + 10y mg, and
this must be greater than or equal to 2400 mg.
▣ This translates into the inequality
Solution
Cost/Pill 6¢ 8¢
▣ The amount of vitamin B1 contained in x brand-A

pills and y brand-B pills is given by 10x + 15y mg,
and this must be greater or equal to 2100 mg.
Solution
Cost/Pill 6¢ 8¢
▣ The amount of vitamin B2 contained in x brand-A

pills and y brand-B pills is given by 5x + 15y mg,
and this must be greater or equal to 1500 mg.
Solution
▣ In short, we want to minimize the objective
function
▣ We will discuss the solution to this problem in

* section 6.4. Prepared by Silas BAHIZI, PhD Candidate 387
Graphical Solutions
of Linear Programming Problems

Feasible Solution Set and Optimal
Solution
▣ The constraints in a linear programming problem
form a system of linear inequalities, which have a
solution set S.
▣ Each point in S is a candidate for the solution of the
linear programming problem and is referred to as a
feasible solution.
▣ The set S itself is referred to as a feasible set.
▣ Among all the points in the set S, the point(s) that
optimizes the objective function of the linear
programming problem is called an optimal solution.

Theorem 1
Linear Programming
▣ If a linear programming problem has a
solution, then it must occur at a vertex, or
corner point, of the feasible set S associated
with the problem.
▣ If the objective function P is optimized at two
adjacent vertices of S, then it is optimized at
every point on the line segment joining these
vertices, in which case there are infinitely
many solutions to the problem.

Theorem 2
Existence of a Solution
• Suppose we are given a linear
programming problem with a feasible
set S and an objective function P = ax +
by.
a. If S is bounded, then P has both a
maximum and a minimum value on S.
b. If S is unbounded and both a and b are
nonnegative, then P has a minimum value
on S provided that the constraints
defining S include the inequalities x ≥ 0
and y ≥ 0.
c. If S is the empty set, then the linear
The Method of Corners
1. Graph the feasible set.
2. Find the coordinates of all corner points
(vertices) of the feasible set.
3. Evaluate the objective function at each
corner point.
4. Find the vertex that renders the objective
function a maximum or a minimum.
◼ If there is only one such vertex, it
constitutes a unique solution to the
problem.
◼ If there are two such adjacent vertices,
there are infinitely many optimal solutions
given by the points on the line segment
* determined byby Silas
Prepared these vertices.
BAHIZI, PhD Candidate 392
▣ Recall Applied Example 1 from the last section (3.2),
which required us to find the optimal quantities to
produce of type-A and type-B souvenirs in order to
maximize profits.
▣ We restated the problem as a linear programming
problem in which we wanted to maximize the
objective function
▣ We can now solve the problem graphically.

• We first graph the feasible set S for the
problem.
– Graph the solution for the inequality
considering only
y positive values for x and y:
200 (0, 180)
100
(90, 0)
x
100 200 300

problem.
considering only
y positive values for x and y:
200
(0, 100)
100
(300, 0)
x
100 200 300

problem.
– Graph the intersection of the solutions to the
inequalities, yielding the feasible set S.
(Note that the feasible set S is bounded)
y
200
100
S
x
100 200 300

• Next, find the vertices of the feasible set S.
– The vertices are A(0, 0), B(90, 0), C(48, 84), and D(0,
100).
y
200
D(0, 100)
100 C(48, 84)
S
A(0, 0) B(90, 0)
x
100 200 300

• Now, find the values of P at the vertices and
tabulate them:
Vertex P = x + 1.2 y
A(0, 0) 0
y
B(90, 0) 90
200 C(48, 84) 148.8
D(0, 100) D(0, 100) 120
100 C(48, 84)
S
A(0, 0) B(90, 0)
x
100 200 300

• Finally, identify the vertex with the highest
value for P:
– We can see that P is maximized at the vertex C(48,
84) and has a value of 148.8. Vertex P = x + 1.2 y
A(0, 0) 0
y
B(90, 0) 90
200 C(48, 84) 148.8
D(0, 100) D(0, 100) 120
100 C(48, 84)
S
A(0, 0) B(90, 0)
x
100 200 300

• Finally, identify the vertex with the highest
value for P:
– We can see that P is maximized at the vertex C(48,
84) and has a value of 148.8.
– Recalling what the symbols x, y, and P represent,
we conclude that ACE Novelty would maximize its
profit at $148.80 by producing 48 type-A souvenirs
and 84 type-B souvenirs.

▣ Recall Applied Example 2 from the last section (3.2),
which asked us to determine the optimal combination
of pills to be purchased in order to meet the minimum
iron and vitamin requirements at the lowest cost.
▣ We restated the problem as a linear programming
problem in which we wanted to minimize the
objective function
▣ We can now solve the problem graphically.

problem.
y
considering only positive values for x and y:
(0, 240)
200
100
(60, 0)
x
100 200 300
problem.
y
200
(0, 140)
100
(210, 0)
x
100 200 300
problem.
y
200
(0, 100)
100
(300, 0)
x
100 200 300
problem.
– Graph the intersection of the solutions to the
inequalities, yielding the feasible set S.
(Note that the feasible
y set S is unbounded)
200
S
100
x
100 200 300
• Next, find the vertices of the feasible set S.
– The vertices are A(0, 240), B(30, 120), C(120, 60),
and D(300, 0).
y
A(0, 240)
200
S
B(30, 120)
100
C(120, 60)
D(300, 0)
x
100 200 300
• Now, find the values of C at the vertices and
tabulate them:
Vertex C = 6x + 8y
A(0, 240) 1920
y
B(30, 120) 1140
C(120, 60) 1200
A(0, 240)
D(300, 0) 1800
200
S
B(30, 120)
100
C(120, 60)
D(300, 0)
x
100 200 300
• Finally, identify the vertex with the lowest value
for C:
– We can see that C is minimized at the vertex
Vertex B(30,
C = 6x + 8y
120) and has a value of 1140.A(0, 240) 1920
y
B(30, 120) 1140
C(120, 60) 1200
A(0, 240)
D(300, 0) 1800
200
S
B(30, 120)
100
C(120, 60)
D(300, 0)
x
100 200 300
• Finally, identify the vertex with the lowest
value for C:
– We can see that C is minimized at the vertex B(30,
120) and has a value of 1140.
– Recalling what the symbols x, y, and C represent,
we conclude that the individual should purchase
30 brand-A pills and 120 brand-B pills at a
minimum cost of $11.40.

Chapter Nine: Time Series
“The Art of Forecasting”

Learning Objectives
⚫Describe what forecasting is
⚫Explain time series & its components
⚫Smooth a data series
⚫Moving average
⚫Exponential smoothing
⚫Forecast using trend models

What Is Forecasting?
• Process of predicting a
future event
• Underlying basis of
all business decisions
– Production
– Inventory
– Personnel
– Facilities

Forecasting Approaches
Qualitative Methods Quantitative Methods
• Used when situation
is vague & little data
exist
– New products
– New technology
• e.g., forecasting sales
on Internet

Forecasting Approaches
Qualitative Methods Quantitative Methods
• Used when situation is • Used when situation is
vague & little data exist ‘stable’ & historical data
– New products exist
– New technology – Existing products
• Involve intuition, – Current technology
experience • Involve mathematical
• e.g., forecasting sales on techniques
Internet • e.g., forecasting sales of
color televisions
Quantitative Forecasting
• Select several forecasting methods
• ‘Forecast’ the past
• Evaluate forecasts
• Select best method
• Forecast the future
• Monitor continuously forecast accuracy

Quantitative Forecasting Methods

Quantitative
Forecasting

Quantitative
Forecasting
Time Series
Models

Quantitative
Forecasting
Time Series Causal

Models Models

Quantitative
Forecasting
Time Series Causal

Models Models
Moving Exponential Trend

Average Smoothing Models

Quantitative
Forecasting
Time Series Causal

Models Models

Regression

Quantitative
Forecasting
Time Series Causal

Models Models

Regression

What is a Time Series?
• Set of evenly spaced numerical data
– Obtained by observing response variable at
regular time periods
• Forecast based only on past values
– Assumes that factors influencing past, present, &
future will continue
• Example
– Year: 1995 1996 1997 1998 1999
– Sales: 78.7 63.5 89.7 93.2 92.1

Time Series vs.
Cross Sectional Data
Time series data is a sequence of

observations
⚫collected from a process

⚫with equally spaced periods of time.

Time Series vs.
Contrary to restrictions placed on cross-
sectional data, the major purpose of
forecasting with time series is to
extrapolate beyond the range of
the explanatory
variables.

Time Series vs.
Time series is
dynamic, it does
change over
time.

Time Series vs.
When working with time series data, it is

paramount that the data is plotted so
the researcher can view the data.

Time Series Components

Trend

Trend Cyclical

Trend Cyclical
Seasonal

Trend Cyclical
Seasonal Irregular

Trend Component
• determined, overall upward or downward
pattern
• Due to population, technology etc.
• Several years duration
Response
Mo., Qtr., Yr.

© 1984-1994 T/Maker Co.
Trend Component
• Overall Upward or Downward Movement
• Data Taken Over a Period of Years
war d trend
Sales Up
Time
Cyclical Component
• Repeating up & down movements
• Due to interactions of factors influencing
economy
• Usually 2-10 years duration
Cycle
Response
Mo., Qtr., Yr.

Cyclical Component
• Upward or Downward Swings
• May Vary in Length
• Usually Lasts 2 - 10 Years
Sales Cycle
Time
Seasonal Component
• Regular pattern of up & down fluctuations
• Due to weather, customs etc.
• Occurs within one year
Summer
Response
Mo., Qtr.
Seasonal Component
• Upward or Downward Swings
• Regular Patterns
• Observed Within One Year
Sales Winter
Time (Monthly or Quarterly)

Irregular Component
• Erratic, unsystematic, ‘residual’ fluctuations
• Due to random variation or unforeseen events
– Union strike
– War
• Short duration &
nonrepeating

Random or Irregular Component
• Erratic, Nonsystematic, Random, ‘Residual’

Fluctuations
• Due to Random Variations of
– Nature
– Accidents
• Short Duration and Non-repeating

Time Series Forecasting

Time
Series

Time
Series
Trend?

Time
Series
No
Smoothing
Trend?
Methods

Time
Series
No Yes
Smoothing Trend
Trend?
Methods Models

Time
Series
No Yes
Smoothing Trend
Trend?
Methods Models
Moving Exponential
Average Smoothing


Plotting Time Series Data

Moving Average Method


• Series of arithmetic means
• Used only for smoothing
–Provides overall impression
of data over time

⚫Series of arithmetic means
⚫Used only for smoothing
⚫Provides overall impression of data
over time
Used for elementary forecasting

Moving Average Graph
Sales
Actual
Year
Moving Average
[An Example]
You work for rice “ KIGORI”. You want to

smooth random fluctuations using a 3-
period moving average.
1995 20,000
1996 24,000
1997 22,000
1998 26,000
1999 25,000
Moving Average
[Solution]
YearSales MA(3) in 1,000

1995 20,000 NA
1996 24,000 (20+24+22)/3 = 22000
1997 22,000 (24+22+26)/3 = 24000
1998 26,000 (22+26+25)/3 = 24000
1999 25,000 NA

Moving Average
Year Response Moving Ave
1994 2 NA
1995 5 3 Sales
1996 2 3
8
1997 2 3.67
6
1998 7 5
4
1999 6 NA
2
0
94 95 96 97 98 99

Exponential Smoothing Method


Exponential Smoothing Method
• Form of weighted moving average
– Weights decline exponentially
– Most recent data weighted most
• Requires smoothing constant (W)
– Ranges from 0 to 1
– Subjectively chosen
• Involves little record keeping of past data

Exponential Smoothing
[An Example]
You’re organizing a Kwanza meeting. You want

to forecast attendance for 1998 using
exponential smoothing
(α = .20). Past attendance (00) is:
1995 4
1996 6
1997 5
1998 3
1999 7
Ei = W·Yi + (1 - W)·Ei-1
Exponential Smoothing

Exponential Smoothing [Graph]
Attendance
Actual
Year
Forecast Effect of Smoothing Coefficient
(W)
Yi+1 = W·Yi + W·(1-W)·Yi-1 + W·(1-W)2·Yi-2

+...

Linear Time-Series Forecasting Model


Linear Time-Series Forecasting Model
• Used for forecasting trend

• Relationship between response variable Y &
time X is a linear function
• Coded X values used often
– Year X: 1995 1996 1997 1998 1999
– Coded year: 0 1 2 3 4
– Sales Y: 78.7 63.5 89.7 93.2 92.1

Linear Time-Series Model
b1 > 0
b1 < 0

Linear Time-Series Model
[An Example]
You’re a marketing analyst for^ Inyange Milk.

Using coded years, you find Yi = .6 + .7Xi.
2008 1
2009 1
2010 2
2011 2
2012 4
Forecast 2013 sales.

Linear Time-Series [Example]
Year Coded Year Sales (Units)
2012 0 1
2013 1 1
2014 2 2
2015 3 2
2016 4 4
2017 5 ?
2017 forecast sales: Yi =

^ .6 + .7·(5) = 4.1
The equation would be different if ‘Year’ used.

Trend Analysis

Quadratic Time-Series Forecasting Model



time X is a quadratic function
• Coded years used


time X is a quadratic function
• Coded years used
• Quadratic model

Quadratic Time-Series Model
Relationships
b11 > 0 b11 > 0
b11 < 0 b11 < 0

Quadratic Trend Model
Year Coded Sales
94 0 2
95 1 5
96 2 2
97 3 2
98 4 7
99 5 6
Excel Output

Exponential Time-Series Model


Exponential Time-Series
Forecasting Model
• Relationship is an exponential function
• Series increases (decreases) at increasing
(decreasing) rate

Exponential Time-Series Forecasting Model
▣ Used for forecasting trend

▣ Relationship is an exponential function
▣ Series increases (decreases) at increasing
(decreasing) rate

Exponential Time-Series
Model Relationships
b1 > 1
0 < b1 < 1

Exponential Weight [Example Graph]
Sales
8
Data
6
0 Smoothed
94 95 96 97 98 99 Year

Exponential Trend Model
or
Year Coded Sales
94 0 2
95 1 5
96 2 2
97 3 2 Excel Output of Values in logs
98 4 7
99 5 6

Autoregressive Modeling


Autoregressive Modeling
▣ Used for forecasting trend
▣ Like regression model
◼Independent variables are lagged response
variables Yi-1, Yi-2, Yi-3 etc.
▣ Assumes data are correlated with past data
values
◼1st Order: Correlated with prior period
▣ Estimate with ordinary least squares

Time Series Data Plot

Auto-correlation Plot

Autoregressive Model [An Example]
The Office Concept Corp. has acquired a number of office units (in thousands of square
feet) over the last 8 years. Develop the 2nd order Autoregressive models.
Year Units
92 4
93 3
94 2
95 3
96 2
97 2
98 4
99 6

Autoregressive Model [Example
Solution]
•Develop the 2nd order table
Year Yi Yi-1 Yi-2
•Use Excel to run a regression 92 4 --- ---
model 93 3 4 ---
94 2 3 4
95 3 2 3
96 2 3 2
97 2 2 3
Excel Output 98 4 2 2
99 6 4 2

Thanks for your participation !!!!!!!

Business Statistics Bsa Day Program All Chapters

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Business Statistics Bsa Day Program All Chapters

Uploaded by

Copyright:

Available Formats

Chapter One: An Introduction to Business

* Prepared by Silas BAHIZI, PhD Candidate 2

* Prepared by Silas BAHIZI, PhD Candidate 3

* Prepared by Silas BAHIZI, PhD Candidate 7

* Prepared by Silas BAHIZI, PhD Candidate 10

* Prepared by Silas BAHIZI, PhD Candidate 11

* Prepared by Silas BAHIZI, PhD Candidate 12

• A qualitative variable is a variable with qualitative

• A quantitative variable is a variable with

* Prepared by Silas BAHIZI, PhD Candidate 14

• Discrete or ungrouped data

* Prepared by Silas BAHIZI, PhD Candidate 15

* Prepared by Silas BAHIZI, PhD Candidate 16

* Prepared by Silas BAHIZI, PhD Candidate 17

* Prepared by Silas BAHIZI, PhD Candidate 18

* Prepared by Silas BAHIZI, PhD Candidate 19

* Prepared by Silas BAHIZI, PhD Candidate 20

Descriptive Statistics is tabular, graphical, and

* Prepared by Silas BAHIZI, PhD Candidate 21

Graphical Summary (Histogram)

* Prepared by Silas BAHIZI, PhD Candidate 23

* Prepared by Silas BAHIZI, PhD Candidate 24

* Prepared by Silas BAHIZI, PhD Candidate 25

* Prepared by Silas BAHIZI, PhD Candidate 26

* Prepared by Silas BAHIZI, PhD Candidate 28

* Prepared by Silas BAHIZI, PhD Candidate 29

* Prepared by Silas BAHIZI, PhD Candidate 33

Find out the mean of the following scores:

* Prepared by Silas BAHIZI, PhD Candidate 34

* Prepared by Silas BAHIZI, PhD Candidate 35

* Prepared by Silas BAHIZI, PhD Candidate 37

Cumulative frequency : it the total of all

* Prepared by Silas BAHIZI, PhD Candidate 38

* Prepared by Silas BAHIZI, PhD Candidate 39

* Prepared by Silas BAHIZI, PhD Candidate 40

* Prepared by Silas BAHIZI, PhD Candidate 41

Relative Frequency 0.04 0.28 0.4 0.24 0.04

Cumulative Relative 0.04 0.32 0.72 0.96 1

* Prepared by Silas BAHIZI, PhD Candidate 42

• This n Scores can algebraically be denote as

* Prepared by Silas BAHIZI, PhD Candidate 45

* Prepared by Silas BAHIZI, PhD Candidate 47

The degree of which numerical data tend to

* Prepared by Silas BAHIZI, PhD Candidate 48

* Prepared by Silas BAHIZI, PhD Candidate 49

Average of absolute differences

In the Mean Deviation , the arithmetic mean

If N X1,X2,X3………..Xn, occur with frequencies, f1,f2,f3……….fn,

* Prepared by Silas BAHIZI, PhD Candidate 54

You are required to show all calculations used to

* Prepared by Silas BAHIZI, PhD Candidate 56

* Prepared by Silas BAHIZI, PhD Candidate 58

* Prepared by Silas BAHIZI, PhD Candidate 60

* Prepared by Silas BAHIZI, PhD Candidate 61

* Prepared by Silas BAHIZI, PhD Candidate 63

* Prepared by Silas BAHIZI, PhD Candidate 69

Note: i(∑ f+1) = Position of the ith percentile

* Prepared by Silas BAHIZI, PhD Candidate 74

* Prepared by Silas BAHIZI, PhD Candidate 75

* Prepared by Silas BAHIZI, PhD Candidate 76

* Prepared by Silas BAHIZI, PhD Candidate 77

Lo+ C i(∑ f+1)- C.Fb