Professional Documents
Culture Documents
Statistics
• Definition of business statistics
• Basic Vocabulary Terms
• Limitation of Statistics
• Uses of Statistics
• Type of data
• Populations and Samples
*
Prepared by Silas BAHIZI, PhD Candidate 8
Types of Statistics
Statistics may be divided into two categories, i.e.
Descriptive and Inferential statistics.
When analyzing data, for example, the marks
achieved by 100 students for a piece of
coursework, it is possible to use both
descriptive and inferential statistics in your
analysis of their marks. Typically, in most
research conducted on groups of people, you
will use both descriptive and inferential
statistics to analyze your results and draw
* conclusions. Prepared by Silas BAHIZI, PhD Candidate 9
Descriptive Statistics
Descriptive statistics is the term given to the
analysis of data that helps describe, show or
summarize data in a meaningful way such
that, for example, patterns might emerge from
the data. Descriptive statistics do not,
however, allow us to make conclusions
beyond the data we have analyzed or reach
conclusions regarding any hypotheses we
might have made.
Population
Sample
2
1
0
8
y
6
4
2
Parts
50 60 70 80 90 100 110 Cost
($)
* Prepared by Silas BAHIZI, PhD Candidate 22
Normal Distribution and Standard
Deviation
Frequency (f) 1 1 1 1 2 5 10 3
The most common numerical descriptive statistic is the average (or mean). For example
a.1,2,3,4,5,6,7,8,9
b.2,4,6,8,7,9
The array organises the data but does not reduce the
quantity of data. Therefore array is most useful as
quick way of organising few observation but becomes
less useful when dealing with a great number
observations
The above data however can still arranged in such a
way the experiment produces 5 different numerical
values given: 0;1;2;3;4
* Prepared by Silas BAHIZI, PhD Candidate 36
Frequency Distribution
Frequency distribution showing number of head
obtained in four tosses of coin in an
experiment done 25 times:
Observation (X) 0 1 2 3 4
Frequencies (f) 1 7 10 6 1
Cumulative frequency
Observation (X) 0 1 2 3 4
Frequencies (f) 1 7 10 6 1
Cumulative 1 8 18 24 25
Frequency
Observation (X) 0 1 2 3 4
Frequencies (f) 1 7 10 6 1
Cumulative 1 8 18 24 25
Frequency
Relative Frequency 1/25= 7/25 = 10/25 = 0.4 6/25=0.24 1/25 =
0.04 0.28 0.04
Observation (X) 0 1 2 3 4
Frequencies (f) 1 7 10 6 1
Cumulative 1 8 18 24 25
Frequency
s ∑(X-Its mean)2
N
S is the root mean square of the deviations
from the mean or as it is sometimes called the
Root mean square deviation.
* Prepared by Silas BAHIZI, PhD Candidate 55
Standard Deviation
Example 3: Consider a Population consisting of the
following eight values:
2,4,4,4,5,5,7,9
These eight data points have the mean of 5 and the
Standard deviation will be 2.
Population δ δ2
Sample S S2
Solution
Step 1:Arranging the data in ascending order:
2210,2255,2260,2280,2350,2390,2420,2440,245
0,2550,2630,2825.
Note: n= 12 When n is even number the median
* Prepared by Silas BAHIZI, PhD Candidate 66
Solution
Step2:
Position of ith percentile = (i/100)n
Hence, position of 50th percentile = (50/100)12 =
6.
Position of 85th percentile = (85/100)12 = 10.2
Step3:
Since the position of the 50th percentile is an
integer (6),
Then the 50th percentile is the average of 6th and
* 7th data values.
Prepared by Silas BAHIZI, PhD Candidate 67
Solution
i.e 50th percentile, P50= (2390+2420)/2 = 2405 =
Median
For 85th percentile, since the position (10.2) is
not an integer, we round up. Hence the
position of the 85th percentile is the next
integer greater than10.2, the 11th position.
Therefore, the 85th percentile corresponds to
the 11th data value.
Thus , the 85th percentile ,P85 = 2630
(b) The position of ith quartile (Q1) is (25i/100)n
* Prepared by Silas BAHIZI, PhD Candidate 68
Solution
Note: All position are integers. Therefore:
Q1= (2260+2280)/2= 2270
Q2= (2390+2420)/2 = 2405
Q3= (2450+2550)/2 =2500
10-14 5
15-19 9
20-24 12
25-29 18
30-34 25
35-39 15
40-44 10
45-49 6
10- 14 9.5-14.5 12 5 5
0.05 0.05
15-19 14.5-19.5 17 9 14
0.09 0.14
20-24 19.5-24.5 22 12 26
0.12 0.26
25-29 24.5-29.5 27 18 44
0.18 0.44
30-34 29.5-34.5 32 25 69
0.25 0.69
35-39 34.5-39.5 37 15 84
0.15 0.84
40-44 39.5-44.5 42 10 94
0.1 0.94
45-49 44.5-49.5 47 6 100
0.06 1
Frequency (f) 1 2 5 8 6 3 5
y y
x x
y y
x x
* Prepared by Silas BAHIZI, PhD Candidate 101
Scatter Plot Examples
(continued
)
Strong relationships Weak relationships
y y
x x
y y
x x
* Prepared by Silas BAHIZI, PhD Candidate 102
Scatter Plot Examples
(continued
No relationship )
x
* Prepared by Silas BAHIZI, PhD Candidate 103
Correlation Coefficient
(continued
)
• The population correlation coefficient ρ
(rho) measures the strength of the
association between the variables
• The sample correlation coefficient r is an
estimate of ρ and is used to measure the
strength of the linear relationship in the
sample observations
x x x
r = -1 r = -.6 r=0
y y
x x
* r = +.3Prepared by Silas BAHIZI, PhD Candidater = +1 106
Examples of Approximate
R2 Values
y
R2 = 1
x
R = +1
2
0 < R2 < 1
x
* Prepared by Silas BAHIZI, PhD Candidate 108
Examples of Approximate
R2 Values
R2 = 0
Variables:
X = Independent Variable
Y = Dependent Variable
Parameters:
β0 = Y-Intercept
β1 = Slope
ε = Error
* Prepared by Silas BAHIZI, PhD Candidate 111
Simple Linear Regression Model
rise
ru
n =slope (=rise/run)
=y-intercept
Y estimate = Ao + A1 X
0-------------------------+----------------------------+1
0 .5
event never event and "not event" always
will occur event are likely will occur
to occur
Females 460 40
Standard deviation,
Or σ = Square root of Σ(x-μ )2P(x)
Prob. p Success win Work Good Pass Open Odd Yes Presen
t
Prob. Failure lose Defecti Bad Fail Shut Even No Absent
q= (1-p) ve
μx =E(x) = n * p
* Prepared Silas BAHIZI, PhD Candidate 220
For example, if we toss a coin
40 times, then the mean or
expected value would be,
40 * 0.5 = 20
x is 13 & λ is 9.
Where,
=
• In the above example, there is some new mathematical
notation.
*
Prepared by Silas BAHIZI, PhD Candidate 283
• However, is this finding likely
to hold true in repeated samples?
• What if we drew 6 different people from CBE?
• A one-sample t-test will help answer this question.
• It will tell us if our findings are ‘significant’, or in
other words, likely to be repeated if we took another
sample.
Our sample has ‘n = 6’ people, so the degrees of freedom for this t-test are:
df = n – 1 = 5
This degrees of freedom figure will be used in our test of significance.
Not so fast.
What does this really mean?
* Prepared by Silas BAHIZI, PhD Candidate 294
• We assume the null hypothesis when
making this test.
• We assume that the population
mean is 100, and therefore we will
most often compute a t = 0.
• Sometimes the computed ‘t’ might
be a bit higher and sometimes a bit
lower.
* Prepared by Silas BAHIZI, PhD Candidate 295
What does the ‘critical value’ tell us?
H0: μ1 = μ2
H1: μ1 ≠ μ2
= 8.7 = 5.7
S1 = 0.3 S2 = 1.1
n1 = 89 n2 = 55
• H1: µ1≠ µ2
– The research hypothesis contradicts the H0
and declares there is a significant
*
difference between the populations.
Prepared by Silas BAHIZI, PhD Candidate 309
Step 3 Select the Sampling Distribution and
Establish the Critical Region
• Z (critical) = ± 1.96
• Solve for Z:
When α = .05, then .025 of the area is distributed on either side of the
curve in area (C )
The .95 in the middle section represents no significant difference
between the two populations.
The cut-off between the middle section and +/- .025 is represented by a
Z-value of +/- 1.96.
Prepared by Silas BAHIZI, PhD Candidate 313*
Chapter Seven: Index Numbers
Definition:
It is indicator of average percentage change in
a series of figures where one figure (called
the base ) is assigned an arbitrary of 100, and
other figures are adjusted in proportion to
the base.
• Price relative
– The price relative of an item is the ratio of
the price of the item in the current period to
the price of the same item in the base
period
– The formal definition is:
Where
Σpn = the sum of the prices in the current
period Prepared by Silas BAHIZI, PhD Candidate 20-326
Composite index numbers
• Simple aggregate index
– Even though the simple aggregate index is
easy to calculate, it has serious disadvantages:
1. An item with a relatively large price can dominate the index
2. If prices are quoted for different quantities, the simple
aggregate index will yield a different answer
3. It does not take into account the quantity of each item sold
– Disadvantage 2 is perhaps the worst feature of
this index, since it makes it possible, to a
certain extent, to manipulate the value of the
index
where
k = the number of items
pn = the price of an item in the current period
Prepared by Silas BAHIZI, PhD Candidate 20-328
Weighted index numbers
• The use of a weighted index number or weighted index allows greater
importance to be attached to some items
• Information other than simply the change in price over time can then be
used, and can include such factors as quantity sold or quantity consumed
for each item
• Laspeyres index
– The Laspeyres index is also known as the average
of weighted relative prices
– In this case, the weights used are the quantities of
each item bought in the base period
Where:
qo = the quantity bought (or sold) in the base period
pn = price in current period
po = price in base period
– Thus, the Laspeyres index measures the
relative change in the cost of purchasing
these items in the quantities specified in the
Prepared by Silas BAHIZI, PhD Candidate 20-330
Weighted index numbers
• Paasche index
– The Paasche index uses the consumption in the
current period
– It measures the change in the cost of purchasing
items, in terms of quantities relating to the
current period
– The formal definition of the Paasche index is:
Where:
pn = the price in the current period
po = the price in the basebyperiod
Prepared Silas BAHIZI, PhD Candidate 20-331
Weighted index numbers
• Comparison of the Laspeyres and Paasche indexes
– The Laspeyres index measures the ratio of
expenditures on base year quantities in the
current year to expenditures on those quantities
in the base year
– The Paasche index measures the ratio of
expenditures on current year quantities in the
current year to expenditures on those quantities
in the base year
– Since the Laspeyres index uses base period
weights, it may overestimate the rise in the cost of
living (because people may have reduced their
*
consumption of items that have become
Prepared by Silas BAHIZI, PhD Candidate 332
Weighted index numbers
• Comparison of the Laspeyres and Paasche indexes (cont…)
– Since the Paasche index uses current period
weights, it may underestimate the rise in the cost
of living
– The Laspeyres index is usually larger than the
Paasche index
– With the Paasche index it is difficult to make year-
to-year comparisons, since every year a new set of
weights is used
– The Paasche index requires that a new set of
weights be obtained each year, and this
information can be expensive to obtain
* – Because of thePrepared
lastby2Silaspoints above, the Laspeyres 333
BAHIZI, PhD Candidate
Weighted index numbers
• Fisher’s ideal index
– Fisher’s ideal index is the geometric mean of
the Laspeyres and Paasche indexes
– Although it has little use in practice, it does
demonstrate the many different types of
index that can be used
– The formal definition is:
• One of the uses for price indexes is to measure the changes in the
purchasing power of the dollar
• This is known as deflation
• In order to eliminate the effect of inflation and obtain a clear
picture of the ‘real’ change, the values must be deflated
• For example, to deflate an actual salary and express it in terms of
‘real’ salary (of the base year), use:
1.Price Index:(Σp1q0/
Σp0q0 )*100
2.Quantity Index: :(Σq1p0/
Σq0p0 )*100
* Prepared by Silas BAHIZI, PhD Candidate 356
Paasche price and quantity Index
Solution
• The intersection of the solution regions of the
y
two inequalities represents the solution to the
4x + 3y =
system: 12
4
3 4x + 3y ≥
12
2
x
–1 1 2 3
* Prepared by Silas BAHIZI, PhD Candidate 367
Example
• Determine the solution set for the system
Solution
• The intersection of the solution regions of the
y
two inequalities represents the solution to the
system: 4
x–y≤ x–y=0
3
0
2
x
–1 1 2 3
* Prepared by Silas BAHIZI, PhD Candidate 368
Example
• Determine the solution set for the system
Solution
• The intersection of the solution regions of the
y
two inequalities represents the solution to the
4x + 3y =
system: 12
4
x–y=0
3
x
–1 1 2 3
* Prepared by Silas BAHIZI, PhD Candidate 369
Bounded and Unbounded Sets
4x + 3y =
4
12
3
x–y=0
2
x
–1 1 2 3
Maximize
Subject to
considering only
y positive values for x and y:
200 (0, 180)
100
(90, 0)
x
100 200 300
considering only
y positive values for x and y:
200
(0, 100)
100
(300, 0)
x
100 200 300
100
S
x
100 200 300
y
200
D(0, 100)
100 C(48, 84)
S
A(0, 0) B(90, 0)
x
100 200 300
Vertex P = x + 1.2 y
A(0, 0) 0
y
B(90, 0) 90
200 C(48, 84) 148.8
D(0, 100) D(0, 100) 120
100 C(48, 84)
S
A(0, 0) B(90, 0)
x
100 200 300
S
A(0, 0) B(90, 0)
x
100 200 300
y
considering only positive values for x and y:
(0, 240)
200
100
(60, 0)
x
100 200 300
* Prepared by Silas BAHIZI, PhD Candidate 402
Applied Example 2: A Nutrition Problem
• We first graph the feasible set S for the
problem.
– Graph the solution for the inequality
y
considering only positive values for x and y:
200
(0, 140)
100
(210, 0)
x
100 200 300
* Prepared by Silas BAHIZI, PhD Candidate 403
Applied Example 2: A Nutrition Problem
• We first graph the feasible set S for the
problem.
– Graph the solution for the inequality
y
considering only positive values for x and y:
200
(0, 100)
100
(300, 0)
x
100 200 300
* Prepared by Silas BAHIZI, PhD Candidate 404
Applied Example 2: A Nutrition Problem
• We first graph the feasible set S for the
problem.
– Graph the intersection of the solutions to the
inequalities, yielding the feasible set S.
(Note that the feasible
y set S is unbounded)
200
S
100
x
100 200 300
* Prepared by Silas BAHIZI, PhD Candidate 405
Applied Example 2: A Nutrition Problem
• Next, find the vertices of the feasible set S.
– The vertices are A(0, 240), B(30, 120), C(120, 60),
and D(300, 0).
y
A(0, 240)
200
S
B(30, 120)
100
C(120, 60)
D(300, 0)
x
100 200 300
* Prepared by Silas BAHIZI, PhD Candidate 406
Applied Example 2: A Nutrition Problem
• Now, find the values of C at the vertices and
tabulate them:
Vertex C = 6x + 8y
A(0, 240) 1920
y
B(30, 120) 1140
C(120, 60) 1200
A(0, 240)
D(300, 0) 1800
200
S
B(30, 120)
100
C(120, 60)
D(300, 0)
x
100 200 300
* Prepared by Silas BAHIZI, PhD Candidate 407
Applied Example 2: A Nutrition Problem
• Finally, identify the vertex with the lowest value
for C:
– We can see that C is minimized at the vertex
Vertex B(30,
C = 6x + 8y
120) and has a value of 1140.A(0, 240) 1920
y
B(30, 120) 1140
C(120, 60) 1200
A(0, 240)
D(300, 0) 1800
200
S
B(30, 120)
100
C(120, 60)
D(300, 0)
x
100 200 300
* Prepared by Silas BAHIZI, PhD Candidate 408
Applied Example 2: A Nutrition Problem
• Finally, identify the vertex with the lowest
value for C:
– We can see that C is minimized at the vertex B(30,
120) and has a value of 1140.
– Recalling what the symbols x, y, and C represent,
we conclude that the individual should purchase
30 brand-A pills and 120 brand-B pills at a
minimum cost of $11.40.
Quantitative
Forecasting
Quantitative
Forecasting
Time Series
Models
Quantitative
Forecasting
Quantitative
Forecasting
Quantitative
Forecasting
Quantitative
Forecasting
Time series is
dynamic, it does
change over
time.
Trend
Trend Cyclical
Trend Cyclical
Seasonal
Trend Cyclical
Seasonal Irregular
Time
* Prepared by Silas BAHIZI, PhD Candidate 434
Cyclical Component
• Repeating up & down movements
• Due to interactions of factors influencing
economy
• Usually 2-10 years duration
Cycle
Response
Time
* Prepared by Silas BAHIZI, PhD Candidate 436
Seasonal Component
• Regular pattern of up & down fluctuations
• Due to weather, customs etc.
• Occurs within one year
Summer
Response
Mo., Qtr.
* Prepared by Silas BAHIZI, PhD Candidate 437
Seasonal Component
• Upward or Downward Swings
• Regular Patterns
• Observed Within One Year
Sales Winter
Trend?
No
Smoothing
Trend?
Methods
No Yes
Smoothing Trend
Trend?
Methods Models
No Yes
Smoothing Trend
Trend?
Methods Models
Moving Exponential
Average Smoothing
Year
* Prepared by Silas BAHIZI, PhD Candidate 453
Moving Average
[An Example]
94 95 96 97 98 99
Year
* Prepared by Silas BAHIZI, PhD Candidate 462
Forecast Effect of Smoothing Coefficient
(W)
b1 > 0
b1 < 0
b1 > 1
0 < b1 < 1
0 Smoothed
94 95 96 97 98 99 Year
The Office Concept Corp. has acquired a number of office units (in thousands of square
feet) over the last 8 years. Develop the 2nd order Autoregressive models.
Year Units
92 4
93 3
94 2
95 3
96 2
97 2
98 4
99 6