You are on page 1of 146

Index Numbers

Introduction
• Index numbers are commonly used in economics, business, policy, etc
• An Index number is a summary measure that states a relative comparison between groups of
related items.
• In its simplest form, Index number is nothing more than a price relative – a percentage figure that
expresses the relationship between two numbers with one of the numbers used as the base
• Example:
• Consumer Price Index
• Index of Industrial Production
• Stock price index
• Cost of living index across cities
• Index numbers are generally constructed by Govt.
• There are also private agencies index numbers
Why Index numbers?
• To study the relative changes of a group of related items in a easy
comprehensible manner
• Example: Prices of Food items, production of food items etc
• To study the average of changes among a set of variables with respect
to a benchmark
Introduction
• For instance, IIP (2011-12=100)
• CPI for 2015-16 = 124.7 CPI 2020-21 2021-22
Year
(2012=100)
which means, compared to 2011-12, the CPI has April 54.0 126.7
increased 24.7% May 90.2 116.0
• CPI for 2019-20 = 146.3 2011-12 93.3 June 107.9 122.6

Compared with 2011-12, CPI increase is 46.3% 2012-13 102.5 July 117.9

2013-14 112.2 August 117.2


September 124.1
• IIP for May 2021 = 116.0 2014-15 118.9
October 129.6
• IIP for June 2021 =122.6 2015-16 124.7
November 126.7
2016-17 130.3
December 137.4
2017-18 135.0 January 136.6
2018-19 139.6 February 129.9
2019-20 146.3 March 145.6
Relative Prices
• Simplest form of price Index is Relative Price Year
Price per
Gallon $
Relative Price
1990=100
1990 1.30 100
1991 1.10 84.6
𝑝𝑟𝑖𝑐𝑒 𝑖𝑛 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡 1992 1.09 83.8
• 𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑃𝑟𝑖𝑐𝑒 𝑖𝑛 𝑝𝑒𝑟𝑖𝑜𝑑 𝑡 = × 100 1993 1.07 82.3
𝑝𝑟𝑖𝑐𝑒 𝑖𝑛 𝑏𝑎𝑠𝑒 𝑝𝑒𝑟𝑖𝑜𝑑 1994 1.08 83.1
1995 1.11 85.4
1996 1.22 93.8
𝑃𝑡 1997 1.20 92.3
𝑅𝑃𝑡 =
𝑃0 1998
1999
1.03
1.14
79.2
87.7
2000 1.48 113.8
2001 1.42 109.2
• Thus, the price of gasoline per gallon in 2000 was 13.8 % higher 2002 1.34 103.1
2003 1.56 120.0
than what it was in 1990 and it was 170.8 % in 2011 2004 1.85 142.3
2005 2.27 174.6
2006 2.57 197.7
2007 2.80 215.4
2008 3.25 250.0
2009 2.35 180.8
2010 2.78 213.8
2011 3.52 270.8
Example of GDP Deflator and Index

GDP Deflator and Index


Current Prices Constant Prices Current Price Constant Price
Crs Crs Deflator
Index Index
2011-12 BY 2011-12 BY
2009-10 63,66,407 76,51,078 83.2% 72.9% 87.6%

2010-11 76,34,472 83,01,235 92.0% 87.4% 95.0%

2011-12 87,36,329 87,36,329 100.0% 100.0% 100.0%

2012-13 99,44,013 92,13,017 107.9% 113.8% 105.5%

2013-14 1,12,33,522 98,01,370 114.6% 128.6% 112.2%

2014-15 1,24,67,959 1,05,27,674 118.4% 142.7% 120.5%

2015-16 1,37,71,874 1,13,69,493 121.1% 157.6% 130.1%

2016-17 1,53,91,669 1,23,08,193 125.1% 176.2% 140.9%

2017-18 1,70,90,042 1,31,44,582 130.0% 195.6% 150.5%

2018-19 1,88,86,957 1,40,03,316 134.9% 216.2% 160.3%

2019-20 2,03,51,013 1,45,69,268 139.7% 232.9% 166.8%


Aggregate Price Index
• Relative prices are used in understanding the price changes in
individual items, but can not be used in the general price change for a
group of items

• For instance, index numbers to measure changes in the overall cost


of living, which may include food, housing, clothing, transportation,
medical care, etc
• In such situations aggregate price index is used
Aggregate Price Index
• For example consider four items
of expenditure in automotive Automotive Operating Expenses

operation and construction of 1990 2011


Gallon of
index between 1990 and 2011 gasoline 1.30 3.52

• Unweighted Index quart of oil 2.10 6.25


σ 𝑃𝑡
𝐼𝑡 = × 100 Tire 130.00 145.00
σ 𝑃0
Insuance policy 820.00 1040.00
• It = 125.31
953.40 1194.77
• Limitations of Unwt Index
• Influence of size of per unit price
Aggregate Price Index – Relative Price
• Unweighted Average of Relative
Automotive Operating Expenses
Price Index
𝑃𝑡 1990 2011 RP
σ
𝑃0 𝑡
𝐼𝑡 = × 100 Gallon of
𝑛 gasoline 1.30 3.52 271%

• It = 202% quart of oil 2.10 6.25 298%

• Limitations of Unwt average of Tire 130.00 145.00 112%

Relative Prices Index Insuance policy 820.00 1040.00 127%

• Percentage changes in opposite 202%


direction net off
Weighted Index
• Each item in the group be weighted
according to its importance Automotive Operating Expenses
• Quantity of usage is generally used as Typical Usage
1990 2011
per year
weight
Gallon of gasoline 1000 1.30 3.52
σ 𝑃𝑡 𝑄
𝐼𝑡 = × 100 quart of oil 15 2.10 6.25
σ 𝑃0 𝑄
Tire 2 130.00 145.00
Insuance policy 1 820.00 1040.00
• Weighted Index = 205.0 2411.5 4943.75
Laspeyres Index
• Weights are fixed at the levels believed to be representative of typical
usage
• Once established, they woulred remain constant for all p periods of
time the index is in use
• Generally, usages in base year are considered as weighs
𝑄 = 𝑄0
• Such index is called Laspeyres Index

σ 𝑃𝑡 𝑄0
𝐼𝑡 = × 100
σ 𝑃0 𝑄0
Paasche Index
• Instead of base-year weights, if the index is calculated over current
year weighys, then the index is called Paasche Index

σ 𝑃𝑡 𝑄𝑡
𝐼𝑡 = × 100
σ 𝑃0 𝑄𝑡

• Paasche Index has the advantage of using current usage, but it has
some problems as well
• Weights are to be obtained every new year, which is more expensive
• Every time new usaage data comes in, the w hole sief index numbers (for the
past years) are to be recalculated
Aggregate Index from Relative Prices
• Many times, individual relative prices are
available
• One can construct aggregate Index from Automotive Operating Expenses Index - Relative Price Method
relative prices Typical Usage
Relative
𝑃 1990 2011 Relative Pirces X
σ 𝑡𝑤 Prices
per year
Weights weights
𝑃0
𝐼𝑡 = × 100 Gallon of
σ𝑤
gasoline 1.30 3.52 2.71 1000 1300 3520
quart of oil 2.10 6.25 2.98 15 31.5 93.75

• Where w is the weight of each item in the Tire 130.00 145.00 1.12 2 260 290
Insuance policy 820.00 1040.00 1.27 1 820 1040
inedex 2411.5 4943.75

• Weight is usally the value if the item in base


year

𝑤 = 𝑃0 𝑄0
Quantity Index Numbers
• Just as price index numbers, quantity index numbers also can be
constructed
• The only difference is that the role of P and Q change.
• Q becomes the variable of interest, whereas P would be the weight
• Laspeyres Quantity Index
σ 𝑄𝑡 𝑃0
𝐼𝑡 = × 100
σ 𝑄0 𝑃0

• Paasche Quantity Index


σ 𝑄𝑡 𝑃𝑡
𝐼𝑡 = × 100
σ 𝑄0 𝑃𝑡
Quantity Index
Q Q P P
2010 2011 2010 2011
A 10 20 5 5
B 45 25 4 8
C 40 60 2 6
D 25 40 3 4
Fisher ‘Ideal’ Index Number
• Fisher ideal index number is defined as geometric mean of Laspeyr’s
index number and Paasche’s Index number

σ 𝑃𝑡 𝑄0 σ 𝑃𝑡 𝑄𝑡
𝐼𝑡 = × × 100
σ 𝑃0 𝑄0 σ 𝑃0 𝑄𝑡
Exercise
A security analyst wishes to 2019 2020 2021
construct an index to reflect Stock P0 Q0 P1 Q1 P2 Q2
the changes in the prices of A 20 100 30 200 25 200
five stocks in a certain
B 40 200 35 150 40 100
portfolio. Average prices and
quantity of his purchase C 50 150 70 250 60 100
during the year are given in D 35 300 40 250 45 200
the table E 20 200 15 400 25 300

• Construct an unweighted aggregates index using 2091 as a base price What are the
disadvantages of this type of index?
• Construct unweighted relative price aggregate Index using 2019 as base. Discuss what are
the disadvangates.
• Construct a Laspeyres index using 2091 as base year. What are the disadvantages of this
type of index?
• Construct a Paasche index using 2019 as base year. What are the disadvantages of this type
of index?
• Construct a Fishers Ideal Index using 2019 as base year
Exercise
• A small electric fan company produces household fans.
The Average unit selling price and number of units sold in Model 2019 2020
2019 and 2020 are given. Units Units
• Calculate the price Index of fan for 2020 on base year Price
Sold
price
sold
2019. Use the weighted arithmetic mean of relative price
High-end 3560 20 3750 30
method with base period weights
Economy 2500 45 2250 42

Budget 1000 60 1090 55


Deflating a Series by Price Index
• Usually, variables such as wage rate, salary,
Hourly Deflated H
inventory, etc are given in nominal currency Wages ($)
CPI
wages
values 2007 23.12 207.3 11.15
2008 23.98 215.3 11.14
• The actual value of such rates can be arrived 2009 24.45 214.5 11.40
only after deflating the numbers with 2010 24.91 218.1 11.42
appropriate price index 2011 25.44 224.9 11.31

• Example of hourly wages of electricians and


consumer price is given 26 11.45
25.5 11.40
11.35

deflated hourly wage rate


25
11.30
24.5
11.25
24
11.20

Hourly wage rate


23.5
11.15
23
11.10
22.5 11.05
22 11.00
21.5 10.95
2007 2008 2009 2010 2011
Axis Title
Deflating a Series by Price Index
• Example of Daily wage rate of Rural Labour in
India
350 198
Daily Wage 196
Deflated H 300
Rate of Ru CPI
wages

deflated daily wage rate


194
Lab (Rs) 250
192
2014-15 224.6 119 188 200
190

Daily wage rate


2015-16 236.9 126 188 150
188
2016-17 252.6 132 191 100
186
2017-18 267.1 137 195
50 184
2018-19 277.4 141 196
0 182
2019-20 286.6 147 195 2014-15 2015-16 2016-17 2017-18 2018-19 2019-20
Shifting base and Splicing
CPI Reworked series
• Sometimes, it becomes (Base : 2001 with base 2011-12
= 100 for IW) = 100
necessary to shift the base
of Index numbers from one 2005-06
2006-07
117
125
60
64
period to another, without 2007-08 133 68
returning to the original raw 2008-09
2009-10
145
163
74
84
data and computing the 2010-11 180 92
2011-12 195 100
entire series. This change of 2012-13 215 110
reference base period is 2013-14 236 121
2014-15 251 129
called shifting the base 2015-16 265 136
2016-17 276 142
2017-18 284 146
2018-19 300 154
2019-20 323 166
Shifting base and Splicing
CPI
(Base : 1981= CPI- IW 2001= 100
• Sometimes, it becomes 100 for IW)

necessary to stich together 1993-94


1982=100
258
2001=100

series of different base 1994-95 284


1995-96 313
periods in to a single series. 1996-97 342
This is called splicing 1997-98 366
1998-99 414
1999-00 428
2000-01 444 100
2001-02 463 104
2002-03 482 109
2003-04 500 113
2004-05 520 117
2005-06 519 117
Thank you
G Nagaraju
nagaraju@nibmindia.org
M: 9665875253
Statistics
 Statistics is combination of Data and analytical methods including theory and
techniques

 Understanding a given data –


 Summarization
 Taking a business decision
 Research
 Data collection (secondary or primary)

 Analysis

 Inference
Inference
 Business decision making depends upon the perception of underlying population
 What is the average daily productivity of a unit?
 Is the supply of acceptable quality
 Are the two populations significantly different?
 Inference has wide applications in Business, Government, Academics, Medical
field and many other fields
Inference Methods
 Inferences are drawn on population based on analysis of samples
 Broad contents of process of Inference:
 Theory of Sampling and sampling errors
 Theory of probability for studying issues of uncertainty
 Probability based Estimation of population parameters
 Test of Hypothesis, Interval estimation
Conceptual Approaches to Probability
 There are three different types of conceptual approaches to probability
1. Classical Probability
2. Relative Frequency of Occurrence
3. Subjective Probability
Classical Probability
 Classical Probability
 If there are a possible outcomes favourable to the occurrence of an event A and b
possible outcomes unfavourable to A, and if all outcomes are equally likely and
mutually exclusive, then the probability that A will occur P(A) is
𝑎
𝑃 𝐴 =
𝑎+𝑏

numbr of outcomes favourable to occurance of A


𝑃 𝐴 =
total number of possible outcomes

 All the outcomes are Mutually Exclusive and Exhaustive (MEE)


 The probabilities are derived purely on logic; no experiments are conducted and data collected to arrive
Relative Frequency of Occurrence
 In problems where it is difficult to establish all possible outcomes, classical probability
approach fails
 Example:
 Probability of a customer walking into the shop actually does a purchase
 Probability of train arriving by a delay of less than 30 minutes

 Relative Frequency Occurrence


 Proportion of times an event occurs in the long run under uniform stable conditions

 Probability in such cases is calculated based on collected data


Subjective Probability
 Subjective Probability
 The degree of belief or confidence placed in the occurrence of an event by a particular
individual based on the evidence available

 Example:
 Financial Analysts assigning probability for a rate cut
 Higher the likelihood of occurrence of event, assigned probability will be closer to 1
Basic Concepts
 Probability
 Numbers assigned to elements in the sample space such that
 They are greater than or equal to 0
 They must add up to 1
S = {A, B, C}

P(A) ≥ 0
P(B) ≥ 0
P(C) ≥ 0

P(A) + P(B) + P(C) = 1

P(A) + (𝐴)ҧ = 1
Counting Principles: Permutations
Consider a group of n objects. In how many ways x Example:
objects (x < n) can be selected and arranged? If there are 5 people competing for three positions,
viz., GM, GDM and AGM. How many ways the
selection is possible?
n𝑃𝑥 = 𝑛 × 𝑛 − 1 × 𝑛 − 2 × . . . . . . (𝑛 − 𝑛 − 𝑥 )
n=5
𝑛! x=3
n𝑃𝑥 = 𝑛−𝑥 !
𝑛! 5!
n𝑃𝑥 = 𝑛−𝑥 !
=
2!
= 60
Counting principles: Combinations
 In permutations order of elements is important. If such order is not important, then we
have situation of ‘combinations’ Assuming A, B, C, D and E are the
candidates
𝑛 𝑥𝑃𝑛 𝑛!
= 𝑛𝐶𝑥 = = combinations candidates
𝑥 𝑥! 𝑥! × 𝑛 − 𝑥 !
1 ABC
2 ABD
Example: Suppose the three positions in the earlier example are 3 ABE
same (say AGM), how many ways the five competitors get selected? 4 ACD
5 ACE
6 ADE
𝑛 5! 7 BCD
= = 10
𝑥 3! × 2! 8 BCE
9 BDE
10 CDE
Exercises
 A panel of consumers is given 4 different package designs to rank them in preference order.
How many different rankings could the panel have given?
 Sales person has shown sarees of 12 different colours to her customer who wants to choose 4
sarees to purchase. If one of the selected colour is azure, how many combinations of choice are
there for the customer?
 A committee consists of eight union members and six non-union members. If a sub committee
of six members is to be formed with 3 from union and 3 from non-union members, how many
combinations are possible?
 Two dice are tossed in air. How many times does at least one of them show ‘3’
 5 coins are tossed in the air. How many times exactly 5 heads can occur?
 5 coins are tossed in the air. How many times exactly 3 heads occur?
Random Variable
 Random variable (RV) may be anything that takes different (numerical) values due to
chances
 For instance, number of heads in an experiment of tossing 10 coins
 Number of accidents that can occur on a road during one year
 Number of marks that a student may obtain

 Discrete random variable


 Random variable that takes on only finite or countable number of distinct values

 Continuous random variable


 Random variable that takes on continuous values in a given range
Probability Distribution
• Probability of Random Variable assuming certain Value
• What is the probability of two heads outcome in tossing 5 coins?
• What is the probability of a student obtaining marks in the range 60%-70%?
• What is the probability of number of accidents on a given road less than 5 in a year?
• Probability Distribution gives the relationship between RV and probability
• What is the probability that a random variable X takes value x?

𝑃 𝑋 = 𝑥 =?
Probability Distribution/ Probability Function
Daily Wage of 100 workers

 Notations: Class Interval Probability


Cumulative
Probability
 X is used for random variables (X assumes random Rs. 𝑓 𝑥
F(x)
values)
 𝑓 𝑥 stands for probability P(X = x) 160 ≤ 170 0.04 0.04
 F(x) stands for cumulative probability P(x ≤ 𝑋)
170 ≤ 180 0.14 0.18

Properties of probability Distribution 180 ≤ 190 0.18 0.36


1. 𝑓 𝑥 ≥ 0
2. σ 𝑓(𝑥) = 1 190 ≤ 200 0.28 0.64

200 ≤ 210 0.20 0.84


Probability Distribution:
“Probability Mass Function” when the RV is discrete 210 ≤ 220 0.12 0.96
“Probability Density Function” when the RV is continuous
220 ≤ 230 0.04 1.00
Probability Function
 Often, table form of probability distribution can be represented in a general ‘algebraic
function’ form. For instance -

X=x 𝑓 𝑥
-1 1/10
0 2/10
1 3/10
2 4/10

Algebraic expression for 𝑓 𝑥 :


𝑥+2
𝑓 𝑥 = x = -1, 0 , 1, 2
10
Probability Function
𝑥2 𝑓 𝑥
X=x 𝐹 𝑥 =
 For a RV that takes values x = 1, 25 (subsequent differences of F)
2, 3, 4, 5, the Cumulative 1 1/25 1/25
Distribution function is given
as: 2 4/25 3/25
𝑥2
𝐹 𝑥 = 3 9/25 5/25
25

4 16/25 7/25
What is 𝑓 𝑥 ?
5 25/25 9/25
Probability Distribution
 X stands for the number of minutes it takes to drain a water tank. The
probability distribution
𝑥
𝑓 𝑥 = x = 1, 2, 3, 4, 5
15
1. Prove that 𝑓 𝑥 is a probability distribution function
2. What is the probability that it will take exactly three minutes to drain
the tank?
3. What is the probability that it will take at least two minutes but not
more than four minutes?
4. Calculate and furnish cumulative distribution in tabular form
5. What is the probability that the draining process will take at most
three minutes?
6. What is the probability that it will take more than two minutes?
Bernoulli Processes / Binomial Distribution
 Bernoulli process consists of trials wherein the following
characteristics apply
 Each trial results in either of two mutually exclusive outcomes
referred as “success” and “failure”
 Probability of success, denoted by ‘p’, remains constant from trial
to trial. The probability of failure ‘q’ is equal to 1-p
 The trials are independent
 Probability of x successes in n trials of a Bernoulli process is
called Binomial Distribution
Binomial Distribution - Example
Question Solution

What is the probability of getting 2 The said event of 2 heads can come up in any
heads in an experiment of tossing 5 order in the 5 trials
coins? Supposing it comes in the order of (H, T, H,T, T)
the probability of the event:
Tossing a coin is a Bernoulli trial 𝑃 𝐻, 𝑇, 𝐻, 𝑇, 𝑇 = 𝑝 × 𝑞 × 𝑝 × 𝑞 = 𝑝2 𝑞 3

1. Two distinct outcomes, one of which is Head


The event includes all possible combinations,
‘success’ and another is Tail ‘failure’
i.e., 5C2 = 10
2. Probability of ‘success’ p is 0.5 constant for all
trials (probability of failure q =1-0.5 = 0.5)
3. Each trial is independent Probability of 2 heads in 5 trials is
P(X= 2) = 5C2 × 𝑝2 𝑞 3 = 10*0.55 = 0.3125
Binomial Distribution - Example
General expression of Binomial Distribution

The Bernoulli process of achieving x success in n trials with


probability of success p in each trial
𝑛
𝑓(𝑥) = × 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥
Binomial Distribution - Example
Problem Solution
X 𝑛
In an experiment of throwing two dice 𝑓(𝑥) = × 𝑝 𝑥 𝑞 𝑛−𝑥
𝑥
1. What is the probability of getting one 6? 0 𝟐 1 0 5 2
2. What is the probability of getting at least one 6? × = 0.694
0 6 6

1 𝟐 1 5
× = 0.278
1 6 6
The experiment is a Bernoulli process. 2 𝟐 1 2 5 0
The random variable X – number of ‘6’ × = 0.028
2 6 6
Number of trials = 2
Event is getting one 6
Defining X as the number of 6s in the experiment, X Probability of at least one six = f(1) + f(2)
can take values 0, 1, 2.
The probability of getting one ‘6’ on a dice is p = 1/6
and q = 5/6
Exercise 1
 Five fair dice are tossed. Would you agree that
the probability of obtaining no 1s is the same
as that of obtaining five 1s? If not what are
the exact probabilities of these two events?
Exercise 2
 Political analysts estimate that 40% of a particular candidate’s supporters are
inclined to vote for other candidate in a municipal election. What is the
probability that from a random sample of two voters selected from among the
first candidate’s supporters,
 a) two vote for other candidate;

 b) one votes for other candidate and

 c) two vote for the candidate


Exercise 2
 A project manager has determined that a certain subcontractor fails to deliver
certain standard orders on schedule about 25% of the time. Out of 10 orders,
what are probabilities for the following events?
1. The subcontractor will deliver seven or more orders on time
2. Will deliver at most eight orders on time
3. Fails to deliver two or fewer times on time
4. Will deliver between three and five orders (inclusive) on time
5. Will fail to deliver exactly two orders on time
6. Will fail to deliver seven or more orders on time
Exercise 3
 Using the Cumulative Binomial Probabilities Table determine the following
 P(X ≤ 2) for n = 5, p = 0.20
 P(X > 8) for n = 10, p = 0.4
 P(X = 4) for n = 5, p = 0.20
 P(4 ≤ 𝑋 ≤ 10) for n = 20, p = 0.50
 P(X ≥ 4) for n = 15, p = 0.10
 P(X = 3) for n =12, p = 0.7
 P(X ≤ 4) for n = 5, p = 0.8
Exercise 4
 Suppose, in the case of population containing 950 adults and 50
children, a sample of 5 are drawn with replacement. What is the
probability that the sample drawn has no children?
Expected Value and Variance
 The expected value of a random variable X is given by
𝐸 𝑋 = ෍ 𝑥 𝑓(𝑥) = 𝜇

 Variance is given by
𝐸[𝑋 − 𝐸 𝑋 ]2 = 𝜎 2
 Variance can also be written as
𝜎 2 = 𝐸 𝑋 2 − 𝐸(𝑋)2
Where 𝐸 𝑋 2 = σ 𝑥 2 𝑓(𝑥)
Poisson Distribution
 Poisson Distribution is highly used in business applications
 Poisson Distribution is applied when average rate of occurrence of event is known
 It approximates to binomial when larger number of trials are involved and probability of
success ‘p’ is small
 If X is a RV representing number of times an event occurs and X can take values x = 0,1,2,3,
………….. (ad infinitum)

𝜇 𝑥 × 𝑒 −𝜇
𝑓 𝑥 =
𝑥!
𝑐
𝜇 𝑥 × 𝑒 −𝜇
𝐹 𝑋≤𝑐 =෍
𝑥!
𝑥=0

µ is the average occurrence of an event


Example
 Number of telephone calls per minute coming through a certain
switchboard between 10.00 am and 11.00 am on business days is
distributed according to Poisson probability function with an average µ
of 0.4. What is the probability distribution of the number of telephone
calls per minute during the specified time period?
X=x f(x)
0 0.670
𝜇 𝑥 × 𝑒 −𝜇
𝑓 𝑥 = 1 0.268
𝑥!
2 0.054
3 0.007
µ = 0.4
4 0.001
5 0.000
Total 0.999996
Example 2
X=x f(x)
 A departmental store has determined in 0 0.018
connection with its inventory control system that 1 0.073
the demand for a certain brand of TV set was 2 0.147
Poisson-distributed with the parameter µ= 4 per 3 0.195
day 4 0.195
a) Determine the probability distribution of the 5 0.156
daily demand for this item 6 0.104
7 0.060
b) If the store stocks 5 of these items on a 8 0.030
particular day, what is the probability that .. …
the demand will be greater than the supply?

P(X > 5) = 1- F(X ≤ 5)


Example 3
 It was observed that on
Mumbai-Pune Expressway 5
accidents occur on an average
per day. If IRB infra, who
maintains the expressway,
has 10 emergency service
units to clear the accident
sites and restore traffic, what
is the probability that it runs
short of a emergency unit in a
day?
Exercises
 The average number of fatal accidents on a
particular road in a month is 0.35. What is the
probability that no fatal accidents occur in a
row of 5 months?
 A devise with three components, each of the
devise has 0.25 probability of failure. What is
the probability that all thee components are
functional? What is the probability of devise
working with at least one component with the
other two being backup
Exercises
 What is the probability of
 Getting at least one six in trial of 6 dice
 Getting at least two sixes in a trial of 12 dice
 Getting at least three sixes in a trial of 18 dice
 A departmental stores get about 200 calls in
a day, but only 1% of them result in sales.
What is the probability that exactly 15 sales
will be there in a day. What is the probability
that there will be more than 15 sales?
Comparison with Binomial Distribution
 In Binomial Distribution the number of trials
are fixed, whereas in Poisson distribution
number of trials is infinite
 Binomial Distribution has two parameters: n
and p
 Poisson Distribution has only one parameter µ
 The mean µ can be thought of as np
 For larger n and smaller p values, binomial
distribution approximates to Poisson
Distribution
Exercise 1
X=x 𝑓 𝑥 E(X) = x 𝑓 𝑥 E(X2) = x2 𝑓 𝑥

-1 1/10 -1/10 1/10


0 2/10 0/10 0/10
1 3/10 3/10 3/10
2 4/10 8/10 16/10
1 2

Mean = µ = 𝐸 𝑋 = 1

Variance = 𝜎 2 = 𝐸 𝑋 2 − 𝐸(𝑋)2 = 1
Mean and Variance of Binomial D
 Assume that a restaurant has determined that there is a probability of
20% that a customer will order Blitz beer. If at a particular time, there
are 5 customers in the restaurant, what is the expected value, variance
and standard deviation of the number of customers who will order
Blitz beer?
 Random Variable: X = number of people ordering beer

Mean = 𝐸 𝑋 = 𝑛𝑝

Variance = 𝜎 2 = 𝑛𝑝𝑞
Mean and Variance of Proportions
 Assume that a restaurant has determined that
there is a probability of 20% that a customer
will order Blitz beer. If at a particular time,
there are 5 customers in the restaurant, what
is the expected value, variance and standard
deviation of the proportion of customers who
will order Blitz beer?
Mean = 𝐸 𝑝ҧ = 𝑝

 Random Variable: proportion of people


ordering beer = 𝑝ҧ
𝑝𝑞
Variance (𝑝)
ҧ =
𝑛
Mean and Variance of Poisson D
 It has been determined in a certain factory that the
number of accidents per month is distributed
according to the Poisson distribution with an
average of µ = 4 accidents per month. What is the
expected value, variance and standard deviation of
the number of accidents per month in the factory?
 Random Variable: X = number of accidents
Mean = 𝐸 𝑋 = 𝜇

Variance = 𝜎 2 = 𝜇
Recap
 Probability definition and types of measurement
 Random Variable (Discrete RV and Continuous RV)
 Bernouli Process and Binomial Distribution
 Mean of RV following BD E(X) = np
 Variance σ2 = npq
 Poisson Distributing
 Mean E(X) = µ
 Variance σ2 = µ
Exercise
 Manufactured items that do not pass inspection are often sold as
“seconds” in an outlet at reduced price. Often these seconds may have
only minor flaws and do not affect their performance. Past testing of a
manufacturer’s coffee maker has shown that 90% of all ‘seconds’
perform as well as ‘firsts’. A random sample of 25 ‘seconds’ from this
particular manufacturer is to be selected. We record X as the number of
items in the sample that perform as well as ‘firsts’
 Find P(X ≥ 20)
 Find P(18≤ X < 23)
 Compute µ and σ of X
Continuous Probability Distributions
 Unlike discrete variables that take distinct values, Continuous variables
are always measured in range of two numbers.
 For instance, the weight of soap piece in a manufacturing unit will lie
between 19 to 21 grams.
 The probability function of continuous variable, therefore, also refers to
probability of variable taking value in between two values
 Probability of car travelling between 5 to 20 kilometers in a day
Continuous Probability Distributions
 Probability of continuous RV taking values between x1 and x2 is given
by the area covered under the curve between the two values
 Probability of continuous RV taking a specific value x is zero
 Therefore,
P(a ≤ x ≤ b) is same thing as P(a < x < b)
Normal Probability Distribution
 Normal Distribution most commonly applied distributions in cases of many
real life problems involving continuous RV
 Normal Distribution is also widely used in test of hypothesis

1 −
1 𝑥−𝜇 2
𝑓(𝑥) = 𝑒 2 𝜎 µ and σ are the parameters of the distribution
𝜎 2𝜋 µ = mean of random variable X
σ = standard deviation of X
Properties of Normal Probability Distribution
0.014
1. Normal distribution depends upon
Two parameters: µ and σ
0.012 2. The probability curve is symmetric
mean µ (skewness = 0)
0.01 3. The tails extend to infinity on both
sides
0.008 4. Given same mean µ, variable with
probability

higher standard deviation will have


0.006 more spread
5. Area under the curve represents the
probability
0.004
6. Total of area under the curve is sum
of all probabilities, which is 1
0.002
7. Area on either side of mean µ is 0.5

0
µ
Properties of Normal Probability Distribution
0.07

0.06

0.05
Probability

0.04 The location of Normal Curves


change with the mean value µ
0.03 Normal Curve with same average
(µ) but larger σ will have longer
0.02
tails
0.01

0
45 46 46 46 47 47 48 48 49 49 50 50 51 51 51 52 52 53 53 54 54
Random Variable

mean = 50; SD = 1 mean = 49; sd = 1 mean=50; SD=1.5


Standardized Normal Variable
 A transformation of normal variable X in to Z where

𝑥−𝜇
𝑍=
𝜎
Z is also a normal variable with parameters µ = 0 and σ = 1

If X ~ N(𝜇𝑥 , 𝜎�𝑥 ) and Y ~ N(𝜇𝑦 , 𝜎�𝑦 ) are two normal variables then

𝑥−𝜇𝑥 𝑦−𝜇𝑦
=𝑍= where Z ~ N(0,1)
𝜎𝑥 𝜎𝑦
Probability Areas under normal curve

z range Probability x range

-1 < z < 1 68.3% µ-σ < x < µ+σ

-2 < z < 2 95.4% µ-2σ < x < µ+2σ

-3 < z < 3 99.7% µ-3σ < x < µ-3σ


Normal Distribution Tables
 The areas under the normal curve for standard normal variable are computed presented in
Standard Normal Table
 Tables are generally given for cumulative probabilities 𝑃(−∞ < 𝑧 ≤ 𝑐)
0.03

0.025

0.02 The excel function


=norm.dist(x,µ,σ)
0.015

0.01

0.005 𝑃 −∞ < 𝑧 ≤ 1 = 0.84

-3.00 0.00 1 3.00


Exercise
 A tire company estimated the mean and 𝑥−𝜇
𝑧=
standard deviation of life of its tires as µ = 𝜎
36,500 km and σ = 5,000 km. The company 40,000 − 36,500
wants to know what is the probability that 𝑧= = 0.7
5000
a tire from its production lasts for more
than 40,000 km 𝑃 𝑧 > 0.7 = 0.5 − 𝑃(𝑧 ≤ 0.7)

𝑃 𝑧 > 0.7 = 0.5 − 0.2580 = 0.242)


X is normally distributed with mean
µ = 36,500 km and
σ = 5,000 km

P (X > 40,000) ?
Conversion of Z to X
 Given the value of standard normal variable Z, one can find the value
of X using the following

X=Zσ+µ

Given µ = 18,000, σ = 1000, what is X for Z = 1?


Given µ = 36,500, σ = 5000, what is X for Z = - 0.7?
Exercise 1
 Given that z is a standard normal random variable, compute the
following probabilities
1. P(0 ≤ z ≤ 0.83)
2. P(-1.57 ≤ z ≤ 0)
3. P(z > 0.44)
4. P(z ≥ -0.23)
5. P(z < 1.2)
6. P(z ≤ -0.71)
Exercise 2
Given that z is a standard normal random variable, find z for each
situation

1. The area to the left of z is 0.9750


2. The area between 0 and z is 0.4750
3. The area to the left of z is 0.7291
4. The area to the right of z is 0.1314
5. The area to the left of z is 0.67
6. The area to the right of z is 0.33
Exercise 3
 Given that z is a standard normal random variable, find z for each
situation
1. The area to the right of z is 0.01
2. The area to the right of z is 0.025
3. The area to the right of z is 0.05
4. The area to the right of z is 0.10
Exercise 4
 The time needed to complete a final examination in a particular college
course is normally distributed with a mean of 80 minutes and a
standard deviation of 10 minutes. Answer the following
1. What is the probability of completing the exam in 90 minutes or less?
2. What is the probability that a student will complete the exam in more
than 70 minutes but less than 80 minutes
3. Assume that the class has 60 students and the examination period is 90
minutes in length. How many students do you expect will be unable to
complete the exam in the allotted time?
Exercise 5
 Let X represent the strength of a certain type of hemlock beam produced by
Outland Lumber Co. Assume that X is normally distributed with a mean
2,000 pounds per square inch (psi) and a standard deviation of 100 psi. A
beam is drawn at random and its strength is tested. Change the following
probability statements made in terms of X into Statements about the
standardized variable Z. What is the probability that the tested strength will
be
a) At least 2,150 psi?
b) Less than 1,825 psi?
c) Between 1,875 and 2,115 psi?
d) Between 1,795 and 1,905 psi?
e) Either more than 2,250 or less than 1,800 psi?
f) At most 2,100 psi?
Exercises 6
 A study found out that an MBA graduate’s average starting salary is Rs.
18,000 per month with s.d. of Rs. 1,000. Find out whether the following
statements are true or false.
 It is equally likely that a graduate may receive lower than Rs. 16,000 or
higher than Rs. 20,000
 It is equally likely that starting salary will be between Rs. 18,000 and Rs
19,000 or between Rs. 16,000 and Rs 18,000
 It is equally likely that starting salary will be either higher than Rs 18,000
or lower than Rs. 18,000
 95.5% of the time, starting salary will be less than Rs. 20,000
Sample and Population
 Statistic:
 Any measure calculated on sample data of a random variable
 For ex: sample mean and sample variance
 Sample mean and s.d. are generally represented by 𝑥ҧ and 𝑠
 Parameter
 A measure describing the population data of the random variable X
 Population mean and s.d. are generally represented by 𝜇 and 𝜎
Sampling and Sampling
Distribution
Sample and Population
 Parameter
 A measure describing the population data of the random
variable X
 For example: population mean E(X) = µ

 Standard deviation of X 𝐸 (𝑋 − 𝜇)2 = 𝜎


 Statistic or point estimator
 Any measure calculated on sample data of a random
variable x
 For ex: Arithmetic mean calculated over a sample (sample
mean) 𝐸 𝑥 = 𝑥ҧ
 Similarly, s.d. calculated over a sample (sample s.d.)
𝐸 (𝑥 − 𝑥)ҧ 2 = 𝑠
Population
 Business decision making is always with respect to population only
 As studying population in totality (including every element of population) is
not possible or feasible, researchers resort to sampling

Two types of populations

a) Population that is finite b) Population that is infinite


Ex: number of employees of a company Ex: Products coming out of a product line of a
factory (tire factory, for instance)
Sample properties
 Sample may not correctly represent population
due to sampling errors.
 Due to ‘sampling errors’, no two samples give
same estimate of population parameter
 Therefore, a sample statistic is specific to the
sample
 A true picture of population comes only on
repeated sampling
Sampling Distribution
 A random sample of of size ‘n’ results in one ‘estimate’ of population
parameter
 Repeated sampes of same size n, result in several estimates which may be
different from one another only due to sampling errors
 Taking estimate as random variable, a probability distribution can be
constructed over estimates
 The probability distribution of sample estimates is called ‘Sampling
Distribution’
 Study of ‘Sampling Distributions’ is most import element in statistical
inference
Example
 A major software company has a task Salary Training Salary Training Salary Training

of developing profiles of its 2,500


49094 yes 45923 yes 45121 yes
managers. It specifically wants to know 53264 yes 57268 no 51753 yes
about salaries they are receiving and 49644 yes 55689 yes 54392 no

whether they have completed company 49895 yes 51565 no 50164 no

training 47622 no 56188 no 52974 no

55924 yes 51766 yes 50241 no


 Company uses ‘simple random 49092 yes 52541 no 52794 no

sampling’ to select 30 employees to 51404 yes 44980 yes 50979 yes

estimate the population parameters 50958 yes 51933 yes 55861 yes

55110 yes 52973 yes 57309 no


Example (cont..)
 Sample size n = 30

Mean salary 𝑥ҧ = 51,814


SD of salary 𝑠 = 3,348
Proportion of training received 𝑝ҧ = 0.63
 These are point estimates of population mean µ, standard deviation σ and population
proportion p

 Another sample of same size would result in another set of estimates


Sampling Distribution
2500
 From the population of 2,500, there are such samples possible
30
 Each such sample gives an estimates forഥ𝑥 , s and 𝑝ҧ

 Considering 𝑥ҧ as a random variable, probability distribution of the sample estimates, is called


sampling distributions

Sampling Distributions for:


1. Random Variable: 𝑥ҧ
2. Random Variable: s
3. Random Variable: 𝑝ҧ
Sampling Distribution of Normal Variable
 An important utility of sampling distribution lies in its relationship with probability
distribution of underlying random variable X
 Theorem:
 If a random variable X is normally distributed with mean µ and sd σ, then the sampling mean 𝑥ҧ of a
given sample size n is also normally distributed with mean and standard deviation:

𝐸 𝑥ҧ = 𝜇

𝜎
𝜎𝑥ҧ =
𝑛
The effect of sample size n
Sampling distribution with varying sample size
0.07
For larger sample size,
0.06 n = 50 sampling distribution would
have lesser dispersion (𝜎ൗ 𝑛)
0.05
n = 10

0.04 The mean value of sampling


distribution would be the
0.03 same (µ) for all sample sizes
Population
0.02
distribution
0.01

0
1000

1002
997
997
997
997
998
998
998
998
998
998
999
999
999
999
999
999
999
1000
1000
1000
1000
1000
1000

1001
1001
1001
1001
1001
1001
1002
1002
1002
1002

1002
1002
1003
1003
1003
1003
Standard Error
 A sample mean 𝑥ҧ is an estimate of 𝜇
 For a sample, (𝑥ҧ − 𝜇) is considered as sampling error
𝜎
 A measure of average sampling error is given by 𝜎𝑥ҧ = which is also
𝑛
called Standard Error
 𝜎𝑥ҧ also gives a measure of precision with which 𝜇 is estimated
 Important property of standard error:
 Larger the sample size (n), smaller will be 𝜎𝑥ҧ
Sampling Distribution of Non-Normal Variables -
Central Limit Theorem
 Sampling Distributions of samples from non-normal probability distributions
also tend to be normally distributed, due to Central Limit Theorem

Central Limit Theorem

If a random variable X, either discrete or continuous, has a mean


µ and a finite standard deviation σ, then the probability
ҧ
𝑥−𝜇
distribution of 𝑧 = approaches the standard normal
𝜎𝑥ഥ
distribution as for large samples (n increases to large number)
Estimation
 Statistical Estimation
 Statistical Estimation procedures provide us with the means to estimate
population parameters with desired degree of precision

 In Inference, there are two types of problems


 Estimation
 Hypothesis Testing
Estimation
 Two types of Estimation of population parameters
1. Point Estimation
2. Interval Estimation
 Point Estimation:
 Usually sample mean 𝑥ҧ is considered as point estimate of µ
 Interval Estimation
 In general, Point Estimate 𝑥ҧ would not be equal to the population parameter µ as there
will always be some sampling error
 Interval estimation involves giving a range with Error Margin (EM): 𝑥ҧ ± EM

𝑥ҧ
𝑥ҧ𝐿 = 𝑥ҧ − EM 𝑥ҧ𝐻 = 𝑥ҧ + EM
Sampling Distribution
 Consider a case of random samples of size n over a
population random variable X whose mean is µ and s.d is
σ
 The sampling distribution of 𝑥ҧ random sample of size n

sampling Mean 𝐸(𝑥)ҧ = μ


𝜎
sampling S.D. 𝜎𝑥ҧ =
𝑛

Using the normal probability distribution properties,


we can say that a random sample mean will take value
between 𝑥ҧ𝐿 and 𝑥ҧ𝐻 with (1-α)% confidence were
𝑥ҧ𝐿 = 𝜇 − 𝑧𝛼 𝜎𝑥ҧ
2
and
𝑥ҧ𝐻 = 𝜇 + 𝑧𝛼 𝜎𝑥ҧ
2
Example of sampling distribution
 Lloyd’s Department store selects a simple random
sample of 100 customers in order to learn about the
amount spent per shopping trip. If the mean
expenditure per trip is known as $ 80 with standard
deviation σ = 20, construct sampling distribution.

 What are the upper and lower values of within which


sample mean is likely to assume 95% of times
Confidence Interval Estimation
 Conversely, when we have a sample mean 𝑥ҧ we
can derive the confidence intervals for unknown
population mean µ with confidence level of 1-α Unknown Population

𝑥ҧ𝐿 = 𝑥ҧ − 𝑧𝛼 𝜎𝑥ҧ 𝜇 − 𝑧𝛼/2 𝜎𝑥ҧ 𝜇 𝜇 + 𝑧𝛼/2 𝜎𝑥ҧ


2
and
𝑥ҧ𝐻 = (𝑥ҧ + 𝑧𝛼 𝜎𝑥ҧ )
2 Confidence interval

𝑥ҧ𝐿 𝑥ҧ 𝑥ҧ𝐻
The unknown population mean is likely to
figure in the intervals 𝑥ҧ𝐿 , 𝑥ҧ𝐻 with (1-α)
confidence
Error Margin is 𝑧𝛼 𝜎𝑥ҧ
2
Example
 Over the sample of 100 customers,
Lloyd’s Store found the mean of
expenditure per shopping trip as $
82. Assuming population σ as 20,
what are the upper and lower
limits within which population
mean is likely to be present 95% of
times?
 What is the error margin?
When σ is known
 A tire company wants to estimate 90% Sampling mean = 𝑥ҧ = 36,000 km
confidence intervals for running life of tires 5000
Sampling sd. 𝜎𝑥ҧ = 𝜎ൗ 𝑛 = 100 = 500
produced by it. A sample of 100 tires are
Confidence level (1-α) = 90%
drawn and tested for life. The sample mean The z0.05 (=norm.s.inv(0.95) = 1.64
was obtained as 36,000 kilometers. Assuming
the population s.d is 5000 kilometers, what is
the confidence interval for the population
running life? 𝑥ҧ − 𝑧0.5 𝜎𝑥ҧ 𝑥ҧ 𝑥ҧ + 𝑧0.05 𝜎𝑥ҧ
 What is the error margin?
36000 − 1.64* 500 36000 + 1.64* 500

The population mean µ is likely to be between


35,177 and 36,822 with a confidence of 90%
The EM is 822
When σ is not given
 Usually the population standard deviation σ is known apriori
 In case it is not known, the same needs to be estimated
 Sample standard deviation s is used as point estimate of σ
 In such cases, EM and interval estimates are to be calculated
based on t distribution
 Appropriate t value from t distribution tables with (n-1)
degrees of freedom are to be used

 The interval 𝑥ҧ 𝐿 , 𝑥ҧ 𝐻 is calculated:

𝑠
𝑥ҧ𝐿 = 𝑥ҧ − 𝑡𝛼
2 𝑛
𝑠
and Error Margin: 𝑡𝛼
𝑠 2 𝑛
𝑥ҧ𝐻 = 𝑥ҧ + 𝑡𝛼
2 𝑛
When σ is unknown
 A group of students working on a summer Sample Mean = 𝑥ҧ = 6810
780
project took a simple random sample of 30 Sampling sd = 𝑠ൗ 𝑛 = = 142.04
30
families from the population of “low income
area” of a large city. The sample estimated The t0.025 (39 df) = 2.045
annual income of a family as 𝑥ҧ = 6810 and s
= 780. What would be the 95% confidence
interval for the mean income of all the
families in this area? 6810 −2.045*124.02 6810 6810 +2.045*142.04
6518.7 7101.2
 What is the error margin?

EM = 2.045*142.04 = 291.25
t α/2 (n d.f) = t.inv.2t(p,df)
Exercise
 In a town, out of sampled 800 480
Sample Mean = 𝑝ҧ = = 60%
800
automobile owners, 480 said they
would like to see the size of Result: In large samples, the sampling distribution of
Proportions follow normal distribution with
automobiles reduced. What are the 95%
Sampling mean 𝐸 𝑝ҧ = 𝑝 and
confidence limits for the proportion of
𝑝ҧ 𝑞ത
all automobile owners in the town who 𝜎𝑝ҧ =
𝑛
would like to see car size reduced? Therefore, given the sample mean 𝑝ҧ = 0.6,
the confidence interval for 95% is

𝑝ҧ 𝑞ത
𝑝ҧ ∓ 2 ∗
𝑛

The standard z 0.025 = 1.96


0.6 ∗ 0.4
0.6 ∓ 1.96 ∗
800
Exercise
 A large nationalized bank was interested Random variables
in analyzing difference between two Northern Sampling Mean = 𝑥1
Southern Sampling Mean = 𝑥2
regions (North region South region) in Mean difference (𝑥1 − 𝑥2 )
average outstanding in NPA accounts of
Standard deviation of mean difference
above Rs. 10 laks. The analysts wanted to
𝑠12 𝑠22
establish 95% confidence intervals for this 𝑠 (𝑥 −𝑥 ) = +
1 2 𝑛1 𝑛2
difference.
North South
Mean 𝑥1 = 76 𝑥2 = 65
Sampling sd s1 = 25 s2 = 22
Sample size n1 = 100 n2 = 120
Exercise
 In a random sample of 400 consumers in a city A,
an arithmetic mean expenditure on durable goods
of Rs. 4 laks with a standard deviation of Rs. 12000
was observed in 2007. In a random sample of 900
consumers in city B, an arithmetic mean of Rs. 4.1
laks with a standard deviation of Rs. 12000 was
observed for the sample period.
1. Construct a 95% confidence interval for the mean
expenditure on durable goods of all consumers in
city A
2. Construct a 95% confidence interval for the mean
expenditure on durable goods of all consumers in
city B
3. Construct a 95% confidence interval for the
difference in mean expenditures between the two
cities
Testing of Hypothesis
 Testing of Hypothesis another
application of sampling distribution.
 It is employed in various contexts of
decision making
 Testing whether a population parameter
is different from a pre-specified (given)
value
 Testing whether parameters of two
different populations are significantly
different
Example
 A particular company’s automobile attains on
an average fuel efficiency of 24 kmpl with a
S.D of 12 kmpl. A product research group has
developed a new fuel injection system
designed to increase the fuel efficiency. A
sample of size 36 automobiles on the new fuel
injection system have produced an average
efficiency of 30 kmpl. The company wants to
take a decision on launching production of the
new system. Would you recommend?
Hypotheses
The procedure of testing of hypothesis requires first to specify
Hypotheses to be tested and held

1. Null Hypothesis H0:


 Null hypothesis is the one that is held as true to begin with
(default hypothesis)
 It is also called Maintained Hypothesis
 Null hypothesis is the one which is tested whether it can be
maintained or not
2. Alternative hypothesis H1:
 Proposed hypothesis as against null hypothesis
 To be held as true when H0 can not be maintained as true
 It is also called ‘Research Hypothesis’
Example
 Market research at Morrel provides management
with up-to-date information on the company’s
various products and how the products compare
with competing brands of similar products. One
research question concerned whether the product of
Morrel was the preferred choice of more than 50% of
consumer population.

 Null Hypothesis H0:


 p ≤ 0.5
 Alternative Hypothesis: H1:
 p > 0.5
Specifying Hypotheses: Example
 Average family income of a certain city
was determined from a census to be Rs.
3,65,000 per year.
 Two years later, a researcher wants to test
whether this mean has changed or not
 Null Hypothesis H0:
The family income has not changed (µ =
3,65,000)
 Alternative Hypothesis: H1:
 The family income has changed (µ ≠
3,65,000)
Specifying Hypotheses: Example
 A car manufacturer claims fuel efficiency
of car above 25 kmpl.
 Research objective: Verify whether the car
manufacturer’s claim is valid through a
sample study

Null Hypothesis: H0: µ ≥ 25


Alternative Hypo: H1: µ < 25

 If the evidence through sample justifies


rejection of null hypothesis, the
alternative hypothesis gets accepted
Specifying Hypotheses: Example
 Effectiveness of Advertisement: Let µ represents average annual sales.

 A manufacturer was selling on an average Null Hypothesis:


Rs. 5 crs sales per year. Recently he hired H0: advertisement has not made any
difference in incremental sales
advertisement agency and incurred µ≤5
substantial cost in advertisements, after
which the sales increased to Rs. 7 crs pa. Alternative Hypothesis:
H1: Advertisement has made positive
 He wants to test whether post difference for sales
advertisement sales is due to
µ>5
advertisements or just due to chances
factor
Specifying Hypothesis: Exercise
 The manager of an automobile dealer is
considering a new bonus plan designed to
increase sales volume. Currently, the mean sales
volume is 14 per month. He wants to conduct a
study on sample basis by allowing some sales
managers to work under the new scheme for a
month
1. Develop the null and alternative hypotheses
for this study
2. Comment on the conclusion when H0 can be
rejected
3. Comment on the conclusion when H0 can
not be rejected
Outcomes of Testing: Types of Errors
 There are four possible outcomes for any
testing of hypothesis problem
 When Null Hypothesis (H0) is true, the
Null (H0) is Null (H0) is
testing may result in:
true not true
 The null hypothesis is rightly
maintained Not Reject H0 Correct Type II Error
 The null hypothesis is wrongly rejected
Reject H0 Type I Error Correct
 When Alternative Hypothesis (H1) is
true, the teste may result in:
 The null hypothesis is wrongly
maintained
 The null hypothesis is rightly rejected
Errors in Hypothesis Testing
Example of new fuel injection system claiming higher fuel
efficiency

Null Hypothesis: H0: µ ≤ 64


Alternative Hypo: H1: µ > 64

When test is done, The Type I and Type II errors that may be
committed are as below:

Type I Error: Type II Error:

H0 is is true, but it was rejected H0 is not true but it was not rejected

Deciding that the fuel efficiency is more Deciding that the fuel efficiency is less
than 64 kmpl when actually it is not than 64 kmpl when actually it is above
Procedure of Test of Hypothesis
 The Test of Hypothesis procedure is simply a decision
rule that specifies whether the null hypothesis H0
should be rejected or retained for every possible value
of a statistic observable in a simple random sample of
size n
 Important Concepts
 Level of Significance
 Test statistic z (Calculated z)
 ‘p’ value
 Critical value zc
Level of Significance
 The level of significance refers to the tolerance
probability level that the researcher is willing
to accept of committing Type I error (that is
rejecting null hypothesis when it is actually
true)
 Level of significance is always specified by the
researcher himself
 Level of significance is generally denoted by α
and is commonly chosen as 0.05 or 0.01
Test Statistic
 If µ is the population mean, σ is the population
standard deviation, and n is the sample size,
the test statistic for sample mean 𝑥ҧ is given as:

𝑥ҧ − 𝜇
𝑧= 𝜎
ൗ 𝑛

• The test statistic is also called ‘z calculated’


‘p’- value
 p-value is the probability of type I error if H0
is rejected at the z calculated

 When p-value is higher than the type I error


(α), then null hypothesis will not be rejected at
α level of significance

 Similarly, when p-value is lesser than type I


error (α), then null hypothesis will be rejected
at α level of significance
z- critical value
 ‘Z-critical value’ refers to the value of z statistic at
the level of α chosen by the researcher

 Z-critical is collected from the standard normal


tables
Procedure of Test of Hypothesis
 Procedure itself consists of the following steps
1. Identifying the underlying variable X and population parameters µ and
σ
2. Specify H0 and H1
3. Under the null hypothesis, consider sampling distribution of sample
size n; It will have sampling mean as 𝜇𝑥ҧ = 𝜇 and 𝜎𝑥ҧ = 𝜎ൗ 𝑛
4. Specify level of significance α (probability of Type I error)
5. Mark out the rejection regions of H0
6. Consider the given sample mean 𝑥ҧ and calculate the test statistic (z-
ҧ
𝑥−𝜇
calculated) for the specific sample mean obtained for the sample 𝑧 = 𝜎

𝑥
7. Calculate p value to decide on H0
8. Alternatively, compare test statistic with critical value from table to
decide on H0

|z-statistic| > z critical Reject H0


|z-statistic| < z critical Do not reject H0
Exercise 1 (When σ is known)
Two years back average income of family in a
town was estimated to be Rs 3,65,000. From past
experience the standard deviation of family
income was observed to be σ = 50,000.

Currently, by taking a sample of 100 families, a


researcher observed the average family income
to be Rs. 3,75,000. Test whether the increase in
income is statistically significant at 1% level.
Exercise 1(cont..)
 Underlying Variable and Population:
 Random variable: Average family income X
 Population parameters: µ = 3,65,000; σ = 50,000
 Hypotheses
 H0: µ ≤ 3,65,000
 H1: µ > 3,65,000
 Sampling Distribution
𝜎 50000
 Mean of sampling 𝜇𝑥ҧ = 3,65,000; σ𝑥ҧ = = = 5000
𝑛 100
 Level of Significance = given as 1%
Marking Rejection Region: One Tail Test
0.45
0.4
probability 0.35
0.3
0.25
0.2
0.15
0.1 Reject H0
0.05
0

𝜇 =3,65,000 𝜇 + 2.33σ

0.45
0.4
0.35
probability

0.3
0.25
0.2
0.15
0.1 Reject H0
0.05
0

𝑧=0 𝑧 = 2.33
Exercise 1(cont..)
 Given Sample Mean
 𝑥ҧ = 3,75,000
ҧ
𝑥−𝜇 375000−365000
 Test Statistic 𝑧 = 𝜎 = = 2
ൗ 𝑛 5000

 The calculated z statistic is lower than Z(critical) at α (0.01) =


2.33. As it is not in the rejection region, H0 can not be
rejected
 Alternatively, p-value = P(z > +2) = 0.028
 As p value is higher than 1%, H0 can not be rejected
Exercise 2 (cont..)
Two years back average income of family in a
town was estimated to be Rs 3,65,000. From past
experience the standard deviation of family
income was observed to be σ = 50,000.

Currently, by taking a sample of 100 families are


a researcher observed the average family income
to be Rs. 3,70,000. Test whether the income has
changed significantly at 5% level of confidence.
Exercise 2 (cont..)
 Population parameters: µ = 3,65,000; σ = 50,000
 Hypotheses
 H0: µ = 3,65,000
 H1: µ ≠ 3,65,000
 Sampling Distribution
 Mean of sampling 𝜇𝑥ҧ = 3,65,000;
𝜎 50000
 σ𝑥ҧ = = = 5000
𝑛 100

 Level of Significance = given as 5%


Marking the Rejection Region - Two Tail Test
0.5

0.4

probability
0.3

0.2
Reject H0 Reject H0
0.1

𝜇 -1.96σ 𝜇 = 3,65,000 𝜇 +1.96σ

0.5

0.4
probability

0.3

0.2
Reject H0 Reject H0
0.1

𝑧= -1.96 𝑧=0 𝑧 = 1.96


Exercise 2 (cont..)
 Given Sample Mean
 𝑥ҧ = 3,70,000
ҧ
𝑥−𝜇
 Test Statistic 𝑧 = 𝜎 = 5000/5000 = 1
ൗ 𝑛

 Z(critical) at α/2 (0.025) = 1.96


 As test statistic is z is lower than z(critical) . It is not in the
rejection region. Therefore, H0 can not be rejected
 p value = P(z > +1) = 0.158; P(z < -1) 0.158
 As p value is higher than 5%, H0 can not be rejected
Exercise 3
 Hilltop Coffee states that the a can
contains 3 pounds of coffee with σ =
0.18. A sample of 36 cans were taken
to test whether the contents is
minimum 3 pounds with 1% risk of
making error. The sample mean was
found to be 2.92 pounds
Exercise 3 (cont..)
 Random Variable: X – measure of
coffee in a can The test statistic 𝑧 = 𝜎
ҧ
𝑥−𝜇 2.92−3
= 0.18 = −2.66
ൗ 𝑛 ൗ 36
 Population mean µ = 3 pounds
and variance is σ = 0.18 pounds
 The null hypothesis H0: µ ≥ 3 The z critical value for 0.01 significance
 The alternative hypothesis H1: µ < 3 From the standard normal table = 2.33
As the z calculated is falling in the rejection
 Level of significance α 1% region, H0 is rejected
 Sample Size n = 36 cans
 The sample mean 𝑥ҧ = 2.92 The p value at z = 2.66 is 0.0039
pounds
Exercise 4
 An advertisement agency that had developed a theme
for commercials on a certain TV show based on the
assumption that 50% of the show’s viewers were over
30 years of age. The agency now wants to know
whether the percentage has changed in either upward
or downward.
 Develop hypotheses
 List out the procedure to conduct test of hypothesis
for a risk level of under 5% for type I error
 A sample of 400 viewers was taken to test the
hypothesis, out of which 210 turned out to be above
30 years
Exercise 5
 A two-wheeler manufacturer wants to test
a new fuel efficiency technology before
implementing it in the production line. He
has decided that unless the error margin is
less than 5%, he would not buy the new
technology. If fuel efficiency of the two-
wheelers he is manufacturing is 55 kmpl,
with sd of 30 kmpl, what should be the
minimum increase he is expecting in the
average fuel efficiency from a sample of 36
bikes under the test for the new
technology?
Two samples case where sample sizes are large
Exercise
 A consulting firm conducting research for a client was asked to test
whether the wage levels of unskilled workers in a certain industry were
the same in two different geographical areas A and B. The firm took
simple random samples of unskilled workers in the two regions

Area Mean sd Sample size


A 𝑥1 = 300.01 s1 = 4.00 n1 = 100
B 𝑥2 = 295.21 s2 = 4.50 n2 = 200

 The client wishes to run a risk of 0.02 of type I error


Exercise
 Random Variables  Define a new random variable
 Wages at A = X1  Y = X1 – X2

 Wages at B = X2  Mean and sd of Y


 Population Means  Mean µ = µ1 - µ2

 Population mean at A = µ1
 Population mean at B = µ2
 Population sd
 For region A σ1= s1 = 4.00
 For region B σ2 = s2 = 4.50
Exercise
 Samples
 The problem is to test whether
 Sample mean of A = 𝑥1 = 300.01
mean of Y is 0 (that is, µ = 0)  Sample mean of B = 𝑥2 = 295.21
 Hypotheses  Therefore, the sample mean of Y
 Null H0 : µ = 0  𝑦
ത = 𝑥1 − 𝑥1 = 4.8
 Alt H1: µ ≠ 0  Sampling Distribution
 Sampling mean 𝜇𝑦ത = 𝜇 = 0

𝑠12 𝑠22
 Sd = 𝑠𝑦ത = +
𝑛1 𝑛2

42 4.52
= + = 0.51
100 200
Exercise
 Test Statistic at sample mean 𝑦ത = 4.8

𝑦ത − 𝜇 4.8 − 0
𝑧= = = 9.41
𝑠𝑦ത 0.51
 Z (critical)
 Given that α = 0.02
 As the test is two tail test, z value at α/2 = 2.33
 As z calculated 9.41 is higher than z (critical) 2.33, H0 is to be rejected
 Conclusion: The average wage rates of A and B are significantly
different
Sampling Distribution in Small Sample
cases
ҧ
𝑥−𝜇
 When sample size is small (generally n < 30), the statistic 𝑧 = , where
𝑠𝑥ഥ
𝑠𝑥ҧ is estimated standard error, does not approximate to normal
distribution
 The statistic z in such cases is found to follow ‘t’ distribution
 The table values for are available in the ‘t’ tables
ҧ
𝑥−𝜇
 The test statistic t = would be associated with degrees of freedom
𝑠𝑥ഥ
of (n-1)
Example Hypothesses
H0: µ = 100
 The personnel department of a company H1: µ ≠ 100
developed an aptitude test for a certain type
of semiskilled worker. The individual test Sampling Distribution
scores were assumed to be normally
𝑠 5
distributed, with mean 100. It was agreed that 𝑠𝑥ҧ = = = 1.25
𝑛 4
this hypothesis would be subjected to a two-
tailed test at the 5% level of significance. The Test statistic
aptitude test was given to a simple random
sample of 16 semiskilled workers with the 𝑥ҧ − 𝜇 94 − 100
t= = = −4.8
following results 𝑠𝑥ҧ 1.25

The t table
5%
𝑥ҧ = 94; 𝑠 = 5; 𝑛 = 16 𝑓𝑜𝑟 𝑠𝑖𝑔𝑖𝑛𝑓𝑖𝑐𝑛𝑎𝑐𝑒 𝑎𝑡 15 𝑑𝑓 𝑖𝑠 2.131
2
H0 can not be maintained

You might also like