You are on page 1of 95

QUANTITATIVE

METHODS FOR
MANAGEMENT
Session -10
RECAP
• Introduction – definition, types of statistics, levels of measurement
• Collection / compilation/ classification / tabulation
• Presentation – graphical and diagrammatic
• Measures of central tendencies
• Measures of dispersion
• Measures of skewness
• Exploratory data analysis
• Association between variables – covariance and correlation
• Probability – concepts, laws and Baye’s theorem
• Random variable- discrete and continuous
• Theoretical distributions – Binomial, Poisson and Normal distributions
• Sampling techniques , sampling distribution, central limit theorem and
estimation theory- types ( point and interval estimate)
Learning objectives
• Hypothesis testing
• Errors in TOH
• Level of significance
• Types – parametric and non parametric
• Steps involved in TOH
• Chi square
• Independence
• Goodness of fit (uniform)
TESTING
OF
HYPOTHESIS
Hypothesis
• An assumption to be tested
• If the sample statistic differs from the population
parameter, a decision must be made as to
whether or not this difference is significant
• If it is, the hypothesis is rejected. If not, it is
accepted
Set up a Hypothesis
• H0:Null Hypothesis
• No significant difference between the sample statistic
and the population parameter
• Any difference found is accidental, arising out of
sampling fluctuations
• H1: Alternate Hypothesis
• A hypothesis that is different from the null hypothesis
• If sample info leads us to reject H0, then accept H1
 In hypothesis testing, we must stated the assumed or
hypothesized value of the population before we begin
sampling. This assumption is called the null hypothesis.

 The Null hypothesis (Ho) usually assumes there is no


difference between the observed and believed values.

 If our sample results fail to support the null hypothesis, then


the conclusion that we do accept is called the alternative
hypothesis, H1.
Set up a Hypothesis…
• If the difference is due to chance
• Accept H0
• If the difference has statistical significance
• Reject H0
 One-tailed Test
 Is a significance test in which the null hypothesis can be
upset by values well above or below the mean but not both.
Ho :  < o Ho :  > o

 Two-tailed test
 Is a significance test in which it will reject the null hypothesis
if the sample mean is significantly higher or lower then
hypothesized population mean.(i.e. there are two rejection
region)
 Ho :  # o
Terminologies
 Significance level
 Complementary concepts to confidence limits.
 Probability of committing a TYPE 1 error, naming rejecting
the null hypothesis when in reality it is true.
 There is no single standard or universal level of significance
for testing hypothesis.
 The higher the significance level, the higher the probability
of rejecting a null hypothesis when it is true.
Set up a Significance level
• The confidence with which an experimenter
rejects or retains H0 depends on the level of
significance involved
•  = 5%
• 5% chance that H0 is rejected when it should be
accepted
• 95% confident that we have made the right decision
• Willing to accept a 5% chance of being wrong to reject
H0
Suitable Test Statistic

Sample Statistic – Hypothesized Parameter


-----------------------------------------------------
Standard Error of Statistic
• Calculation  use appropriate formula and get the calculated value

• Inference - p value approach


classical approach
P-value
• The p-value is a measure of the likelihood of the
sample results when the null hypothesis is
assumed to be true
• The smaller the p-value, the less likely it is that
the sample results came from a situation where
the null hypothesis is true
Errors

D C O N D I T I O N
E H0 : True H0 : False
C Accept H0 Correct Decision TYPE 2 Error
I Confidence Level ()
S (1 - )
I Reject H0 TYPE 1 Error Correct Decision
() Power of Test
O
(1 - )
N
 Type 1 error, 
 Is the error of rejecting a null hypothesis when it is true.

 Type 11error, 
 Is the error of accepting a null hypothesis when it is actually
false.

 In order for any tests of hypothesis or rules of decision


to be good, they must be designed so as to minimise
errors of decision. The only way to reduce both types
of errors is to increase the sample size.
Steps involved in TOH
Step 1: set up the null and alternative hypothesis ( one tail v/s
two tail)

Step 2 : define the level of significance

Step 3 : Test statistic

Step 4 : calculation

Step 5 : inference
• Classical approach : if table value > calculated value – accept Ho
• P value approach : if p>  accept Ho.
FLOW CHART FOR HYPOTHESIS TESTING

State H0 as well as H1

Specify the level of significance 

Decide the correct sampling distribution

Sample a random sample(s) and workout an appropriate value from


sample data

Calculate the probability that sample result would diverge as widely as it


has from expectations, if H0 were true

Is this probability equal to or smaller than  value in case of one-tailed test


and /2 in case of two- tailed test
YES NO
Reject H0 Accept H0
• Parametric • Non parametric
• on the assumption, or presence • on occasions, the data are not
of the normal distribution normal, or contain extreme
• concerned with the values or not enough is known to
parameters of the distribution be able to make any assumption
e.g. mean, proportion. about the type of distribution/
distribution free
Advantages of non-parametric tests
No assumptions need to be made about the
underlying distribution
They can be used on data ranked in some order.
Mathematic concepts are simpler than for parametric
tests
Disadvantages of non-parametric tests
They are less discriminating than parametric tests.
I.e. they are more prone to error and less powerful
 Although simple, the arithmetic may take a long
time
Chi-square (2) Distribution
 used when it is wished to compare an actual, observed
distribution with a hypothesized, or expected
distribution.

 Often referred to as a ‘goodness of fit’ test & test for


independence

O  E 2
 2 =

E
 where O = the observed frequency of any value E
= the expected frequency of any value
 The obtained value from the formula is compared with
the value from 2 table for a given significance level and
the number of degrees of freedom.

 Degrees of freedom = (Rows-1)(Columns –1)

 If 2 calculated is > 2 from table, the null hypothesis is


rejected.
 Use broadly for
 Test of goodness of fit (for one way classification or for one
variable only)
 Can also be used to determine how well empirical
distributions I.e. those obtained from sample data fit
theoretical distributions such as the Normal, Poisson and
Binomial
 Test of independence (for more than one row or column in
the form of a contingency table covering several attributes.)
 Note that :
 When calculating, the expected cell values, the expected
frequency is less than 5, the 2 test becomes inaccurate. In
such circumstances the cell which is less than 5 is merged
with an adjoining cell so that the expected frequencies in all
resulting cells are at least 5.
Degrees of freedom: the number of degrees of freedom is equal to
the number of independent constraints.
If there are 10 frequency classes and there is one independent
constraints, then there are (10-1)=9 degrees of freedom. Thus if ‘n’
is the number of groups and one constraints is placed by making the
totals of observed and expected frequencies equal, the degree of
freedom would be equal to (n-1).
If the case of rc contingency, the degree of freedom is worked
out as
d.f.=(r-1)(c-1)
c: No. of columns
r: No. of rows.
Conditions for the application of 2-Test:
The following conditions should be satisfied before 2 –test can
be applied:
(i) Observations recorded and used are collected on a random
basis,
(ii) All the items in the sample must be independent.
Chi-square Goodness-of-fit Test

 Does sample data conform to a hypothesized distribution?


 Examples:
 Are technical support calls equal across all days of
the week? (i.e., do calls follow a uniform
distribution?)
 Do measurements from a production process follow
a normal distribution?
Quick Review Question

 Example:
 Are technical support calls equal across all days of the week? (i.e., do calls follow a
uniform distribution?)
 Sample data for 10 days per day of week:

Sum of calls for this day:


Monday 290
Tuesday 250
Wednesday 238
Thursday 257
Friday 265
Saturday 230
Sunday 192

 = 1722
Logic of Goodness-of-Fit Test
 If calls are uniformly distributed, the 1722 calls would be expected
to be equally divided across the 7 days:
1722
 246 expected calls per day if uniform
7
 Chi-Square Goodness-of-Fit Test: test to see if the sample results
are consistent with the expected results
Observed & Expected Frequencies
Observed Expected
oi ei
Monday 290 246
Tuesday 250 246
Wednesday 238 246
Thursday 257 246
Friday 265 246
Saturday 230 246
Sunday 192 246

TOTAL 1722 1722


Chi-Square Test Statistic
H0: The distribution of calls is uniform over days of the week
HA: The distribution of calls is not uniform

 The test statistic is


(o  e ) 2
2   i i
(where df  k  1)
ei
where:
k = number of categories
oi = observed cell frequency for category i
ei = expected cell frequency for category i
The Rejection Region

H0: The distribution of calls is uniform over days of the week


HA: The distribution of calls is not uniform

( o  e ) 2
2   i i
ei
• Reject H0 if  
2 2
α

(with k – 1 degrees of
freedom) 0 2
Do not Reject H0
reject H0 
2
Chi-Square Test Statistic
Contingency Tables
 Situations involving multiple population
proportions
 Used to classify sample observations according
to two or more characteristics
 Also called a cross-tabulation table.
Example:
 The following data concerning industrial accidents and absentees
classified according to the types of employee.

 Is there any evidence to suggest that the severity of accident is


associated with type of employee ?
Logic of the test
H0: Severity of accident is independent of type of employees
HA: Severity of accident is not independent of type of employees

 If H0 is true, then the proportion of severity of accidents


should be the same as the proportion of type of
employees
Observed vs. expected Frequencies
The Chi-square contingency test statistic is:

χ 
2  O  E 2
with d.f .  (r  1)(c  1)
E
 where:
O = observed frequency
E = expected frequency
r = number of rows
c = number of columns
Contingency Analysis
Example:
 Left-Handed vs. Gender
 Dominant Hand: Left vs. Right
 Gender: Male vs. Female

H0: Hand preference is independent of gender


HA: Hand preference is not independent of gender
Example:
Logic of the test

H0: Hand preference is independent of gender


HA: Hand preference is not independent of gender

 If H0 is true, then the proportion of left-handed females


should be the same as the proportion of left-handed
males
 The two proportions above should be the same as the
proportion of left-handed people overall
Observed vs. expected Frequencies
Contingency Analysis
Example 1- The following table gives the number of aircraft accidents that
occur during the various days of the week. Find whether the accidents are
uniformly distributed over the week.
Days: Sun. Mon. Tues. Wed. Thurs. Fri. Sat.
No. of accidents: 14 16 8 12 11 9 14

Given table value of Chi-square at 5% level of significance for 6 d.f. is


12.59 (20.05=12.59).
Solution: Here we set the null hypothesis, the expected frequencies of the
accidents on each of the days would be:
Days: Sun. Mon. Tues. Wed. Thurs. Fri. Sat.
No. of accidents: 12 12 12 12 12 12 12
Thus we have
Observed Expected (O-E) (O-E)2 (O-E)2/E No. of d.f
frequency frequency =7-1=6
O E
14 12 2 4 4/12 Table value
16 12 4 16 16/12 26,0.05=12.59
8 12 -4 16 16/12 2cal.=4.17
12 12 0 0 0 We see that
11 12 -1 1 1/12 2cal.< 2tab
9 12 -3 9 9/12 Ho is
14 12 2 4 4/12 accepted.
Total 56/12=4.17

Example 2-Two research worker classified some people in income


groups on the basis of sampling studies. Their results are as follows:

Investigator Income Groups Total


Poor Middle Rich
A 160 30 10 200
B 140 120 40 300
Total 300 150 50 500
Show that the sampling technique of at least one research worker is
defective.
Solution: Let us take the hypothesis that the sampling techniques adopted
by research workers are similar (i.e. there is no difference between
the techniques adopted by research workers). This being so, the
expectation of A investigator classifying the people in
Poor income group= 200300
500
= 120
(ii) Middle income group= 200150
500
= 60
(iii) Rich income group= 20050
500
= 20
Similarly the expectation of B investigator classifying the people in
(i) Poor income group= 300300
500
= 180
(ii) Middle income group = 300150
500
= 90

(iii) Rich income group= 30050


500
= 30
We can now calculate value of 2 as follows:
Groups Observed Expected (O-E) (O-E)2/E
frequency (O) frequency (E)
Investigator A
Classified 160 120 40 1600/120=13.
people as poor 33
Classified 30 60 -30 900/60=15
people as
middle
Classified 10 20 -10 100/20=5
people as rich
Investigator B
Classified 140 180 -40 1600/180=8.8
people as poor 8
Classified 120 90 30 900/90=10
people as
middle
Classified 40 30 10 100/30=3.33
people as rich
Total 55.54
Think: Suppose we find that a person’s gender affects his or
her attitude toward abortion. What are the two variables
involved in this explanatory finding? Which variable is the
independent variable? Which is the independent variable?
Which is the dependent variable?
A WORD OF CAUTION
The fact that two variables “go together” does not mean
that change in one variable causes changes in another
variable. A social researcher in his study shows that violent
crime rates (a dependent variable) are lower in metropolitan
areas where people tend to watch violent TV programs than
in areas where they don’t. Does this mean that watching
violent TV programs “causes” less violent crime? Probably
not.
Example: Two sample polls of votes for two candidates A and B for
a public office are taken, one from among the residents of rural areas.
The results are given in the table. Examine whether the nature of the
area is related to voting preference in this election.
Votes A B Total

for
Area

Rural 620 380 1000


a b a+b
Urban 550 450 1000
c d c+d
Total 1170 830 2000
a+c b+d
Under the null hypothesis that the nature of the area is independent of the
voting preference in the election, we get the expected frequencies as follows:
11701000
E(620)= =585,
2000
8301000
E(380)= =415,
2000
11701000
E(550)= =585,
2000
8301000
E(450)= =415,
2000
  O  E E 
2 2  620  585 2


 380  415 2


 550  585 2


 450  415 2

585 415 585 415



 35    35    35   35   35 2  1  1  1  1 
2 2 2 2

 
585 415 585 415  585 415 585 415 
 1225 0.002409  0.001709  0.002409  0.001709
i.e. 2cal =10.0891
Tabulated 2 for (2-1)(2-1)=1 d.f. at 5% level of significance is 3.841 i.e.
2tab=3.841.
Here we see that 2cal>2tab (10.0891>3.841)Ho is rejected i.e. it is highly
significant at 5% level of significance. Thus we conclude that nature of area is
related to voting preference in the election.
Alternative procedure: To calculate the value 2, we can use the following
formula:
Total
a b a+b
c d c+d
Total a+c b+d N = a+b+c+d
N = 620+380+550+450=2000
N  ad  bc  2000 620  450  380  550 
2 2

  2
  10.09165
 a  b  a  c  b  d  c  d  1000 1170  830 1000
What is a Hypothesis
Of a test?
• A hypothesis is an I assume the mean AGE
of this class is 50!!!
assumption about the
population parameter. Am I correct? TEST IT!
• A parameter is a
characteristic of the
population, like its
mean or variance.
• The parameter must
be identified before
analysis.

© 1984-1994 T/Maker Co.


The Null Hypothesis, H0
• States the Assumption (numerical) to be
tested
e.g. Our class mean age is 50 (H0: µ=50)
• Begin with the assumption that the null
hypothesis is TRUE.
(Similar to the notion of innocent until proven
guilty)

The Null Hypothesis may or may not be


rejected,but our aim is to REJECT the null
hypothesis!
The Alternative Hypothesis, H1

• Is the opposite of the null hypothesis


e.g. The average age of our class is
different from 50 (H1: µ ≠50)

• Is generally the hypothesis that is


believed to be true by the researcher!
Identify the Problem

• Steps:
• State the Null Hypothesis
• State its opposite, the Alternative Hypothesis
• Hypotheses are mutually exclusive &
exhaustive
• Sometimes it is easier to form the
alternative hypothesis first.
Hypothesis Testing Process

Assume the
population
mean age is 50.
(Null Hypothesis) Population

The Sample
Is X =20 @ m =50? Mean Is 20
No, not likely!
 

REJECT
Sample
Null Hypothesis
Reason for Rejecting H0

Sampling Distribution
Our sample
mean (20) we reject the
falls in the null hypothesis
tails!It’s not H0 that µ = 50.
likely!
Hypotyzed
population mean.

20 µ = 50 Sample Mean

Observed population mean


Level of Significance, α

• Defines the Rejection region

• Typical value of a is 0.05. It Provides the


Critical Value(s) of the Test

Critical
Rejection
Regions Value
α “Area” of the
Rejection region

0
Level of Significance, α and
the Rejection Region
One tail (left) test
a Critical
H0: m  0
H1: m < 0 0 Value(s)
Rejection
Regions One tail (right) test
H0: m  0
a
H1: m > 0
0
H0: m = 0
Two tails test
H1: m ¹ 0 a/2

0
Errors in Making Decisions
• Type I Error
• Reject Null Hypothesis when it is True (“False
Positive”)
• Has Serious Consequences
• Probability of Type I Error Is α
Called Level of Significance

• Type II Error
• Do Not Reject Null Hypothesis when it is
False (“False Negative”)
• Probability of Type II Error Is β
( Power 1- β )
a &b Have an Inverse
Relationship
Reduce probability of
one error and the
other one goes up.

One possibility: Increase the sample


size!!!!
What is the p Value and how to use it
in a Test?

• The p-value is the Probability of Obtaining a Test Statistic (under


H0) more Extreme (£ or ³) than the observed Sample Value

Observed One tail test


Sample p
Value

• Used to Make Rejection Decision


0

• If p value < a  Reject H0  SUCCESS

• If p value ³ a  Do Not Reject H0  FAILURE


DECISION THEORY
Introduction
• Statistical Decision Theory provides an analytical
and systematic approach to the study of decision
making
• Use of statistical techniques to solve problems for
which information is incomplete, uncertain or
completely lacking
• Data concerning occurrence of different outcomes
are evaluated to enable the decision maker to
identify suitable decision alternatives or courses
of action
• Decide among alternatives by taking into account
the monetary repercussions of actions (payoff)
Decision theory
• Provides a formal analytic framework for decision making under conditions of
uncertainty

• Also called decision analysis

• Used to determine optimal strategies where a decision maker faced with several
decision alternatives and an uncertain, or risky pattern of future events.

• Decision – Definition

Defined as the selection by the decision maker of an act, considered to be


best according to some predesigned standard, from among the available options
Decision making process

Choosing from
alternatives
Determination of
payoff

Identification of all
courses of action
(Strategies)

Identification of various possible


outcomes (States of nature or
events Ei)
Introduction…
• Objective is to maximize gains or minimize
losses
• Several courses of action – choice among
alternatives
• Calculate the measure of benefit of various
alternatives
• Events beyond the decision maker’s control
(acts of God)
• Uncertainty concerning which outcome will
actually happen
Types of Decision Making Environments
• Decision Making under Certainty – perfect
knowledge – only one possible future state of
nature exists
• Decision Making under Risk – less than
complete knowledge – certainty of
consequence of every decision choice – with
associated probabilities
• Decision Making under Uncertainty – unable
to specify probabilities
Problem Formulation

 The first step in the decision analysis process is


problem formulation.
 We begin with a verbal statement of the problem.
 Then we identify:
• the decision alternatives
• the states of nature (uncertain future events)
• the payoffs (consequences) associated with each
specific combination of:
• decision alternative
• state of nature
Problem Formulation
 A decision problem is characterized by decision
alternatives, states of nature, and resulting payoffs.
 The decision alternatives are the different possible
strategies the decision maker can employ.
 The states of nature refer to future events, not
under the control of the decision maker, which
may occur.
 States of nature should be defined so that they are
mutually exclusive and collectively exhaustive.
Payoff Tables
 The consequence resulting from a specific
combination of a decision alternative and a state of
nature is a payoff.

 A table showing payoffs for all combinations of


decision alternatives and states of nature is a payoff
table.

 Payoffs can be expressed in terms of profit, cost, time,


distance or any other appropriate measure.
Preparing a Payoff table and opportunity
loss table
• A flower merchant purchases roses at Rs. 10 per dozen and sells them
for Rs. 30. Unsold flowers are donated to a temple. Prepare a payoff
table and opportunity loss table. Consider the event and strategy in
multiples of 5.
Solution - Payoff
States of nature
E1 (5) E2 (10) E3 (15) E4 (20)

Strategies S1 100 100 100 100


(5)

S2 50 200 200 200


(10)

S3
(15) 0 150 300 300

S4
(20) -50 100 250 400
Solution – opportunity loss
States of nature
E1 (5) E2 (10) E3 (15) E4 (20)

Strategies S1 0 100 200 300


(5)

S2 50 0 100 200
(10)

S3
(15) 100 50 0 100

S4
(20) 150 100 50 0
Decision Making Under Certainty
Manager knows which event will occur
pick the alternative with the best payoff
Possible Future Demand
Alternative Low High
Small facility 200 270
Large facility 160 800
Do nothing 0 0

What is the best choice if future demand will be low?


Decision Making Under Uncertainty

Decision making without probability


(no probability of occurrence are assigned)

Decision making with probabilities


(probabilities can be assigned)
Decision Making with Probabilities

Expected Value Approach


 If probabilistic information regarding the states of
nature is available, one may use the expected value
(EV) approach.
 Expected value is computed by multiplying each
decision outcome under each state of nature by the
probability of its occurrence
 The decision yielding the best expected return is
chosen.
Decision Making with Probabilities
 Once we have defined the decision alternatives and
states of nature for the chance events, we focus on
determining probabilities for the states of nature.
 The classical method, relative frequency method, or
subjective method of assigning probabilities may be
used.
 Because one and only one of the N states of nature can
occur, the probabilities must satisfy two conditions:

P(sj) > 0 for all states of nature

 P(s )  P(s )  P(s )    P(s


j 1
j 1 2 N )1
Expected Value Approach
 The expected value of a decision alternative is the
sum of weighted payoffs for the decision alternative.
 The expected value (EV) of decision alternative di is
defined as
N
EV( d i )   P( s j )Vij
j 1

where: N = the number of states of nature


P(sj ) = the probability of state of nature sj
Vij = the payoff corresponding to decision
alternative di and state of nature sj
Problem
• Warren Bubby, a wealthy investor is offered three major investments viz.,
conservative, speculative and countercyclical. The profits under these
scenarios (i) Improving economy (ii) Stable economy (iii) Worsening
economy are given in the following payoff matrix (in dollars)

Investment pattern Improving economy Stable economy Worsening economy

Conservative
30 m 5m -10 m
Speculative
40 m 10 m -30 m
Counter cyclical
-10 m 0 15 m
• If the prior probabilities for improving economy, stable economy and worsening
economy are 0.1, 0.5 and 0.4, which investment would Warren consider?
Solution
E(conservative) = 0.1*30+0.5*5-0.4*10 =
E(Speculative) = 40*0.1+0.5*10 -30*0.4 =
E(counter cylic) = -10*0.1+ 0*0.5+ 10*0.4 =
Decision Trees

 A decision tree is a chronological representation of


the decision problem. It is a graphical device that
forces the decision-maker to examine all possible
outcomes, including unfavorable ones.

 makes easier the computation of the expected


values

 easy to understand the process of making decision


 Each decision tree has two types of nodes; round
nodes correspond to the states of nature while square
nodes correspond to the decision alternatives.
 The branches leaving each round node represent the
different states of nature while the branches leaving
each square node represent the different decision
alternatives.
 At the end of each limb of a tree are the payoffs
attained from the series of branches making up that
limb.
How to draw a decision Tree ?
Expected Value Approach
 Example: Burger Prince

Burger Prince Restaurant is considering opening a new


restaurant on Main Street. It has three different
restaurant layout models (A, B, and C), each with a
different seating capacity.

Burger Prince estimates that the average number


of customers served per hour will be 80, 100, or 120.
The payoff table for the three models is on the next
slide.
Expected Value Approach
 Payoff Table

Average Number of Customers


Per Hour
s1 = 80 s2 = 100 s3 = 120

Model A $10,000 $15,000 $14,000


Model B $ 8,000 $18,000 $12,000
Model C $ 6,000 $16,000 $21,000
Expected Value Approach
 Calculate the expected value for each decision.

 The decision tree on the next slide can assist in this


calculation.
 Here d1, d2, d3 represent the decision alternatives of
models A, B, and C. the probabilities are 0.4,0.2,0.4
respectively
 And s1, s2, s3 represent the states of nature of 80, 100,
and 120 customers per hour.
Expected Value Approach

Decision Tree Payoffs


s1 .4
10,000
s2 .2
2 s3 15,000
.4
d1
14,000
s1 .4
d2 8,000
s2 .2
1 3 18,000
s3 .4
d3 12,000
s1 .4
6,000
s2 .2
4 16,000
s3
.4
21,000
Expected Value Approach

EMV = .4(10,000) + .2(15,000)


d1 2 + .4(14,000) = $12,600
Model A
EMV = .4(8,000) + .2(18,000)
Model B d2 + .4(12,000) = $11,600
1 3

d3 EMV = .4(6,000) + .2(16,000)


Model C
4 + .4(21,000) = $14,000

Choose the model with largest EV, Model C


 END

You might also like