You are on page 1of 64

Statistics for Decisions Making

PGP 17-19

Dr. Rohit Joshi, IIM Shillong


Let us see some situations
Let us see some situations

Election Contestants
Let us see some situations

Operations Manager
Let us see some situations

Marketing Research
Let us see some situations

A Pharmaceutical manufacturer needs to determine whether


a new drug is more effective than those currently in use
Let us see some situations

Effectiveness of a Gym or a Gym trainer


Let us see some situations

Performance evaluation and appraisal


Let us see some situations

Financial Evaluation and Forecasting


Let us see some situations
Why Study Statistics?

1. Data are everywhere


2. Statistical techniques are used to make
many decisions that affect our lives
3. No matter what your career, you will
make professional decisions that involve
data. An understanding of statistical
methods will help you make these
decisions efectively
Why Study Statistics?
Decision Makers Use Statistics To:

 Present and describe business data and information


properly
 Draw conclusions about large populations, using
information collected from samples
 Make reliable forecasts about a business activity
 Improve business processes
Types of Statistics
Statistics
 The branch of mathematics that transforms data
into useful information for decision makers.

Descriptive Statistics Inferential Statistics

Collecting, summarizing, and Drawing conclusions and/or making


describing data decisions concerning a population
based only on sample data
Descriptive Statistics

 Collect data
 ex. Survey
 Present data
 ex. Tables and graphs
 Characterize data
 ex. Sample mean =  X i

n
Inferential Statistics
 Estimation
 ex. Estimate the
population mean weight
using the sample average
weight
 Hypothesis testing
 ex. Test the claim that
the population mean
weight is 65 Kg
Drawing conclusions and/or making decisions
concerning a population based on sample results.
Basic Vocabulary of Statistics
VARIABLE
A variable is a characteristic of an item or individual.

DATA
Data are the different values associated with a variable.

POPULATION
A population consists of all the items or individuals about which you want to draw a
conclusion.

SAMPLE
A sample is the portion of a population selected for analysis.

PARAMETER
A parameter is a numerical measure that describes a characteristic of a population.

STATISTIC
A statistic is a numerical measure that describes a characteristic of a sample
Population vs. Sample

Population Sample

Measures used to describe the Measures computed from


population are called parameters sample data are called statistics
Sources of Data
 Primary Sources: The data collector is the one
using the data for analysis
 Data from a political survey
 Data collected from an experiment
 Observed data
 Secondary Sources: The person performing data
analysis is not the data collector
 Analyzing census data
 Examining data from print journals or data published on
the internet.
Types of Variables
 Categorical (qualitative) variables have values
that can only be placed into categories, such as
“yes” and “no.”

 Numerical (quantitative) variables have values


that represent quantities.
Types of Data

Data

Categorical Numerical

Examples:
 Marital Status Discrete Continuous
 Political Party
 Eye Color
Examples: Examples:
(Defined categories)
 Number of Children  Weight
 Defects per hour  Voltage
(Counted items) (Measured characteristics)
Levels of Measurement

Nominal
Ordinal
Interval
Ratio
Nominal Scale
Lowest level of data measurement
Classifies data into distinct categories in which no
ranking is implied
Used to classify or categorize

 Personal Computer Ownership : Yes/ No

 Type of Stocks Owned: Growth/Disinvestment

 Your role in class: Instructor/ Participant/ Observer


Ordinal scale
An ordinal scale classifies data into distinct
categories in which ranking is implied
Likert Scale

How much this statistic course is helpful to you


Student
1. Not at Grades
all
2. Some what
Faculty rank
3. Moderate
Standard
4. Very much& Poor’s bond ratings
5. Extremely
Product satisfaction
Interval scale
 An interval scale is an ordered scale in which the
difference between measurements is a meaningful
quantity but the measurements do not have a true
zero point.
 Does not mean absence of phenomena

Temperature, Standardized Score (Z value, SAT)


Ratio Scale
A ratio scale is an ordered scale in which the
difference between the measurements is a
meaningful quantity and the measurements have a
true zero point.
Height, Weight, Age

Ratio
Interval
Ordinal

Nominal
A survey in healthcare industry
Many changes continues to occur in the healthcare
industry. Because of increased competition for
patients among providers and the need to
determine how providers can better serve their
clientele, hospital administrator sometimes mail a
qualitative satisfactory survey to their patient after
the patient is released. The following types of
questions are some time asked on such a survey.
1. How long ago were you released from the hospital?
2. Which type of unit were you in for most of your stay?
1. Intensive care
2. Maternity care
3. Medical unit
4. Pediatric/ children’s unit
5. Surgical unit
3. In choosing a hospital how important was the hospital
location? (Circle one)
Very imp Somewhat imp Not very imp not at all
4. Rate the skill of the doctor:
Excellent Very good Good Fair Poor
5. On the following scale from one to seven, rate the
nursing care
Poor 1 2 3 4 5 6 7 Excellent
These question will result in what level of data measurement?
Let us jump to SPSS
Probability
 Empirical classic probability
 Based on historical data
 Computed after performing the experiment
 Number of times an event occurred divided by the number of
trials
 Objective -- everyone correctly using the method assigns an
identical probability
Subjective probability
 different individuals may (correctly) assign different numeric
probabilities to the same event
Mutually Exclusive event
Collectively Exhaustive event
Equally Likely event
Random Variable
 A random variable x takes on a defined set
of values with different probabilities.
 For example, if you roll a die, the outcome is random
(not fixed) and there are 6 possible outcomes, each of
which occur with probability one-sixth.
 For example, if you poll people about their voting
preferences, the percentage of the sample that responds
“Yes on Proposition 100” is a also a random variable
(the percentage will be slightly differently every time
you poll).

 Roughly, probability is how frequently we


expect different outcomes to occur if we
repeat the experiment over and over
(“frequentist” view)
Random variables can be discrete or
continuous
Discrete random variables have a countable number of
outcomes
Examples: Dead/alive, dice, counts, etc.

Continuous random variables have an infinite continuum of


possible values.
Examples: blood pressure, weight, the speed of a car, the real
numbers from 1 to 6.
Probability functions
A probability function maps the possible values of
x against their respective probabilities of
occurrence, p(x)
p(x) is a number from 0 to 1.0.
The area under a probability function is always 1.
Discrete example: roll of a die

p(x)

1/6

x
1 2 3 4 5 6

 P(x)  1
all x
Probability mass function (pmf)

x p(x)
1 p(x=1)=1/6

2 p(x=2)=1/6

3 p(x=3)=1/6

4 p(x=4)=1/6

5 p(x=5)=1/6

6 p(x=6)=1/6
1.0
Cumulative distribution function
(CDF)

1.0 P(x)
5/6
2/3
1/2
1/3
1/6
1 2 3 4 5 6 x
Cumulative distribution function
x P(x≤A)
1 P(x≤1)=1/6

2 P(x≤2)=2/6

3 P(x≤3)=3/6

4 P(x≤4)=4/6

5 P(x≤5)=5/6

6 P(x≤6)=6/6
Practice Problem:
 The number of patients seen in the ER in any given hour is
a random variable represented by x. The probability
distribution for x is:

x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1

Find the probability that in a given hour:


a.    exactly 14 patients arrive  p(x=14)= .1
b.    At least 12 patients arrive p(x12)= (.2 + .1 +.1) = .4
c.    At most 11 patients arrive p(x≤11)= (.4 +.2) = .6
Continuous case
 The probability function that accompanies a continuous
random variable is a continuous mathematical function
that integrates to 1.
 For example, recall the negative exponential function (in
probability, this is called an “exponential distribution”):

f ( x)  e  x

 This function integrates to 1:


 

e
x x
 e  0 1 1
0
0
For example, the probability of x falling within 1 to 2:

Clinical example: Survival


times after lung transplant may
roughly follow an exponential
function.
p(x)=e-x
Then, the probability that a
patient will die in the second
year after surgery (between 1
years 1 and 2) is 23%.

x
1 2

2 2


x x
P(1  x  2)  e  e  e  2  e 1  .135  .368  .23
1
1
Expected Value and Variance
All probability distributions are
characterized by an expected value
(mean) and a variance (standard
deviation squared).
Expected value, formally
Discrete case:

E( X )   x p(x )
all x
i i

Continuous case:

E( X )  
all x
xi p(xi )dx
A Situation
Acme Fruit and Vegetable Wholesalers buys tomatoes,
then sells them to retailers. Acme currently pays ` 2000
per container. Tomatoes sold on the same day bring `
5000 per container. Extremely perishable in nature, if
any tomato container not sold on the same day are
worthless and required to be disposed off (consider at
no cost). The distribution manager’s problem is to
determine the optimum number he should order each
day. On days when he stocks more than he sells, his
profit is reduced by the cost of the unsold containers.
On the other hand, when retailers request more
containers than he has in stock, he loses sales and
makes smaller profit than he could have.
Developing Pay-off table
 Acme currently pays Rs. 2000 per container. Tomatoes
sold on the same day bring Rs. 5000 per container. Profit
= 3000 per container.
Pay off table in ` ‘00
ACTIONS ( Quantity ordered Q)
EVENTS Q1= 10 Q2= 11 Q3 =12 Q4= 13
(Demand)
D1= 10 300 280 260 240
D2= 11 300 330 310 290
D3= 12 300 330 360 340
D4= 13 300 330 360 390

When D  Q, P = 30 Q and when D Q, P = 30 D – 20 (Q-D)


Probability of Occurrence principle
Let us suppose the Manager kept a record of his sales for the past
100 days.

Daily Sales Number of days Probability of each


sold number being sold
D1= 10 15 0.15
D2= 11 20 0.20
D3= 12 40 0.40
D4= 13 25 0.25

N
The expected value
EV( d i )   decision
(EV) of P( s j )Vij alternative di is defined as:
j 1

where: N = the number of states of nature


P(sj ) = the probability of state of nature sj
ij = the payoff corresponding to decision alternative di and state of nature sj
Expected profit from stocking 10 containers
ACTION ( Quantity ordered is 10)
EVENTS Conditional Probability of Expected profit
(Demand) profit (1) selling (2) =(1) x (2)
D1= 10 300 0.15 45
D2= 11 300 0.20 60
D3= 12 300 0.40 120
D4= 13 300 0.25 75
Total EV 300

Expected profit from stocking 11 containers


ACTION ( Quantity ordered is 11)
EVENTS Conditional Probability of Expected profit
(Demand) profit (1) selling (2) =(1) x (2)
D1= 10 280 0.15 42
D2= 11 330 0.20 66
D3= 12 330 0.40 132
D4= 13 330 0.25 82
Total EV 322.50
Expected profit from stocking 12 containers
ACTION ( Quantity ordered is 12)
EVENTS Conditional Probability of Expected profit
(Demand) profit (1) selling (2) =(1) x (2)
D1= 10 260 0.15 39
D2= 11 310 0.20 62
D3= 12 360 0.40 144
D4= 13 360 0.25 90
Total EV 335

Expected profit from stocking 13 containers


ACTION ( Quantity ordered is 13) Strategy adopted
EVENTS Conditional Probability of Expected profit
(Demand) profit (1) selling (2) =(1) x (2)
D1= 10 240 0.15 36
D2= 11 290 0.20 58
D3= 12 340 0.40 136
D4= 13 390 0.25 97
Total EV 327.50
Important discrete probability
distribution: The binomial
The Binomial Distribution: Properties
 A fixed number of observations, n
 ex. 15 tosses of a coin; ten light bulbs taken from a
warehouse
 Two mutually exclusive and collectively
exhaustive categories
 ex. head or tail in each toss of a coin; defective or not
defective light bulb; having a boy or girl
 Generally called “success” and “failure”
 Probability of success is p, probability of failure is 1 – p

 Constant probability for each observation


 The outcome of one observation does not affect the outcome of
the other
 Two sampling methods
 Infinite population without replacement
 Finite population with replacement
Binomial distribution
Take the example of 5 coin tosses. What’s the
probability that you flip exactly 3 heads in 5 coin
tosses?
Binomial distribution, generally
Note the general pattern emerging  if you have only two possible
outcomes (call them 1/0 or yes/no or success/failure) in n independent
trials, then the probability of exactly X “successes”=
n = number of trials

n X n X
  p (1  p )
X 1-p = probability
X=# of failure
successes p=
out of n probability of
trials success
Binomial distribution: example

If I toss a coin 20 times, what’s the probability of


getting exactly 10 heads?

 20  10 10
 (.5) (.5)  .176
 10 
Binomial distribution: example
If I toss a coin 20 times, what’s the probability of
getting of getting 2 or fewer heads?

 20  20!
  (.5) 0
(.5) 20
 (.5) 20  9.5 x107 
0 20!0!
 20  20!
1
 (.5) (.5) 
19
(.5) 20  20x9.5 x10 7  1.9 x105 
1 19!1!
 20  20!
  (.5) 2
(.5)18
 (.5) 20  190x9.5 x10 7  1.8 x10 4
2 18!2!
 1.8 x10 4
**All probability distributions are characterized
by an expected value and a variance:
If X follows a binomial distribution with parameters n
and p: X ~ Bin (n, p)

Mean μ  E(x)  np

 Variance and Standard Deviation

σ 2  np (1 - p ) σ np (1 - p )

Where n = sample size


p = probability of success
(1 – p) = probability of failure
Applications
A manufacturing plant labels items as either
defective or acceptable
A firm bidding for contracts will either get a
contract or not
A marketing research firm receives survey responses
of “yes I will buy” or “no I will not”
New job applicants either accept the offer or reject it
Your team either wins or loses the football game at
the company picnic
The Hypergeometric Distribution
The binomial distribution is applicable
when selecting from a finite population with
replacement or from an infinite population
without replacement.

The hypergeometric distribution is


applicable when selecting from a finite
population without replacement.
The Hypergeometric Distribution

A  N  A 

X 

n  X 
P( X )    
N

n 

 

Where
N = population size
A = number of successes in the population
N – A = number of failures in the population
n = sample size
X = number of successes in the sample
n – X = number of failures in the sample
The Hypergeometric Distribution
Example
Different computers are checked from 10 in the
department. 4 of the 10 computers have illegal
software loaded. What is the probability that 2 of the 3
selected computers have illegal software loaded?
So, N = 10, n = 3, A = 4, X = 2

 A  N  A   4  6 
     
 X  n  X   2 1  (6)(6)
P(X  2)           0.3
 
N  
10 120
   
n  3 
   

The probability that 2 of the 3 selected computers have


illegal software loaded is .30, or 30%.
The Hypergeometric Distribution
Characteristics

The mean of the hypergeometric distribution is:

nA
μ  E(x) 
N
 The standard deviation is:

nA(N - A) N - n
σ 2

N N -1

N-n
Where N - 1 is called the “Finite Population Correction Factor”

from sampling without replacement from a finite population


The Poisson Distribution Definitions
An area of opportunity is a continuous unit or
interval of time, volume, or such area in which
more than one occurrence of an event can
occur.

ex. The number of scratches in a car’s paint


ex. The number of mosquito bites on a
person
ex. The number of computer crashes in a day
The Poisson Distribution Properties
Apply the Poisson Distribution when:
You wish to count the number of times an event occurs in a
given area of opportunity
The probability that an event occurs in one area of opportunity
is the same for all areas of opportunity
The number of events that occur in one area of opportunity is
independent of the number of events that occur in the other
areas of opportunity
The probability that two or more events occur in an area of
opportunity approaches zero as the area of opportunity
becomes smaller
The average number of events per unit is  (lambda)
The Poisson Distribution Formula

eλ λ x
P(X) 
X!

where:
X = the probability of X events in an area of opportunity
 = expected number of events
e = mathematical constant approximated by 2.71828…
An example
Suppose that, on average, 5 cars enter a parking lot
per minute. What is the probability that in a given
minute, 7 cars will enter?
e  λ λ x e 5 5 7
P(7)    0.104
X! 7!

So, there is a 10.4% chance 7 cars will enter the


parking in a given minute.

Mean = Variance = λ

You might also like