You are on page 1of 50

Data, Models and Decisions

PGP 13-15

Dr. Rohit Joshi, IIM Shillong

Why Study Statistics?


Decision Makers Use Statistics To:
Present and describe business data and information

properly
Draw conclusions about large populations, using
information collected from samples
Make reliable forecasts about a business activity
Improve business processes

Why Collect Data?


A marketing research analyst needs to assess the

effectiveness of a new television advertisement.

A pharmaceutical manufacturer needs to determine whether

a new drug is more effective than those currently in use.

An operations manager wants to monitor a manufacturing

process to find out whether the quality of product being


manufactured is conforming to company standards.

An auditor wants to review the financial transactions of a

company in order to determine whether the company is in


compliance with generally accepted accounting principles.

Types of Statistics
Statistics
The branch of mathematics that transforms data

into useful information for decision makers.

Descriptive Statistics

Inferential Statistics

Collecting, summarizing, and


describing data

Drawing conclusions and/or making


decisions concerning a population
based only on sample data

Descriptive Statistics
Collect data
ex. Survey

Present data
ex. Tables and graphs

Characterize data

X
ex. Sample mean = i
n

Inferential Statistics
Estimation
ex. Estimate the
population mean weight
using the sample average
weight
Hypothesis testing
ex. Test the claim that
the population average
weight is 65 Kg
Drawing conclusions and/or making decisions
concerning a population based on sample results.

Basic Vocabulary of Statistics


VARIABLE
A variable is a characteristic of an item or individual.
DATA
Data are the different values associated with a variable.
POPULATION
A population consists of all the items or individuals about which you want to draw a
conclusion.
SAMPLE
A sample is the portion of a population selected for analysis.
PARAMETER
A parameter is a numerical measure that describes a characteristic of a population.
STATISTIC
A statistic is a numerical measure that describes a characteristic of a sample

Population vs. Sample


Population

Measures used to describe the


population are called parameters

Sample

Measures computed from


sample data are called statistics

Sources of Data
Primary Sources: The data collector is the one
using the data for analysis
Data from a political survey
Data collected from an experiment
Observed data

Secondary Sources: The person performing data


analysis is not the data collector
Analyzing census data
Examining data from print journals or data published on
the internet.

Types of Variables
Categorical (qualitative) variables have values

that can only be placed into categories, such as


yes and no.
Numerical (quantitative) variables have values

that represent quantities.

Types of Data
Data

Categorical

Numerical

Examples:

Marital Status
Political Party
Eye Color
(Defined categories)

Discrete

Continuous

Examples:

Number of Children
Defects per hour
(Counted items)

Examples:

Weight
Voltage
(Measured characteristics)

Probability
Empirical classic probability

Based on historical data


Computed after performing the experiment
Number of times an event occurred divided by the number of

trials
Objective -- everyone correctly using the method assigns an
identical probability

Subjective probability
different individuals may (correctly) assign different numeric

probabilities to the same event

Mutually Exclusive event


Collectively Exhaustive event
Equally Likely event

Random Variable
A random variable x takes on a defined set
of values with different probabilities.

For example, if you roll a die, the outcome is random


(not fixed) and there are 6 possible outcomes, each of
which occur with probability one-sixth.
For example, if you poll people about their voting
preferences, the percentage of the sample that responds
Yes on Proposition 100 is a also a random variable
(the percentage will be slightly differently every time
you poll).

Roughly, probability is how frequently we


expect different outcomes to occur if we
repeat the experiment over and over
(frequentist view)

Random variables can be discrete or


continuous
Discrete random variables have a countable number of

outcomes

Examples: Dead/alive, dice, counts, etc.

Continuous random variables have an infinite continuum

of possible values.

Examples: blood pressure, weight, the speed of a car, the real

numbers from 1 to 6.

Probability functions
A probability function maps the possible values of

x against their respective probabilities of


occurrence, p(x)
p(x) is a number from 0 to 1.0.
The area under a probability function is always 1.

Discrete example: roll of a die

p(x)

1/6

P(x) 1

all x

Probability mass function (pmf)


x

p(x)

p(x=1)=1
/6
p(x=2)=1
/6
p(x=3)=1
/6
p(x=4)=1
/6
p(x=5)=1
/6
p(x=6)=1
/6
1.0

2
3
4
5
6

Cumulative distribution function


(CDF)
1.0
5/6
2/3
1/2
1/3
1/6

P(x)

Cumulative distribution function


x

P(xA)

P(x1)=1/6

P(x2)=2/6

P(x3)=3/6

P(x4)=4/6

P(x5)=5/6

P(x6)=6/6

Practice Problem:
The number of patients seen in the ER in any given hour is

a random variable represented by x. The probability


distribution for x is:
x
P(x)

10
.4

11
.2

12
.2

13
.1

14
.1

Find the probability that in a given hour:


a. exactly 14 patients arrive

p(x=14)= .1

b. At least 12 patients arrive

p(x12)= (.2 + .1 +.1) = .4

c. At most 11 patients arrive

p(x11)= (.4 +.2) = .6

Review Question 1
If you toss a die, whats the probability that you
roll a 3 or less?
a.
b.
c.
d.
e.

1/6
1/3
1/2
5/6
1.0

Review Question 1
If you toss a die, whats the probability that you
roll a 3 or less?
a.
b.
c.
d.
e.

1/6
1/3
1/2
5/6
1.0

Review Question 2
Two dice are rolled and the sum of the face
values is six? What is the probability that at
least one of the dice came up a 3?
a.
b.
c.
d.
e.

1/5
2/3
1/2
5/6
1.0

Review Question 2
Two dice are rolled and the sum of the face
values is six. What is the probability that at least
one of the dice came up a 3?
a.
b.
c.
d.
e.

1/5
2/3
1/2
5/6
1.0

How can you get a 6 on two dice?


1-5, 5-1, 2-4, 4-2, 3-3
One of these five has a 3.
1/5

Example: Suppose we flip two identical coins

simultaneously. What is the probability of obtaining a head


on the first coin (call event A) and a head on the second
coin (call event B)?
Example: A card is drawn from a well shuffled pack of
playing cards. What is the probability that it will either a
spade or a queen?
Example: In a DMD class there are 123 students of which
93 students are males and 30 are females. Of these, 36
males and 18 females plan to major in Marketing. A student
is selected at random from this class and it is found that this
student plans to be a Marketing major. What is the
probability that the student is a male?

Continuous case
The probability function that accompanies a
continuous random variable is a continuous
mathematical function that integrates to 1.

For example, recall the negative exponential


function (in probability, this is called an
exponential distribution):
f ( x) e x

This function integrates to 1:

e
0

0 1 1

For example, the probability of x falling within 1 to 2:


Clinical example: Survival
times after lung transplant may
roughly follow an exponential
function.
Then, the probability that a
patient will die in the second
year after surgery (between
years 1 and 2) is 23%.

p(x)=e-x
1

x
1
2

P(1 x 2) e
1

2
1

e 2 e 1 .135 .368 .23

Expected Value and Variance


All probability distributions are
characterized by an expected value
(mean) and a variance (standard
deviation squared).

Expected value, formally


Discrete case:

E( X )

x p(x )
i

all x

Continuous case:

E( X )

xi p(xi )dx

all x

A Situation
Acme Fruit and Vegetable Wholesalers buys tomatoes,
then sells them to retailers. Acme currently pays `
2000 per container. Tomatoes sold on the same day
bring ` 5000 per container. Extremely perishable in
nature, if any tomato container not sold on the same
day are worthless and required to be disposed off
(consider at no cost). The distribution managers
problem is to determine the optimum number he
should order each day. On days when he stocks more
than he sells, his profit is reduced by the cost of the
unsold containers. On the other hand, when retailers
request more containers than he has in stock, he loses
sales and makes smaller profit than he could have.

Developing Pay-off table


Acme currently pays ` 2000 per container. Tomatoes sold

on the same day bring ` 5000 per container. Profit = 3000


per container.
Pay off table in ` 00
ACTIONS ( Quantity ordered Q)
EVENTS
(Demand)

Q1= 10

Q2= 11

Q3 =12

Q4= 13

D1= 10

300

280

260

240

D2= 11

300

330

310

290

D3= 12

300

330

360

340

D4= 13

300

330

360

390

When D Q, P = 30 Q and when D

Q, P = 30 D 20 (Q-D)

Probability of Occurrence principle


Let us suppose the Manager kept a record of his sales for the past

100 days.
Daily Sales

Number of days
sold

Probability of each
number being sold

D1= 10

15

0.15

D2= 11

20

0.20

D3= 12

40

0.40

D4= 13

25

0.25

The expected value


of decision alternative di is defined as:
EV( d(EV)
i ) P( s j )Vij
j 1

where:
N = the number of states of nature
P(sj ) = the probability of state of nature sj
ij = the payoff corresponding to decision alternative di and

state of nature sj

Expected profit from stocking 10 containers


ACTION ( Quantity ordered is 10)
EVENTS
(Demand)

Conditional
profit (1)

Probability of
selling (2)

Expected profit
=(1) x (2)

D1= 10

300

0.15

45

D2= 11

300

0.20

60

D3= 12

300

0.40

120

D4= 13

300

0.25

75

Total EV

300

Expected profit from stocking 11 containers


ACTION ( Quantity ordered is 11)
EVENTS
(Demand)

Conditional
profit (1)

Probability of
selling (2)

Expected profit
=(1) x (2)

D1= 10

280

0.15

42

D2= 11

330

0.20

66

D3= 12

330

0.40

132

D4= 13

330

0.25

82

Total EV

322.50

Expected profit from stocking 12 containers


ACTION ( Quantity ordered is 12)
EVENTS
(Demand)

Conditional
profit (1)

Probability of
selling (2)

Expected profit
=(1) x (2)

D1= 10

260

0.15

39

D2= 11

310

0.20

62

D3= 12

360

0.40

144

D4= 13

360

0.25

90

Total EV

335

Expected profit from stocking 13 containers


ACTION ( Quantity ordered is 13)

Strategy adopted

EVENTS
(Demand)

Conditional
profit (1)

Probability of
selling (2)

Expected profit
=(1) x (2)

D1= 10

240

0.15

36

D2= 11

290

0.20

58

D3= 12

340

0.40

136

D4= 13

390

0.25

97

Total EV

327.50

Important discrete probability


distribution: The binomial

The Binomial Distribution: Properties


A fixed number of observations, n
ex. 15 tosses of a coin; ten light bulbs taken from a
warehouse
Two mutually exclusive and collectively

exhaustive categories
ex. head or tail in each toss of a coin; defective or not

defective light bulb; having a boy or girl


Generally called success and failure
Probability of success is p, probability of failure is 1 p
Constant probability for each observation
The outcome of one observation does not affect the outcome of

the other

Two sampling methods


Infinite population without replacement
Finite population with replacement

Binomial distribution
Take the example of 5 coin tosses. Whats the
probability that you flip exactly 3 heads in 5 coin
tosses?

Binomial distribution, generally


Notethegeneralpatternemergingifyouhaveonlytwopossible
outcomes(callthem1/0oryes/noorsuccess/failure)innindependent
trials,thentheprobabilityofexactlyXsuccesses=
n = number of trials
n

X=#
successes
out of n
trials

p (1 p )

n X

1-p = probability
of failure
p=
probability of
success

Binomial distribution: example


If I toss a coin 20 times, whats the probability of

getting exactly 10 heads?

20

10

10
10
(.5) (.5) .176

Binomial distribution: example


If I toss a coin 20 times, whats the probability of

getting of getting 2 or fewer heads?


20

20

20

0
20
(.5) (.5)

19

(.5) (.5)

20!

(.5) 20 20 x9.5 x10 7 1.9 x10 5


19!1!

2
18
(.5) (.5)

1.8 x10 4

20!
(.5) 20 9.5 x10 7
20!0!

20!
(.5) 20 190 x9.5 x10 7 1.8 x10 4
18!2!

**All probability distributions are characterized


by an expected value and a variance:
If X follows a binomial distribution with parameters n
and p: X ~ Bin (n, p)
Mean

E(x) np

Variance and Standard Deviation

2 np (1 - p )

np (1 - p )

Where n = sample size


p = probability of success
(1 p) = probability of failure

Applications
A manufacturing plant labels items as either

defective or acceptable
A firm bidding for contracts will either get a
contract or not
A marketing research firm receives survey responses
of yes I will buy or no I will not
New job applicants either accept the offer or reject it
Your team either wins or loses the football game at
the company picnic

The Hypergeometric Distribution


The binomial distribution is applicable

when selecting from a finite population with


replacement or from an infinite population
without replacement.
The hypergeometric distribution is

applicable when selecting from a finite


population without replacement.

The Hypergeometric Distribution

P( X )

N A

n X
N

Where
N = population size
A = number of successes in the population
N A = number of failures in the population
n = sample size
X = number of successes in the sample
n X = number of failures in the sample

The Hypergeometric Distribution


Example
Different computers are checked from 10 in the

department. 4 of the 10 computers have illegal


software loaded. What is the probability that 2 of the
3 selected computers have illegal software loaded?
So, N = 10, n = 3, A = 4, X = 2
A

X
P(X 2)

N A

4 6

2 1
n X
(6)(6)

0.3
N
120
10

3
n

The probability that 2 of the 3 selected computers

have illegal software loaded is .30, or 30%.

The Hypergeometric Distribution


Characteristics
The mean of the hypergeometric distribution is:
E(x)

nA
N

The standard deviation is:

Where

nA(N - A) N - n

2
N
N -1

N-n
N - 1 is called the Finite Population Correction Factor

from sampling without replacement from a finite population

The Poisson Distribution Definitions


An area of opportunity is a continuous unit or

interval of time, volume, or such area in which


more than one occurrence of an event can
occur.
ex. The number of scratches in a cars paint
ex. The number of mosquito bites on a

person
ex. The number of computer crashes in a day

The Poisson Distribution Properties


Apply the Poisson Distribution when:
You wish to count the number of times an event occurs in a

given area of opportunity


The probability that an event occurs in one area of opportunity
is the same for all areas of opportunity
The number of events that occur in one area of opportunity is
independent of the number of events that occur in the other
areas of opportunity
The probability that two or more events occur in an area of
opportunity approaches zero as the area of opportunity
becomes smaller
The average number of events per unit is (lambda)

The Poisson Distribution Formula


e x
P(X)
X!
where:
X = the probability of X events in an area of opportunity
= expected number of events
e = mathematical constant approximated by 2.71828

An example
Suppose that, on average, 5 cars enter a parking lot

per minute. What is the probability that in a given


minute, 7 cars will enter?
e x e 5 5 7
P(7)

0.104
X!
7!
So, there is a 10.4% chance 7 cars will enter the

parking in a given minute.


Mean = Variance =