You are on page 1of 79

PAN African e Network Project

DBM
Quantitative Techniques in Management

Semester - 1
Session - 4

Dr. Sarika Jain

Examples of continuous
probability distributions:
The normal and standard normal

The Normal Distribution:


as mathematical function (pdf)

f ( x)

Note constants:
=3.14159
e=2.71828

1 x 2
(
)
2

This is a bell shaped


curve with different
centers and spreads
depending on and

The Normal PDF


Its a probability function, so no matter what the
values of and , must integrate to 1!

1
2

1 x 2
(
)
e 2 dx

Normal distribution is defined by


its mean and standard dev.
E(X)= =

Var(X)= =
2

1
2

1 x 2
(
)
2

e
dx

x2

Standard Deviation(X)=

1 x 2
(
)
2

dx ) 2

**The beauty of the normal curve:


No matter what and are, the area between - and
+ is about 68%; the area between -2 and +2 is
about 95%; and the area between -3 and +3 is
about 99.7%. Almost all values fall within 3 standard
deviations.

68-95-99.7 Rule

68% of
the data
95% of the data
99.7% of the data

68-95-99.7 Rule
in Math terms

1
2
1
2
1
2

1 x 2
(
)
e 2 dx

.68

1 x 2
(
)
e 2 dx

.95

1 x 2
(
)
2

e
dx

.997

What is Normal (Gaussian) Distribution?

The normal distribution is a descriptive model


that describes real world situations.

It is defined as a continuous frequency


distribution of infinite range (can take any
values not just integers as in the case of
binomial and Poisson distribution).
This is the most important probability
distribution in statistics and important tool in
analysis of epidemiological data and
management science.

Characteristics of Normal Distribution

It links frequency distribution to probability


distribution
Has a Bell Shape Curve and is Symmetric
It is Symmetric around the mean:
Two halves of the curve are the same
(mirror images)

Characteristics of Normal Distribution Contd

Hence Mean = Median


The total area under the curve is 1 (or 100%)
Normal Distribution has the same shape as Standard
Normal Distribution.

Characteristics of Normal Distribution Contd

In a Standard Normal Distribution:


The mean ( ) = 0

and

Standard deviation () =1

Skewness
Positive Skewness: Mean Median
Negative Skewness: Median

Mean

Pearsons Coefficient of Skewness:


= 3 (Mean Median)
Standard deviation

Positive Skewness (Tail to Right)

Negative Skewness (Tail to Left)

Application/Uses of Normal Distribution

Its application goes beyond describing distributions

It is used by researchers and modelers.

The major use of normal distribution is the role it plays in


statistical inference.

The z score along with the t score, chi-square and F-statistics


is important in hypothesis testing.

It helps managers/management make decisions.

How good is rule for real data?

Check some example data: (no. of runners = 120)


The mean of the weight of the runners = 127.8
The standard deviation (SD) = 15.5

68% of 120 = .68x120 = ~ 82 runners


In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.

112.3

127.8

143.3

25

20
P
e
r
c
e
n
t

15

10

0
80

90

100

110

120
POUNDS

130

140

150

160

95% of 120 = .95 x 120 = ~ 114 runners


In fact, 115 runners fall within 2-SDs of the mean.

96.8

127.8

158.8

25

20
P
e
r
c
e
n
t

15

10

0
80

90

100

110

120
POUNDS

130

140

150

160

99.7% of 120 = .997 x 120 = 119.6 runners


In fact, all 120 runners fall within 3-SDs of the mean.

81.3

127.8

174.3

25

20
P
e
r
c
e
n
t

15

10

0
80

90

100

110

120
POUNDS

130

140

150

160

Example
Suppose SAT scores roughly follows a normal
distribution in the U.S. population of collegebound students (with range restricted to 200800), and the average math SAT is 500 with a
standard deviation of 50, then:
68% of students will have scores between 450 and
550
95% will be between 400 and 600
99.7% will be between 350 and 650

Example
BUT
What if you wanted to know the math SAT
score corresponding to the 90th percentile
(=90% of students are lower)?
P(XQ) = .90
Q

1 x 500 2
)
50
dx

(
1
e 2
(50) 2
200

Solve for Q?

.90

The Standard Normal (Z):


Universal Currency
The formula for the standardized normal
probability density function is

1
p( Z )
e
(1) 2

1 Z 0 2
(
)
2 1

e
2

1
( Z )2
2

The Standard Normal Distribution (Z)


All normal distributions can be converted into
the standard normal curve by subtracting the
mean and dividing by the standard deviation:
X
Z

Somebody calculated all the integrals for the standard


normal and put them in a table! So we never have to
integrate!
Even better, computers now do all the integration.

Comparing X and Z units

100
0

200
2.0

X
Z

( = 100, = 50)
( = 0, = 1)

Example
For example: Whats the probability of getting a math SAT score of 575 or less, =500 and =50?

575 500
Z
1.5
50
i.e., A score

of 575 is 1.5 standard deviations above the mean


575

P ( X 575)

(50)

200

1 x 500 2
(
)
2
50
e
dx

1.5

1
2

1
Z2
e 2 dz

Yes!
But to look up Z= 1.5 in standard normal chart = .9332

Practice problem
If birth weights in a population are normally
distributed with a mean of 109 oz and a
standard deviation of 13 oz,
a. What is the chance of obtaining a birth
weight of 141 oz or heavier when sampling
birth records at random?
b. What is the chance of obtaining a birth
weight of 120 or lighter?

Answer
a. What is the chance of obtaining a birth
weight of 141 oz or heavier when sampling
birth records at random?
141 109
Z
2.46
13
From the chart 2.46 corresponds to a right tail (greater than)
of: P(Z2.46) = 1-(.9931)= .0069 or .69 %

area

Answer
b. What is the chance of obtaining a birth
weight of 120 or lighter?
120 109
Z
.85
13

From the chart .85 corresponds to a left tail area of:


P(Z.85) = .8023= 80.23%

Looking up probabilities in the


standard normal table
What is the area to the
left of Z=1.51 in a
standard normal curve?

Z=1.51

Z=1.51

Area is 93.45%

Exercises
Assuming the normal heart rate (H.R) in
normal healthy individuals is normally
distributed with Mean = 70 and Standard
Deviation =10 beats/min

Exercise # 1

Then:
1) What area under the curve is
above 80 beats/min?

Diagram of Exercise # 1
13.6%

33.35%

13.6 %
2.2 %

-3

-2

-1

0.159

0.15%

Since M=70, then the area under the curve which is above 80 beats per minute
corresponds to above + 1 standard deviation. The total shaded area corresponding to
above 1+ standard deviation in percentage is 15.9% or Z= 15.9/100 =0.159. Or we
can find the value of z by substituting the values in the formula Z= X-M/ standard
deviation. Therefore, Z= 70-80/10 -10/10= -1.00 is the same as +1.00. The value of z
from the table for 1.00 is 0.159

Exercise # 2
Then:
2) What area of the curve is above 90
beats/min?

Diagram of Exercise # 2
33.35%13.6%

2.2%

0.15

0.023

-3

-2

-1

Exercise # 3
Then:
3) What area of the curve is between
50-90 beats/min?

Diagram of Exercise # 3
33.35%
13.6%

2.2%

0.954

-3

-2

-1

0.15

Exercise # 4
Then:
4) What area of the curve is above 100
beats/min?

Diagram of Exercise # 4
33.35%
13.6%

2.2%

0.15

0.015

-3

-2

-1

Are my data normal?


Not all continuous random variables are
normally distributed!!
It is important to evaluate how well the
data are approximated by a normal
distribution

Are my data normally


distributed?
1. Look at the histogram! Does it appear bell shaped?
2. Compute descriptive summary measuresare
mean, median, and mode similar?
3. Do 2/3 of observations lie within 1 std dev of the
mean? Do 95% of observations lie within 2 std dev
of the mean?
4. Look at a normal probability plotis it
approximately linear?
5. Run tests of normality (such as KolmogorovSmirnov). But, be cautious, highly influenced by
sample size!

Normal approximation to the


binomial
When you have a binomial distribution where n is large
and p is middle-of-the road (not too small, not too big,
closer to .5), then the binomial starts to look like a
normal distribution in fact, this doesnt even take a
particularly large n
Recall: What is the probability of being a smoker among a
group of cases with lung cancer is .6, whats the
probability that in a group of 8 cases you have less than
2 smokers?

Normal approximation to the


binomial
When you have a binomial distribution
where n is large and p isnt too small (rule
of thumb: mean>5), then the binomial
starts to look like a normal distribution
Recall: smoking
example
.27

6 7

Starting to have a normal


shape even with fairly small
n. You can imagine that if n
got larger, the bars would get
thinner and thinner and this
would look more and more
like a continuous function,
with a bell curve shape. Here
np=4.8.

Normal approximation to
binomial
.27

6 7

What is the probability of fewer than 2 smokers?


Exact binomial probability (from before) = .00065 + .008 = .00865

Normal approximation probability:


=4.8
=1.39

2 (4.8) 2.8
Z

2
1.39
1.39

P(Z<2)=.022

A little off, but in the right ballpark we could also use the value
to the left of 1.5 (as we really wanted to know less than but not
including 2; called the continuity correction)

1.5 (4.8) 3.3


Z

2.37
1.39
1.39
P(Z-2.37) =.0069

A fairly good approximation of


the exact probability, .00865.

SAMPLING and
SAMPLING DISTRIBUTIONS

Sampling Distributions
A sampling distribution is created by, as the
name suggests, sampling.
The method we will employ on the rules of
probability and the laws of expected value
and variance to derive the sampling distribution.
For example, consider the roll of one and two
dice

Sampling Distribution of the


Mean

A fair die is thrown infinitely many times,


with the random variable X = # of spots on any
throw.The probability distribution of X is:
x
P(x)

1/6

1/6

1/6

1/6

1/6

1/6

and the mean and variance are calculated as well:

Sampling Distribution of Two Dice

A sampling distribution is created by looking at


all samples of size n=2 (i.e. two dice) and their means

While there are 36 possible samples of size 2, there are only 11 values for
, and some (e.g. =3.5) occur more frequently than others (e.g.
=1).

Sampling Distribution of Two Dice


The sampling distribution of

1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36

5/36

1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0

is shown below:

6/36

P( )

4/36

P(

3/36
2/36
1/36
1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

Compare

Compare the distribution of X

2
4
5
6
1 with
the 3sampling
distribution
of

As well, note that:

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

Generalize
We can generalize the mean and variance
of the sampling of two dice:

to n-dice:

The standard deviation of the


sampling distribution is called the
standard error:

Central Limit Theorem


The sampling distribution of the mean of a
random sample drawn from any
population is approximately normal for a
sufficiently large sample size.
The larger the sample size, the more
closely the sampling distribution of X will
resemble a normal distribution.

Central Limit Theorem[Most important theorem in statistics]

The sampling distribution of the sample mean will be approximately normal


as the sample size increases.

In many practical situations, a sample size of 30 may be sufficiently large to


allow us to use the normal distribution as an approximation for the sampling
distribution of X.
Note: If X is normal, Dont need Central Limit Theorem in this case.

Example 1
The foreman of a bottling plant has observed that
the amount of soda in each 32-ounce bottle is
actually a normally distributed random variable,
with a mean of 32.2 ounces and a standard
deviation of .3 ounce.
If a customer buys one bottle, what is the
probability that the bottle will contain more than
32 ounces?
Regular old look up a normal probability.

Example 1

We want to find P(X > 32), where X is normally distributed and


=.3

=32.2 and

there is about a 75% chance that a single bottle of soda contains


more than 32oz.

Example 1
The foreman of a bottling plant has observed
that the amount of soda in each 32-ounce
bottle is actually a normally distributed random
variable, with a mean of 32.2 ounces and a
standard deviation of .3 ounce.
If a customer buys a carton of four bottles, what
is the probability that the mean amount of the
four bottles will be greater than 32 ounces?

Example 1

We want to find P(X > 32), where X is normally distributed


with =32.2 and =.3

Things we know:
1) X is normally distributed, therefore so will X.
2)
3)

= 32.2 oz.

Example 1
If a customer buys a carton of four bottles, what is the
probability that the mean amount of the four bottles
will be greater than 32 ounces?

There is about a 91% chance the mean of the four


bottles will exceed 32oz.

Graphically Speaking
mean=32.
2

what is the probability that one


bottle will contain more than 32
ounces?

what is the probability that the


mean of four bottles will exceed 32
oz?

Example
The dean of the School of
Business claims that the average
salary of the schools graduates
one year after graduation is $800
per week (x) with a standard
deviation of $100 (x). Note: This
is the population. A second-year
student would like to check
whether the claim about the mean
is correct. He does a survey of 25
people who graduated one year
ago and determines their weekly
salary. He discovers the sample
mean to be $750. Is this
consistent with the deans
claim???

x 800
x / n 100 / 25 20

Sampling Distribution of a
Proportion

The estimator of a population proportion of


successes is the sample proportion. That is, we
count the number of successes in a sample and
compute:

(read this as p-hat).


X is the number of successes, n is the sample size.

Normal Approximation to
Binomial
Binomial distribution with n=20 and p=.5 with a normal approximation
superimposed ( =10 and =2.24)

Normal Approximation to
Binomial
Binomial distribution with n=20 and p=.5 with a normal
approximation superimposed ( =10 and =2.24)

where did these values come from?!


From 7.6 we saw that:

Hence:
and

Normal Approximation to
Binomial

Normal approximation to the binomial


works best when the number of
experiments, n, (sample size) is large, and
the probability of success, p, is close to 0.5
For the approximation to provide good
results two conditions should be met:
1) np 5
2) n(1p) 5

Sampling Distribution of a Sample


Proportion
Using the laws of expected value and variance, we can
determine the mean, variance, and standard deviation of
(The standard deviation of
of the proportion.)

is called the standard error

Sample proportions can be standardized to a standard


normal distribution using this formulation:

Sampling Distribution: Difference of


two means
The final sampling distribution introduced is that of the
difference between two sample means. This requires:
independent random samples be drawn from each of
two normal populations
If this condition is met, then the sampling distribution of the
difference between the two sample means will be
normally distributed if the populations are both normal.
(note: if the two populations are not both normally
distributed, but the sample sizes are large (>30), the
distribution of
is approximately normal) Central
Limit Theorem

Sampling Distribution: Difference of


two means

is normally distributed with

mean:

and standard deviation:


(also called the standard error of the difference
between two means)

Example 2

Starting salaries for MBA grads at two universities are normally distributed
with the following means and standard deviations. Samples from each
school are taken

University1

University2

Mean

62,000$/yr

60,000$/yr

Std.Dev.

14,500$/yr

18,300$/yr

50

60

samplesizen

What is the sampling distribution of

Sampling Distribution

is normally distributed with


mean:
= 62,000 60,000
=2000
and standard deviation:

=SQRT(14,5002/50 + 18,3002/60)
= 3128.3

Sampling
Population A group that includes all the
cases (individuals, objects, or groups) in
which the researcher is interested.
Sample A relatively small subset from a
population.

Random Sampling
Simple Random Sample A sample
designed in such a way as to ensure that
(1) every member of the population has
an equal chance of being chosen and
(2) every combination of N members has
an equal chance of being chosen.
This can be done using a computer,
calculator, or a table of random numbers

Random Sampling
Systematic random sampling A method
of sampling in which every Kth member (K is
a ration obtained by dividing the population
size by the desired sample size) in the total
population is chosen for inclusion in the
sample after the first member of the sample
is selected at random from among the first K
members of the population.

Systematic Random Sampling

Stratified Random Sampling


Proportionate stratified sample The size
of the sample selected from each subgroup is
proportional to the size of that subgroup in
the entire population. (Self weighting)
Disproportionate stratified sample The
size of the sample selected from each
subgroup is disproportional to the size of that
subgroup in the population. (needs weights)

Disproportionate Stratified Sample

Stratified Random Sampling


Stratified random sample A method of
sampling obtained by (1) dividing the
population into subgroups based on one
or more variables central to our analysis
and (2) then drawing a simple random
sample from each of the subgroups

Please forward your query


To: sjain@amity.edu