You are on page 1of 41

INTRODUCTION TO

PROBABILITY
DISTRIBUTIONS
Learning Objectives

• Have a broad understanding of what probability distributions are and


why they are important.
• Understand the role that probability distributions play in determining
whether an event is a random occurrence or significantly different.
• Understand the common measures used to characterize a population
central tendency and dispersion.
• Understand the concept of Shift & Drift.
• Understand the concept of significance testing.
How does it help?

An
An understanding
understanding of of Probability
Probability
Distributions
Distributions is
is necessary
necessary to:
to:
•Understand
•Understand the the concept
concept and
and use
use
of
of statistical
statistical tools.
tools.
•Understand
•Understand the
the significance
significance of
of
random
random variation
variation in
in everyday
everyday
measures.
measures.
•Understand
•Understand thethe impact
impact of
of
significance
significance on
on the
the successful
successful
resolution
resolution of
of aa project.
project.
IMPROVEMENT ROADMAP
Uses of Probability Distributions

Project Uses

Phase 1: •Establish baseline data


Measurement
characteristics.
Characterization

Phase 2: •Identify and isolate


Analysis
sources of variation.
Breakthrough
Strategy

Phase 3:
•Demonstrate before and
Improvement after results are not random
chance.
Optimization

Phase 4:
•Use the concept of shift &
Control drift to establish project
expectations.
KEYS TO SUCCESS

Focus on understanding the concepts


Visualize the concept
Don’t get lost in the math….
Measurements are critical...

•If we can’t accurately measure something,


we really don’t know much about it.
•If we don’t know much about it, we can’t
control it.
•If we can’t control it, we are at the mercy
of chance.
Types of Measures

• Measures where the metric is composed of a classification in one of


two (or more) categories is called Attribute data. This data is
usually presented as a “count” or “percent”.
• Good/Bad
• Yes/No
• Hit/Miss etc.
• Measures where the metric consists of a number which indicates a
precise value is called Variable data.
• Time
• Miles/Hr
COIN TOSS EXAMPLE

• Take a coin from your pocket and toss it 200 times.

• Keep track of the number of times the coin falls as “heads”.

• When complete, the instructor will ask you for your “head” count.
COIN TOSS EXAMPLE
Results from 10,000 people doing a coin toss 200 times. Results from 10,000 people doing a coin toss 200 times.
Count Frequency Cumulative Count
600 10 00 0

C um ul ative Frequenc y
500

Cumulative Percent
400
Frequ enc y

Cumulative Frequency
300 5 00 0

200

100

0 0
70 80 90 100 110 120 130
70 80 90 1 00 110 120 1 30
"Head Count"
Results from 10,000 people doing a coin toss 200 times.
Cumulative Cumulative Percent
Cumulativecount
countisissimply
simplythe
thetotal
totalfrequency
frequency
count
count accumulated as you move fromleft
accumulated as you move from leftto
toright
right
10 0

until
untilwe
weaccount
accountfor
forthe
thetotal
totalpopulation
populationof of10,000
10,000 C um ulative Percent
people.
people.
Since
Sincewe
weknow
knowhowhowmany
manypeople
peoplewere
werein
inthis
this
50

population
population(ie
(ie10,000),
10,000),we
wecan
candivide
divideeach
eachofofthe
the
cumulative
cumulative counts by 10,000 to give us a curvewith
counts by 10,000 to give us a curve with
the
thecumulative
cumulativepercent
percentof
ofpopulation.
population. 0

70 80 90 100 110 120 130

"Head Count"
COIN TOSS PROBABILITY EXAMPLE

Results from 10,000 people doing a coin toss 200 times


Cumulative Percent
This
This means
means that
that we
we can
can now
now
100
predict the
predict the change
change that
that
certain
certain values
values can
can occur
occur
Cumulative Percent

based
based onon these
these percentages.
percentages.
50 Note
Note here
here that
that 50%
50% of of the
the
values
values are
are less
less than
than our
our
expected
expected value
value ofof 100.
100.
0 This
This means
means that
that in
in aa future
future
70 80 90 100 110 120 130 experiment
experiment set
set up
up thethe same
same
way,
way, we
we would
would expect
expect 50% 50% of
of
the
the values
values to
to be
be less
less than
than 100.
100.
COIN TOSS EXAMPLE
Results from 10,000 people doing a coin toss 200 times.
Count Frequency
6 00 We
We can
can now
now equate
equate aa probability
probability to
to the
the
5 00 occurrence
occurrence of
of specific
specific values
values or
or groups
groups ofof
4 00 values.
values.
Frequency

3 00
For
For example,
example, we we can
can see
see that
that the
the
2 00
occurrence
occurrence of of aa “Head
“Head count”
count” ofof less
less than
than
1 00
74
74 or
or greater
greater than
than 124
124 out
out of
of 200
200 tosses
tosses isis
0
so
so rare
rare that
that aa single
single occurrence
occurrence was was not
not
70 80 90 1 00 110 1 20 130

"Head Count"
registered
registered out
out ofof 10,000
10,000 tries.
tries.
Results from 10,000 people doing a coin toss 200 times.
Cumulative Percent On
On the
the other
other hand,
hand, wewe can
can see
see that
that the
the
100 chance
chance ofof getting
getting aa count
count near
near (or
(or at)
at) 100
100 isis
much
much higher.
higher. With
With the
the data
data that
that we
we now
now
Cum ulative Percent

have,
have, we
we can
can actually
actually predict
predict each
each of
of these
these
50 values.
values.

70 80 90 100 110 120 13 0

"Head Count"
COIN TOSS PROBABILITY DISTRIBUTION
PROCESS
PROCESSCENTERED
CENTEREDONON
EXPECTED VALUE
EXPECTED VALUE
% of population = probability of occurrence
600

IfIf we SIGMA (σ ) IS A MEASURE OF


we know
know where
where 500 “SCATTER” FROM THE
we
we areare in
in the
the EXPECTED VALUE THAT CAN
population
population we we can 400
Fr equency

can BE USED TO CALCULATE A


equate PROBABILITY OF
equate that that to
to aa OCCURRENCE
300
probability
probability value.
value.
This
This isis the
the purpose
purpose 200
of
of the
the sigma
sigma value
value σ
(normal
(normal data).
data). 100

70 80 90 100 110 120 130


NUMBER OF HEADS 58 65 72 79 86 93 100 107 114 121 128 135 142

SIGMA VALUE (Z) -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

CUM % OF POPULATION .003 .135 2.275 15.87 50.0 84.1 97.7 99.86 99.997
WHAT DOES IT MEAN?

 Common Occurrence
 Rare Event

What
What are
are the
the chances
chances that
that this
this “just
“just
happened”
happened” IfIf they
they are
are small,
small, chances
chances
are
are that
that an
an external
external influence
influence isis at
at
work
work that
that can
can be
be used
used to to our
our
benefit….
benefit….
Probability and Statistics

• “the odds of Colorado University winning the national


title are 3 to 1”
• “Drew Bledsoe’s pass completion percentage for the last
6 games is .58% versus .78% for the first 5 games”
• “The Senator will win the election with 54% of the popular
vote with a margin of +/- 3%”

• Probability and Statistics influence our lives daily


• Statistics is the universal language for science
• Statistics is the art of collecting, classifying,
presenting, interpreting and analyzing numerical
data, as well as making conclusions about the
system from which the data was obtained.
Population Vs. Sample (Certainty Vs. Uncertainty)

É A sample is just a subset of all possible values

sample
population

É Since the sample does not contain all the possible values, there
is some uncertainty about the population. Hence any statistics,
such as mean and standard deviation, are just estimates of
the true population parameters.
Descriptive Statistics

Descriptive Statistics is the branch of statistics which


most people are familiar. It characterizes and summarizes
the most prominent features of a given set of data (means,
medians, standard deviations, percentiles, graphs, tables
and charts.

Descriptive Statistics describe the elements of


a population as a whole or to describe data that represent
just a sample of elements from the entire population

Inferential Statistics
Inferential Statistics

Inferential Statistics is the branch of statistics that deals with


drawing conclusions about a population based on information
obtained from a sample drawn from that population.

While descriptive statistics has been taught for centuries,


inferential statistics is a relatively new phenomenon having
its roots in the 20th century.

We “infer” something about a population when only information


from a sample is known.

Probability is the link between


Descriptive and Inferential Statistics
WHAT DOES IT MEAN?
WHAT IF WE MADE A CHANGE TO THE PROCESS?

Chances
Chances are are very
very 600 And
And the
the first
first 50
50 trials
trials
good
good that
that the
the showed
showed “Head
“Head
process
process distribution
distribution 500 Counts”
Counts” greater
greater
has
has changed.
changed. In In than
than 130?
130?
400
Fr equency

fact,
fact, there
there isis aa
probability
probability greater
greater 300
than
than 99.999%
99.999% that that itit
has
has changed.
changed. 200
σ
100

70 80 90 100 110 120 130


NUMBER OF HEADS 58 65 72 79 86 93 100 107 114 121 128 135 142

SIGMA VALUE (Z) -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6

CUM % OF POPULATION .003 .135 2.275 15.87 50.0 84.1 97.7 99.86 99.997
USES OF PROBABILITY DISTRIBUTIONS

Primarily
Primarily these
these distributions
distributions are
are used
used to
to test
test for
for significant
significant differences
differences in
in data
data sets.
sets.
To
To be
be classified
classified asas significant,
significant, thethe actual
actual measured
measured valuevalue must
must exceed
exceed aa critical
critical value.
value.
The
The critical
critical value
value isis tabular
tabular value
value determined
determined by by the
the probability
probability distribution
distribution and
and the
the risk
risk
of
of error.
error. This
This risk
risk of
of error called αα risk
error isis called risk and
and indicates
indicates thethe probability
probability ofof this
this value
value
occurring
occurring naturally.
naturally. So, an αα risk
So, an risk of
of .05
.05 (5%)
(5%) means
means that that this
this critical
critical value
value will
will be
be
exceeded
exceeded by by aa random
random occurrence
occurrence less less than
than 5%
5% of of the
the time.
time.

Critical Critical
Value Value

Rare Common Rare


Occurrence Occurrence Occurrence
SO WHAT MAKES A DISTRIBUTION UNIQUE?
CENTRAL
CENTRAL TENDENCY
TENDENCY
Where
Where aa population
population is
is located.
located.

DISPERSION
DISPERSION
How
How wide
wide aa population
population is
is spread.
spread.

DISTRIBUTION
DISTRIBUTION FUNCTION
FUNCTION
The
The mathematical
mathematical formula
formula that
that
best
best describes
describes the the data
data (we
(we will
will
cover
cover this
this in
in detail
detail in
in the
the next
next
module).
module).
COIN TOSS CENTRAL TENDENCY
Number of occurrences

600

500

400

300

200

100

70 80 90 100 110 120 130

What are some of the ways that we can easily indicate


the centering characteristic of the population?

Three measures have historically been used; the mean,


the median and the mode.
WHAT IS THE MEAN?
ORDERED DATA SET
The mean has already been used in several earlier modules and
-5
is the most common measure of central tendency for a
population. The mean is simply the average value of the data. -3
-1

mean =x=
∑ xi
=
−2
= − .17 -1
n 12 0
0
n=12 0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Mean ∑x i = −2
WHAT IS THE MEDIAN?
If we rank order (descending or ascending) the data set for this
distribution we could represent central tendency by the order of ORDERED DATA SET
the data points. -5
-3
If we find the value half way (50%) through the data points, we
have another way of representing central tendency. This is called -1 50% of data
points
the median value. -1
0
0
Median
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4

Median
Value
WHAT IS THE MODE?
If we rank order (descending or ascending) the data set for this
distribution we find several ways we can represent central ORDERED DATA SET
tendency. -5

We find that a single value occurs more often than any other. -3
Since we know that there is a higher chance of this occurrence in -1
the middle of the distribution, we can use this feature as an -1
indicator of central tendency. This is called the mode.
0
0
Mode Mode
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
MEASURES OF CENTRAL TENDENCY, SUMMARY
ORDERED DATA SET
-5
-3
MEAN ( X )
-1
-1
0 (Otherwise known as the average)


0

− 2
0
0 X
0
1
X = i
= = . 17
3 n 12
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4

ORDERED DATA SET


-5
-3
MEDIAN
-1
n/2=6
-1
0
(50 percentile data point)
0
n=12
0 Median Here the median value falls between two zero values
0
0
1
n/2=6 and therefore is zero. If the values were say 2 and 3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6
3
4
instead, the median would be 2.5.
ORDERED DATA SET

Mode = 0
-5
-3 MODE
-1
-1
0 (Most common value in the data set)

}
0
0
0 Mode = 0 The mode in this case is 0 with 5 occurrences within
0
1 this data.
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
SO WHAT’S THE REAL DIFFERENCE?

MEAN
MEAN
The
The mean
mean is is the
the most
most consistently
consistently
accurate
accurate measure
measure of of central
central
tendency,
tendency, but
but is
is more
more difficult
difficult to
to
calculate
calculate than
than the
the other
other measures.
measures.

MEDIAN
MEDIAN AND
AND MODE
MODE
The
The median
median andand mode
mode are
are both
both
very
very easy
easy to
to determine.
determine. That’s
That’s the the
good
good news….The
news….The badbad news
news is is that
that
both
both are
are more
more susceptible
susceptible to
to bias
bias
than
than the
the mean.
mean.
SO WHAT’S THE BOTTOM LINE?

MEAN
MEAN
Use
Use on
on all
all occasions
occasions unless
unless aa
circumstance
circumstance prohibits
prohibits its
its use.
use.

MEDIAN
MEDIAN AND
AND MODE
MODE
Only
Only use
use ifif you
you cannot
cannot use
use mean.
mean.
COIN TOSS POPULATION DISPERSION
Number of occurrences

600

500

400

300

200

100

70 80 90 1 00 110 120 130

What are some of the ways that we can easily indicate the dispersion
(spread) characteristic of the population?

Three measures have historically been used; the range, the standard
deviation and the variance.
WHAT IS THE RANGE?
ORDERED DATA SET
The range is a very common metric which is easily determined
-5
from any ordered sample. To calculate the range simply subtract
the minimum value in the sample from the maximum value. -3
-1
-1
Range = x MAX − x MIN = 4 − ( − 5 ) = 9 0
0
0 Range
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Range
Min Max
WHAT IS THE VARIANCE/STANDARD DEVIATION?
The variance (s2) is a very robust metric which requires a fair amount of work to
determine. The standard deviation(s) is the square root of the variance and is the most
commonly used measure of dispersion for larger sample sizes.

Xi − X (X − X ) 2

∑ DATA SET
−2
i
Xi -5
X = = = -.17 -5-(-.17)=-4.83 (-4.83)2=23.32
n 12 -3 -3-(-.17)=-2.83 (-2.83)2=8.01
-1 -1-(-.17)=-.83 (-.83)2=.69

∑ (X )
2
-1
i
− X 61 .67 0
-1-(-.17)=-.83 (-.83)2=.69
s =
2
= = 5.6 0-(-.17)=.17 (.17)2=.03
n −1 12 − 1 0
0-(-.17)=.17 (.17)2=.03
0
0-(-.17)=.17 (.17)2=.03
0
0 0-(-.17)=.17 (.17)2=.03
1 0-(-.17)=.17 (.17)2=.03
3 1-(-.17)=1.17 (1.17)2=1.37
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 3-(-.17)=3.17 (3.17)2=10.05
4-(-.17)=4.17 (4.17)2=17.39
61.67
MEASURES OF DISPERSION
ORDERED DATA SET
-5
-3
Min=-5 RANGE (R)
-1
-1
0 (The maximum data value minus the minimum)
R = X max − X min = 4 − ( − 6 ) = 10
0
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
Max=4
DATA SET Xi − X (X − X ) 2

VARIANCE (s2)
i

X =
∑ Xi
=
−2
= -.17
-5 -5-(-.17)=-4.83
-3 -3-(-.17)=-2.83
(-4.83)2=23.32
(-2.83)2=8.01
n 12 -1 -1-(-.17)=-.83 (-.83)2=.69
-1 -1-(-.17)=-.83 (-.83)2=.69
(Squared deviations around the center point)

∑ (X )
0 0-(-.17)=.17 (.17)2=.03
0 0-(-.17)=.17 (.17)2=.03 2
0 0-(-.17)=.17 (.17)2=.03
− X
i 61 .67
s =
2
= = 5.6
0 0-(-.17)=.17 (.17)2=.03
0 0-(-.17)=.17 (.17)2=.03

1 1-(-.17)=1.17
3 3-(-.17)=3.17
(1.17)2=1.37
(3.17)2=10.05 n−1 12 − 1
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4 4-(-.17)=4.17 (4.17)2=17.39
ORDERED DATA SET 61.67
-5
-3 STANDARD DEVIATION (s)
-1
-1
0 (Absolute deviation around the center point)
0

s= s2 = 5.6 = 2 .37
0
0
0
1
3
-6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 4
SAMPLE MEAN AND VARIANCE EXAMPLE
(X )
2

Xi Xi − X i −X

μ$ = X =
∑ Xi 1
2
10
15
N 3 12
4 14

(X − X)
2 5 10
9
σ$ = s = ∑
2 2 i 6
7 11
n −1 8 12
9 10
10
12
Σ
Xi
X
2
s
SO WHAT’S THE REAL DIFFERENCE?
VARIANCE/
VARIANCE/ STANDARD
STANDARD DEVIATION
DEVIATION
The
The standard
standard deviation
deviation is
is the
the most
most
consistently
consistently accurate
accurate measure
measure of of central
central
tendency
tendency forfor aa single
single population.
population. TheThe
variance
variance has
has the
the added
added benefit
benefit of
of being
being
additive
additive over
over multiple
multiple populations.
populations. Both
Both
are
are difficult
difficult and
and time
time consuming
consuming to to
calculate.
calculate.

RANGE
RANGE
The
The range
range is is very
very easy
easy to
to determine.
determine.
That’s
That’s the the good
good news….The
news….The bad bad news
news is
is
that
that itit is
is very
very susceptible
susceptible to
to bias.
bias.
SO WHAT’S THE BOTTOM LINE?

VARIANCE/
VARIANCE/ STANDARD
STANDARD
DEVIATION
DEVIATION
Best
Best used
used when
when you
you have
have enough
enough
samples
samples (>10).
(>10).

RANGE
RANGE
Good
Good for
for small
small samples
samples (10
(10 or
or
less).
less).
SO WHAT IS THIS SHIFT & DRIFT STUFF...

LSL USL

-12 -10 -8 -6 -4 -2 0 2 4 6 8 10 12

The project is progressing well and you wrap it up. 6 months


later you are surprised to find that the population has taken a
shift.
SO WHAT HAPPENED?
All
Allof
ofour
ourwork
workwas
wasfocused
focusedin inaanarrow
narrow time
timeframe.
frame.
Over
Overtime,
time,other
otherlong
longterm
terminfluences
influencescome
comeand andgogo
which move the population and change some
which move the population and change some of its of its
characteristics.
characteristics. This
Thisisiscalled
calledshift
shiftand
anddrift.
drift.

e
Tim
Historically,
Historically, this
this shift
shift andand drift
drift
Original Study primarily
primarily impacts
impacts the the position
position of
of the
the
mean
mean andand shifts 1.5 σσ from
shifts itit 1.5 from it’s
it’s
original
original position.
position.
VARIATION FAMILIES

Sources of
Variation

Within Individual Piece to Time to Time


Sample Piece

Variation is present Variation is present Variation is present


upon repeat upon measurements of upon measurements
measurements within different samples collected with a
the same sample. collected within a short significant amount of
time frame. time between samples.
SO WHAT DOES IT MEAN?

To
To compensate
compensate for for these
these long
long term
term
variations,
variations, wewe must
must consider
consider two
two
sets
sets of
of metrics.
metrics. Short
Short term
term metrics
metrics
are
are those
those which
which typically
typically are
are
associated
associated with
with our
our work.
work. Long
Long
term
term metrics
metrics take
take the
the short
short term
term
metric
metric data
data and
and degrade
degrade itit by
by an
an
average
average of of 1.5σ.
1.5σ.
IMPACT OF 1.5σ SHIFT AND DRIFT
Z PPM ST C pk PPM LT (+1.5 σ)
0.0 500,000 0.0 933,193
0.1 460,172 0.0 919,243
0.2 420,740 0.1 903,199
0.3 382,089 0.1 884,930
0.4 344,578 0.1 864,334 Here,
Here, you
you cancan see
see that
that the
the
0.5 308,538 0.2 841,345 impact
impact ofof this
this concept
concept isis
0.6 274,253 0.2 815,940 potentially
potentially very
very significant.
significant. In In
0.7 241,964 0.2 788,145 the
the short
short term,
term, wewe have
have driven
driven
0.8 211,855 0.3 758,036 the
the defect
defect rate
rate down
down to to 54,800
54,800
0.9 184,060 0.3 725,747 ppm
ppm and
and can
can expect
expect toto see
see
1.0 158,655 0.3 691,462 occasional
occasional longlong term
term ppm
ppm to to be
be
1.1 135,666 0.4 655,422 as
as bad
bad asas 460,000
460,000 ppm.
ppm.
1.2 115,070 0.4 617,911
1.3 96,801 0.4 579,260
1.4 80,757 0.5 539,828
1.5 66,807 0.5 500,000
1.6 54,799 0.5 460,172
1.7 44,565 0.6 420,740
SHIFT AND DRIFT EXERCISE

We have just completed a project and have presented the following


short term metrics:
•Zst=3.5
•PPMst=233
•Cpkst=1.2

Calculate
Calculate the
the long
long
term
term values
values for
for each
each
of
of these
these metrics.
metrics.
Learning Objectives

• Have a broad understanding of what probability distributions are and


why they are important.
• Understand the role that probability distributions play in determining
whether an event is a random occurrence or significantly different.
• Understand the common measures used to characterize a population
central tendency and dispersion.
• Understand the concept of Shift & Drift.
• Understand the concept of significance testing.

You might also like