You are on page 1of 103

Statistics for Six Sigma

Why a 6 sigma practioner needs to


Know about Statistics
› To be able to effectively conduct 6 sigma
investigation. Without the use of statistics it
would be very difficult to make decisions based
on the data collected .
› To further develop critical and analytic thinking
skills.
› To act as an informed investigator.
› To know how to properly analyze information
› To know how to draw conclusions about
populations based on sample information
Key Definitions
› A population (universe) is the collection of
things under consideration
› A sample is a portion of the population
selected for analysis
› A parameter is a summary measure computed
to describe a characteristic of the population
› A statistic is a summary measure computed
to describe a characteristic of the sample
Population and Sample

Population Sample

Use statistics to
summarize features

Use parameters to
summarize features

Inference on the population from the sample


Statistical Methods
› Descriptive statistics
– Collecting and describing data
› Inferential statistics
– Drawing conclusions and/or making decisions
concerning a population based only on sample data
Descriptive Statistics
› Collect data
– e.g. Survey
› Present data
– e.g. Tables and graphs
› Characterize data
– e.g. Sample mean = X i

n
Why We Need Data
› To provide input to survey
› To provide input to study
› To measure performance of service or production
process
› To evaluate conformance to standards
› To assist in formulating alternative courses of action
› To satisfy curiosity
Data Sources
Primary Secondary
Data Collection Data Compilation

Print or Electronic
Observation Survey

Experimentation
Statistical Inquiry
Primary and Secondary Data

The difference between the primary and the secondary


data is only one of degree of detachment with the
original source. The data which is primary in the hands
of one may become secondary in the hands of others.
For example, if it is desired to conduct an investigation
into the working conditions or workers of textile mills,
the facts collected by the investigators directly from the
workers themselves would be termed as the primary
data. But if the information is obtained from a report
prepared by the labour department of the Government,
will be called secondary data.
Types of Data

D a ta

Categorical Num erical


(Q ualitative) (Q uantitative)

Discrete Continuous
Key Terms
› Measures of central tendency: statistical measurements
such as the mean, median or mode that indicate how
data groups toward the center.
› Measures of variation or dispersion: statistical
measurement such as the range and standard deviation
that indicate how data is dispersed or spread.
Measures of Central Tendency

› Find the mean 9


8
7

› Find the median 6


5
4

› Find the mode 3


2
1
0
Key Terms
› Mean: the arithmetic average of a set of data or sum of
the values divided by the number of values.
› Median: the middle value of a data set when the values
are arranged in order of size.
› Mode: the value or values that occur most frequently in a
data set.
Find the mean of a data set.

1. Find the sum of the values.


2. Divide the sum by the total number of values.

Mean = sum of values


number of values
Here’s an example.

Sales figures for the last week for the Western region
have been as follows:

› Monday Rs 4,200
› TuesdayRs 3,980
› Wednesday Rs 2,400
› Thursday Rs 3,100
› Friday Rs 4,600
› What is the average daily sales figure?
› Rs 3,656
Try these examples.

› Mileage for the new salesperson has been 243, 567, 766,
422 and 352 this week. What is the average number of
miles traveled?
› 470 miles daily

› Prices from different suppliers of 500 sheets of copier


paper are as follows: Rs 399, Rs 475, Rs 375 and Rs 425.
What is the average price?
› Rs 419
Find the median.

› Arrange the values in the data set from smallest to largest


(or largest to smallest) and select the value in the middle.
› If the number of values is odd, it will be exactly in the
middle.
› If the number of values is even, identify the two middle
values. Add them together and divide by two.
Here is an example.

› A recent survey of the used car market for the particular


model John was looking for yielded several different
prices. Find the median price.

› $9,400, $11,200, $5,900, $10,000, $4,700, $8,900, $7,800


and $9,200.

› Arrange from highest to lowest:


$11,200, $10,000, $9,400, $9,200, $8,900, $7,800, $5,900
and $4,700.

› Calculate the average of the two middle values.


› $9050 is the median price.
Try this example.
› Five local moving companies quoted the following
prices to Bob’s Best Company: $4,900, $3800, $2,700,
$4,400 and $3,300. Find the median price.
› $3,800
Find the mode.

› Find the mode in a data set by counting the


number of times each value occurs.
› Identify the value or values that occur most
frequently.
› There may be more than one mode if the same
value occurs the same number of times as
another value.
› If no one value appears more than once, there is
no mode.
Find the mode in this data set.

› Results of a placement test in mathematics


included the following scores:
65, 80, 90, 85, 95, 85, 80, 70 and 80.
› Which score occurred the most frequently?
› 80 is the mode. It appeared three times.
Key Terms

› Range: the difference between the highest and lowest


values in a data set. (also called the spread)
› Deviation from the mean: the difference between a value
of a data set and the mean.
› Standard variation: a statistical measurement that shows
how data is spread above and below the mean.
Key Terms

› Variance: a statistical measurement that is the


average of the squared deviations of data from
the mean. The square root of the variance is
the standard deviation.
› Square root: the quotient of number which is the
product of that number multiplied by itself. The
square root of 81 is 9. (9 x 9 = 81)
› Normal distribution: a characteristic of many
data sets that shows that data graphs into a
bell-shaped curve around the mean.
Find the range in a data set

› Find the highest and lowest values.


› Find the difference between the two.
› Example: The grades on the last exam were 78,
99, 87, 84, 60, 77, 80, 88, 92, and 94.
The highest value is 99.
The lowest value is 60.
The difference or the range is 39.
Find the standard deviation

› The deviation from the mean of a data value is


the difference between the value and the mean.
› Get a clearer picture of the data set by
examining how much each data point differs or
deviates from the mean.
Deviations from the mean

› When the value is smaller than the mean, the


difference is represented by a negative number
indicating it is below or less than the mean.

› Conversely, if the value is greater than the


mean, the difference is represented by a positive
number indicating it is above or greater than the
mean.
Find the deviation from the mean.
› Find the mean of a set of data.
› Mean = Sum of data values
Number of values
› Find the amount that each data value deviates or is
different from the mean.
› Deviation from the mean = Data value - Mean
Here’s an example.

› Data set: 38, 43, 45, 44


› Mean = 42.5
› First value: 38 – 42.5 = -4.5 below the mean
› Second value: 43 – 42.5 = 0.5 above the mean
› Third value: 45 – 42.5 = 2.5 above the mean
› Fourth value: 44 – 42.5 = 1.5 above the mean
Interpret the information

› One value is below the mean and its deviation is


-4.5.
› Three values are above the mean and the sum of
those deviations is 4.5.
› The sum of all deviations from the mean is zero.
This is true of all data sets.
› We have not gained any statistical insight or new
information by analyzing the sum of the deviations
from the mean.
Average deviation

› Average deviation =

Sum of deviations =0 =0
Number of values n
Find the standard deviation
of a set of data.

› A statistical measure called the standard


deviation uses the square of each deviation
from the mean.
› The square of a negative value is always
positive.
› The squared deviations are averaged (mean)
and the result is called the variance.
Find the standard deviation
of a set of data.

› The square root is taken of the variance so that


the result can be interpreted within the context
of the problem.
› This formula averages the values by dividing the
number of values (n).
› Several calculations are necessary and are best
organized in a table.
Find the standard deviation
of a set of data.
1. Find the mean.
2. Find the deviation of each value from the mean.
3. Square each deviation.
4. Find the sum of the squared deviations.
5. Divide the sum of the squared deviations by the
number of values in the data set. This amount
is called the variance.
6. Find the standard deviation by taking the square
root of the variance.
Standard Deviation

Standard deviation measures variation of values


from the mean, using the following formula:

   (x – x )2
n

Where  = sum of, X = observed values, X bar


(X with a line over the top) = arithmetic mean,
and n = number of observations.
Standard Deviation (Contd..)

Average difference between any value in a series of values


and the mean of all the values in that series. This statistic is
a measure of the variation in a distribution of values.
If we plot enough values, we’ll likely find that the distribution
of values forms some variant of a bell-shaped curve. This
curve can assume various shapes. However, in a normal
curve, statisticians have determined that about 68.2% of
the values will be within 1 standard deviation of the mean,
about 95.5% will be within 2 standard deviations, and
99.7% will be within 3 standard deviations.
Standard Deviation (Contd..)

Specification limit

One of two values (lower and upper)


that indicate the boundaries of
acceptable or tolerated values for a
process.
Draw and interpret
a bar graph

› Write an appropriate title.

› Make appropriate labels for bars and scale. The intervals


should be equally spaced and include the smallest and
largest values.

› Draw horizontal or vertical bars to represent the data.


Bars should be of uniform width.

› Make additional notes as appropriate to aid


interpretation.
Here’s an example.

Sales Volume
2001-2004

Product 3
2004
2003
Product 2
2002
2001
Product 1

0 10 20 30 40 50
Thousands of Units
38
Interpret and draw
a line graph

› Write an appropriate title.


› Make and label appropriate horizontal and
vertical scales, each with equally spaced
intervals. Often, the horizontal scale represents
time.
› Use points to locate data on the graph.
› Connect data points with line segments or a
smooth curve.
Here’s an example.

First Semester Sales

100
Thousands of $

80
60
40
20
0
Jan Feb Mar Apr May Jun

Judy Denise Linda Wally


40
Interpret and draw
a circle graph (Pie-Graph).

› Write an appropriate title.


› Find the sum of values in the data set.
› Represent each value as a fraction or decimal
part of the sum of values.
› For each fraction, find the number of degrees in
the sector of the circle to be represented by the
fraction or decimal. (100% = 360°)
› Label each sector of the circle as appropriate.
Here’s an example.

Local Daycare Market Share


6%
16%
Teddy Bear
La La Land
43%
Little Gems
Other

35%
Make and interpret a frequency distribution.

› Identify appropriate intervals for the data.


› Tally the data for the intervals.
› Count the number in each interval.

90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Key Terms

› Class intervals: special categories for grouping


the values in a data set.
› Tally: a mark that is used to count data in class
intervals.
› Class frequency: the number of tallies or values
in a class interval.
› Grouped frequency distribution: a compilation
of class intervals, tallies, and class frequencies
of a data set.
HISTOGRAM

Histogram is a graphical representation of a frequency


distribution which is a summary of variation in a
product or process.
Dr W.A.Shewart, a physicist from Bell Laboratories
explained about variations in 1931 in his publication of
“Economic Control of Quality of Manufactured
product”.
Histogram is basically a graphical presentation of a
series of measurements grouped into continuous
classes or intervals.
45
10-15
40
15- 20
35 20-25
30 30-35
No of Crimes

35-40
25 40-45
20 45-50

15 50-55

10

0 0
AGE OF CRIMINAL
DISTRIBUTION
While individual measured values may all be different, as a
group they tend to exhibit a pattern. This is called
distribution which can be described by:

› Location (Process level or centering)

› Spread or dispersion (Range of values from smallest


to largest)

› Shape (Pattern of variation, whether symmetrical or


skewed etc.)
Distribution of Data
› Normal distributions › Skewed distribution
A
A - Original Process
Spread
B - Increase in spread
with same location

Change in process variation

B – Pattern is skewed

A - Original symmetrical
Shape pattern

Change in pattern of variation


In the figure Change in pattern of
variation the Original pattern (A) is
symmetrical but the new pattern (B) is
skewed. Even though the centering is
the same, the shapes or patterns are
different.
STABILITY
If the process characterised by distribution remains unchanged over a period
of time, then the process is said to be Stable and Repeatable. This can be
understood from the following depiction of process over a period of time, see
the figure below:

Target

Time

Stable and repeatable process


This pattern results when only common causes are present in the process.
COMMON CAUSES

The common causes are minute and many and are


individually not measurable. The pattern resulting from
the influence of common causes is called “State of
statistical control” or sometimes, just “In control”.
It is called statistical because the variation can be
described by statistical laws. It only common causes are
present and do not change, the output of a process is
predictable.
The advantages of maintaining a state of statistical
control are:

› Variation (inherent) is restricted to common causes.

› Since variability exhibits a regularity in its pattern, process


is repeatable.

› Since process is repeatable, quality of future production


can be predicted.

However, process level and variation may change due


to influence of causes additional to common causes.
Such causes are called special causes.
Special Causes
Examples of special causes are changes in setting, operator, material
input, etc. When they occur, they make the (overall) process distribution
change. Unless they are arrested, they will continue to affect the process
output in unpredictable ways as shown below:

Shift in process level Increase in variation

Time

Original
process Shift in process
level and variation

Unstable Process
Changes in process pattern due to
special causes can be either
detrimental or beneficial. When
detrimental, they need to be
identified and eliminated. When
beneficial, they need to be
perpetuated by making them a
permanent part of the process.
PROCESS CONTROL

This is the state where only common causes


are present. The proof of this situation is
when the pattern of variation conforms to the
statistical normal distribution.
It involves continuous monitoring of the
process for special causes and eliminating
them. Evidence of special causes is provided
by systematic patterns in process variability.
PROCESS CAPABILITY

A process should not only be in control but


also satisfactory in the sense that all the
production should meet specification
requirements.
This ability of a process to produce within the
variation permitted by tolerance is called
process capability.
LSL USL Process with reference to
specification limits

Process is in control (stable) and capable

Process is in control but not satisfactory

Process is capable but not in control


because process level is not properly
centered

Process not in control and not capable

The above can be used to classify a process based on capability and control.
Process Capability

› Product Specifications
– Preset product or service dimensions, tolerances
– e.g. bottle fill might be 16 oz. ±.2 oz. (15.8oz.-16.2oz.)
– Based on how product is to be used or what the customer expects

› Process Capability – Cp and Cpk


– Assessing capability involves evaluating process variability relative to
preset product or service specifications
– Cp assumes that the process is centered in the specification range
– Cpk helps to address a possible lack of centering of the
process
specification width USL  LSL
Cp  
process width 6σ

 USL  μ μ  LSL 
Cpk  min , 
 3σ © Wiley 2007
3σ 
Process capability…. (contd.)

The goal of Six Sigma is to reduce the standard deviation


of your process variation to the point that six standard
deviations (six sigma) can fit within your specification limits.
“The capability index (CP) of a process is usually expressed
as process width (the difference between USL & LSL)
divided by six times the standard deviation (six sigma) of
the process:

CP = USL – LSL / 6

The higher your CP, the less variation in your process.


Process capability…. (contd.)
There’s a second process capability index, CPK. In essence, this splits
the process capability of CP into two values.

CPK = the lesser of these two calculations:


USL – mean / 3 or mean – LSL / 3
In addition to the lower and upper specification limits, there’s another
pair of limits that should be plotted for any process – the lower control
limit (LCL) and the upper control limit (UCL). These values mark the
minimum and maximum inherent limits of the process, based on data
collected from the process. If the control limits are within the
specification limits or align with them, then the process is considered to
be capable of meeting the specifications. If either or both of the control
limits are outside the specification limits, then the process is
considered incapable of meeting the specifications.
Relationship between Process Variability
and Specification Width
› Three possible ranges for Cp

– Cp = 1, as in Fig. (a), process


variability just meets specifications

– Cp ≤ 1, as in Fig. (b), process not capable


of producing within specifications

– Cp ≥ 1, as in Fig. (c), process


exceeds minimal specifications

› One shortcoming, Cp assumes that the


process is centered on the specification
range

› Cp=Cpk when process is centered


Computing the Cp Value at Cocoa Fizz: three bottling machines are being
evaluated for possible use at the Fizz plant. The machines must be capable of
meeting the design specification of 15.8-16.2 oz. with at least a process capability
index of 1.0 (Cp≥1)

› The table below shows the information › Solution:


gathered from production runs on each
machine. Are they all acceptable? – Machine A
USL  LSL .4
Cp   1.33
6σ 6(.05)
Machine σ USL-LSL 6σ
– Machine B
A .05 .4 .3
Cp=
B .1 .4 .6

C .2 .4 1.2 – Machine C

Cp=
Computing the Cpk Value at Cocoa Fizz

› Design specifications call for a target


value of 16.0 ±0.2 OZ.
(USL = 16.2 & LSL = 15.8)
› Observed process output has now
shifted and has a µ of 15.9 and a
σ of 0.1 oz.
 16.2  15.9 15.9  15.8 
Cpk  min , 
 3(.1) 3(.1) 
.1
Cpk   .33
.3

› Cpk is less than 1, revealing that the


process is not capable
When we start making efforts, many of the chance causes, which were
persisting, now start disappearing and improvement start coming in. This will
help to reduce the present spread of  3 ‘’ to lesser and lesser span as
shown in the picture below:
Spec. Limit (T)

(3) (3)
(- 6) (+ 6)
Spec. Mean
Six Sigma concept also professes similar idea
with certain approach changes.

With Six Sigma strategy an organisation can


achieve an incredible level of efficiency i.e. the
defects level can be brought down to a level of 3.4
parts per million.
±6 Sigma versus ± 3 Sigma

› Motorola coined “Six-sigma” to › PPM Defective for ±3σ versus


describe their higher quality efforts ±6σ quality
back in 1980’s

› Six-sigma quality standard is now a


benchmark in many industries
(including services)
– Before design, marketing ensures
customer product characteristics
– Operations ensures that product
design characteristics can be met by
controlling materials and processes to
6σ levels
– Other functions like finance and
accounting use 6σ concepts to
control all of their processes
Expected Defects listed for six processes with Cp values ranging from 1.00 to 2.00
The relationship between x and y

› Correlation: is there a relationship between 2 variables?


› Regression: how well a certain independent variable predict
dependent variable?
Correlation

› statistical technique that is used to measure and describe


a relationship between two variables (X and Y).
SCATTER DIAGRAM
Scatter Diagram is a graphical representation of
relationship between two variables. It can be between
a Cause and Effect and between two causes. It also
reveals the nature of relationship between two
variables and their approximate strength.
Dr.Buxton developed printed graph paper which
spurred the uses of Scatter diagram. In 1837 J.F.W.
Herschat, an Englishman, used Scatter diagram.
In 1950s Dr.K.Ishikawa popularised the use of Scatter
diagram.
SCATTER DIAGRAM (Contd..)
Let us say that we are interested in find out the orientation angle (measure)
before and after lapping in quartz crystal unit.
Plot the data on the graph.
If the emerging picture is something like this we say that there is a positive
relationship or positive correlation.

A 90
N
G
L 80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10

10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
Some examples of series of positive
correlation are:
Heights and weights;
Household income and
expenditure;
Amount of rainfall and yield of
crops.
SCATTER DIAGRAM (Contd..)
If the picture is slightly spread like this then we say that there is a possibility
of positive correlation

A 90
N
G
L 80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10

10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)
If it is like this we can say that there is ‘no correlation’ between them.

A 90
N
G
L 80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10

10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)

Some times emerging diagram can be like this, then we can say that there is
a possibility of negative correlation.

A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10

10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)

If it is like this, we can say that there is a negative relationship or negative


correlation between the two variables

A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10

10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
Some examples of series of negative
correlation are:
Volume and pressure of perfect
gas;
Current and resistance [keeping the
voltage constant] (R =V / I) ;
Price and demand of goods.
SCATTER DIAGRAM (Contd..)

Sometimes we may have Scatter like this also i.e. positive correlation upto
certain level and then negative.

A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10

10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
SCATTER DIAGRAM (Contd..)

It can be vice versa also i.e. negative correlation upto a particular level and
then positive.

A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10

10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
The Coefficient of Correlation

One of the most widely used statistics is the coefficient of


correlation ‘r’ which measures the degree of association
between the two values of related variables given in the data
set.
• It takes values from + 1 to – 1.
• If two sets or data have r = +1, they are said to be
perfectly correlated positively .
• If r = -1 they are said to be perfectly correlated
negatively; and if r = 0 they are uncorrelated.
Regression
› Correlation tells you if there is an association between x and y
but it doesn’t describe the relationship or allow you to predict
one variable from the other.

› To do this we need REGRESSION!


Regression
› Is the statistical technique for finding the best-fitting straight line
for a set of data.
› To find the line that best describes the relationship for a
set of X and Y data.
Regression Analysis
› Question asked: Given one variable, can we predict values
of another variable?

› Examples: Given the weight of a person, can we predict


how tall he/she is; given the IQ of a person, can we
predict their performance in statistics; given the
basketball team’s wins, can we predict the extent of a
riot. ...
Best-fit Line

› Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for
any value of x

› This will be the line that


ŷ = ax + b
minimises distance between
data and fitted line, i.e. slope intercept
the residuals
ε

= ŷ, predicted
value
= y i , true value
ε = residual
error
Regression Equation

Suppose we have a sample of size ‘n’ and it has two sets


of measures, denoted by x and y. We can predict the values
of ‘y’ given the values of ‘x’ by using the equation, called
the regression equation.
y* = a + bx
where the coefficients a and b are given by

The symbol y* refers to the predicted value of y from a given


value of x from the regression equation.
Example

› Local tennis club charges $5 per hour plus an annual


membership fee of $25.
› Compute the total cost of playing tennis for 10 hours per
month.
(predicted cost) Y = (constant) bX + (constant) a

When X = 10
Y= $5(10 hrs) + $25
Y = 75
When X = 30
Y= $5(30 hrs) + $25
Y = $175
Why Learn Probability?
› Nothing in life is certain. In everything we do, we gauge
the chances of successful outcomes, from business to
medicine to the weather
› A probability provides a quantitative description of the
chances or likelihoods associated with various outcomes
› It provides a bridge between descriptive and inferential
statistics

Probability

Population Sample
Statistics
Probabilistic vs Statistical Reasoning
› Suppose I know exactly the proportions of car makes in
California. Then I can find the probability that the first car
I see in the street is a Ford. This is probabilistic reasoning
as I know the population and predict the sample

› Now suppose that I do not know the proportions of car


makes in California, but would like to estimate them. I
observe a random sample of cars in the street and then I
have an estimate of the proportions of the population.
This is statistical reasoning
What is Probability?

› We measure “how often” using


Relative frequency = f/n
• As n gets larger,
Sample Population
And “How often”
= Relative frequency Probability
Basic Concepts

› An experiment is the process by which an


observation (or measurement) is obtained.
› An event is an outcome of an experiment,
usually denoted by a capital letter.
– The basic element to which probability is applied
– When an experiment is performed, a particular event
either happens, or it doesn’t!
Experiments and Events
› Experiment: Record an age
– A: person is 30 years old
– B: person is older than 65
› Experiment: Toss a die
– A: observe an odd number
– B: observe a number greater than 2
Basic Concepts
› Two events are mutually exclusive if, when one
event occurs, the other cannot, and vice versa.

Not Mutually
•Experiment: Toss a die Exclusive

–A: observe an odd number


–B: observe a number greater than 2
–C: observe a 6 B and C?
–D: observe a 3 B and D?

Mutually
Exclusive
Basic Concepts
› An event that cannot be decomposed is
called a simple event.
› Denoted by E with a subscript.
› Each simple event will be assigned a
probability, measuring “how often” it
occurs.
› The set of all simple events of an
experiment is called the sample space, S.
Example
› The die toss:
› Simple events: Sample space:
1 E1
2
S ={E1, E2, E3, E4, E5, E6}
E2
3
S
E3 •E1 •E3
4
E4 •E5
5
E5 •E2 •E4 •E6
6
E6
Basic Concepts
› An event is a collection of one or more simple
events.
S
•E1 •E3
•The die toss: A •E5
– A: an odd number B
– B: a number > 2 •E2 •E4 •E6

A ={E1, E3, E5}


B ={E3, E4, E5, E6}
The Probability
of an Event
› The probability of an event A measures “how
often” A will occur. We write P(A).
› Suppose that an experiment is performed n times.
The relative frequency for an event A is

Number of times A occurs f



n n
• If we let n get infinitely large,
f
P ( A)  lim
n n
The Probability
of an Event
› P(A) must be between 0 and 1.
– If event A can never occur, P(A) = 0. If event A
always occurs when the experiment is performed,
P(A) =1.
› The sum of the probabilities for all simple
events in S equals 1.

• The probability of an event A is


found by adding the probabilities of
all the simple events contained in A.
Finding Probabilities
› Probabilities can be found using
– Estimates from empirical studies
– Common sense estimates based on equally likely
events.

• Examples:
–Toss a fair coin. P(Head) = 1/2
– Suppose that 10% of the U.S.
population has red hair. Then for a
person selected at random,
P(Red hair) = .10
Using Simple Events
› The probability of an event A is equal to the
sum of the probabilities of the simple events
contained in A
› If the simple events in an experiment are
equally likely, you can calculate

n A number of simple events in A


P( A)  
N total number of simple events
Example

Toss a fair coin twice. What is the probability


of observing at least one head?

1st Coin 2nd Coin E


P(Ei)
H HH 1/4 P(at least 1 head)
H
1/4 = P(E1) + P(E2) + P(E3)
T HT
= 1/4 + 1/4 + 1/4 = 3/4
H TH 1/4
T 1/4
T TT

You might also like