Statistics For 6 Sigma

Statistics for Six Sigma
Why a 6 sigma practioner needs to

Know about Statistics
› To be able to effectively conduct 6 sigma
investigation. Without the use of statistics it
would be very difficult to make decisions based
on the data collected .
› To further develop critical and analytic thinking
skills.
› To act as an informed investigator.
› To know how to properly analyze information
› To know how to draw conclusions about
populations based on sample information
Key Definitions
› A population (universe) is the collection of
things under consideration
› A sample is a portion of the population
selected for analysis
› A parameter is a summary measure computed
to describe a characteristic of the population
› A statistic is a summary measure computed
to describe a characteristic of the sample
Population and Sample
Population Sample
Use statistics to
summarize features
Use parameters to
summarize features
Inference on the population from the sample

Statistical Methods
› Descriptive statistics
– Collecting and describing data
› Inferential statistics
– Drawing conclusions and/or making decisions
concerning a population based only on sample data
Descriptive Statistics
› Collect data
– e.g. Survey
› Present data
– e.g. Tables and graphs
› Characterize data
– e.g. Sample mean = X i
n
Why We Need Data
› To provide input to survey
› To provide input to study
› To measure performance of service or production
process
› To evaluate conformance to standards
› To assist in formulating alternative courses of action
› To satisfy curiosity
Data Sources
Primary Secondary
Data Collection Data Compilation
Print or Electronic
Observation Survey
Experimentation
Statistical Inquiry
Primary and Secondary Data
The difference between the primary and the secondary

data is only one of degree of detachment with the
original source. The data which is primary in the hands
of one may become secondary in the hands of others.
For example, if it is desired to conduct an investigation
into the working conditions or workers of textile mills,
the facts collected by the investigators directly from the
workers themselves would be termed as the primary
data. But if the information is obtained from a report
prepared by the labour department of the Government,
will be called secondary data.
Types of Data
D a ta
Categorical Num erical

(Q ualitative) (Q uantitative)
Discrete Continuous
Key Terms
› Measures of central tendency: statistical measurements
such as the mean, median or mode that indicate how
data groups toward the center.
› Measures of variation or dispersion: statistical
measurement such as the range and standard deviation
that indicate how data is dispersed or spread.
Measures of Central Tendency
› Find the mean 9

8
7
› Find the median 6

5
4
› Find the mode 3

2
1
0
Key Terms
› Mean: the arithmetic average of a set of data or sum of
the values divided by the number of values.
› Median: the middle value of a data set when the values
are arranged in order of size.
› Mode: the value or values that occur most frequently in a
data set.
Find the mean of a data set.
1. Find the sum of the values.

2. Divide the sum by the total number of values.
Mean = sum of values

number of values
Here’s an example.
Sales figures for the last week for the Western region
have been as follows:
› Monday Rs 4,200
› TuesdayRs 3,980
› Wednesday Rs 2,400
› Thursday Rs 3,100
› Friday Rs 4,600
› What is the average daily sales figure?
› Rs 3,656
Try these examples.
› Mileage for the new salesperson has been 243, 567, 766,
422 and 352 this week. What is the average number of
miles traveled?
› 470 miles daily
› Prices from different suppliers of 500 sheets of copier

paper are as follows: Rs 399, Rs 475, Rs 375 and Rs 425.
What is the average price?
› Rs 419
Find the median.
› Arrange the values in the data set from smallest to largest

(or largest to smallest) and select the value in the middle.
› If the number of values is odd, it will be exactly in the
middle.
› If the number of values is even, identify the two middle
values. Add them together and divide by two.
Here is an example.
› A recent survey of the used car market for the particular

model John was looking for yielded several different
prices. Find the median price.
› $9,400, $11,200, $5,900, $10,000, $4,700, $8,900, $7,800

and $9,200.
› Arrange from highest to lowest:

$11,200, $10,000, $9,400, $9,200, $8,900, $7,800, $5,900
and $4,700.
› Calculate the average of the two middle values.

› $9050 is the median price.
Try this example.
› Five local moving companies quoted the following
prices to Bob’s Best Company: $4,900, $3800, $2,700,
$4,400 and $3,300. Find the median price.
› $3,800
Find the mode.
› Find the mode in a data set by counting the

number of times each value occurs.
› Identify the value or values that occur most
frequently.
› There may be more than one mode if the same
value occurs the same number of times as
another value.
› If no one value appears more than once, there is
no mode.
Find the mode in this data set.
› Results of a placement test in mathematics

included the following scores:
65, 80, 90, 85, 95, 85, 80, 70 and 80.
› Which score occurred the most frequently?
› 80 is the mode. It appeared three times.
Key Terms
› Range: the difference between the highest and lowest

values in a data set. (also called the spread)
› Deviation from the mean: the difference between a value
of a data set and the mean.
› Standard variation: a statistical measurement that shows
how data is spread above and below the mean.
Key Terms
› Variance: a statistical measurement that is the

average of the squared deviations of data from
the mean. The square root of the variance is
the standard deviation.
› Square root: the quotient of number which is the
product of that number multiplied by itself. The
square root of 81 is 9. (9 x 9 = 81)
› Normal distribution: a characteristic of many
data sets that shows that data graphs into a
bell-shaped curve around the mean.
Find the range in a data set
› Find the highest and lowest values.

› Find the difference between the two.
› Example: The grades on the last exam were 78,
99, 87, 84, 60, 77, 80, 88, 92, and 94.
The highest value is 99.
The lowest value is 60.
The difference or the range is 39.
Find the standard deviation
› The deviation from the mean of a data value is

the difference between the value and the mean.
› Get a clearer picture of the data set by
examining how much each data point differs or
deviates from the mean.
Deviations from the mean
› When the value is smaller than the mean, the

difference is represented by a negative number
indicating it is below or less than the mean.
› Conversely, if the value is greater than the

mean, the difference is represented by a positive
number indicating it is above or greater than the
mean.
Find the deviation from the mean.
› Find the mean of a set of data.
› Mean = Sum of data values
Number of values
› Find the amount that each data value deviates or is
different from the mean.
› Deviation from the mean = Data value - Mean
› Data set: 38, 43, 45, 44

› Mean = 42.5
› First value: 38 – 42.5 = -4.5 below the mean
› Second value: 43 – 42.5 = 0.5 above the mean
› Third value: 45 – 42.5 = 2.5 above the mean
› Fourth value: 44 – 42.5 = 1.5 above the mean
Interpret the information
› One value is below the mean and its deviation is

-4.5.
› Three values are above the mean and the sum of
those deviations is 4.5.
› The sum of all deviations from the mean is zero.
This is true of all data sets.
› We have not gained any statistical insight or new
information by analyzing the sum of the deviations
from the mean.
Average deviation
› Average deviation =
Sum of deviations =0 =0
Number of values n
of a set of data.
› A statistical measure called the standard

deviation uses the square of each deviation
from the mean.
› The square of a negative value is always
positive.
› The squared deviations are averaged (mean)
and the result is called the variance.
of a set of data.
› The square root is taken of the variance so that

the result can be interpreted within the context
of the problem.
› This formula averages the values by dividing the
number of values (n).
› Several calculations are necessary and are best
organized in a table.
of a set of data.
1. Find the mean.
2. Find the deviation of each value from the mean.
3. Square each deviation.
4. Find the sum of the squared deviations.
5. Divide the sum of the squared deviations by the
number of values in the data set. This amount
is called the variance.
6. Find the standard deviation by taking the square
root of the variance.
Standard Deviation
Standard deviation measures variation of values

from the mean, using the following formula:
   (x – x )2
n
Where  = sum of, X = observed values, X bar

(X with a line over the top) = arithmetic mean,
and n = number of observations.
Standard Deviation (Contd..)
Average difference between any value in a series of values

and the mean of all the values in that series. This statistic is
a measure of the variation in a distribution of values.
If we plot enough values, we’ll likely find that the distribution
of values forms some variant of a bell-shaped curve. This
curve can assume various shapes. However, in a normal
curve, statisticians have determined that about 68.2% of
the values will be within 1 standard deviation of the mean,
about 95.5% will be within 2 standard deviations, and
99.7% will be within 3 standard deviations.
Standard Deviation (Contd..)
Specification limit
One of two values (lower and upper)

that indicate the boundaries of
acceptable or tolerated values for a
process.
Draw and interpret
a bar graph
› Write an appropriate title.
› Make appropriate labels for bars and scale. The intervals

should be equally spaced and include the smallest and
largest values.
› Draw horizontal or vertical bars to represent the data.

Bars should be of uniform width.
› Make additional notes as appropriate to aid

interpretation.
Sales Volume
2001-2004
Product 3
2004
2003
Product 2
2002
2001
Product 1
0 10 20 30 40 50
Thousands of Units
38
Interpret and draw
a line graph

› Make and label appropriate horizontal and
vertical scales, each with equally spaced
intervals. Often, the horizontal scale represents
time.
› Use points to locate data on the graph.
› Connect data points with line segments or a
smooth curve.
First Semester Sales
100
Thousands of $
80
60
40
20
0
Jan Feb Mar Apr May Jun
Judy Denise Linda Wally

40
Interpret and draw
a circle graph (Pie-Graph).

› Find the sum of values in the data set.
› Represent each value as a fraction or decimal
part of the sum of values.
› For each fraction, find the number of degrees in
the sector of the circle to be represented by the
fraction or decimal. (100% = 360°)
› Label each sector of the circle as appropriate.
Local Daycare Market Share

6%
16%
Teddy Bear
La La Land
43%
Little Gems
Other
35%
Make and interpret a frequency distribution.
› Identify appropriate intervals for the data.

› Tally the data for the intervals.
› Count the number in each interval.
90
80
70
60
50 East
40 West
30 North
20
10
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
Key Terms
› Class intervals: special categories for grouping

the values in a data set.
› Tally: a mark that is used to count data in class
intervals.
› Class frequency: the number of tallies or values
in a class interval.
› Grouped frequency distribution: a compilation
of class intervals, tallies, and class frequencies
of a data set.
HISTOGRAM
Histogram is a graphical representation of a frequency

distribution which is a summary of variation in a
product or process.
Dr W.A.Shewart, a physicist from Bell Laboratories
explained about variations in 1931 in his publication of
“Economic Control of Quality of Manufactured
product”.
Histogram is basically a graphical presentation of a
series of measurements grouped into continuous
classes or intervals.
45
10-15
40
15- 20
35 20-25
30 30-35
No of Crimes
35-40
25 40-45
20 45-50
15 50-55
10
0 0
AGE OF CRIMINAL
DISTRIBUTION
While individual measured values may all be different, as a
group they tend to exhibit a pattern. This is called
distribution which can be described by:
› Location (Process level or centering)
› Spread or dispersion (Range of values from smallest

to largest)
› Shape (Pattern of variation, whether symmetrical or

skewed etc.)
Distribution of Data
› Normal distributions › Skewed distribution
A
A - Original Process
Spread
B - Increase in spread
with same location
Change in process variation
B – Pattern is skewed
A - Original symmetrical
Shape pattern
Change in pattern of variation

In the figure Change in pattern of
variation the Original pattern (A) is
symmetrical but the new pattern (B) is
skewed. Even though the centering is
the same, the shapes or patterns are
different.
STABILITY
If the process characterised by distribution remains unchanged over a period
of time, then the process is said to be Stable and Repeatable. This can be
understood from the following depiction of process over a period of time, see
the figure below:
Target
Time
Stable and repeatable process

This pattern results when only common causes are present in the process.
COMMON CAUSES
The common causes are minute and many and are

individually not measurable. The pattern resulting from
the influence of common causes is called “State of
statistical control” or sometimes, just “In control”.
It is called statistical because the variation can be
described by statistical laws. It only common causes are
present and do not change, the output of a process is
predictable.
The advantages of maintaining a state of statistical
control are:
› Variation (inherent) is restricted to common causes.
› Since variability exhibits a regularity in its pattern, process

is repeatable.
› Since process is repeatable, quality of future production

can be predicted.
However, process level and variation may change due

to influence of causes additional to common causes.
Such causes are called special causes.
Special Causes
Examples of special causes are changes in setting, operator, material
input, etc. When they occur, they make the (overall) process distribution
change. Unless they are arrested, they will continue to affect the process
output in unpredictable ways as shown below:
Shift in process level Increase in variation
Time
Original
process Shift in process
level and variation
Unstable Process
Changes in process pattern due to
special causes can be either
detrimental or beneficial. When
detrimental, they need to be
identified and eliminated. When
beneficial, they need to be
perpetuated by making them a
permanent part of the process.
PROCESS CONTROL
This is the state where only common causes

are present. The proof of this situation is
when the pattern of variation conforms to the
statistical normal distribution.
It involves continuous monitoring of the
process for special causes and eliminating
them. Evidence of special causes is provided
by systematic patterns in process variability.
PROCESS CAPABILITY
A process should not only be in control but

also satisfactory in the sense that all the
production should meet specification
requirements.
This ability of a process to produce within the
variation permitted by tolerance is called
process capability.
LSL USL Process with reference to
specification limits
Process is in control (stable) and capable
Process is in control but not satisfactory
Process is capable but not in control

because process level is not properly
centered
Process not in control and not capable
The above can be used to classify a process based on capability and control.
Process Capability
› Product Specifications
– Preset product or service dimensions, tolerances
– e.g. bottle fill might be 16 oz. ±.2 oz. (15.8oz.-16.2oz.)
– Based on how product is to be used or what the customer expects
› Process Capability – Cp and Cpk

– Assessing capability involves evaluating process variability relative to
preset product or service specifications
– Cp assumes that the process is centered in the specification range
– Cpk helps to address a possible lack of centering of the
process
specification width USL  LSL
Cp  
process width 6σ
 USL  μ μ  LSL 
Cpk  min , 
 3σ © Wiley 2007
3σ 
Process capability…. (contd.)
The goal of Six Sigma is to reduce the standard deviation

of your process variation to the point that six standard
deviations (six sigma) can fit within your specification limits.
“The capability index (CP) of a process is usually expressed
as process width (the difference between USL & LSL)
divided by six times the standard deviation (six sigma) of
the process:
CP = USL – LSL / 6
The higher your CP, the less variation in your process.

Process capability…. (contd.)
There’s a second process capability index, CPK. In essence, this splits
the process capability of CP into two values.
CPK = the lesser of these two calculations:

USL – mean / 3 or mean – LSL / 3
In addition to the lower and upper specification limits, there’s another
pair of limits that should be plotted for any process – the lower control
limit (LCL) and the upper control limit (UCL). These values mark the
minimum and maximum inherent limits of the process, based on data
collected from the process. If the control limits are within the
specification limits or align with them, then the process is considered to
be capable of meeting the specifications. If either or both of the control
limits are outside the specification limits, then the process is
considered incapable of meeting the specifications.
Relationship between Process Variability
and Specification Width
› Three possible ranges for Cp
– Cp = 1, as in Fig. (a), process

variability just meets specifications
– Cp ≤ 1, as in Fig. (b), process not capable

of producing within specifications
– Cp ≥ 1, as in Fig. (c), process

exceeds minimal specifications
› One shortcoming, Cp assumes that the

process is centered on the specification
range
› Cp=Cpk when process is centered

Computing the Cp Value at Cocoa Fizz: three bottling machines are being
evaluated for possible use at the Fizz plant. The machines must be capable of
meeting the design specification of 15.8-16.2 oz. with at least a process capability
index of 1.0 (Cp≥1)
› The table below shows the information › Solution:

gathered from production runs on each
machine. Are they all acceptable? – Machine A
USL  LSL .4
Cp   1.33
6σ 6(.05)
Machine σ USL-LSL 6σ
– Machine B
A .05 .4 .3
Cp=
B .1 .4 .6
C .2 .4 1.2 – Machine C
Cp=
Computing the Cpk Value at Cocoa Fizz
› Design specifications call for a target

value of 16.0 ±0.2 OZ.
(USL = 16.2 & LSL = 15.8)
› Observed process output has now
shifted and has a µ of 15.9 and a
σ of 0.1 oz.
 16.2  15.9 15.9  15.8 
Cpk  min , 
 3(.1) 3(.1) 
.1
Cpk   .33
.3
› Cpk is less than 1, revealing that the

process is not capable
When we start making efforts, many of the chance causes, which were
persisting, now start disappearing and improvement start coming in. This will
help to reduce the present spread of  3 ‘’ to lesser and lesser span as
shown in the picture below:
Spec. Limit (T)
(3) (3)
(- 6) (+ 6)
Spec. Mean
Six Sigma concept also professes similar idea
with certain approach changes.
With Six Sigma strategy an organisation can

achieve an incredible level of efficiency i.e. the
defects level can be brought down to a level of 3.4
parts per million.
±6 Sigma versus ± 3 Sigma
› Motorola coined “Six-sigma” to › PPM Defective for ±3σ versus

describe their higher quality efforts ±6σ quality
back in 1980’s
› Six-sigma quality standard is now a

benchmark in many industries
(including services)
– Before design, marketing ensures
customer product characteristics
– Operations ensures that product
design characteristics can be met by
controlling materials and processes to
6σ levels
– Other functions like finance and
accounting use 6σ concepts to
control all of their processes
Expected Defects listed for six processes with Cp values ranging from 1.00 to 2.00
The relationship between x and y
› Correlation: is there a relationship between 2 variables?

› Regression: how well a certain independent variable predict
dependent variable?
Correlation
› statistical technique that is used to measure and describe

a relationship between two variables (X and Y).
SCATTER DIAGRAM
Scatter Diagram is a graphical representation of
relationship between two variables. It can be between
a Cause and Effect and between two causes. It also
reveals the nature of relationship between two
variables and their approximate strength.
Dr.Buxton developed printed graph paper which
spurred the uses of Scatter diagram. In 1837 J.F.W.
Herschat, an Englishman, used Scatter diagram.
In 1950s Dr.K.Ishikawa popularised the use of Scatter
diagram.
SCATTER DIAGRAM (Contd..)
Let us say that we are interested in find out the orientation angle (measure)
before and after lapping in quartz crystal unit.
Plot the data on the graph.
If the emerging picture is something like this we say that there is a positive
relationship or positive correlation.
A 90
N
G
L 80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10
10 20 30 40 50 60 70 80
90
ANGLE BEFORE LAPPING
Some examples of series of positive
correlation are:
Heights and weights;
Household income and
expenditure;
Amount of rainfall and yield of
crops.
If the picture is slightly spread like this then we say that there is a possibility
of positive correlation
A 90
N
G
L 80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10
10 20 30 40 50 60 70 80
90
If it is like this we can say that there is ‘no correlation’ between them.
A 90
N
G
L 80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10
10 20 30 40 50 60 70 80
90
Some times emerging diagram can be like this, then we can say that there is
a possibility of negative correlation.
A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10
10 20 30 40 50 60 70 80
90
If it is like this, we can say that there is a negative relationship or negative

correlation between the two variables
A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10
10 20 30 40 50 60 70 80
90
Some examples of series of negative
correlation are:
Volume and pressure of perfect
gas;
Current and resistance [keeping the
voltage constant] (R =V / I) ;
Price and demand of goods.
Sometimes we may have Scatter like this also i.e. positive correlation upto
certain level and then negative.
A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10
10 20 30 40 50 60 70 80
90
It can be vice versa also i.e. negative correlation upto a particular level and
then positive.
A 90
N
G
L
80
E
A 70
F
T 60
E
R 70
L
A 50
P
P 40
I
N 30
G
20
10
10 20 30 40 50 60 70 80
90
The Coefficient of Correlation
One of the most widely used statistics is the coefficient of

correlation ‘r’ which measures the degree of association
between the two values of related variables given in the data
set.
• It takes values from + 1 to – 1.
• If two sets or data have r = +1, they are said to be
perfectly correlated positively .
• If r = -1 they are said to be perfectly correlated
negatively; and if r = 0 they are uncorrelated.
Regression
› Correlation tells you if there is an association between x and y
but it doesn’t describe the relationship or allow you to predict
one variable from the other.
› To do this we need REGRESSION!

Regression
› Is the statistical technique for finding the best-fitting straight line
for a set of data.
› To find the line that best describes the relationship for a
set of X and Y data.
Regression Analysis
› Question asked: Given one variable, can we predict values
of another variable?
› Examples: Given the weight of a person, can we predict

how tall he/she is; given the IQ of a person, can we
predict their performance in statistics; given the
basketball team’s wins, can we predict the extent of a
riot. ...
Best-fit Line
› Aim of linear regression is to fit a straight line, ŷ = ax + b, to data that gives best prediction of y for
any value of x
› This will be the line that

ŷ = ax + b
minimises distance between
data and fitted line, i.e. slope intercept
the residuals
ε
= ŷ, predicted
value
= y i , true value
ε = residual
error
Regression Equation
Suppose we have a sample of size ‘n’ and it has two sets

of measures, denoted by x and y. We can predict the values
of ‘y’ given the values of ‘x’ by using the equation, called
the regression equation.
y* = a + bx
where the coefficients a and b are given by
The symbol y* refers to the predicted value of y from a given

value of x from the regression equation.
Example
› Local tennis club charges $5 per hour plus an annual

membership fee of $25.
› Compute the total cost of playing tennis for 10 hours per
month.
(predicted cost) Y = (constant) bX + (constant) a
When X = 10
Y= $5(10 hrs) + $25
Y = 75
When X = 30
Y= $5(30 hrs) + $25
Y = $175
Why Learn Probability?
› Nothing in life is certain. In everything we do, we gauge
the chances of successful outcomes, from business to
medicine to the weather
› A probability provides a quantitative description of the
chances or likelihoods associated with various outcomes
› It provides a bridge between descriptive and inferential
statistics
Probability
Population Sample
Statistics
Probabilistic vs Statistical Reasoning
› Suppose I know exactly the proportions of car makes in
California. Then I can find the probability that the first car
I see in the street is a Ford. This is probabilistic reasoning
as I know the population and predict the sample
› Now suppose that I do not know the proportions of car

makes in California, but would like to estimate them. I
observe a random sample of cars in the street and then I
have an estimate of the proportions of the population.
This is statistical reasoning
What is Probability?
› We measure “how often” using

Relative frequency = f/n
• As n gets larger,
Sample Population
And “How often”
= Relative frequency Probability
Basic Concepts
› An experiment is the process by which an

observation (or measurement) is obtained.
› An event is an outcome of an experiment,
usually denoted by a capital letter.
– The basic element to which probability is applied
– When an experiment is performed, a particular event
either happens, or it doesn’t!
Experiments and Events
› Experiment: Record an age
– A: person is 30 years old
– B: person is older than 65
› Experiment: Toss a die
– A: observe an odd number
– B: observe a number greater than 2
Basic Concepts
› Two events are mutually exclusive if, when one
event occurs, the other cannot, and vice versa.
Not Mutually
•Experiment: Toss a die Exclusive
–A: observe an odd number

–B: observe a number greater than 2
–C: observe a 6 B and C?
–D: observe a 3 B and D?
Mutually
Exclusive
Basic Concepts
› An event that cannot be decomposed is
called a simple event.
› Denoted by E with a subscript.
› Each simple event will be assigned a
probability, measuring “how often” it
occurs.
› The set of all simple events of an
experiment is called the sample space, S.
Example
› The die toss:
› Simple events: Sample space:
1 E1
2
S ={E1, E2, E3, E4, E5, E6}
E2
3
S
E3 •E1 •E3
4
E4 •E5
5
E5 •E2 •E4 •E6
6
E6
Basic Concepts
› An event is a collection of one or more simple
events.
S
•E1 •E3
•The die toss: A •E5
– A: an odd number B
– B: a number > 2 •E2 •E4 •E6
A ={E1, E3, E5}

B ={E3, E4, E5, E6}
The Probability
of an Event
› The probability of an event A measures “how
often” A will occur. We write P(A).
› Suppose that an experiment is performed n times.
The relative frequency for an event A is
Number of times A occurs f


n n
• If we let n get infinitely large,
f
P ( A)  lim
n n
The Probability
of an Event
› P(A) must be between 0 and 1.
– If event A can never occur, P(A) = 0. If event A
always occurs when the experiment is performed,
P(A) =1.
› The sum of the probabilities for all simple
events in S equals 1.
• The probability of an event A is

found by adding the probabilities of
all the simple events contained in A.
Finding Probabilities
› Probabilities can be found using
– Estimates from empirical studies
– Common sense estimates based on equally likely
events.
• Examples:
–Toss a fair coin. P(Head) = 1/2
– Suppose that 10% of the U.S.
population has red hair. Then for a
person selected at random,
P(Red hair) = .10
Using Simple Events
› The probability of an event A is equal to the
sum of the probabilities of the simple events
contained in A
› If the simple events in an experiment are
equally likely, you can calculate
n A number of simple events in A

P( A)  
N total number of simple events
Example
Toss a fair coin twice. What is the probability

of observing at least one head?
1st Coin 2nd Coin E

P(Ei)
H HH 1/4 P(at least 1 head)
H
1/4 = P(E1) + P(E2) + P(E3)
T HT
= 1/4 + 1/4 + 1/4 = 3/4
H TH 1/4
T 1/4
T TT

Statistics For 6 Sigma

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics For 6 Sigma

Uploaded by

Copyright:

Available Formats

Statistics for Six Sigma

Why a 6 sigma practioner needs to

Inference on the population from the sample

The difference between the primary and the secondary

Categorical Num erical

› Find the mean 9

› Find the median 6

› Find the mode 3

1. Find the sum of the values.

Mean = sum of values

› Prices from different suppliers of 500 sheets of copier

› Arrange the values in the data set from smallest to largest

› A recent survey of the used car market for the particular

› $9,400, $11,200, $5,900, $10,000, $4,700, $8,900, $7,800

› Arrange from highest to lowest:

› Calculate the average of the two middle values.

› Find the mode in a data set by counting the

› Results of a placement test in mathematics

› Range: the difference between the highest and lowest

› Variance: a statistical measurement that is the

› Find the highest and lowest values.

› The deviation from the mean of a data value is

› When the value is smaller than the mean, the

› Conversely, if the value is greater than the

› Data set: 38, 43, 45, 44

› One value is below the mean and its deviation is

› A statistical measure called the standard

› The square root is taken of the variance so that

Standard deviation measures variation of values

Where  = sum of, X = observed values, X bar

Average difference between any value in a series of values

One of two values (lower and upper)

› Write an appropriate title.

› Make appropriate labels for bars and scale. The intervals

› Draw horizontal or vertical bars to represent the data.

› Make additional notes as appropriate to aid

› Write an appropriate title.

First Semester Sales

Judy Denise Linda Wally

› Write an appropriate title.

Local Daycare Market Share

› Identify appropriate intervals for the data.

› Class intervals: special categories for grouping

Histogram is a graphical representation of a frequency

› Location (Process level or centering)

› Spread or dispersion (Range of values from smallest

› Shape (Pattern of variation, whether symmetrical or

Change in process variation

Change in pattern of variation

Stable and repeatable process

The common causes are minute and many and are

› Variation (inherent) is restricted to common causes.

› Since variability exhibits a regularity in its pattern, process

› Since process is repeatable, quality of future production

However, process level and variation may change due

Shift in process level Increase in variation

This is the state where only common causes

A process should not only be in control but

Process is in control (stable) and capable

Process is in control but not satisfactory

Process is capable but not in control

Process not in control and not capable

› Process Capability – Cp and Cpk