You are on page 1of 43

Descriptive Statistics

PROPRIETARY
Descriptive Statistics

Using
Usingscientific
scientificmethods
methods
to
tocollect,
collect,organize,
organize,
summarize,
summarize,andandpresent
present
data.
data.

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 2 PROPRIETARY


Population

The population consists of all data points (finite


or infinite) defined by certain parameters

Examples:
• All of the Ford seatbelts ever made on A-line
• All purchase orders dated 2003
• All parts made on machine D since the last
improvement was implemented
• The height of every person in the US

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 3 PROPRIETARY


Sample

A sample consists of a selected number of data


points from the population which are
representative of the population without having
to account for it completely

Examples:
• 100 of the Ford seatbelts from A-line
• 20 purchase orders from every month of 2003
• The first 60 parts from machine D since the last
improvement was implemented
• The heights of 3,000 people from all over the US

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 4 PROPRIETARY


Describing Data

Central tendency
• Where is the middle?

Dispersion
• How spread out is the data ?

Shape of the distribution


• How are the data distributed?

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 5 PROPRIETARY


Measures of Central Tendency

Mean
• The sum of all the data
divided by the number of
data

Median
• The numeric middle

Mode
• The most frequently
occurring value

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 6 PROPRIETARY


Mean
Mathematical Notation
• The sum of all the data xxnn 
divided by the number of XX 
xx11xx22 xx33
   xxi i
data nn nn
x = x , x , etc.
• Commonly called i 1

n = Number of data in a sample


2

average  = Summation of

• Usually the best Example:


indicator of central 5,3,6,4,7,5,9,6,4,3,2,6
tendency

 55,,33,,66,,44,,77,,55,,99,,66,,44,,33,,22,,66
XX 
12
12
XX 55

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 7 PROPRIETARY


Symbols
Population
Sample Estimates Parameters
(Roman Letters) (Greek letters)
Sample
Size n N
Mean
X 

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 8 PROPRIETARY


Median

• The middle value Example:


• As many samples above 5,3,6,4,7,5,9,6,4,3,2,6
as there are below when
arranged in numerical
sequence 22,,33,,33,,44,,44,,55,,55,,66,,66,,66,,77,,99
median  55
median

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 9 PROPRIETARY


Mean Vs. Median
Salaries of randomly selected US government employees

Case 1 Case 2
Worker 1 $30,000 Worker 1 $30,000
Worker 2 $35,000 Worker 2 $35,000
Worker 3 $40,000 Worker 3 $40,000
Worker 4 $45,000 Worker 4 $45,000
Worker 5 $50,000 US President $400,000

Find
Findthe
themean
meanand
andthe
themedian
medianfor forboth
bothcases.
cases.
Sometimes
Sometimesthe
themedian
medianisisaabetter
betterindicator
indicatorofofcentral
central
tendency
tendency(e.g.
(e.g.when
whenoutliers
outliersexist
existininthe
thedata).
data).

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 10 PROPRIETARY


Mode

• The most common value Example:


• The peak of a 5,3,6,4,7,5,9,6,4,3,2,6
distribution
• A very weak indicator of
the center 22,,33,,33,,44,,44,,55,,55,,66,,66,,66,,77,,99
mode  66
mode
• Distributions with two
peaks are called
“bimodal”

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 11 PROPRIETARY


Measures of Dispersion

Range
• The largest number minus
the smallest

Variance
• A measure of the difference
between the points and the
mean

Standard Deviation
• The square root of the
variance

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 12 PROPRIETARY


Symbols
Population
Sample Estimates Parameters
(Roman Letters) (Greek letters)
Sample
Size n N
Mean
X 
Range
R R
Standard
Deviation s 
Variance
s2 2

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 13 PROPRIETARY


Range

• Largest number minus Example:


the smallest 5,3,6,4,7,5,9,6,4,3,2,6
• Good for small groups of
data
• Easy to calculate 22,,33,,33,,44,,44,,55,,55,,66,,66,,66,,77,,99
• Easily distorted by one
unusually large or small RR  9922  77
datum (outlier)

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 14 PROPRIETARY


Variance

• A measure of the
difference between the 22
ss 

  X
X 
 X
X
ii  22

points and the mean


• Used in statistical tests
nn11
to represent dispersion
• Units of variance are the Exercise:
Exercise:
data’s original units Determine
Determinethe themean,
mean,
squared variance,
variance,and
andstandard
standard
deviation
deviationofofthe
thefollowing
following
sample:
sample:
8,
8,13,
13,7,
7,10,
10,12,
12,11,
11,10,
10,99

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 15 PROPRIETARY


Calculations

n = _8__

X = _10_

Sum of
X = 10 Squares= _28_
2
s = _4__

s = _2__

Instructions
Total
Divide
Take
Count
Square
Subtract
Plot
Divide
Addthe each
the
upeach
the
square
all
the
total
number
point
sum
of
term
mean
of
the
root
on
of
the
of
in
Xi 8 13 7 10 12 11 10 9 80
of
squared
points
samples
the
from
the
squares
the
second
variance
by
each
graph
terms
and
thebypoint
row
number
add
with
n-1
toand
find
to
the
Xi-X -2 3 -3 0 2 1 0 -1 of
write
(degrees
the
respect
samples
find
samples
sum
the
that
it’s
standard
to
of
of
together
value
deviation
the
to
squares
freedom)
find
mean
in the
to
(Xi-X)2 4 9 9 0 4 1 0 1 28 to from
find (X
{find
deviation
third
mean
the
the
ai-X)
row 2
total
variance
mean }

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 17 PROPRIETARY


Standard Deviation

• The square root of the


variance
• Most common measure ss 

  X
X 

ii X
X  22

of spread for collections nn11


of data larger than 10
items
• Works fine for small
samples
• Not easily “pulled” by
one outlier

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 18 PROPRIETARY


Exercise
For each data set, circle which measure of “middle” and “spread” to
use. (More than one is possible)

Data Set Middle Spread

2,5,3 Mean Range


Median Variance
Mode Standard Deviation

3,4,6,1,4,5,7,2,4,1, Mean Range


Median Variance
1000,5,7,3
Mode Standard Deviation

3,4,6,1,4,5,7,2,4, Mean Range


Median Variance
1,5,7,3,3,7,4
Mode Standard Deviation

2,4,6,651 Mean Range


Median Variance
Mode Standard Deviation

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 19 PROPRIETARY


Measures of Data

Mean,
Mean, median,
median, range,
range, standard
standard
deviation,
deviation, and
and variance
variance apply
apply
regardless
regardless ofof how
how the
the data
data are
are
distributed.
distributed.

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 20 PROPRIETARY


Distribution Shapes

• Describes how the data


is dispersed
• Types of distributions
• Normal
• Gaussian
• “Bell curve”
• F distribution
• T distribution
• Chi-square distribution
• Uniform distribution
• Weibull distribution

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 21 PROPRIETARY


Dispersion (Spread)

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
cm

A bag of marbles is sorted


according to size.
AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 22 PROPRIETARY
Dotplot
Number of cases

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
cm
Size of marble

The result is a natural graph of the number of


cases (frequency) vs. the marble diameter. This is
called a dotplot.
AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 23 PROPRIETARY
Histogram
Number of cases

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
cm
Size of marble

If a bar chart is made with bars at the height of the


marble stacks, the result is called a histogram.

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 24 PROPRIETARY


Distribution Curve
Number of cases

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
cm
Size of marble

If an infinite number of marbles were measured,


and the increment size was infinitesimal, the result
would be a continuous distribution curve.
AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 25 PROPRIETARY
Estimating Frequency
Number of cases

0.5 0.6 0.7 0.8 0.9 1.0 1.1 1.2 1.3 1.4 1.5
cm
Size of marble

The number of cases that happen between two


points is approximately the area under the
distribution curve between those two points.
AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 26 PROPRIETARY
Data Distributions

Mathematical models Types of Distributions


• If the data fits the model, the • Normal
model can be used to
represent the population
• Gaussian
• “Bell curve”
• Used in established
mathematical formulas • F distribution
• Entire population can be • T distribution
expressed with a few
parameters
• Chi-square distribution
• Future points from the same • Uniform distribution
population can be predicted • Weibull distribution

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 27 PROPRIETARY


The Normal Distribution

• Occurs often in nature Mean, median,


• Symmetric and mode

• The left half is the mirror


image of the right half
• The mean, median, and
mode all occur exactly in
the middle of the curve
• Once the mean and
standard deviation of the
normal curve have been
specified, the curve is
completely known

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 28 PROPRIETARY


Properties of the Normal Distribution

• About 68% of all cases


occur within +/-1
standard deviation of the
mean
• About 95% of all cases
occur within +/-2
standard deviations of
the mean -3 -2 -1 +1 +2 +3

• About 99.7% of all cases 68%

occur within +/-3 95%


99.7%
standard deviations of
the mean

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 29 PROPRIETARY


Marble Exercise

If the marble sizes are normally distributed


• What percent of marbles will be rejected if the size
specifications are:
• 2 standard deviations above and below the mean?
• The upper specification is set at the mean, and the lower
specification is set 12 standard deviations below the mean?
• 6 standard deviations below and 1 standard deviation above the
mean?
• If nothing disturbs the process, what are the chances
of getting a marble more than 6 standard deviations
below the mean?

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 30 PROPRIETARY


Describing Data with
Graphs

PROPRIETARY
Pareto Chart
Pareto Chart of defect
250 100

200 80

150 60

Percent
Count

100 40

50 20

0 0
defect scratch chip dent bubble Other
Count 125 65 32 15 8
Percent 51.0 26.5 13.1 6.1 3.3
Cum % 51.0 77.6 90.6 96.7 100.0

“The important few, vs. the unimportant many.”


AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 32 PROPRIETARY
Pareto Chart

• A pareto Chart is a bar 250


Pareto Chart of defect

100

chart of frequencies with 200 80

the input sorted such 150 60

Percent
Count
that the category with 100 40

the most occurrences is 50 20

on the far left 0


defect scratch chip dent bubble Other
0

• The red line is the


Count 125 65 32 15 8
Percent 51.0 26.5 13.1 6.1 3.3
Cum % 51.0 77.6 90.6 96.7 100.0

cumulative frequency
A
A pareto
pareto chart
chart is
is used
used to
to
prioritize
prioritize resources
resources to to where
where
they
they will
will have
have the
the most
most impact
impact

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 33 PROPRIETARY


Time Series Plot
Time Series Plot of travel time
70

60

50
travel time

40

30

20

10

1 10 20 30 40 50 60 70 80 90 100
Index

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 34 PROPRIETARY


Time Series Plot

• Plots the individual 70


Time Series Plot of travel time

points in consecutive 60

order 50

travel time
40

• Places the data in the 30

context of time 20

10

• Visually allows changes 0

1 10 20 30 40 50 60 70 80 90 100

over time to be seen Index

• Interpretation can still be


A
A time
time series
series plot
plot is
is used
used to
to
subjective
put
put data
data in
in context
context with
with time
time

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 35 PROPRIETARY


Process Behavior Chart

I Chart of travel time


1
70 1 1
UCL=64.35
60

50 Limits that
Individual Value

indicate
40
_ unusual
X=34.60
30 outcomes

20

10
LCL=4.85
0
1 1
Mean 1 10 20 30 40 50 60 70 80 90 100
Observation

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 36 PROPRIETARY


Process Behavior Chart

• Data is displayed in the same I Chart of travel time

manner as in a Time Series 70


1
1 1
UCL=64.35
Plot 60

• Green line indicates the mean


50

Individual Value
40
of the data 30
_
X=34.60

• Red lines indicate process 20

limits 10
LCL=4.85
• They are calculated from the 0
1 1

data itself 1 10 20 30 40 50 60
Observation
70 80 90 100

• Points outside the lines indicate


unusual outcomes or potential
signals A
A process
process behavior
behavior chart
chart is
is
• Often called control limits
used
used toto look
look for
for exceptional
exceptional
• Interpretation is more
variation
objective variation inin time
time series
series data
data

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 37 PROPRIETARY


Histogram
Histogram of travel time
Normal
25 Mean 34.60
StDev 10.48
N 100

20

15
Frequency

10

0
0 10 20 30 40 50 60 70
travel time

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 38 PROPRIETARY


Histogram

• The histogram plots the Histogram of travel time


Normal

variable (travel time) vs. the 25 Mean


StDev
34.60
10.48

number of occurrences 20
N 100

(frequency) 15

Frequency
• Most of the travel times occur 10

between 20 and 50 minutes


5

• The data seems to be


0
centered at about 35 minutes 0 10 20 30 40
travel time
50 60 70

• The normal curve (the blue


line) represents normally A
A histogram
histogram is is used
used to
to
distributed data with the
visually
visually understand
understand howhow the
the
mean and standard deviation
estimated from the sample
data
data is
is distributed
distributed

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 39 PROPRIETARY


Summary

Measures of central tendency


• Mean, Median, and Mode
• Mean is usually preferred
Measures of dispersion
• Range, Variance, and Standard Deviation
• Standard Deviation is most often used
There are many types of data distributions
If your data are normally distributed
• 68% within one standard deviation of the mean
• 95% within two
• 99.7% within three

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 40 PROPRIETARY


Exercise

Class Heights
• Collect the heights of all the people in the class
• Determine the Mean, Median, Range, and
Standard Deviation of those heights
• Assume that heights are normally distributed,
and that the class members are typical of all
Autoliv employees. About what percent of
Autoliv employees are over 6 feet (1.8 meters)
tall?

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 41 PROPRIETARY


Answers to Marble Exercise

• Question
• What percent of marbles will be rejected if the size specifications are set at 2
standard deviations above and below the mean?
• Solution
• Two standard deviations on both sides will include 95% of the parts.
Therefore, the amount rejected will be: 100%-95% = 5%

• Question
• What percent of marbles will be rejected if the upper specification is set at
the mean and the lower specification is set at 12 standard deviations below
the mean?
• Solution
• Half of the parts will fall below the mean and almost no parts will fall below
12 standard deviations. Therefore, the amount rejected will be: 50%-0% =
50%

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 42 PROPRIETARY


Answers to Marble Exercise cont.

• Question
• What percent of marbles will be rejected if the size specifications are set
at 6 standard deviations below the mean and 1 standard deviation above
the mean?
• Solution
• Six standard deviations below the mean will include about 50% of the
parts. One standard deviation above the mean will include about 68%/2
= 34%. Therefore, the amount rejected will be: 100%-(50%+34%) = 16%

• Question
• If nothing disturbs the process, what are the chances of getting a marble
more than 6 standard deviations below the mean?
• Solution
• Almost zero

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 43 PROPRIETARY


References

“Descriptive Statistics” -Promontory


Management Group
• (c) 2001 Promontory Management Group, Inc.

AUTOLIV-AOA/FW-TR/6-27-03/What is 6 Sigma - 44 PROPRIETARY

You might also like