You are on page 1of 28

The Nature of Data

The Nature of Data


 Data is derived from objects, situations or Phenomenon.

 Data is used to classify, describe, improve or control


objects, situations or phenomenon

 Data can be captured that uses some kind of continuous


scale for differentiation. In other words, the scale used can
be subdivided into meaningful increments of precision.

 But we can also capture data by simply counting the


frequency of occurrence. This data can’t be subdivided and
is therefore said to be discrete .

Copyright Route Six Sigma, LLC 2003 2


Types of Data
Processes generate two different types of data
Continuous Data Discrete Data
Variable data Countable Data
Measurement data Attribute data
Pass/Fail
This data uses some sort of This data records the count of
measurement scale and occurrences of something
can be subdivided into happening or not happening.
increments Discrete data cannot be
meaningfully subdivided.

Copyright Route Six Sigma, LLC 2003 3


Populations and Samples
Data for making decisions can be captured from two
different sources, the Population and a Sample of the
population.
Population (N). Sample (n)
A set, or collection of all possible A subset of a population. In
objects or individuals of interest. Statistics, we’ll deal with a “random
This includes measurements of a sample”, or a sample that is so
specific parameter or characteristic
of a set, or collection of all possible chosen that every such sample
objects or individuals. chosen has an equal chance of
A population might be all the bags being selected.
of potato chips coming out of a If we select a random sample of 10
chip factory. bags of potato chips from the chip
Population data is generally not factory, we need to select the
available, or it is to expensive to sample in such a way that it could
collect and use. come from any of the produced
bags of potato chips.
We’ll use sample data to make decisions about the population
Copyright Route Six Sigma, LLC 2003 4
Continuous Data
 Measurements that can be
infinitely divisible on a scale
or continuum.
 i.e. Size; height; weight;
temperature; decibels; money.

Copyright Route Six Sigma, LLC 2003 5


All Processes Have Variation
 Systematic variation  Random variation
 Measurement differences that  Measurement differences
are expected and predictable that are not predictable
 We expect temperatures to be  A race horse will gallop five
warmer in Louisville Kentucky in furlongs with a different
the summer than in the winter time on different days

We expect data to vary, and, if it didn’t, we’d question how accurate it is,
but, because it varies, it makes using data for decision making a little more
challenging.
We normally won’t just use one data point for a decision, but rather collect
multiple pieces of data, and we’ll manage that collection to minimize
variation.
Thus variation is natural and expected, and it is the foundation of statistics
Copyright Route Six Sigma, LLC 2003 6
Precise or Accurate – Which
Way?
Data can be precise (small
variation) but not accurate
like these arrows on the
target

Or it can be
accurate but
lack precision
(large variation)
Copyright Route Six Sigma, LLC 2003 7
The primary Sources of Variation
Inadequate Design Margin

Process Capability Analysis for Camshaft Support

LSL Target USL


Process Data
USL 602.000
Within
Target 600.000
LSL 598.000 Overall
Mean 599.548
Sample N 100
StDev (Within) 0.576429
StDev (Overall) 0.620865

Potential (Within) Capability


Cp 1.16
CPU 1.42
CPL 0.90
Cpk 0.90

598 599 600 601 602


Cpm 0.87

Overall Capability Observed Performance Exp. "Within" Performance Exp. "Overall" Performance

Insufficient
Pp 1.07 PPM < LSL 10000.00 PPM < LSL 3621.06 PPM < LSL 6328.16
PPU 1.32 PPM > USL 0.00 PPM > USL 10.51 PPM > USL 39.19

Unstable Parts
PPL 0.83 PPM Total 10000.00 PPM Total 3631.57 PPM Total 6367.35
Ppk 0.83

Process
and Material
Capability
Copyright Route Six Sigma, LLC 2003 8
Types of Variation
Common cause Special Cause
 Unknown or chance cause of  Causes that are distinct and
variation inherent in any assignable to a specific element or
process. It is not controllable input to a process.
with the technology used in  These causes are generally
the process. controllable with the existing
 It is also known as residual or technology.
background noise.  These causes will effect the
 It limits the achievable variation in the process output
variation in the process. So, over time.
the common cause variation in  Special Causes are often
a process represents the best a categorized by 5 M’s
process can be, from a  Manpower
variation perspective.  Machinery
 Control or improvement of  Method
common cause variation  Measurement
requires action on the system  Materials
or process.  Environment

Copyright Route Six Sigma, LLC 2003 9


What is Statistics

Statistics is the science of


collecting data, classifying it,
graphing interpreting and
analyzing that data to derive
information and make
decisions about the data as
well as the system from which
the data came.

Copyright Route Six Sigma, LLC 2003 10


Types of Statistics

 Descriptive statistics  Inferential Statistics


include statistical data that describes deal with drawing conclusions about a
the present condition. Just as we population using information drawn
describe a person we met, or a from a sample of the population.
movie we have seen, we can use Inferential statistics is the science that
data to describe a process output or enables the news media to make
a defect occurrence. predictions about election campaigns.
Of course, inferential statistics have
limitations as experienced in the 2000
presidential elections.

Probability is the Link


between Descriptive Statistics and Inferential Statistics
Copyright Route Six Sigma, LLC 2003 11
The Quincunx
 Latin: Quinque Uncia
 Five ounces
 The quincunx is a training tool
that simulates a stable
process. While your process
may have a formed or
punched part or a cycle time
as an output, the quincunx's
output is a marble falling into
numbered slots that range
from 44 to 56. Also, like most
processes, the quincunx has
adjustments for centering or
targeting the output.

Copyright Route Six Sigma, LLC 2003 12


Normal Distribution
 The normal distribution is defined by two parameters: its mean and
standard deviation. It is a continuous, bell-shaped distribution
which is symmetric about its mean and can take on values from
negative infinity to positive infinity.
 The equation for calculation of the density is given as:


x  m  2

f x  
1 2s
2

e
s 2
 Note that calculation of expected value for the density function
requires knowledge of the value of the x, the value of the mean (m)
and the value of the standard deviation (s).

Copyright Route Six Sigma, LLC 2003 13


Describing Data Distributions
 Shape
This reflects the pattern of variation. Is it Symmetrical around the
mean, peaked, etc.
 Location or Central Tendency
This measurement indicates the center or midpoint of the
distribution of data. The most common of these is mean or average,
but median, or middlemost point is also used. Mode is still another
indicator, but one not generally used in statistics.
 Dispersion (Spread)
This measure provides information about the variability of the data.
The most common measures of variability are data range, data
variance and data standard deviation.

Copyright Route Six Sigma, LLC 2003 14


Distribution Shape
 Normal - the normal distribution is one of the
most important distributions used in probability.
It is useful for describing a variety of random
processes such as student test scores or the
size of metal parts.

 Poisson - The Poisson distribution is useful to


describe situations concerned with counting the
number of times that a certain type of event
occurs within a specified opportunity frame
such as a period of time or a physical region or
part.
 Binomial – The Binomial Distribution describes
the data that arises from counts or proportions
which are realizations of a discrete random
variable such as how many times an event
occurs in n repetitions of an experiment.
Copyright Route Six Sigma, LLC 2003 15
Measures of Central Tendency
 Average or Mean
The mean of a set of n values is simply the sum of all the
values divided by n
n

x i
X i 1
n
Where X represents the name of the variable being observed
and xi represents the ith value of x in the set of data. S
represents “sum of” and X represents the mean of the xi’s.
Note that m is used in lieu of X when the mean is of the
entire population.

Copyright Route Six Sigma, LLC 2003 16


Measures of Central Tendency
 Median
The median is the value of the middlemost term on a distribution. If
the data set has an odd number of data points, the median is the
middle term. If the number of data points is an even number, the
median is the mean of the two middle values.

Example:
For the data set 5,7,8,9,12,15,16, the median is “9”.
For the data set 5,7,8,9,12,16, the median is 8.5 (the average of 8
and 9.

Copyright Route Six Sigma, LLC 2003 17


Measures of Central Tendency
Mode
The mode is the value that appears most frequently in a set of data.
It’s not used much in statistics.

Example: Shoe sizes sold today


7.5, 8, 7.5, 10,10.5, 11, 11.5, 10.5, 9, 10.5, 8

Count the number of times each size appears. 10.5 is the mode.

Copyright Route Six Sigma, LLC 2003 18


Measures of Dispersion
 Range
The simplest of dispersion measurements. The range is simply
the measured difference between the largest measurement
and the smallest.
Deviation
The deviation is the difference between the measured value and
the mean of the data set from which drawn.
 x or
x
i
x i
x
Variance
The variance is the sum of the squares of the deviation divided by
the number N in the population, or the degrees of freedom (n-1) for
a sample set. N 2 2 n 2 2
 (x )m (x ) x
2
s 
i 1
N
i
s 2 i 1 i
n 1
Copyright Route Six Sigma, LLC 2003 19
Measures of Dispersion
 Standard Deviation
 This is the most common means for measuring
Histogram of normal2, with Normal Curve
dispersion in statistics. The standard deviation is
simply the square root of the variance. It is denoted
by s for a population and s or s hat for a sample.
20
Mean (m) Standard
Deviation (s)
 x i  m   x i  x 
N 2 n 2
Frequency

s i 1
s  sˆ  i 1
10 N n 1

45 50 55
normal2

Copyright Route Six Sigma, LLC 2003 20


Discrete Data
Counted data
Number of orders processed
Number of defects

Product Attributes or Levels


First Class Tickets; Technical degreed people on staff
This is commonly counted data

Artificial scales
Likert Scales
Good; Better; Best
Agree; Neutral; Disagree

Copyright Route Six Sigma, LLC 2003 21


Binomial Distribution
 Binomial Distribution is commonly associated with binary data (two possible
outcomes- good/bad; pass fail).
 The sampled trials are identical; I.e., the same trial is performed under identical
conditions.
 The trials are independent – the outcome of one trial does not effect or influence the
outcome of the second trial.
 The probability (p) of “success” on each trial is the same.
 Where n is the number of items sampled and X is the number of items having
a defect

P (x )  p (1  p )n  x
x

 Notice that the distribution is very slightly skewed.


 This distribution is the basis for the common forms of attribute control charts.

0.3
Density

0.2

0.1

0.0

Copyright Route Six Sigma, LLC 2003


0 5
binomial
10
22
Poisson Distribution
The Poisson distribution is most commonly associated with modeling
the number of random occurrences of some phenomenon in a specified
space or time.
To calculate the probability of X random occurrences of an event if the
over time average is m , the Poisson distribution is described by the
equation
e m
m x
Where x=0,1,2,…
P( x)  and e=2.71828
x!

0 .3

0 .2
Density

0 .1

0 .0

0 5 10 15
p o is s o n
Copyright Route Six Sigma, LLC 2003 23
Defects per Unit
 Since we are counting to accumulate data, a
common thread will that it will be the count per
unit of measure.
 I.e., number of defects on a PC Board.
 It is common to report this data as
Defects per Unit (DPU).
 But a computer mother board will not have as
many opportunities for a defect as a Alcatel card
or a sun CPU
 Is it accurate to report defects as DPU in both
instances and compare the results?

DPU forms the foundation for Six Sigma using discrete data.
Copyright Route Six Sigma, LLC 2003 24
Defects per Opportunity
 To accurately compare discrete defect data from
different processes or products, it is appropriate to
include the number of opportunities for a defect to
occur in each unit as well as the number of units.
Defect count
Defect per opportunity 
Units * (opportunities per unit)
 Given this relationship, what should be our goal for
the opportunities?
 Reduce the number of opportunities
 Increase the ability of each opportunity to perform without
defects

DPO is the probability of a defect on any one CTQ or step of the process.

Copyright Route Six Sigma, LLC 2003 25


The Bead Box
 This bead box represents a
population of product or service
outputs. The white beads
represent good outcomes and
the colored beads represent
outcomes with some defect.

 We can use the power of


statistical sampling to learn the
relationship of defective
outcomes to good outcomes
without looking at every output
in the population.

Copyright Route Six Sigma, LLC 2003 26


Yield: A Common Discrete
Measure
How do you measure Yield?
Number of units that pass
Yield 
Number of units tested
Does this yield take into account all the defects generated
by your process?

What’s missing?

Would a yield measurement that looked at yield at each


operational element be more accurate?

Yield  Yield Op1 * Yield Op2 * Yield Op3 * ...* Yield Opn

Copyright Route Six Sigma, LLC 2003 27


Rolled Throughput Yield
 What is the probability of accomplishing a process error free?
Prob(good) Op1* Prob(good) Op2* Prob(good) Op3* Prob(good) Op n
 Does this look familiar?
 This is known as Rolled Throughput Yield
 Rolled Throughput yield takes into account the yield at each step
of completion, the yield for every opportunity for a defect.
 Complexity plays a major part here.
• 4s capability across 50 steps produces a rolled-throughput yield of .9937950
= .7324, or 73.24%.
• 4s capability across 100 steps produces a rolled-throughput yield of
.99379100 = .5364, or 53.64%.

Copyright Route Six Sigma, LLC 2003 28

You might also like