You are on page 1of 29

07/03/2023

Course Objectives D M A I C

At the conclusion of the training module, the


participants are expected to:

1) Understand the meaning of statistics and


appreciate its uses in studies
Basic Statistics 2) Learn basic methods to plan, collect, organize
and present data; and
3) Learn how to use basic statistical tools to draw
conclusions from a sample data.

Confidential Proprietary

1 2

1
07/03/2023

Course Outline D M A I C D M A I C

1. Introduction
Part 1.0
2. Rationale of Statistics
3. Data Collection
4. The Normal Distribution Introduction
1. Descriptive Statistics

2. Measures of Location (Central Tendency)


3. Measures of Spread (Variability)
4. Measures of Shape

Page 3 • Basic Statistics • Jan-15 Confidential Proprietary Page 4 • Basic Statistics • Jan-15 Confidential Proprietary

3 4

2
07/03/2023

D M A I C What is Statistics D M A I C

A branch of applied mathematics


1.1 concerned with describing and
interpreting a collection of data and with
drawing conclusions about populations
from a knowledge of the characteristics of
a sample.

Rationale of The science of data handling

Statistics
Page 5 • Basic Statistics • Jan-15 Confidential Proprietary Page 6 • Basic Statistics • Jan-15 Confidential Proprietary

5 6

3
07/03/2023

What is Statistics D M A I C What is Statistics D M A I C

Population
Population Population
Sample
(Parameter) (Parameter)
✓ The entire set of observations ✓ A subset of data taken from a
that are of interest in a population; denoted by the
Sample statistical investigation; denoted Sample small letter n (the sample size)
(Statistic) by the letter N (the population (Statistic)
✓ Statistics (or sample statistics)
size)
are terms used to describe the
✓ Parameters (or population key characteristics of a sample;
parameters) are terms used to usually denoted by a Latin
describe the key characteristics letter, i.e., x, s
of a population; usually denoted
✓ Sample Statistics are usually
by a small Greek letter, i.e., , 
measured in order to learn
something about Population
Parameters

Page 7 • Basic Statistics • Jan-15 Confidential Proprietary Page 8 • Basic Statistics • Jan-15 Confidential Proprietary

7 8

4
07/03/2023

Population Parameters vs. Sample Statistics D M A I C Advantages and Disadvantage of Sampling D M A I C

Random Samples
of Size n=3
Population, N
ADVANTAGES
x1 , s1 ✓ Faster and cheaper than 100% data collection
✓ Avoids handling damage during inspection
x2 , s2 ✓ Requires lesser manpower
Sample
Statistics
x3 , s3
DISADVANTAGE
✓ Data may not be as precise or
Population Parameters (, ) x4 , s4 exact as in 100% data collection

Consider a few different samples of 3 drawn from a population of interest to us. Would
we expect to get the same average and range for each sample? Why or why not?

Page 9 • Basic Statistics • Jan-15 Confidential Proprietary Page 10 • Basic Statistics • Jan-15 Confidential Proprietary

9 10

5
07/03/2023

Why Generate Statistics D M A I C Categories of Statistics D M A I C

✓ To help manage processes by answering: DescriptiveStatistics Inferential Statistics

❑ Those methods for ❑ Those methods whose results


summarizing data can be extrapolated beyond the
data to a more general setting
❑ Take the form of either visual
displays of the data or ❑ Used, for example, when one is
numerical summaries estimating an entire day’s
process variation by examining
❑ Methods: a small sample from the daily
output of a process
Visual Numerical
❑ Methods:
- Histograms - Means
- Pareto Charts - Medians - Hypothesis Testing
- Box Plots - Ranges - Analysis of Variance
- Scatter Plots - SDs - Experimental Design
- Variances

Page 11 • Basic Statistics • Jan-15 Confidential Proprietary Page 12 • Basic Statistics • Jan-15 Confidential Proprietary

11 12

6
07/03/2023

D M A I C What is a Measure D M A I C

▪ A quantified evaluation of a characteristic and/or level of


performance based on observable data
1.2 ▪ Examples include:
– Length of time (cycle rate, age)
– Size (length, height, weight)
– Monetary Value (costs, sales revenue, profits)
– Counts of characteristics or “attributes” (types of customer, types
of defects, gender)
Data – Counts of defects (number of errors, late checkouts, complaints)

Collection
Page 13 • Basic Statistics • Jan-15 Confidential Proprietary Page 14 • Basic Statistics • Jan-15 Confidential Proprietary

13 14

7
07/03/2023

Why Measure D M A I C Measurement Starts with a Point D M A I C

✓ To gain knowledge about the problem, process,


▪ A point in time
customer or organization
▪ A point (or step) in a process
✓ To establish the current performance level (baseline)
▪ A point (or element) in a complex product
✓ To determine priorities for action; and whether or not ▪ A point (or component) in a system
to take action (substantiate the magnitude of the
problem)
✓ To gain insight into potential causes of problems and
the basis for changes in the process
We are forced to gather data point-by-point, but our goal is
✓ To prevent problems and predict future performance
information about the process, product, or system both in the
✓ To hold performance gains and set the stage for present and over time. Don’t lose sight of the goal!
future improvements

Page 15 • Basic Statistics • Jan-15 Confidential Proprietary Page 16 • Basic Statistics • Jan-15 Confidential Proprietary

15 16

8
07/03/2023

Statistical Measurements D M A I C Types of Data D M A I C

▪ A key part of the Six Sigma culture change is learning


to measure both Variability and Central Tendency.
– The concept of Central Tendency is familiar to most people. It
is often computed as the “average” of a data set.
ualitative
– Explicitly measuring Variability is unfamiliar to most people,
but it is an important part of understanding the nature of a
process.
versus
▪ Measuring both Central Tendency and Variability are
necessary when fully describing a set of data.
▪ Central Tendency and Variability are the “right hand”
and “left hand” of Six Sigma measurements.
uantitative

Page 17 • Basic Statistics • Jan-15 Confidential Proprietary Page 18 • Basic Statistics • Jan-15 Confidential Proprietary

17 18

9
07/03/2023

Qualitative Data D M A I C Quantitative Data D M A I C

✓ Deals with descriptions ✓ Deals with numbers


✓ Data can be observed but not measured ✓ Data which can be measured
✓ Qualitative → Quality ✓ Quantitative → Quantity
✓ Examples: ✓ Examples:
Colors, textures, smells, tastes, appearance, beauty, Length, height, area, volume, weight, speed, time,
etc. temperature, humidity, sound levels, cost, members,
ages, etc.
.

Page 19 • Basic Statistics • Jan-15 Confidential Proprietary Page 20 • Basic Statistics • Jan-15 Confidential Proprietary

19 20

10
07/03/2023

Classification of Quantitative Data D M A I C Continuous Data D M A I C

▪ Also called measurement data;


Quantitative Data the measurement scale can be
meaningfully divided into finer and
finer increments of precision

Continuous Discrete ▪ Very rich in information, i.e.,


small amount of samples can
Only certain values are possible (there are gaps between possible values) provide large amounts of
information

1 2 3 4 5 6 7 8 ▪ Characterizes a product or
process feature in terms of a
parameter such as size, weight or
Theoretically, any value within an interval is possible with a fine enough measuring device
time

0 1000

Page 21 • Basic Statistics • Jan-15 Confidential Proprietary Page 22 • Basic Statistics • Jan-15 Confidential Proprietary

21 22

11
07/03/2023

Discrete Data D M A I C Continuous vs Discrete D M A I C

▪ Also called Count, Categorical or


Attribute data

▪ It is measured as the frequency of


occurrence.

▪ Very poor in information; limited to


the identification of categories, say
“good” or “bad” parts; but cannot be
further subdivided into more precise
increments meaningfully

What is the
balance telling
us?

Page 23 • Basic Statistics • Jan-15 Confidential Proprietary Page 24 • Basic Statistics • Jan-15 Confidential Proprietary

23 24

12
07/03/2023

Continuous vs Discrete D M A I C Scales of Measurement D M A I C

Scale Types Description Statistics Example/s

✓ Continuous data is generally preferable to discrete

Increasing Information
Data are classified into two or Mode Male/Female,
data since you can derive more information with less Nominal
more categories;
Unordered;
Chi-Square With/Without,
Pass/Fail
data. The values of the scale have no
'numeric' meaning
Discrete
Data are grouped according to Median 1st/2nd/3rd,
rank or order; Percentile Small/Medium/L

✓ If continuous data is not available, discrete data Ordinal Scale assignment is by the
property of "greater than," "equal
arge

can be analyzed, results found, and decisions made to," or "less than."

Data where ordering and Mean, The difference


but will require more samples. arithmetic differences of the Std. Deviation, between 8 and
Interval observations have meaning Correlation, Regression, 9 is the same as
Who Cares? ANOVA the difference
between 76 and
✓ The type of data affects the choice Continuous
Data where equality of ratio or All stats for Interval
77
The ratio of 2 to
of data display and analysis that can proportion has meaning plus: Geometric Mean, 1 is the same as
Ratio Harmonic Mean, the ratio of 8 to
be made. Coefficient of Variation, 4
Logarithms

Page 25 • Basic Statistics • Jan-15 Confidential Proprietary Page 26 • Basic Statistics • Jan-15 Confidential Proprietary

25 26

13
07/03/2023

Characteristics of Data Collection D M A I C Characteristics of Data Collection D M A I C

Extra caution must 1. Data integrity or validity must Because all sampling 1. Get samples at random
be taken when studies always have
collecting data. The be high (100% as much as errors, uncertainties
- this avoids bias ensuring
enumerated items at possible) and risks associated
right are some of the
most important
with them, an
experimenter needs to
that each sample has an
considerations… do two things to ensure equal chance of being
2. Data traceability must be that the result of the
selected
present study would be close to
the “true” population
characteristic:

3. The right type of data needs to


2. Get the right size of the sample
be collected
- this ensures that the
4. The system must be on line population is sufficiently
and on time, where appropriate represented

Page 27 • Basic Statistics • Jan-15 Confidential Proprietary Page 28 • Basic Statistics • Jan-15 Confidential Proprietary

27 28

14
07/03/2023

Phases of Statistical Application D M A I C D M A I C

1 Collection of Data Any sampling study


starts with data
1.3
- population or sample collection. As soon as
the data are available,
these need to be
organized into some

2 Organization of Data form to make easier


the analysis and
- tables, charts, graphs, etc. interpretation of data.

3 Analysis of Data
- involves concise numerical measures
The
like central tendency, spread and shape
Normal
4 Interpretation of Data
- conclusions are based on the
charts, graphs
Distribution
Page 29 • Basic Statistics • Jan-15 Confidential Proprietary Page 30 • Basic Statistics • Jan-15 Confidential Proprietary

29 30

15
07/03/2023

What is a Normal Distribution D M A I C Properties of the Normal Distribution D M A I C

▪ It is the most common continuous distribution in the entire field of statistics; it’s
graph called the normal curve is a bell-shaped curve which approximately 1. Bell-shaped and extends indefinitely in both directions
describes many phenomena that occur in nature, industry, and research, i.e.,
weight, IQ, thickness, etc.
2. Asymptotic: the curve comes closer and closer to the
▪ In process control, it is used as a model of a good or stable process behavior horizontal axis without even reaching it
▪ It is completely described by two parameters: mean and standard deviation
3. Symmetric with respect to the mean
99.73 % of product

95.45 % of product Useful probabilities 4. Unimodal (only have one mode or “hump”)
68.27 %
5. Skewness,  = 0; Kurtosis,  = 3.0

6. Dependent on mean,  and standard deviation, 


34.13 % 34.13 %
7. Mean = Median = Mode
2.14 % 2.14 %
0.13 %
13.60 % 13.60 %
0.13 %
8. Area under the curve =1

- 3 - 2 - 1  + 1 + 2 + 3

Page 31 • Basic Statistics • Jan-15 Confidential Proprietary Page 32 • Basic Statistics • Jan-15 Confidential Proprietary

31 32

16
07/03/2023

Workshop D M A I C D M A I C

Part 2.0
The Normal Distribution
Descriptive
(Instruction to be given by the facilitator)
Statistics

Page 33 • Basic Statistics • Jan-15 Confidential Proprietary Page 34 • Basic Statistics • Jan-15 Confidential Proprietary

33 34

17
07/03/2023

Summary Statistics D M A I C Important Attributes of Population D M A I C

• Data can be summarized both numerically and graphically using Summary


Statistics and graphs or plots.

• Attribute data can usually be summarized by counts, proportions or time


graphs of these two statistics.
 Location →
• Variables data can be summarized by: S p r e a d
– A measure of the center or location of the data


A measure of the spread of the data
Various plots or graphs of the data S H APE
• Summary statistics are numbers based on samples from a population.
They are point estimates (single numbers) of characteristics of the
distribution of population values.

Page 35 • Basic Statistics • Jan-15 Confidential Proprietary Page 36 • Basic Statistics • Jan-15 Confidential Proprietary

35 36

18
07/03/2023

D M A I C Central Tendency D M A I C

▪ A property that data tends to


group around a “center” point

2.1 ▪ This “center” may be the:


→ Mean: mathematical
average
→ Median: the data point in

Measures of the center of the data set,


or

Count
→ Mode: the most frequently
Central Tendency occurring data value

(Location)
Measure

Page 37 • Basic Statistics • Jan-15 Confidential Proprietary Page 38 • Basic Statistics • Jan-15 Confidential Proprietary

37 38

19
07/03/2023

Mean D M A I C Median D M A I C

The arithmetic center or the The physical center of rank-ordered data


average of all data
= x x=
x
Obtained in two different ways, i.e., for odd and even set of
N n data
The most common measure
of central tendency Population Mean Sample Mean
Can easily be determined by first arranging the data in order
(ascending or descending)

Example:
Example:
Given: (Even) xi : 62, 73, 78, 78, 78, 86, 86, 89, 90, 95
Given: xi : 62, 73, 78, 78, 78, 86, 86, 89, 90, 95 (Odd) xi : 62, 73, 78, 78, 78, 86, 86, 89, 90

Median (even) = (78+86)/2 = 82


Mean = 62 + 73 + 78 + … + 95 = 81.5
10 Median (odd) = 78

Page 39 • Basic Statistics • Jan-15 Confidential Proprietary Page 40 • Basic Statistics • Jan-15 Confidential Proprietary

39 40

20
07/03/2023

Mode D M A I C Discussion D M A I C

The number that occurred most often What is the simplest


Can easily be determined by arranging the data in order measure of
“location”? The
most reliable?
Explain why.
Example:

Given: xi : 62, 73, 78, 78, 78, 86, 86, 89, 90, 95

Mode = 78

Page 41 • Basic Statistics • Jan-15 Confidential Proprietary Page 42 • Basic Statistics • Jan-15 Confidential Proprietary

41 42

21
07/03/2023

D M A I C Variability D M A I C

▪ Variability recognizes that


processes do not produce
identical results every time
2.2 ▪ Variability may be caused by
identifiable forces acting on the
process or by minute
fluctuations in the process itself

Measures of ▪ The common measures are:


→ Range

Count
→ Variance
Variability → Standard Deviation
▪ Standard deviation is the most

(Spread) useful because it can be in the


same unit of measure as the
mean Measure

Page 43 • Basic Statistics • Jan-15 Confidential Proprietary Page 44 • Basic Statistics • Jan-15 Confidential Proprietary

43 44

22
07/03/2023

Workshop D M A I C Range D M A I C

the difference between the


largest (max) and the
smallest (min) measurements Range = xmax - xmin

Beads Experiment
(Instruction to be given by the facilitator) Example:

Given: xi : 62, 73, 78, 78, 78, 86, 86, 89, 90, 95

Range = 95 – 62 = 33

Page 45 • Basic Statistics • Jan-15 Confidential Proprietary Page 46 • Basic Statistics • Jan-15 Confidential Proprietary

45 46

23
07/03/2023

Variance D M A I C Standard Deviation D M A I C

The average distance from


(x − x)
Defines how the numbers in
(x − x)
2
the center a data set vary from the
2
Population
variance 2 = mean; the average distance
Population
=
The sum of the squared N SD N
from the center
deviations of
measurements from their
the
Sample
s2 =
 (x − x)2 (x − x) 2
Variance
n −1
The positive square root of Sample
s=
mean divided by n or n-1 variance SD n−1

Example: Example:

Given: xi : 62, 73, 78, 78, 78, 86, 86, 89, 90, 95 Given: xi : 62, 73, 78, 78, 78, 86, 86, 89, 90, 95

(x − x) 2
(62 − 81.5)2 + (73 − 81.5)2 + ... + (95 − 81.5)2 (x − x) 2
(62 − 81.5)2 + (73 − 81.5)2 + ... + (95 − 81.5)2
s =
2
= = 93.39 s= = = 9.66
n −1 10 − 1 n −1 10 − 1

Page 47 • Basic Statistics • Jan-15 Confidential Proprietary Page 48 • Basic Statistics • Jan-15 Confidential Proprietary

47 48

24
07/03/2023

Discussion D M A I C D M A I C

What is the simplest


measure of
“spread”? The most
reliable? Explain
why. 2.3

Measures of
Shape
Page 49 • Basic Statistics • Jan-15 Confidential Proprietary Page 50 • Basic Statistics • Jan-15 Confidential Proprietary

49 50

25
07/03/2023

Skewness D M A I C Skewness D M A I C

Normal Distribution, Skewed to the Right Skewed to the Left


A measure of asymmetry Symmetrical (Positively Skewed) (Negatively Skewed)
−x
 x
n 3
n
Skewness = i
 S=0 S=+ S=-
Zero indicates perfect (n −1)(n − 2) i=1  s 
symmetry; the normal
distribution has a
skewness of zero
Normal Distribution, S=0
Positive skewness Symmetrical
indicates that the tail of
the distribution is more
stretched on the side Skewed to the Right S=+
above the mean; (Positively Skewed) M M M M M M
Negative skewness is Mean=Median=Mode
o e e e e o
more stretched on the d. d a a d d
e. i n n i e
side below the mean Skewed to the Left S=- a a
(Negatively Skewed) n n

Page 51 • Basic Statistics • Jan-15 Confidential Proprietary Page 52 • Basic Statistics • Jan-15 Confidential Proprietary

51 52

26
07/03/2023

Kurtosis D M A I C Exercise D M A I C

A measure of Customer A is noticing in their end some inconsistencies in the soldering


Normal Distribution K=3
flatness of the performance of Device Z. To verify if this observation is something to worry about,
distribution the most senior PE in OSPI performed a sampling study on “plating thickness”.
Because of time constraint, only 30 well-prepared samples were obtained with
Heavier tailed readings as follows:
Mesokurtic
distributions have (Peaked Distribution) K>3
611 372 773 329 528 439 544 710 486 377
larger kurtosis
560 314 423 791 339 605 356 421 568 563
measures; the 555 541 569 475 618 703 323 457 325 790
normal distribution Platykurtic
(Flat Distribution) K<3
has a kurtosis of 3 Q1: Characterize the performance of Plating by calculating the different
measures of location and spread. Assuming that the process is stable,
draw the process distribution with respect to the specs limits of 300 ~ 800
micro inches.

 n(n +1) xi − x   3(n −1)2 Q2: Is there really a problem in Plating? Should Customer A be nervous about

n 4

Kurtosis =   − ON’s process capability? If there is a problem, what will be the focus of
(n −1)(n − 2)(n − 3) i=1 s   (n − 2)(n − 3) improvement (process location or spread)?

Page 53 • Basic Statistics • Jan-15 Confidential Proprietary Page 54 • Basic Statistics • Jan-15 Confidential Proprietary

53 54

27
07/03/2023

Summary D M A I C Summary D M A I C

1. Population refers to the entire set of observations that are of interest in 9. Vital information are extracted from the samples by measuring:
a statistical investigation; denoted by the letter N (the population size) a) Central Tendency (Location): Mean, Median, Mode
2. Sample refers to a subset of data taken from a population; denoted by b) Variability (Spread): Range, Variance, Standard Deviation
the small letter n (the sample size) c) Shape: Skewness, Kurtosis
3. Sample Statistics are measured in order to learn something about
Population Parameters
4. Advantages of sampling studies: (a) faster and cheaper than 100%
data collection, (b) avoids handling damage during inspection, (c)
requires lesser manpower
5. Disadvantage of sampling studies: risks are involved, data may not be
as precise or exact as in 100% data collection
6. Data collection to draw sample statistics must be carried out carefully
considering the right sample size and employing randomization
7. There are two general types of data for collection:
a) Qualitative
b) Quantitative: Discrete (attribute) and Continuous (Variables)
8. The Normal Distribution is the most common continuous distribution in
the entire field of statistics

Page 55 • Basic Statistics • Jan-15 Confidential Proprietary Page 56 • Basic Statistics • Jan-15 Confidential Proprietary

55 56

28
07/03/2023

D M A I C

References
1. The Six Sigm a Handbook
By Thomas Pyzdek

2. Modern Statistical Quality Control and Im provement


By Nicholas R. Farnum

3. Statistical Process Control and Quality Im provement, 3rd Edition


By Gerald M. Smith

4. Fundam entals of Statistical Quality Control


By Jerome D. Braverman

5. Statistical Quality Control for Manufacturing Managers


By William Messina

6. Wikipedia

7. Lean Six Sigm a Program


By George Group

Page 57 • Basic Statistics • Jan-15 Confidential Proprietary

57

29

You might also like