You are on page 1of 71

Statistical Process Control

CDB3013

Chemical Engineering Program


Universiti Teknologi PETRONAS
May 2018

05/26/2021 Statistical Process Control 1


Chapter 2
Review of Statistical Methods

05/26/2021 Statistical Process Control 2


2.1 Describing Variation
2.1.1 The Stem-and-Leaf Plot
2.1.2 The Histogram
2.1.3 Numerical Summary of Data
2.1.4 The Box Plot
2.2 Important Probability Distributions
2.2.1 The Hypergeometric Distribution
2.2.2 The Binomial Distribution
2.2.3 Poisson Distribution
2.2.3 The Normal Distribution

05/26/2021 Statistical Process Control 3


At the end of this chapter, the student should be able to do the
following:
1. Construct and interpret visual data displays, including the
stem-and-leaf plot, the histogram, and the box plot
2. Compute and interpret the sample mean, the sample variance,
the sample standard deviation, and the sample range
3. Explain the concepts of a random variable and a probability
distribution
4. Understand and interpret the mean, variance, and standard
deviation of a probability ddistribution
5. Determine probabilities from probability distributions
6. Select an appropriate probability distribution for use in specific
applications

05/26/2021 Statistical Process Control 4


The Stem-and-Leaf Plot
2.1.1 The Stem-and-Leaf Plot
• The stem-and-leaf plot gives a visual impression of shape,
spread (variability), and the central tendency (middle) of the
data.

• To construct a stem-and-leaf plot, we divide each number into


two parts:
 a stem, consisting of one or more of the leading digits
 a leaf, consisting of the remaining digits.

• It is generally recommended to choose between 5 and 20


stems.

05/26/2021 Statistical Process Control 5


The Stem-and-Leaf Plot: How to construct
Example
The record of the operating temperature of a batch isomerization
reactor is given below. Construct the stem-and-leaf plot of the data.
76 74 82 96 66 76 78 72 52 68
86 84 62 76 78 92 82 74 88 95

Solution

• Stems: since the numbers is between 52-96: the stem is set to be


: 5, 6, 7, 8 and 9
• Leaf: remaining digit
• Key: 5 | 2 means 52

05/26/2021 Statistical Process Control 6


The Stem-and-Leaf Plot: How to construct
• Construct the stem and leaf columns Stem Leave

• List down the stems in the “stem” column

Stem Leave
5
6
7
8

05/26/2021 Statistical Process Control 7


• List the leaf of the observed data values in the order in which
they are encountered in the data set.

76 74 82 96 66 76 78 72 52 68
86 84 62 76 78 92 82 74 88 95

Temperature Record
Write down the title and the
Stem Leave key.
5 2
6 6 8 2
7 6 4 6 8 2 6 8 4
8 2 6 4 2 8
9 6 2 5

Key : 5 | 2 means 52 oC
05/26/2021 Statistical Process Control 8
• Arrange the leaves by magnitude to get ordered-stem-and-leaf
plot
Ordered-stem-and-leaf plot
Temperature Record Deg. C
Stem Leave
1 5 2
4 6 2 6 8
(8) 7 2 4 4 6 6 6 8 8
13 8 2 2 4 6 8
3 9 2 5 6
• Split-stem-and-leaf plot is obtained if each stem is split into a
lower and an upper half, .i.e., leaves with 0 to 4 lower half and <
5 to 9 upper half

05/26/2021 Statistical Process Control 9


• The column to the left of the stems gives a cumulative count of
the number of observations that are at or below that stem for the
smaller stems, and at or above that stem for the larger stems.

05/26/2021 Statistical Process Control 10


05/26/2021 Statistical Process Control 11
Reading Stem-and-Leaf Plots
• Find the least value, greatest value, mean, median, mode, and
range of the data.
Temperature Records
Stem Leave
The least stem and least leaf give
5 4 0 0 1 5 7
the least value, 40.
9 5 1 1 2 4
(6) 6 3 3 3 5 9 9
The greatest stem and greatest
8 7 0 4 4
leaf give the greatest value, 94.
5 8 3 6 7
2 9 1 4
Use the data values to find the
mean (40 + … + 94) ÷ 23 = 64.09 Key: 4 0 means 40

05/26/2021 Statistical Process Control 12


• The median is the middle value in the table, 63.

• To find the mode, look for the number that occurs most
often in a row of leaves. Then identify its stem. The
mode is 63.

• The range is the difference between the greatest and


the least value. 94 – 40 = 54.

05/26/2021 Statistical Process Control 13


Reading Stem-and-Leaf Plots
• Finding percentiles of the data.
 100 kth percentile is a value such that at least 100 k% of
the data values are at or below this value
 and at least 100 (1 − k)% of the data values are at or
above this value.

• The fiftieth percentile of the data distribution is called the


sample median .

• The median can be thought of as the data value that


exactly divides the sample in half, with half of the
observations smaller than the median and half of them
larger.

05/26/2021 Statistical Process Control 14


Percentiles
• The tenth percentile is the observation with rank (0.1)(23) + 0.5 = 2.8
(between the second and the third observations), or (40 + 41)/2 = 40.5.

• The first quartile ( lower quartile or Q1)is the observation with rank
(0.25)(23) + 0.5 = 6.25 ( between the 6th and 7th observation) or (51 +
51)/2 =51,

• and the third quartile (upper quartile or Q3) is the observation with rank
(0.75)(23) + 0.5 = 17.75 ( between the 17th and 18th observation), or (74 +
74) =74.

• The first and third quartiles are occasionally denoted by the symbols Q1
and Q3, respectively, and the interquartile range IQR = Q3 − Q1 is
occasionally used as a measure of variability.

• For example 2 the interquartile range is IQR = Q3 − Q1 = 74 − 51 = 23.


05/26/2021 Statistical Process Control 15
The Histogram

05/26/2021 Statistical Process Control 16


The Histogram
• A histogram is a more compact summary of data than a
stem-and-leaf plot.
• The histogram as a technique best suited for larger data sets
containing, say, 75 to 100 or more observations.

• To construct a histogram for continuous data, we must divide


the range of the data into intervals, which are usually called
class intervals, cells, or bins.

• If possible, the bins should be of equal width to enhance the


visual information in the histogram.

05/26/2021 Statistical Process Control 17


The Histogram
• The
  number of bins depends on the number of observations and
the amount of scatter or dispersion in the data.

• 5 to 20 bins is satisfactory in most cases and that the number of


bins should increase with n.
Guidelines
• Scott’s (1979) rule :

• Sturgess rule

• h= bin width, s= sample standard deviation


• n=number of observations,

05/26/2021 Statistical Process Control 18


The Histogram
EXAMPLE
Table 2.2 presents the thickness of a metal layer on 100 silicon
wafers resulting from a chemical vapor deposition (CVD) process in a
semiconductor plant. Construct a histogram for these data.

Layer Thickness on Semiconductor Wafers


438 450 487 451 452 441 444 461 432 471
413 450 430 437 465 444 471 453 431 458
444 450 446 444 466 458 471 452 455 445
468 459 450 453 473 454 458 438 447 463
445 466 456 434 471 437 459 445 454 423
472 470 433 454 464 443 449 435 435 451
474 457 455 448 478 465 462 454 425 440
454 441 459 435 446 435 460 428 449 442
455 450 423 432 459 444 445 454 449 441
449 445 455 441 464 457 437 434 452 439

05/26/2021 Statistical Process Control 19


The Histogram
•Solution
 

• In manual, plotting you can first order the data in increasing


order to make it easier to count/ find the frequency in each
bin.
• Take the mid point of each bin to represent each bin.

05/26/2021 Statistical Process Control 20


The Histogram

metal
No. bin width thickness (x) frequency p
1 [410 , 420) 415 1 0.01
2 [420, 430) 425 4 0.04
3 [430, 440) 435 17 0.17
4 [440, 450) 445 25 0.25
5 [450, 460) 455 32 0.32
6 [460, 470) 465 11 0.11
7 [470 , 480) 475 9 0.09
8 [480, 490) 485 1 0.01

05/26/2021 Statistical Process Control 21


The Histogram

05/26/2021 Statistical Process Control 22


The Histogram
• Most computer packages have a default setting for the number
of bins.

• Histograms can be relatively sensitive to the choice of the


number and width of the bins.

• For small data sets, histograms may change dramatically in


appearance if the number and/or width of the bins changes.

• For this reason, the histogram is considered as a technique best


suited for larger data sets containing, say, 75 to 100 or more
observations.

05/26/2021 Statistical Process Control 23


The Histogram
• Cumulative frequencies are often very useful in data
interpretation.
• For example, we can read directly from the Cumulative
frequency plot that about 75 of the 100 wafers have a
metal layer thickness that is less than 460Å.
120
100
80
Frequency

60
40
20
0
405 415 425 435 445 455 465 475 485 495
Metal Thickness (x)

05/26/2021 Statistical Process Control 24


2.1.3 Numerical Summary of Data
• Sample
  Average: - The most important measure of central
tendency of the sample. Suppose that x1, x2, . . . , xn are the
observations in a sample, the sample average is defined as

• For the metal thickness example

05/26/2021 Statistical Process Control 25


2.1.3 Numerical Summary of Data
• The
  variability in the sample data is measured by the sample
variance:

• If there is no variability in the sample, then each sample


observation is equal and the sample variance s2 = 0.

05/26/2021 Statistical Process Control 26


2.1.3 Numerical Summary of Data
• The units of the sample variance s2 are the square of the
original units of the data.

• This is often inconvenient and awkward to interpret, and so


the square root of the sample variance , called the sample
standard deviation s, is used as a measure of variability.

05/26/2021 Statistical Process Control 27


2.1.4 The Box and Whisker Plot
2.1.4 The Box and Whisker Plot
• The box and whisker plot displays
 central tendency
 spread (variability)
 departure from symmetry
 outliers

• A box plot displays the three quartiles (Q1, Q2 and Q3) , the
minimum, the maximum and of the data on a rectangular box,
aligned either horizontally or vertically.

05/26/2021 Statistical Process Control 28


Algorithm for Box and Whisker Plot
•• If
  the number of data is even the first quartile (Q1) and the third
Quartile (Q3) are the middle values of the two halves of the data.
Otherwise they are the medians including the middle in both
halves.

• Step 1: draw vertical line at Q1, Q2 and Q3

• Step 2: Connect the edges of the vertical lines with horizontal


lines to form two adjacent boxes.

• Step 3: Draw the “Whisker” a horizontal line from the middle of Q1


line to the smallest observation greater than and the largest
observation less than

• Step 4: Any point outside the end of the whisker is considered


outlier and labeled with “”.

05/26/2021 Statistical Process Control 29


2.1.4 The Box and Whisker Plot
•  Example:- Draw the box and whisker plot for the data
{15 27 34 35 37.5 39 42.5 45 46 58 59 }
Solution
The data is already ranked

n = 11
Median, Q2= 39 10 20 30 40 50 60
Q1 = (34+35)/2=34.5
Q3= (45+46)/2=45.5
min = 15
max= 59
IQR=(45.5-34.5)=11
Outlier Range : or

05/26/2021 Statistical Process Control 30


2.1.4 The Box and Whisker Plot
Example:
Find Q1 , Q2 , and Q3 for the following data set, and draw a box-
and-whisker plot.
{2,6,7,8,8,11,12,13,14,15,22,27}
Solution
• The data is already ranked
• n = 12
• Median, Q2= (11+12)/2=11.5
• k=3.5, between 3rd and 4th observation
Q1 = (7+8)/2=7.5
• k=9.5, between 9th and 10th observation
Q3=(14+15)/2=14.5 .

05/26/2021 Statistical Process Control 31


2.1.4 The Box and Whisker Plot
• min
  =2
max= 23
IQR=(14.5-7.5)=7
Outlier Range : or

0 5 10 15 20 25

05/26/2021 Statistical Process Control 32


2.1.4 The Box and Whisker Plot
• Box plots are very useful in graphical comparisons among
data sets, because they have visual impact and are easy to
understand.
• Example : - In the following box plot it is observed that there
is too much variability at plant 2, and that plants 2 and 3
need to raise their quality index performance.

05/26/2021 Statistical Process Control 33


2.1.5 Probability Distributions
• The histogram (or stem-and-leaf plot, or box plot) is used to
describe sample data.

• A sample is a collection of measurements selected from


some larger source or population.

• By using statistical methods, it is possible to analyze the


sample data and draw certain conclusions about the process
from which the data is obtained.

05/26/2021 Statistical Process Control 34


2.1.5 Probability Distributions
• A probability distribution is a mathematical model that
relates the value of the variable with the probability of
occurrence of that value in the population.

• There are two types of probability distributions:


1. Continuous distributions. When the variable being measured
is expressed on a continuous scale. Example length, mass,
density, etc.
2. Discrete distributions. When the parameter being measured
takes only discrete values. Example, the distribution of the
number of nonconformities in a product

05/26/2021 Statistical Process Control 35


Discrete and Continuous Probability Distributions

p(x) p(x3) f(x)

p(x2)
p(x4)

p(x1)
p(x5)

x1 x2 x3 x4 x5 a b x

Discrete probability Distribution Continuous probability Distribution

05/26/2021 Statistical Process Control 36


2.1.5 Probability Distributions
• We
  write the probability that a random variable x takes on a
specific value xi as

(2.3)

• The appearance of a continuous distribution is that of a


smooth curve, with the area under the curve equal to
probability, so that the probability that x lies in the interval
from a to b is written as

(2.4)

05/26/2021 Statistical Process Control 37


2.1.5 Probability Distributions
• The
  height of each spike in discrete distribution plots is
proportional to the probability.

• The mean m of a probability distribution is a measure of the


central tendency in the distribution, or its location. The mean
is defined as

(2.5a)

(2.5b)

05/26/2021 Statistical Process Control 38


2.1.5 Probability Distributions
• For
  the case of a discrete random variable with exactly N
equally likely values [that is, p(xi) = 1/N], then equation 2.5b
reduces to

• The mean is simply the center of mass of the probability


distribution.

05/26/2021 Statistical Process Control 39


2.1.5 Probability Distributions

05/26/2021 Statistical Process Control 40


2.1.5 Probability Distributions
•  The fiftieth percentile of the distribution is the median

• the most likely value of the variable is called the mode

• The scatter, spread, or variability in a distribution is expressed


by the variance which is defined as


(2.6a)
(2.6b)

05/26/2021 Statistical Process Control 41


2.1.5 Probability Distributions
• when
  the random variable is discrete with N equally likely
values, then equation 2.6b becomes

• and we observe that in this case the variance is the average


squared distance of each member of the population from the
mean.
• If there is no variability in the population then s2=0 . As the
variability increases, the variance increases.
• The variance is expressed in the square of the units of the
original variable.

05/26/2021 Statistical Process Control 42


2.1.5 Probability Distributions
• The
  square root of the variance is called the standard
deviation.

• The standard deviation is a measure of spread or scatter in the


population expressed in the original units.

05/26/2021 Statistical Process Control 43


2.1.5 Probability Distributions

Figure 2.4. Normal Distribution with the same means and different standard
deviations

05/26/2021 Statistical Process Control 44


2.1.5 Probability Distributions

Figure 2.4. Normal Distribution with the different means and same standard deviations

05/26/2021 Statistical Process Control 45


2.2 Important Discrete Distributions
• Several discrete probability distributions arise frequently in
statistical quality control. In this section, we discuss the
 hypergeometric distribution
 the binomial distribution
 the Poisson

05/26/2021 Statistical Process Control 46


2.2.1 The Hypergeometric Distribution
•Suppose
  that there is a finite population consisting of N items.
Some number-say, D(D ≤ N) - of these items fall into a class of
interest. A random sample of n items is selected from the
population without replacement, and the number of items in the
sample that fall into the class of interest-say, x-is observed. Then
x is a hypergeometric random variable with the probability
distribution defined as:
(2.8)
The mean and variance of the distribution are
(2.9)
and
(2.10)
05/26/2021 Statistical Process Control 47
2.2.1 The Hypergeometric Distribution
•  In the above definition, the quantity

is the number of combinations of a items taken b at a


time, it is called the Binomial Coefficient.

• In these applications, x usually represents the number of


nonconforming items found in the sample.

05/26/2021 Statistical Process Control 48


2.2.1 The Hypergeometric Distribution
•  Example:- Suppose that a lot contains 100 items, 5 of
which do not conform to requirements. If 10 items are
selected at random without replacement, what is the
probability of finding one or fewer nonconforming items
in the sample?

05/26/2021 Statistical Process Control 49


2.2.2 The Binomial Distribution
• Consider
  a process that consists of a sequence of n
independent trials. When the outcome of each trial is either a
“success” or a “failure,” if the probability of “success” on any
trial-say, p-is constant, then the number of “successes” x in n
Bernoulli trials has the binomial distribution with
parameters n and p, defined as follows:

The binomial distribution with parameters is


(2.11)
The mean and variance of the binomial distribution are
(2.12)

(2.13)

05/26/2021 Statistical Process Control 50


2.2.2 The Binomial Distribution
• The binomial distribution is the appropriate probability
model for sampling from an infinitely large population,
where p represents the fraction of defective or
nonconforming items in the population.

• In these applications, x usually represents the number of


nonconforming items found in a random sample of n
items.

05/26/2021 Statistical Process Control 51


2.2.2 The Binomial Distribution
•Example:-
  A random sample of 15 items is taken from a binomial
distribution that has 10% probability of getting nonconforming
items. Determine the probability of obtaining 3 or less
nonconforming items in the sample.

05/26/2021 Statistical Process Control 52


2.2.2 The Binomial Distribution
• The
  shape of those examples is typical of all binomial
distributions.

• For a fixed n, the distribution becomes more symmetric as p


increases from 0 to 0.5 or decreases from 1 to 0.5. For a fixed p,
the distribution becomes more symmetric as n increases.

• A random variable that arises frequently in statistical quality


control is
(2.14)
where x has a binomial distribution with parameters n and p.
Often is the ratio of the observed number of defective or
nonconforming items in a sample (x) to the sample size (n),

05/26/2021 Statistical Process Control 53


2.2.2 The Binomial Distribution
• Several binomial distributions are shown graphically in
Figure 2.14.

05/26/2021 Statistical Process Control 54


• is
  usually called the sample fraction defective or sample
fraction nonconforming.

• The “ ˆ ” symbol is used to indicate that is an estimate of the


true, unknown value of the binomial parameter p.

• The probability distribution of is obtained from the binomial

05/26/2021 Statistical Process Control 55


•  
• where [na] denotes the largest integer less than or equal to na.
• The mean of is p and that the variance of is

05/26/2021 Statistical Process Control 56


2.2.3 The Poisson Distribution
• A useful discrete distribution in statistical quality control is the
Poisson distribution, defined as follows:

05/26/2021 Statistical Process Control 57


2.2.3 The Poisson Distribution
•  The Poisson distribution is

where the parameter l>0. The mean and variance of the


Poisson distribution are

and

05/26/2021 Statistical Process Control 58


2.2.3 The Poisson Distribution
• Note that the mean and variance of the Poisson distribution are both
equal to the parameter l.

• A typical application of the Poisson distribution in quality control is


as a model of the number of defects or nonconformities that occur in
a unit of product.

• Any random phenomenon that occurs on a per unit (or per unit area,
per unit volume, per unit time, etc.) basis is often well approximated
by the Poisson distribution.

05/26/2021 Statistical Process Control 59


2.2.3 The Poisson Distribution
•Example
  The number of wire-bonding defects per unit that occur in a
semiconductor device is Poisson distributed with parameter
l = 4. Determine the probability that a randomly selected
semiconductor device will contain two or fewer wire-bonding.

05/26/2021 Statistical Process Control 60


2.2.3 The Poisson Distribution
• Several Poisson distributions are shown in Figure 2.15. Note
that the distribution is skewed; that is, it has a long tail to the
right.

• As the parameter becomes larger, the Poisson distribution


becomes symmetric in appearance.

• the Poisson distribution can be considered as a limiting form


of the binomial distribution.

• That is, in a binomial distribution with parameters n and p, if


we let n approach infinity and p approach zero in such a way
that np = l is a constant, then the Poisson distribution results.

05/26/2021 Statistical Process Control 61


2.2.3 The Poisson Distribution
Poisson distributions

05/26/2021 Statistical Process Control 62


2.2.3 The Normal Distribution
• The
  normal distribution is probably the most important
distribution in both the theory and application of statistics.
• If x is a normal random variable, then the probability distribution
of x is defined as follows:

The normal distribution is

The mean of the normal distribution is and the variance is .

05/26/2021 Statistical Process Control 63


2.2.3 The Normal Distribution
• The
  normal distribution is used so much that we frequently
employ a special notation, , to imply that x is normally distributed
with mean m and variance s 2.

• The visual appearance of the normal distribution is a symmetric,


unimodal or bell-shaped curve.

05/26/2021 Statistical Process Control 64


2.2.3 The Normal Distribution
• Note that 68.26% of the population values fall between the limits
defined by the mean plus and minus one standard deviation
• 95.46% of the values fall between the limits defined by the mean
plus and minus two standard deviations (m ± 2s)

05/26/2021 Statistical Process Control 65


2.2.3 The Normal Distribution
• 99.73% of the population values fall within the limits defined by
the mean plus and minus three standard deviations.
• Thus, the standard deviation measures the distance on the
horizontal scale associated with the 68.26%, 95.46%, and
99.73% containment limits.
• It is common practice to round these percentages to 68%, 95%,
and 99.7%.

05/26/2021 Statistical Process Control 66


2.2.3 The Normal Distribution
• The
  cumulative normal distribution is defined as the probability that
the normal random variable x is less than or equal to some value a, or

• This integral cannot be evaluated in closed form. However, by using


the change of variable and the evaluation can be made independent of
and .

05/26/2021 Statistical Process Control 67


2.2.3 The Normal Distribution
• Where
  is the cumulative distribution function of the
standard normal distribution (mean = 0, standard deviation
= 1).
• The transformation is usually called standardization, because it
converts a random variable into an N(0, 1) random variable.

05/26/2021 Statistical Process Control 68


2.2.3 The Normal Distribution
Example
The time to resolve customer complaints is a critical quality
characteristic for many organizations. Suppose that this time in
a financial organization, say, x-is normally distributed with mean
m = 40 and standard deviation s = 2 denoted x N(40, 22).
What is the probability that a customer complaint will be
resolved in less than 35 hours?

05/26/2021 Statistical Process Control 69


2.2.3 The Normal Distribution
•Solution
 
• The desired probability is

To use standard normal tables we standardize the point 35 and find

Consequently, the desired probability is

05/26/2021 Statistical Process Control 70


2.2.3 The Normal Distribution

The fraction of customer complaint resolved


in less than or equal to 35 hours.

05/26/2021 Statistical Process Control 71

You might also like