You are on page 1of 138

Resolving the Mysteries of Six Sigma:

Statistical Constructs and Engineering Rationale

April 21, 2003

Mikel J. Harry, Ph.D.


Founder and Chairman of the Board

Six Sigma Management Institute


Scottsdale, Arizona

Copyright 2003, Mikel Harry, Ph.D.


Resolving the Mysteries of Six Sigma:
Statistical Constructs and Engineering Rationale
by Mikel Harry, Ph.D.

Copyright © 2003 by Mikel Harry, Ph.D.

Six Sigma is a registered trademark of Motorola, Inc.

All Rights Reserved. No part of this book may be used or reproduced in any manner whatsoever
without written permission from the publisher except in the case of brief quotations embodied in
critical articles and reviews.

Publisher:
Palladyne Publishing

Distributor:
Tri Star Visual Communications
3110 North 35th Avenue, Suite 4
Phoenix, Arizona 85017
(602) 269-2900
sixsigma@tristarvisual.com

Design, layout and printing:


Tri Star Visual Communications, Phoenix, Arizona
www.tristarvisual.com

ISBN 0-9715235-1-7
Table of contents

Forward i

1.0 Introducing the Context 1


1.1 Unfolding the history
1.2 Adopting the principles

2.0 Extending the Context 7


2.1 Defining the root
2.2 Expanding the function
2.3 Describing the interaction
2.4 Rationalizing the sample
2.5 Detecting the error
2.6 Classifying the error
2.7 Declaring the opportunity
2.8 Qualifying the interface

3.0 Interrogating the Context 23


3.1 Articulating the goal
3.2 Polishing the definition
3.3 Inflating the error
3.4 Calibrating the shift
3.5 Rationalizing the shift
3.6 Applying the shift
3.7 Framing the correction
3.8 Establishing the center

4.0 Understanding the shift 42


4.1 Identifying the expectations
4.2 Conducting the analysis
4.3 Considering the implications
4.4 Constructing the worst-case
4.5 Exploring the consequences
4.6 Visualizing the distributions

5.0 Examining the shift 58


5.1 Establishing the equality
5.2 Developing the correction
5.3 Advancing the concepts
5.4 Analyzing the system

6.0 Validating the shift 63


6.1 Conducting the simulation
6.2 Generalizing the results
6.3 Pondering the issues

© Mikel J. Harry, Ph.D. 2002


7.0 Contracting the error 68
7.1 Conducting the analysis
7.2 Drawing the conclusions
7.3 Verifying the conclusions
7.4 Establishing the shortcut

8.0 Partitioning the error 76


8.1 Separating the noise
8.2 Aggregating the error
8.3 Rationalizing the sample

9.0 Analyzing the partitions 85


9.1 Defining the components
9.2 Analyzing the variances
9.3 Examining the entitlement

10.0 Computing the Correction 94


10.1 Computing the shift
10.2 Resolving the shift
10.3 Calculating the minimum
10.4 Connecting the capability

11.0 Harnessing the Chaos 111


11.1 Setting the course
11.2 Framing the approach
11.3 Limiting the history
11.4 Understanding the chaos
11.5 Evolving the heuristics
11.6 Timing the geometry
11.7 Exemplifying the fractal
11.8 Synthesizing the journey

12.0 Concluding the discussion 126

Appendix A: Guidelines for the Mean Shift 128

References and Bibliography 130


Forward

Two pillars of seemingly mystical origin and uncertain composition have long
supported the practice of six sigma. The first pillar is characterized by the quantity "six"
in the phrase "six sigma." The second pillar is related to the 1.5 sigma shift. This book
sets forth the theoretical constructs and statistical equations that underpin and validate
both of these pillars, as well as several other intersecting issues related to the subject.
The reader should be aware that this book has been prepared from a design
engineering perspective. Owing to this, it can fully support many of the aims associated
with design-for-six-sigma (DFSS). Although skewed toward design engineers, this book
provides a methodology for risk analysis that would be of keen interest to producibility
engineers. In addition, the book is also intended for quality professionals and process
engineers that are responsible for the "qualification" of a process prior to its adoption.
With these aims in mind, the ensuing discussion will mathematically demonstrate
that the "1.5 sigma shift" can be attributable solely to the influence of random error. In
this context, the 1.5 sigma shift is a statistically based correction for scientifically
compensating or otherwise adjusting a postulated model of instantaneous reproducibility
for the inevitable consequences associated with random sampling variation. Naturally,
such an adjustment (1.5 sigma shift) is only considered and instituted at the opportunity
level of a product configuration. Thus, the model performance distribution of a given
critical performance characteristic can be affectively attenuated for many of the
operational uncertainties associated with a design-process qualification (DPQ).
Based on this quasi-definition, it should fairly evident that the 1.5 sigma shift factor
can often be treated as a "statistical correction," but only under certain engineering
conditions that would generally be considered “typical.” By all means, the shift factor
(1.5 sigma) does not constitute a "literal" shift in the mean of a performance distribution
– as many quality practitioners and process engineers falsely believe or try to postulate
through uniformed speculation and conjecture. However, its judicious application during
the course of designing a system, product, service, event, or activity can greatly facilitate
the analysis and optimization of "configuration repeatability."

© Mikel J. Harry, Ph.D. 2002


By the conscientious application of the 1.5 sigma shift factor (during the course of
product configuration), an engineer can meaningfully "design in" the statistical and
pragmatic confidence necessary to ensure or otherwise assure that related performance
safety margins are not violated by unknown (but anticipated) process variations. Also of
interest, its existence and conscientious application has many pragmatic implications (and
benefits) for reliability engineering. Furthermore, it can be used to "normalize" certain
types and forms of benchmarking data in the interests of assuring a "level playing field"
when considering heterogeneous products, services, and processes.
In summary, the 1.5 sigma shift factor should only be viewed as a mathematical
construct of a theoretical nature. When treated as a "statistical correction," its origin can
be mathematically derived as an equivalent quantity representing or otherwise reflecting
the "worst-case error" inherent to an estimate of short-term process capability. As will be
demonstrated, the shift factor is merely an "algebraic byproduct" of the chi-square
distribution that will vary depending on the accepted level of risk and prevailing degrees-
of-freedom. However, when typical application circumstances are postulated and
rationally evaluated, the resulting shift will prove to be approximately equivalent to 1.5
sigma.

Dr. Mikel Harry,


Scottsdale, Arizona
April 21, 2003

© Mikel J. Harry, Ph.D. 2002


1.0 Introducing the Context

1.1 Unfolding the history


Today, a great many professionals already know the story of six sigma.
As one of the original architects of six sigma, this author has carefully
observed its phenomenal growth over the years. During this course of time,
numerous business executives have watched six sigma expand from a simple
quality target to a viable system of business management. In fact, Jack Welch
(former CEO of General Electric) has said that six sigma is the first initiative
to come along that reaches the “control function” of a corporation.
During the evolution of six sigma, numerous questions have been
formulated, fielded and addressed by this author. These questions span the
chasm between the issues of business executives and the concerns of technical
practitioners. In this author’s recollection, two recurring questions have
dominated the broad field of technical inquiry, at least where the statistical
theory of six sigma is concerned.
The first type of recurring inquiry can be described by the global
question: “Why 6σ and not some other level of capability?” The second type
of inquiry is more molecular in nature. It can be summarized by the
compound question: “Where does the 1.5σ shift factor come from – and why
1.5 versus some other magnitude?”
Although some quality professionals still debate the merits of a six
sigma level of capability (per opportunity) and the mysterious, so-called “shift
factor,” many have judiciously ignored such rhetoric. These pioneers have
persevered by the continued demonstration of successful application – reaping
considerable treasure along the way. Such individuals are concerned only
with results, not with statistical theory and academic debate that is best left to
the mathematicians and statisticians of the world.
Still, regardless of success, pervasive questions regarding six sigma and
the the1.5σ shift remain in the minds of many theorists and quality
professionals. Although well-intentioned answers are often set forth in the
form of cursory explanations, there is a dearth of technical exposition. In

© Mikel J. Harry, Ph.D. 2002


1
short, such attempts to clearly prescribe the genetic code of six sigma and its
associated 1.5σ shift factor have fallen short of their mark. To satisfy this
apparent need, our discussion will uncloak the theoretical origins of six sigma,
and concurrently demystify the 1.5σ correction factor, by directly examining
their underlying determinants in the context of producibility analysis.1
Further, the examination will be conducted and presented from a theoretical as
well as pragmatic frame of reference.2 As well, a series of sidebar discussions
and several progressive case examples will be provided during the course of
presentation so as to reinforce certain aspects and features of the instructive
content.
At the onset of six sigma in 1985, this writer was working as an engineer
at the Government Electronics Group of Motorola. By chance connection,
this practitioner linked up with another engineer by the name of Bill Smith
(originator of the six sigma concept in 1984). At that time, Bill’s proposition
was eloquently simple. He suggested that Motorola should require 50 percent
design margins for all of its key product performance specifications. When
considering the performance tolerance of any given critical design feature, he
believed that a conventional 25 percent “cushion” was not sufficient for

1
Based on application experiences, Mr. William “Bill” Smith proposed the 1.5σshift factor more than 18
years ago (as a compensatory measure for use in certain reliability and engineering analyses). At that
time, the author of this book conducted several theoretical studies into its validity and judiciously
examined its applicability to design and process work. The generalized application components were
subsequently published in several works by this author (see bibliography). While serving at Motorola,
this author was kindly asked by Mr. Robert “Bob” Galvin not to publish the underlying theoretical
constructs associated with the shift factor, as such “mystery” helped to keep the idea of six sigma alive.
He explained that such a mystery would help “keep people talking about six sigma in the many hallways
of our company.” To this end, he fully recognized that no matter how valid an initiative may be, if people
stop talking about it, interest will be greatly diminished, or even lost. In this vein, he rightfully believed
that the 1.5σ mystery would motivate further inquiry, discussion and lively debate – keeping the idea
alive as six sigma seated itself within the corporation. For such wisdom and leadership, this author
expresses his deepest gratitude. However, after 18 years, the time has come to reveal the theoretical
basis of six sigma and that of the proverbial 1.5σ shift.
2
At all times, the reader must remain cognizant of the fact that the field of producibility assessment is
relatively new territory in terms of engineering. Because of this, and its enormous scope, the full and
complete articulation of certain details is not possible within the confines of this book. As a
consequence, emphasis is placed on the development of a conceptual understanding at many points in
the discussion. Also recognize that the focus of this book is on the statistical theory and supporting
engineering rational surrounding the 1.5σ shift factor advocated by the practice of six sigma.
Nonetheless, the discussion is framed in the context of design engineering and producibility analysis.

© Mikel J. Harry, Ph.D. 2002


2
absorbing a sudden shift in process centering on the order of 1.5σ (relative to
the target value).
Regardless of the exact magnitude of such a disturbance (shock) to the
centering of a critical performance characteristic, those of us working this
issue fully recognized that the initial estimate of process capability will often
erode over time in a “very natural way” – thereby increasing the expected rate
of product defects (when considering a protracted period of production).
Extending beyond this, we concluded that the product defect rate was highly
correlated to the long-term process capability, not the short-term capability.
Of course, such conclusions were predicated on the statistical analysis of
empirical data gathered on a wide array of electronic devices. At that time,
those of us involved in the initial research came to understand that an estimate
of long-term capability is mostly influenced by two primary contributors – the
extent of instantaneous reproducibility and the extent of process centering
error. In essence, we began to see the pragmatic connection between design
margin, process capability, defects and field reliability.
It must be remembered that Bill’s initial assertions (prior to such
research) were seemingly extremist in nature, at least during that period of
time. Although the ideas had strong intuitive appeal, many design engineers
experienced a very high degree of skepticism, to say the least. The notion of
using a 50 percent margin on both sides of a bilateral performance
requirement (for key design features only) was certainly outside the bounds of
conventional wisdom. Again, at that point in time, conventional engineering
practice advocated a 25 percent design margin for most applications.
Moreover, the unconventional practice of imposing a 1.5σ shift on all of the
critical performance features of a design (so as to test certain system-level
producibility and performance assumptions) did seem somewhat bizarre, even
to very liberal engineers of the time. However, Bill’s many years of
successful manufacturing and engineering experience gave just cause for our
attention – thus meriting further inquisition by this researcher and practitioner.

© Mikel J. Harry, Ph.D. 2002


3
After several more highly interactive discussions with Bill, this
researcher established that his grounded assertions were well founded and
quite rational – from an engineering and statistical point of view. Essentially,
he was saying that the design margins affixed to certain key design features
(often called CTQs) should be bilaterally increased from 25 percent to 50
percent so as to compensate for the aggregate effect of “normal” process
perturbations that would otherwise not be accounted for over a relatively short
period of sampling. As this research would later validate, such errors are
inherently and progressively manifested in the form of an enlarged standard
deviation, therein expanding or otherwise inflating the performance
distribution. Although not a part of the initial discussions with Bill, it became
all too apparent that he was indirectly attempting to account for long-term
sources of random sampling error (on the order of 1.5σ).
Such temporal error can sometimes manifest itself in a variety of ways,
among which is dynamic momentary shifts in process centering. Naturally,
such shifting is inevitably encountered during the course of production (as can
be readily verified by statistical sampling over protracted periods of time).3
Along these lines, we reasoned that such process behaviors could be
anticipated and subsequently compensated for early on in the design process.
By doing so, the observed in-process dpu could be lowered (by design) to
such an extent that the overall product reliability (MTBF) would be
significantly enhanced. From a design and reliability point of view, his
assertion was well taken.
From this perspective, Bill’s arguments were persuasive. He argued
that, as a result of increasing design margins, the “instantaneous failure rate”
of the product would naturally improve. Also, the need for “in-process
testing” could be significantly reduced. In addition, there would be huge

3
Although a particular set of subgroup-to-subgroup centering errors may be classified as “random,” their
individual existence is nonetheless unique and real. Just because the subgroup-to-subgroup variation is
random does not preclude the resulting units of product from being different. In short, the subgroup
differences (in terms of centering) may be statistically insignificant (owing to random sampling error), but
the difference is real – in an absolute sense, no matter how small or large. Owing to this, it is rational to
assert that such variation would induce unit-to-unit differences in reliability.

© Mikel J. Harry, Ph.D. 2002


4
benefits associated with the reduction of “burn in” time. Furthermore, the
resulting decrease in dpu (by way of increased design margins) would
virtually eliminate the need for in-line test and inspection, thereby reducing
production costs, not to mention the implications on reducing warranty costs,
work in process and process cycle-time. From all of this, he proposed a huge
economic benefit to Motorola and more satisfied customers.
As a member of the engineering community, this researcher found Bill’s
ideas about quality, applied engineering and production management most
intriguing, even though at the time his ideas were mostly undefined and fully
undefended from a statistical point-of-view. In fact, much of the related
literature available in the mid-eighties often posed contrary arguments to
much of Bill’s thinking. To justify any professional use of these concepts,
this practitioner needed to see the statistical architecture and analytical
building blocks that would support his core of reasoning – subsequently
validated with “real world” data. Following one discussion along these lines,
Bill asked this writer if he could investigate the matter from a statistical point
of view. Little did this author know (at that time), a simple “look into the
matter” would trigger an 18-year quest for “enlightenment.”
Over the years to come, this researcher and practitioner enjoyed many
discoveries, among which was the mathematical basis and necessary empirical
evidence to support several of Bill’s original precepts. In fact, this quest
ultimately impacted this writer’s views about statistics, quality, engineering
and how a business should be organized and run. The resulting pursuit lead
this investigator to invent and subsequently disseminate such contributions as
the Breakthrough Strategy® (DMAIC); the black belt concept, terminology
and infrastructure; the plan-train-apply-review (PTAR) cycle of learning; the
idea of Cp* and Cpk* (now known as Pp and Ppk). Suffice it to say, this

© Mikel J. Harry, Ph.D. 2002


5
author has had a few epiphanies along the way, and his career has never been
the same since.4

1.2 Adopting the Principles


For purposes of simplified knowledge transfer and meaningful
communication, this researcher (and several other founding agents of six
sigma) decided the idea of a shifted distribution would have far more
cognitive appeal within the general workforce (at Motorola) than the idea of
an “inflated” short-term sampling standard deviation (used principally for
purposes of design analysis). Underlying this assertion was the general belief
that most process workers can readily understand that a process will naturally
“shift and drift” over time. However, to assert that the process standard
deviation might dynamically expand over time required additional time-
consuming explanation.
Of course, to yield a meaningful discussion, such explanation had to be
received by someone with the necessary prerequisite statistical knowledge.
Invariably, when given this prerequisite training, participants did not perceive
the sometimes-voluminous statistical details as “value-added.” As a
consequence, many application opportunities were sidestepped and their
potential beneficial effects were forever lost, simply because they perceived
the base of training as too “complicated” and “technical.”
In short, the idea of an expanding and contracting standard deviation
was found to be outside the realm of “common sense reasoning” without the
provision of statistical instruction. However, the idea of a “shift correction”
carried high appeal and inevitably promoted lively and meaningful discussion

4
With this as a backdrop, this writer feels compelled to acknowledge the very fine technical contributions
and enhancements provided over the years by such accomplished engineers as Dr. Thomas Cheek, Dr.
Jack Prins, Dr. Douglas Mader, Dr. Ron Lawson and Mr. Reigle Stewart, just to name a few. During this
author’s years at Motorola, their personal insights and “late at night over a beer, pencil and calculator”
discussions significantly aided in adding to the body of six sigma research. Perhaps most of all, this
writer would like to recognize Mr. Robert “Bob” Galvin. His many words of wisdom, piercing leadership
acumen and personal encouragement provided this scientist the “intellectually-rich and politically-risk -
free” environment from which to reach out and question conventional thinking. Only with his support
were the beginnings of this investigator’s journey made possible (and relatively painless). He is truly an
icon of leadership and a solid testament to what can happen when a senior executive embodies and
empowers the “idea of ideas.”

© Mikel J. Harry, Ph.D. 2002


6
– without the prerequisite education. Therefore, those of us at Motorola
involved in the initial formulation of six sigma (1984-1985) decided to adopt
and support the idea of a “1.5σ equivalent mean shift” as a simplistic (but
effective) way to account for the underlying influence of long-term, random
sampling error. Of course, the shift factor was viewed as a means to facilitate
the creation of sound design specifications, the study of certain types of
reliability problems and the forecasting of sustainable producibility. In classic
engineering style, we further decided that our optimization efforts should be
wrapped around the idea of “worst case” analysis, but only in a statistical
sense.

2.0 Extending the Context


2.1 Defining the root
At the very heart of six sigma is the idea of determinism. As most
would likely agree, the foundation of this idea is scientific in nature and
advocates that every existing phenomenon is caused by another existing
phenomenon or phenomena. For example, we know that “answers” stem from
“questions.” In turn, we can say that a “question” is the result of “thinking.”
Thus, many believe that our daily reality can be defined by a series of
intersecting cause-and-effect relationships. Of course, we can influence some
of these causative chains, while others are outside our span of control.
Nonetheless, we strive to discover, understand and harness the power of
determinism.
So as to simplify this idea, we set forth the notion that Y = f (X), where
Y is a certain dependent variable, X is an explanatory variable of an
independent causative nature, and f is the function that relates or otherwise
associates Y to X.5 For all intents and purposes, this relation tells us that the
performance of Y can only be made fully known when the performance of X

5
Given the model Y = f (X), it should be recognized that the function can be of a linear or nonlinear form.
For a linear transfer function f, we would rightfully expect that any given incremental change in X would
necessarily induce a corresponding and incremental change in Y. Given the same increment of change
in X, a nonlinear function would induce a disproportional change in Y.

© Mikel J. Harry, Ph.D. 2002


7
is fully known – given that the function f is valid and reliable. Consequently,
it can be said that (at any moment in time) the output variable Y is
conditioned by the input variable X.
If for some reason X is not fully explanatory or causative, then we must
assert that Y = f (X) + ε, where ε constitutes the extent or degree of
uncertainty in the forecast of Y. It is from this perspective that the scientific
notion of error is made synonymous with the idea of uncertainty. In this
context, an error (per se) is not related to the phrase “to blunder.” Rather, it is
related to the scientific understanding, which is “to deviate or be different
from.”
In this context, we understand that uncertainty (risk) is constituted or
otherwise manifested by any type or form of variation in Y or X. Owing to
this, the concept of variation is also made germane to the idea of
reproducibility and repeatability, both of which are related to the concept of
replication error.6 For example, let us contrast an arbitrary performance
observation of Y to its model condition ζ. If the observation Yi is not equal to
ζ, it can be said that the observation varies (deviates) from the model
(expected) condition such that |δi| > 0.7
In general, ζ can assume the form of a nominal specification (such as T),
a measure of central tendency (such as µ), or even some other case of Y (such
as Yk). To illustrate, consider the difference δi = Yi – µ. In this instance, we
recognize the error δi to be a particular “mean deviation. Given this
understanding, we might then seek to characterize the aggregate set of
deviations in terms of its specific magnitude, vector, dwell, and timing. Of
course, the outcome of such a characterization study would allow us to better

6
Generally speaking, such variation can be of the random or nonrandom variety. Random variation is
also referred to as “white noise,” where as nonrandom variation is referenced as “black noise.”
7
As may be apparent, such a deviation from expectation could be the result of a random or nonrandom
effect (as the case may be). Of course, the discovery, classification, and subsequent study of such
effects are of central concern to the field of mathematical statistics.

© Mikel J. Harry, Ph.D. 2002


8
define or otherwise describe the underlying system of causation.8 Only in this
manner can Y be scientifically linked to X. It should go without saying that
the progressive classification of error lies at the heart of modern problem
solving and the practice of six sigma.

2.2 Expanding the function


Unfortunately, most types of phenomenon in nature are not so simple
that they can be adequately or sufficiently described by the influence of a
single independent variable. In fact, virtually all such mono-variable cases
would reveal that ε > 0, at least to some extent. To fully eliminate such
uncertainty (error), it would be necessary to isolate all of the other causative
agents (independent variables). Following this, it would be crucial to examine
and subsequently characterize their independent and interactive effects –
instantaneously and longitudinally. Only when such effects are made known
or rationally postulated can it be said that Y = f ( X1 , … , XN ). As before, we
recognize Y as the dependent variable, f as the transfer function (mechanism
of causation), X as an independent variable, and N as the last possible X.
When all of the Xs have been accounted for or otherwise defined in a
valid and reliable manner, the resulting set of independent variables would be
fully comprehensive, or “exhaustive” as some would say. This means that all
possible independent variables (of a causative nature) are present or otherwise
accounted for. Expressed more succinctly, we would logically assert that as
the quantity N approaches its natural upper limit, the inherent error in Y
would necessarily approach zero.9

8
Of interest, most errors can be classified into one of four broad categories: 1) random transient; 2)
nonrandom transient; 3) random temporal; and 4) nonrandom temporal. While transient errors are
relatively instantaneous in nature, temporal errors require time to be fully created or otherwise
manifested. Without saying, random errors cannot be predicted or otherwise forecast (in a statistical
sense) whereas nonrandom errors can be. In this context, random errors do not have an “assignable
cause,” but the occurrence of nonrandom errors can be assigned. This is to say that nonrandom errors
can be directly attributed to the influence of one or more independent variables or some interactive
combination thereof.
9
This theoretical understanding naturally assumes that the partial derivatives associated with the
contributing Xs have been rank ordered in terms of influence and then subjected to the transformative
process f. Under this assumption, the residual error will decrease as the accumulation of influence
increases. Of course, the inverse of this is also true.

© Mikel J. Harry, Ph.D. 2002


9
However, in practice, it is often not possible to fully establish the
function that operationally connects Y to its corresponding set of Xs. In such
cases, it would be very rare to find that N is fully exhaustive and that the
function f is absolutely valid and reliable.10 Owing to this, we innately
acknowledge the presence of error in our statement of Y, at least to some
statistical or pragmatic extent.11 Thus, we must modify the aforementioned
relation and subsequently proclaim that Y = f ( X1 , … , XN ) + ε. Here again,

ε constitutes the extent or degree of uncertainty (error) that is present in our


forecast of Y.12 Only when given a valid and reliable transfer function f and
an exhaustive set of preconditioned independent causative variables is it
rationally possible to declare that ε = 0. Consequently, we are most often

forced to grapple with the case ε > 0. Hence, the ever present need for
mathematical statistics.
With respect to any dependent variable Y, each X within the
corresponding system of causation exerts a unique and contributory influence
(W). Of course, the weight of any given X is provided in the range 0.0 <
10
To this end, a statistical experiment is often designed and executed. Such experiments are intended to
efficiently isolate the underlying variable effects that have an undue effect on the mean and variance of
Y. As a part of such an exercise, a polynomial equation is frequently developed so as to interrelate or
otherwise associate Y to the “X effects” that prove to be of statistical and practical concern. In such
cases, it is not feasible to isolate the exhaustive set of Xs and all of their independent and interactive
effects. In other words, it would not make pragmatic or economic sense to attempt a full explanation or
accounting of the observed behavior in Y. Consequently, we observe that ε > 0 and conclude that the
given set of causative variables is not exhaustive.
11
For the moment, let us postulate that N is exhaustive. As any given X is made to vary, we would
naturally observe some corresponding variation in Y, subject only to the mechanistic nature of the
function f. Of course, such variation (in Y and X) is also referred to as “error.” Thus, the function f is
able to “transmit” the error from X to Y. If the errors assignable to X are independent and random, the
corresponding errors in Y will likewise be independent and random. Naturally, the inverse of this would
be true – nonrandom error in X would transmit to Y in the form of nonrandom error. From a more
technical perspective, it can be said that any form of autocorrelated error in X would necessarily
transmit to Y in a consistent and predictable fashion – to some extent, depending on the function f. In
any such event, it is quite possible that a particular “blend” of nonrandom input variation could be
transmitted through the given function in such a way that the output variation would not exhibit any
outward signs of autocorrelation (for any given lag condition). Since Y would exhibit all the statistical
signs of random behavior, it would be easy to falsely conclude that the underlying system of causation
is non-deterministic.
12
Uncertainty is often manifested when: a) one or more causative variables are not effectively contained
within the composite set of such variables; b) the transfer function f is not fully valid or reliable; c) one or
more of the causative (independent) variables has undergone a momentary or temporal change of
state; d) two or more of the causative (independent) variables are somehow made interactive,
instantaneously or longitudinally; or e) some combination thereof.

© Mikel J. Harry, Ph.D. 2002


10
Wi < 1.0, where Wi is the contributory weight of the ith independent
variable.13
Given this knowledge, the adequacy and sufficiency of f, as well as the
declaration that N is exhaustive, it would then be reasonable to assert that Y
and its corresponding set of Xs can be fully characterized without error. In
other words, there would be no “error” inherent to our characterization of Y –
owing to the inclusion of all possible variables operating in the light of a valid
and fully reliable transfer function f. This is to say that, for any unique set of
momentary or longitudinal conditions, it would be possible to forecast or
otherwise characterize the nature of Y with 100 percent certainty.

2.3 Describing the interaction


With the same form of reasoning, we must also concede that the
resulting influence of any given interactive combination of Xs will likely
change over time – owing to the instantaneous and longitudinal states that are
naturally manifest to each X. For example, let us suppose that Xi and Xj have
the potential to be interactive when both variables dwell on the high side of
their respective performance scales, but the interactive effect is not nearly as
strong when both variables dwell on their low side.
We will further postulate that when both variables are operating near
their central condition, they are no longer interactive, per se. Based on this set
of circumstances, it is easy to understand how it is possible that their joint
influence ( βij XiXj ) might radically change over time – owing to a change of
state in Xi and Xj respectively. Thus, we conclude that replication error is
often quite circumstantial by nature. For example, many types and forms of
variable interactions are dependent upon the sustained coincidence of certain
respective frequencies and amplitudes among the independent variables.
Given such reasoning, we theoretically recognize that the progressive
behavior of any given X can be described in terms of frequency and
13
For virtually any relatively complex system of causation, it is widely accepted that a small number of the
Xs will generally account for a majority of the total weight. This is often referred to as the “Pareto” effect
– the case where most of the influence emanates from the “vital few” variables versus the “trivial many.”

© Mikel J. Harry, Ph.D. 2002


11
amplitude. By the laws of nature, there would exist a hierarchical progression
of causation that would continue through the Zth level, where Z could
represent infinity.
In this context, every X is a contributor to, or the resultant of, some other
X – in some way, shape, or form. Hence, the declaration of a Y variable
provides us with an indirect reference to one of the infinite steps on the
“staircase of causation.” Here again, we are reminded that everything is
relative. Only at the Zth level of a given causative system would each X
exhibit a distinct and perfectly predictable pattern behavior. This is to say
that, at the lowest possible level of causation, the pattern of each X would be
stable in terms of its operating frequency and amplitude.
From this perspective, it should be relatively easy to understand how the
instantaneous or longitudinal effect of interactive variables could induce the
illusion of random behavior in Y, even though the unique operating frequency
and amplitude of each X is deterministic. From another angle, it is possible
that each X (associated with a complex system of causation) could exhibit a
high level of autocorrelation (for a lag 1 condition), but the dependent variable
Y might not exhibit such autocorrelation (for any given lag condition).
Naturally, such a phenomenon results from the blending of the many
instantaneous and longitudinal effects stemming from the underlying system
of causation.

2.4 Rationalizing the sample

As many practitioners of process improvement already know, it is often


the case that the influence of certain background effects must be significantly
reduced or eliminated so as to render a “statistically valid” estimate of process
capability. Of course, this goal is often achieved or greatly facilitated by the
deliberate and conscientious design of a sampling plan.
Through such a plan, the influence of certain variables and related
effects can be effectively and efficiently “blocked” or otherwise neutralized.
For example, it is possible to block the first-order effect of an independent

© Mikel J. Harry, Ph.D. 2002


12
variable by controlling its operative condition to a specific level. When this
principle is linked to certain analytical tools, it is fully possible to ensure that
one or more causative variables do not “contaminate” or otherwise unduly
bias the extent of natural error inherent to the response characteristic under
consideration.
As a given sampling strategy is able to concurrently block the influence
of more and more independent variables, the response replication error is
progressively attenuated. In other words, as the influence of each independent
variable is progressively blocked, it is theoretically possible to eventually
reach a point where it is not possible to observe any type, form, or magnitude
of replication error. At this point, only one measurement value could be
realized during the course of sampling. In short, the system of classification
would be so stringent (as prescribed by the sampling plan) that no more than
one observation would be possible at any given moment in time.
Should such a sampling plan be invoked, the same response
measurement would be observed upon each cycle of the process – over and
over again it would be the same measurement – the replication error would be
zero (assuming a fully valid and reliable system of measurement). However,
for any given sampling circumstance, there does exist a theoretical
combination of blocking variables and corresponding control settings that will
allow only random errors to be made observable. However, the pragmatic
pursuit of such an idealized combination would be considered highly
infeasible or impractical, to say the least. For this reason, we simply elect to
block on the variable called “time.” In this manner, we are able to indirectly
and artificially “scale” the system of blocking to such an extent that only
random errors are made known or measurable.
If the window of time is made too small, the terminal estimate of pure
error (extent of random variation) is underestimated, owing to the forced
exclusion of too many variable effects of a random nature. On the other hand,
if the window of time is too large, the terminal estimate of pure error (extent
of random variation) is overestimated, owing to the natural inclusion of

© Mikel J. Harry, Ph.D. 2002


13
nonrandom variable effects. However, by the age-old method of trial and
error, it is possible to define a window size (sampling time frame) that
captures the “true” magnitude of background variations (white noise) but yet
necessarily precludes nonrandom sources of error from joining the mix.
In short, it is pragmatically feasible to discover a sampling interval (in
terms of time) that will capture the full extent of white noise while preserving
the primary “signal effect.” Only when this has been rationally and reasonably
accomplished can the instantaneous and longitudinal reproducibility of a
performance characteristic be established in valid manner. Such a sampling
plan is also called a rational sampling strategy, as execution of the plan
rationally (sensibly and judiciously) partitions the array of “signal effects”
from the mix of indiscernible background noises. In this sense, a rational
sampling strategy can effectively and efficiently preserve the array of signal
effects, while concurrently capturing the full extent of random error.
From this perspective, it is easy to understand why the idea of rational
sub grouping is so important when attempting to estimate the short-term
standard deviation of a response characteristic (CTQ). Only when the
“signal” effects are removed from the total mix of variations can the
instantaneous (short-term) reproducibility be made known in a statistically
valid way.

2.5 Detecting the error


The idea of error is certainly not a new concept – by any stretch of the
imagination. We naturally recognize the existence of error whenever there is
a departure from some type of model expectation, regardless of the
magnitude, direction, dwell, or timing of that departure. To illustrate, let us
suppose that a certain performance variable can be fully described as Y ~
NID(µ,σST), where µ is the distribution mean and σ is the standard deviation.
If a single member of such a population is arbitrarily (randomly) selected, but
its instantaneous performance cannot be immediately assessed or otherwise
made known, the momentary expectation (best guess) would be µ. Under

© Mikel J. Harry, Ph.D. 2002


14
these circumstances, the odds of correctly guessing (estimating) the true value
of Y are maximized since 50 percent of the values lie above and below µ.
Should we discover a difference between a value and the corresponding
model expectation, then such a differential would be referred to as a “mean
deviation.” In the context of our discussion, we would observe | Yi – µ | > 0.
Given this condition, we would necessarily declare an error in our estimate of
Y. When such errors are amalgamated and then subsequently summarized in
the form of a standard deviation σ, the resulting index only reflects the
perturbing influences that would have been present during the interval of
observation. If the period of observation (duration of sampling) is relatively
short, it is rational to assert that not all sources of potential error would be
accounted for or otherwise represented by the given standard deviation. In
other words, it can be said that as the period of observation approaches its
natural upper limit, the likelihood of detecting or otherwise “trapping” all
possible sources of error approaches 100 percent.
Conversely, as the period of observation (time) approaches the natural
limit of zero, there exists a point at which it would no longer be possible to
make more than one uniquely independent observation. Obviously, under
such a condition, the underlying system of causation would be virtually
invariant. Consequently, it would not be possible to identify and subsequently
authenticate (validate) any given source of variation related to the dependent
variable Y.

2.6 Classifying the error


Holistically speaking, the exact nature and magnitude of any given
replication error is fully determined by the net effect of many ever-changing
conditions and circumstances within the underlying system of causation.
Globally speaking, such errors can be classified into two distinct categories.
The first category is called “random error,” while the second is referenced as
“nonrandom error.” We fully recognize that random error (white noise) is

© Mikel J. Harry, Ph.D. 2002


15
unpredictable in a mathematical sense.14 However, nonrandom error (black
noise) is often found to be predictable – at least to some extent greater than
zero.15 We also must concede that white noise is due to unassignable
(untraceable) causes, while black noise can be attributed to assignable causes
(those sources of causation that can be made accountable or traceable).
Pertaining to both classifications of error (random and nonrandom), we must
fully consider two discrete types of effects. The first type is called a
“transient effect” while the second is referred to as a “temporal effect.”
To illustrate the nature of a transient effect, consider a dependent
variable Y that has just experienced a momentary oscillation, brief
disturbance, or instantaneous shock – much like a sudden pulse or surge of
energy. Of course, such an effect can be due to the sudden influence of one or
more random or nonrandom forces within the underlying system of
causation.16 It is also understood that the dwell of such an effect is relatively

14
In many cases, the nature of such error is often so complex, compounded, and confounded that existing
analytical technologies do not have the “diagnostic power” to discern or otherwise “source trace” its
independent origins through the many chains of causation. When it is not pragmatically feasible or
economically sensible to “track down” the primary sources of variation, we simply declare (assume) that
each individual error constitutes an “anomaly.” For any given anomaly, the circumstantial state of the
underlying cause system is momentarily declared to be “indeterminate” and, as a consequence, the
perturbation is treated as if it emanated from a system of random causes.
15
The momentary or longitudinal blending (mix) of many independent variables (each with a unique
weighting) can effectively “mask” the presence of a nonrandom signal condition inherent to the
dependent variable Y. As may be apparent, this would create the illusion of random variation (with
respect to Y). However, as the sources of variation (Xs) are progressively blocked or otherwise
neutralized (by virtue of a rational sampling scheme coupled with the appropriate analytical tools), the
dominant signal conditions would then be discernable from the white noise. When such a signal
condition is detected, the composite (total) variation would no longer be considered fully random. In
other words, as the background variations are minimized, the likelihood of detecting some type or form
of underlying signal increases. From a purely classical point-of-view, some would assert that nothing in
nature happens by chance (everything is theoretically deterministic). In other words, everything moves
in some form of trend, shift, or cycle. Holding this as an axiom, it would then be reasonable to assert
that Y is always perfectly predictable (theoretically speaking), regardless of how complex or
sophisticated the underlying system of causation may be. Accepting that Y = f (X1, … , XN) and given
that the influence of all variables is effectively eliminated except that of XK, then Y would necessarily
exhibit the same behavior as XK (momentarily and longitudinally). Thus, it can be theoretically argued
that any collective set of independent variables, each having a unique signal effect of a nonrandom
nature, can be momentarily or longitudinally blended or otherwise mixed in such a manner so as to form
a seemingly nondeterministic system. When this type of condition is at hand, it is often far more
convenient (for purposes of analysis) to assume a random model than it is to progress under the
constraints of a nonrandom model.
16
As independent agent, any given source of variation has the capacity and capability to induce a
transient or temporal effect (error). However, when two or more such forces work in unison (at certain
operational settings), it is often possible to form an effect that is larger than the simple sum of their
independent contributions. In general, as the number of independent contributory forces increases, it
becomes less likely that the resulting effect (error) can be dissected or otherwise decomposed for

© Mikel J. Harry, Ph.D. 2002


16
instantaneous and, as a consequence, is not sustained over time. Naturally, a
transient effect can be periodic or sporadic. However, when the timing of
such an effect is sporadic (random), it is often referred to as an “anomaly.”
Although the magnitude and direction of a transient effect can be random or
nonrandom in nature, its timing and dwell is generally found to be
unpredictable. From a statistical perspective, the magnitude of such effects
can often be made known by progressively tracking the general quantity δ = Y
– µ. Of course, the direction of effect can often be established by noting the
vector of δ. In other words, the sign of a deviation (positive or negative)
reports on the direction of effect.
The basic nature of a temporal effect is time-dependent. In other words,
a temporal effect requires the passage of time before its influence can be fully
manifested and subsequently detected. Of course, a temporal effect can be of
the random or nonrandom variety. For example, suppose a random normal
independent variable experiences a temporary interaction with another such
variable. It is quite possible that the outcome of such a phenomenon would be
manifested in the form of a “performance dwell,” where the period of rise or
fall in the dependent variable is sustained for a moderate period of time within
the system of causation.
From a process control perspective, such a condition could constitute a
“temporal shift” in the signal condition of the performance characteristic. By
nature, this type of shift would exhibit a particular magnitude and vector –
both of which would be fully attributable to the two-variable interaction
within the system of causation. However, the timing and dwell may be fully
nondeterministic (random). Moreover, the overall time-series pattern may or
may not exhibit a “statistically significant” autocorrelation.17 Consequently, it

independent consideration and analysis. Consequently, higher order interactions are often treated as a
random effect when, in reality, that effect is comprised of several deterministic causes.
17
To better understand the idea of autocorrelation, let us consider a set of time-series data. Given this,
we say that the data are “sequentially realized over time.” First, let us consider a lag one condition. For
this condition, it can be said that any given error cannot be used to statistically forecast the next
observed error. For a lag two condition, the error from the two previous time periods cannot be used
(individually or collectively) to forecast the next observed error. Of course, this line of reasoning would
apply for all possible lag conditions. If no statistical correlation is observed over each of the possible

© Mikel J. Harry, Ph.D. 2002


17
would be extremely difficult (if not impossible), or at least generally
impractical, to undertake a comprehensive characterization (classification) of
the composite variations (errors).18
In light of such considerations, we seek to employ various types and
forms of rational sampling strategies so as to ensure the random transient
effects are reflected or otherwise trapped within sampling groups and the
temporal effects are duly reflected or otherwise accounted for between
sampling groups. Given that the sampling strategy is adequately and
sufficiently “blocked,” the primary signal effects will then be forced into their
respective blocks. As a continuous and progressive strategy, the goal of
rational sampling is fairly straightforward – separate the random and
nonrandom errors so they may be categorically analyzed, independently
compared, and statistically contrasted. Of course, there exists various types of
statistical tools to facilitate this aim.

2.7 Declaring the opportunity


At the inception of six sigma, the issue of “opportunity counting” was a
source of heated debate and analytical confusion – often centered on the
criteria that constitute an opportunity. Simply stated, an opportunity is merely
a set of conditions favorable to some end. In view of the two possible fates of
a CTQ – success or failure – we have the idea of a “yield opportunity” and
that of a “defect opportunity.” Since one is merely the flip side of the other

lags, it would then be reasonable to assert that the data is not patterned (the data would be free of any
discernable trends, shifts, or cycles).
18
From the dictionary, it should be noted the word “temporal” is taken to mean “of or related to time.” Of
course, this definition could be applied to a short-term or long-term effect. However, for purposes of this
book and six sigma work, we naturally apply its meaning in a long-term sense. For example, when
characterizing a performance variable, we often seek to accomplish two things. First, we attempt to
isolate the short-term influence of random, “transient” effects (instantaneous errors). In general,
transient errors usually prove to be of the random variety. Second, we isolate those factors that require
the passage of time before their unique character can be fully identified or otherwise assessed. Such
errors are time-dependent and, as a consequence, are often referred to as “temporal errors.” From this
perspective, it is easy to understand how the collective influence of transient effects can govern the
short-term capability (instantaneous reproducibility) of a process. Given this, it is now easy to reason
how the total set of temporal effects (coupled with the aggregate transient effects) determine the long-
term (sustainable reproducibility) of a process. Again, we must take notice of the fact that any given
transient or temporal effect can be of the random or nonrandom variety. However, as previously stated,
transient effects most generally induce a random influence whereas temporal effects are generally
manifested as both.

© Mikel J. Harry, Ph.D. 2002


18
(as they are mutually exclusive), we choose most frequently to use the form
“defect opportunity” in recognition of certain quality conventions.19
From an industrial or commercial perspective, the “set of conditions”
just mentioned can be associated with a set of performance standards. For
example, we can offer a set of performance standards in the form often given
as LSL < T < USL. In this form, a “potential opportunity” can be fully
described by the relation Op = f (LSL, T, USL), where Op is the potential
opportunity, LSL is the lower specification limit, T is the target value
(nominal specification) and USL is the upper specification limit. In addition,
the operational condition of the corresponding process distribution (defined by
the parameters µ and σ) must be made known or rationally estimated, and
then “mated” or otherwise contrasted to the performance specifications so as
to place the opportunity in a kinetic state.
Thus, a “kinetic opportunity” can be fully prescribed by the simple
relation Ok = f (LSL, T, USL, µ, σ), where µ is the corresponding process
mean, and σ is the standard deviation of the corresponding process.20
Essentially, this relation infers that a kinetic opportunity can only be created
when these five key factors are mechanistically interacted or otherwise
interrelated in real time and space. From a different angle, we can say that a
kinetic opportunity can be brought forth into real time and space only when
the performance specification of a given design feature is married or
otherwise operationally mated to its corresponding process capability (process
distribution). Only then can a probability of success or failure be rationally
estimated, declared or consequentially established.

19
The reader should recognize that the idea of “error” and that of a “defect” are closely related, but not
necessarily synonymous. For example, let us postulate the marriage of a certain process to a
symmetrical bilateral specification, such that µ = T. In addition, we will also postulate the existence of a
particular error described as δi = Yi - µy. In this case, the deviation δi is fully recognized as an error, but
its vectored magnitude may not be large enough to constitute a defect (nonconformance to
specification).
20
The reader is again reminded that our discussion is based on the assumption of a random normal
variable.

© Mikel J. Harry, Ph.D. 2002


19
It should go without saying that if any of the underlying conditions are
fully absent, unknown or not established, a kinetic opportunity cannot be
declared. However, a potential opportunity can be acknowledged. For
example, if the performance specifications USL, T and LSL do in fact exist,
but the corresponding parameters µ and σ are unknown or have not been
empirically estimated and made relational to the specifications, then the
opportunity would only exist in a potential state. As a result, it would not be
possible to estimate the probability of a nonconformance to standard. In other
words, if an opportunity does not kinetically exist in real time and space, it
should not be counted among those that do exist (for reporting purposes).
On the other hand, if the opportunity is kinetic, but the performance is
not regularly assessed or otherwise measured and reported, the opportunity
would be declared as “passive” in nature. Consequently, it should not be
included among those opportunities that are active by nature (regularly
measured and reported). Thus, we have the operational guideline that says: a
defect opportunity should only be ”counted” if it is regularly assessed
(measured in terms of conformance to standards) and subsequently reported
for purposes of quality management. This is to say the opportunity must not
only be kinetic, it must be active as well.
Application of this general rule and its underlying precepts will
significantly reduce the spurious practice of denominator management where
such performance metrics as defects-per-million-opportunities (dpmo) are
concerned.21 Given the nature of such quality metrics (like dpmo), the

21
The colorful term denominator management is used to describe the practice of inflating or otherwise
distorting the denominator term of the classic quality metric called defects-per-opportunity. As should
be apparent to the informed practitioner, such a practice is most often applied to effectively mask or
confound the true quality of a product or service. For example, consider a simple printed circuit board
(PCB) that employs through-hole technology. In this case, we will exemplify the soldered connection
between the two leads of a standard carbon resistor and the PCB. Given this, it is understood that each
component lead must be adequately soldered to the PCB at two different but related points (i.e., on the
top-side and bottom-side of the board). For the sake of discussion, let us say that the performance
category called “solder joint pull strength” is the CTQ of concern. Given the nature of this CTQ and
application technology at hand, it should be quite evident that each PCB connection constitutes an
independent opportunity to realize a pull-test failure. In other words, each lead of the resistor
represents a defect opportunity. If one lead of the resistor passes the pull test and the other lead fails
the test, then the defects-per-opportunity metric would be properly presented as dpo = d / o = 1 / 2 =
.50. A more liberal perspective would hold there are four defect opportunities since there would exist

© Mikel J. Harry, Ph.D. 2002


20
management focus should be on minimizing the numerator, not maximizing
the denominator. Naturally, the practice of denominator management should
be highly discouraged as it does nothing more than thwart judicious attempts
to create “true” quality improvements. Such false reporting not only harms
the producer but the customer as well. Although the practice of denominator
management can create the illusion of quality improvement, such fictitious
gains are inevitably brought to light over time as the lack of correlation
between field performance, reliability and process performance becomes
known.

2.8 Qualifying the interface


As many practitioners of six sigma know, designing a product or service
is often a tenuous and iterative process, fraught with many uncertainties.22 As
an integral part of such a process, various interventions are made to either
eliminate or reduce various forms of risk. For example, producibility analyses
are commonly undertaken to examine and ultimately enhance the viability of
manufacture. Unfortunately, such efforts frequently miss or fall short of their
aims and intents, often due to the absence of a science-based methodology
that will sufficiently interface a performance specification to its corresponding
process distribution.
To conceptually illustrate such a shortfall, consider the absence or
misconduct of a design-process qualification procedure (DPQ). Without a

four separate solder joints. In this event, the defects-per-opportunity would be wrongfully reported as
dpo = d / o = 1 / 4 = .25. Even more liberal would be the case that advocates six defect opportunities –
four solder joints and two leads. Taken to an extreme, some conniving managers might even try to say
there exist eight defect opportunities – four solder joints, two leads, and two through-holes. In this case,
the product quality would be given as dpo = d / o = 1 / 8 = .125. In this way, management could
inappropriately create a 4X quality improvement by simply changing the “rules of defect accounting.”
Thus, we have improvement by denominator management. To avoid such an error of leadership, we
must recognize that any given unit of product or service will inherently possess “Y” number of critical
failure modes, where each mode has “X” number of active chances. Thus, the total number of defect
opportunities can be described by the general relation O = Σ( Y * X ).
22
For purposes of simplified communication, the author shall define the term “product” to mean any form of
deliverable resulting from a commercial or industrial endeavor or process. In some cases the “product”
may be a process, such as those often encountered in the service sector. In addition, any performance
characteristic that is vital to customer or provider satisfaction will be herein referred to as a “critical-to-
quality characteristic,” or CTQs for short.

© Mikel J. Harry, Ph.D. 2002


21
statistically valid way to qualify a design (relative to its corresponding
processes), we are unable to confidently establish “interface quality.” In other
words, we cannot speak to the “quality of marriage” that exists between the
design and its corresponding production process. In such instances, it is
possible that the allowable bandwidth of a performance specification does not
adequately “fit” the operational bandwidth of the process. Analogously
speaking, the process owner’s automobile is inconveniently wider than the
designer’s garage door. When the sufficiency of unionization between a
design and process is procrastinated until initial production is already
underway, we certainly have a formula for disappointment, failure or both, as
reproducibility errors will likely be bountiful.
As most practitioners of six sigma are all too aware, many product
design organizations simply put an idea on paper and then “throw it over the
wall” to see if the configuration is producible or viable in terms of
reproducibility. In some cases, to make such an assessment, the design is
exercised or otherwise tested during a limited production run. Of course, this
type of approach for studying producibility is undertaken to work out or
otherwise resolve any “unanticipated design flaws and process bugs” prior to
full-scale production.
When problems arise, stopgaps are plugged into the process, or the
product design is somehow changed to accommodate the unwanted
intervening circumstances. Needless to say, such a highly reactionary
approach is not a very productive or efficient way of assuring performance
and producibility. But without a scientific process to follow, perhaps the trial-
and-error approach is well served.
If the results of a DPQ prove unfavorable, the design (and process) is
subsequently “tweaked” until an acceptable result is obtained. Sometimes, a
substantial redesign is undertaken. Other times, the many “marriages”
embedded within a design prove to be so inconvenient and costly that the
entire product is wiped off the business radar screen.

© Mikel J. Harry, Ph.D. 2002


22
Nevertheless, an alternative to the test-discover-tweak approach is the
six sigma method. Essentially, the six sigma way is a more scientific
approach that is grounded in mathematical statistics. Of course, the
overriding purpose of a six sigma DPQ is to statistically prescribe, analyze
and validate the marriage between the design and its conjugal processes.
From this perspective, it is easy to see how the quality of such a marriage can
be used to forecast the relative extent to which the value entitlements will be
realized.23
As many already know, there are a variety of existing statistical tools
and methods that are fully capable of characterizing and optimizing the
producibility of a design before the fact, not after. In other words, the intent is
to assure the realization of value entitlements during the process of design, not
during the course of production. Even more importantly, the six sigma method
of analysis (as prescribed in this book) will provide such assurances with a
known degree of statistical risk and confidence.

3.0 Interrogating the Context

3.1 Articulating the goal


Before proceeding with a pervasive discussion that will answer the
driving questions underpinning this book, we should first review several of the
key tenets associated with the statistical idea of six sigma . To enrich this
perspective, let us briefly comment on what six sigma “is” and what it “is
not.” This is important because many newcomers to the world of six sigma do
not fully appreciate the fact that the idea of six sigma originates at the
opportunity level of a deliverable. In this context, the word “deliverable”

23
There is usually a performance expectation for each and every critical feature in a system. Of course,
such specifications and requirements are derived from higher-order negotiations between the customer
and provider about what constitutes “value” in the business relationship. When such value is achieved or
exceeded, even for a single CTQ, we can say that entitlement has been realized. In this sense, value
entitlements are rightful expectations related to the various aspects of product utility, access and worth.
For example, there are three primary physical aspects (expectations) of utility – form, fit, and function. In
terms of access, we have three basic needs – volume, timing and location. With respect to worth, there
exist three fundamental value states – economic, intellectual and emotional.

© Mikel J. Harry, Ph.D. 2002


23
should be interpreted as a product, service, transaction, event or activity. It is
generally described as “that which the customer seeks to purchase.”
For example, when we refer to a certain deliverable as being “six
sigma,” we do not mean that each unit will contain only 3.4 defects.
Furthermore, we do not mean that only 3.4 units-per-million production units
will contain a defect, as this would imply that (on average) only 1 out of about
every 294,118 units will exhibit a quality infringement of some form or type.
What we do mean is quite simple. For any type of deliverable, each defining
critical-to-quality characteristic (CTQ) will exhibit a 6σ level of
instantaneous reproducibility, or “capability” as some would say.24 However,
this model level of capability is degraded to 3.4 defects-per-million
opportunities (dpmo), owing to certain process variations.25

3.2 Polishing the definition


Perhaps, through an example, we should set forth and interrogate a more
technical definition of six sigma. By doing so, we will be able to gain deeper
insight into its original intent and meaning. To this end, let us consider a
random performance variable (Y) in the context of a symmetrical-bilateral
specification (two sided tolerance with a centered nominal specification).26

24
Holistically speaking, any design feature (or requirement) constitutes a quality characteristic.
Interestingly, such characteristics are also known as “potential defect opportunities.” If a defect
opportunity is vital or otherwise critical to the realization of quality, it is most typically called a critical-to-
quality characteristic and designated as a “CTQ.”
25
Based on this, it is only natural that the defects-per-unit (dpu) will increase as the number of CTQs are
increased, given a constant and uniform level of process capability. As a result of this, the DPU metric is
not a good comparative index for purposes of benchmarking. In other words, DPU should not be used to
compare the inherent quality capability of one deliverable to some other type of deliverable, owing to
differences in complexity. However, by normalizing the DPU to the opportunity level, and then converting
the defect rate to a sigma value (equivalent Z), it is possible to compare apples-to-oranges, if you will.
Only then do we have a level playing field for purposes of benchmarking and for subsequently comparing
dissimilar phenomena.
26
We naturally recognize that a symmetrical-bilateral specification is arguably the most common type of
performance requirement. As a consequence, this particular type of design expectation was selected to
conventionally idealize a statistically-based definition of six sigma capability. Nevertheless, we must also
acknowledge the existence of asymmetrical-bilateral specifications, as well as unilateral specifications
(one-sided). While the unilateral case can be defined by either side of a symmetrical-bilateral
specification (with or without a nominal specification), the short-term error rate is consequently reduced
to one defect-per-billion-opportunities, or DPBO = 1.0. However, the asymmetrical bilateral case
presents some interesting challenges when attempting to define a six sigma level of capability. For
example, consider an asymmetrical-bilateral performance specification while recognizing that a normal
distribution is symmetrical – indeed, an interesting set of circumstances. Given this framework, a six

© Mikel J. Harry, Ph.D. 2002


24
From a design engineering perspective, a six sigma level of capability can be
theoretically prescribed by the a priori assignment of 50 percent design
margins.
Of course, such “guard banding” of the specification limits is imposed to
account for or otherwise counterbalance the influence of uncertainties that
induce process repeatability errors. Naturally, such uncertainties are reflected
in the form of variation during the course of production. In light of such
variation, we establish a bilateral design margin of M = .50 so as to provide a
measure of resilience. Given this, the magnitude of necessary guard banding
can be theoretically and equivalently realized by hypothesizing a six sigma
model of reproducibility during the course of design.
Naturally, such a postulated performance distribution would be normal
in its form and comprised of an infinite degrees-of-freedom (df). In addition,
the three-sigma limits of such a distribution are conveniently used as the
pragmatic boundaries that prescribe unity. Given these factors, we are able to
establish an operating margin with respect to the performance specification
that is theoretically equivalent to 50 percent.
By the conventions of quality engineering, we naturally understand that
the instantaneous reproducibility of a design feature can be described by
several different but related indices of short-term capability. For example, it
is widely known that the short-term (instantaneous) capability of a process can
be generally described by the relation ZST = (T – SL)/ σ ST , where ZST is the
short-term standard normal deviate, T is the specified target value, SL is a
specification limit (upper or lower), and σST is the short-term standard
deviation.27 Of course, this particular performance metric assumes µ = T

sigma level of capability must be conditionally associated with the most restrictive side of the
specification. In other words, the capability must be made relational to the smallest semi-tolerance zone.
But if for some pragmatic reason it is more beneficial to locate the process center off target (in the form
of a static mean offset), the short-term definition of six sigma becomes highly relative. For such
instances, sound statistical reasoning must prevail so as to retain a definition that is rational, yet
theoretically sound.
27
It must be recognized that the short-term standard deviation (root-mean-square) is a statistical measure
of random error that, when properly estimated, provides an index of instantaneous reproducibility. In this
regard, it only reports on the relative extent to which random background variation (extraneous noise)
influences the “typical mean deviation” that can be expected at any given moment in time. In this sense,

© Mikel J. Harry, Ph.D. 2002


25
(centered process). Given that µ = T, and the fact σ ST constitutes a measure of
instantaneous reproducibility, it should be evident that ZST represents the
inherent performance capability of the corresponding process. Thus, ZST must
be viewed as a “best case” index of reproducibility.
Another common and closely related index of capability is given by the
relation Cp = |T – SL| / 3σ ST. For purposes of comparison, it can be
algebraically demonstrated that Cp = ZST / 3, where 3 is a statistical constant
that defines the corresponding limit of unity.28 In this context, we naturally
understand that the process capability ratio Cp is merely one-third of the
quantity ZST. Thus, a six sigma level of short-term capability (instantaneous
repeatability) is given as ZST = 6.0, or Cp = 2.0 if preferred. Consequently, it
can be said that a six sigma level of instantaneous reproducibility is
distinguished by a ±6σST random normal distribution that is centered between
the limits of a symmetrical, bilateral performance specification, thus realizing
the design expectation M = .50. In this context, the ±6σST limits exactly
coincide with their corresponding design limits – the upper specification limit
(USL) and lower specification limit (LSL), respectively. To better visualize
the six sigma model of instantaneous reproducibility, the reader’s attention is
directed to figure 3.2.1.
The reader must recognize that the given figure only provides the right-
hand side of a symmetrical-bilateral performance specification. Since the left-
hand side is a mirror image of the right-hand side, there is little need to
discuss both. Consequently, the ensuing discussion is simplified without loss
of specificity.

it constitutes the magnitude of instantaneous error that emanates from the system of causation and is,
therefore, a measure of inherent capability, also called entitlement capability.
28
The uninformed reader should understand that unity (per se) is statistically constituted by 100 percent of
the area under the normal distribution. Given this, we naturally recognize that the “tails” of a normal
distribution bilaterally extend to infinity. However, conventional quality practice often “trims the tails” of
such a distribution and declares that unity exists between the three sigma limits. This is done in the
interests of enjoying certain analytical conveniences. Of course, this convention logically assumes the
area extending beyond the three-sigma limits is trivial and, therefore, inconsequential. Perhaps such an
assumption is reasonable when balancing statistical precision against the demands of quality reporting.

© Mikel J. Harry, Ph.D. 2002


26
Figure 3.2.1
Depiction of a Centered Short-Term Six Sigma Critical-to-Quality Characteristic
that Reflects Only Transient Sources of Random Error

Although previously stated, it should again be recognized that Y is an


independent random normal performance variable. Thus, we naturally
understand that Y ~ NID(µ,σST) such that µ = T, where T is the specified
target value (nominal specification). Based on these model circumstances, the
short-term quality goal of a six sigma characteristic is statistically translated to
reflect one random error per 500 million chances for such an error, or simply
two defects-per-billion opportunities, but only for the centered, symmetrical,
bilateral case. The unilateral case (no target specified) would reflect only one
defect per billion opportunities.

© Mikel J. Harry, Ph.D. 2002


27
3.3 Inflating the error
Over a great many cycles of production, we are inevitably confronted
with the natural occurrence of transient and temporal effects of a
circumstantial and random nature. It should stand without saying that the
mitigating effect of these errors can be quite significant, as they ultimately
induce a consequential impact on the long-term reproducibility of Y. Of
course, the pragmatic nature of this impact is manifested in the form of an
“inflated” short-term standard deviation over many cycles of the process.
Uniquely stated, the short-term (instantaneous) error model is
compensated or otherwise corrected by enlarging the “typical” root-mean-
deviation, also called the standard deviation. However, in practice, the exact
magnitude of such a compensatory measure is theoretically established by
way of the chi-square distribution (for a given df and α). Again, this is the
primary means of compensating or otherwise mitigating the short-term
performance model for a wide array of transient and temporal uncertainties (of
a random nature). As should be intuitively evident, such variations will
inevitably arise during the course of protracted process operation.
It should now be noted that such a compensatory inflation of the short-
term (instantaneous) standard deviation is employed purely for the purposes of
conducting a producibility analysis or a design optimization study.29 In the
context of a six sigma reproducibility model, the magnitude of inflation is
29
Many practitioners that are fairly new to six sigma work are often erroneously informed that the proverbial
“1.5σ shift factor” is a comprehensive empirical correction that should somehow be overlaid on active
processes for purposes of “real time” capability reporting. In other words, some unjustifiably believe that
all processes will exhibit a 1.5σ shift. Owing to this false conclusion, they consequentially assert that the
measurement of long-term performance is fully unwarranted (as it could be algebraically established).
Although the “typical” shift factor will frequently tend toward 1.5σ (over the many heterogeneous CTQs
within a relatively complex product or service), each CTQ will retain its own unique magnitude of dynamic
variance expansion (expressed in the form of an equivalent mean offset). Of course, we also recognize
that the centering condition of a CTQ can be deliberately offset – independently, or concurrently.
Naturally, such a deliberate offset in the process center is frequently employed to enjoy some type of
performance or business-related benefit. In no way can or should a “generalized” shift factor be defined
to characterize or otherwise standardize such an offset in the mean, nor should it be confused with the
idea of a compensatory static mean offset (such as discussed in this book). Although both types of
mean offset constitute “shifting” the process center, their basic nature and purpose is radically different
and should not be confused.

© Mikel J. Harry, Ph.D. 2002


28
expressed as an expansion factor and quantified in the form c = 1.33. In this
context, c is often referred to as the six sigma correction. Thus, the general
reproducibility of Y is degraded or otherwise diminished via a compensatory
inflation of the short-term standard deviation such that σLT = σ STc, where c is
the inflationary correction, σ ST is the short-term standard deviation (index of
instantaneous random error), and σ LT is the expected long-term standard
deviation (index of sustained random error).
The long-term inflationary effect of transient and temporal sources of
random error on a short-term six sigma performance distribution is presented
in figure 3.3.1. The reader should notice that the net effect of transient and
temporal error (of the random variety) results in a 1.5σ ST loss of design
margin. Here is yet another perspective of the proverbial “shift factor”
commonly employed in six sigma work.30

1.5sA
m =100 115.0 122.5 130.0
Case A = Short-Term
ppm = .001

Case B = Long-Term 25%


ppm = 3.4 Margin

50%
Margin

scale

Case A = 6.0s

Case B = 4.5s

T USL

Figure 3.3.1
Depiction of a Six Sigma Critical-to-Quality Characteristic that
Reflects Transient and Temporal Sources of Random Error

30
These assertions will be theoretically demonstrated later in this discussion. For the moment, the reader
is kindly asked to faithfully accept this premise without proof.

© Mikel J. Harry, Ph.D. 2002


29
3.4 Calibrating the shift
So as to conceptually simplify the inflationary effect of transient and
temporal errors (of a random nature), and to enjoy a more convenient form of
application, an equivalent mean off set is often applied to the model
distribution of Y. In the spirit of six sigma, such a quantity is expressed in the
form of δ = 1.5σST. Of course, the relative direction of such a linear
correction to µ can be positive or negative, but not both concurrently.
However, it is most often applied in the “worst-case” direction – when testing
or otherwise analyzing the performance of a design.
Applying this compensatory correction to the short-term distribution
(illustrated in figure 3.4.1) reveals a long-term performance expectation of
6σST - 1.5σST = 4.5σLT. Expressed differently, the resulting long-term
capability is given as an “equivalent” figure of merit and expressed in the
form ZLT = 4.5. Under this condition, the design margin is consequentially
reduced to M = .25. Of course, the remaining safety margin of 25 percent is
still large enough to absorb a fairly substantial shock to process centering,
owing to some type or form of transient or temporal perturbation of a
nonrandom nature. Of course, such a shock may or may not be manifested as
a momentary disturbance to the process center.31 Statistically translating the
4.5σ LT level of capability into defects-per-million-opportunities reveals that
dpmo = 3.4. For the reader’s convenience, the long-term “shifted” model of

31
However, such a shock effect is often manifested as a transient (short-term) disturbance to the process
center. When this happens, the probability of a defect temporarily increases. Of course, the exact
duration of this effect is generally indeterminate, owing to the random nature of the underlying system of
causation. Because of this, it should now be apparent that if a design engineer seeks to establish a long-
term safety margin of M = .25, the short-term marginal expectation must be generously greater than 25
percent. By enlargement of M, the engineer is able to provide a more realistic level of “guard banding”
that cushions a performance distribution against certain types of disturbances resulting from transient
and temporal effects that tend to upset process centering. Again, more will be said about this later in this
book.

© Mikel J. Harry, Ph.D. 2002


30
six sigma capability is depicted in figure 3.4.1. The reader must recognize the
probabilistic equivalency between figures 3.3.1 and 3.4.1.

Figure 3.4.1
Depiction of a Long-term Six Sigma Critical-to-Quality Characteristic
Presented as an Equivalent Short-term Shifted Distribution

Thus, whenever we refer to a system, product, process, service, event or


activity as being “six sigma,” what we are really saying is that any given CTQ
related to that deliverable will maintain a short-term capability (instantaneous
reproducibility) of ±6σST and will exhibit no more than 3.4 dpmo over the
long-term (after many cycles or iterations of the corresponding process).

© Mikel J. Harry, Ph.D. 2002


31
3.5 Rationalizing the Shift
In order to better grasp the original constructs underpinning the six
sigma shift factor, we must jointly consider the engineering rational and
statistical context by considering a hypothetical performance characteristic.
Doing so will provide us with a conceptual platform from which to view the
justification for employing an equivalent 1.5σST static off-set to the mean of a
critical-to-quality performance characteristic. For purposes of our discussion,
we will simply refer to a critical performance variable as a CTQ . As related
to the process capability of our CTQ, it will be accepted that the population
standard deviation σ is short-term in nature, known, rational, and statistically
stable over time.32 We will also assert that the center of this normal
distribution is positioned such that µ = T, where µ is the process mean and T
is the target value of the design (nominal specification).
Let us now say the referenced CTQ will be independently replicated K
number of times during the course of executing a standard cycle of
production, where K is a relatively small quantity – often called a “production
batch.” It will also be known that, for any given batch, only N = 4 of the K
replicates would be arbitrarily selected for performance verification.33
Following this, the sample average (Xbar) would be dutifully computed for
the N = 4 performance measurements. Given many independent occurrences
of Xbar, it would then be possible to form a distribution of sampling averages.
With such a distribution, certain decisions about process centering could be

32
As this discussion point would naturally infer, the population standard deviation is fully known a priori
and genuinely reflects all known and unknown sources of random error (white noise).
33
For purposes of this discussion, it will be known to the reader (by definition) that the given sample
consisting of N = 4 members prescribes a “rational” sub-grouping of the measurements. In recognition
of conventional quality practice, this assertion stipulates that the observed within-group errors are fully
independent, random and normally distributed. Furthermore, sampling plans that involve the formation
of rational subgroups often rely on a subgroup size within the general range 4 < N < 6, where the typical
subgroup size is often defined as N = 5. Since subgroup size is positively correlated to statistical
precision, it is proposed that the case of N = 4 can be pragmatically and operationally viewed as a
“worst-case” sampling construct, especially when declaring the expected theoretical error associated
with process centering. In other words, a design engineer is often not privy to the sampling plan that
manufacturing intends to implement (for purposes of statistical process control). As a consequence, the
design engineer should be somewhat pessimistic when attempting to analyze the influence of natural
process centering errors on design performance. Hence, the reliance on “worst-case” sampling
assumptions when analyzing the producibility of a design (prior to its release for production).

© Mikel J. Harry, Ph.D. 2002


32
rationally made and scientifically interrogated during the course of design, as
well as production.34,35
To better illustrate the import of our latter discussion, we should closely
examine the simple case of N = 4 and α = .0027, where Xbar = Τ. With these
conditions in mind, it is more than reasonable to ask the central question: “For
any given cycle of production, what is the expected bandwidth around T
within which Xbar should fall – given only the expectation of random
sampling error?” In different form, the same question could be presented as:
“For any given cycle of production, how much could a sampled process center
be expected to momentarily shift from the nominal specification before such a
deviation can no longer be attributable to random variation?” In short, this
question seeks to uncover the maximum statistical extent to which a sample
average can be off-set from the target specification in the instance µ = T, but
only for the special case of N = 4 and 1 – α = .9973.
Obviously, an answer to this question would reveal the theoretical extent
to which a common process could momentarily “shift and drift” from its ideal
centering condition (owing to the sole influence of random sampling error),
given that the population mean µ is, in reality, centered on the nominal
specification such that µ = T. With such a rule-of-thumb, a product design
engineer could realistically emulate and better simulate the extent to which
normal process centering bias could influence the performance of a given
product design. It is from this perspective that we will explore the six sigma
shift factor and consider its implications for product design analysis and
optimization.

34
For example, the distribution of sampling averages is one of several theoretical constructs that is
essential to the proper construction and operation of an Xbar and R chart. Such statistical devices are
often employed during the course of production to ensure the proper and sufficient management of
process centering.
35
Knowledge of the distribution of sample averages would make it possible (and highly advantageous) to
account for natural process centering error during the course of design. In this manner, the natural and
expected errors in process centering (as would be normally experienced during the course of
production) could be effectively neutralized or otherwise managed at the time of design configuration.
Of course, the principles of robust design and mathematical optimization could be invoked to realize this
aim.

© Mikel J. Harry, Ph.D. 2002


33
To fully answer this historically stubborn question, we must first
consider the confidence interval surrounding the process average µ. With this
aim in mind, the experienced practitioner will recall that such a boundary
condition about µ can be given as Xbar – Zα/2σST / N1/2 < µ < Xbar + Zα/2σST /
N1/2, where µ is the population mean, N is the sample size, Xbar is the sample
average, Zα/2 is the required type I decision risk (expressed as a standard
normal deviate), and σST is the short-term population standard deviation. If
we standardize to the case NID (0,1) and let Xbar = µ, it would be most
apparent that the given confidence interval can be reduced to the form of -
Zα/2 / N1/2 < 0 < +Zα/2 / N1/2.
For the special case of N = 4 and α / 2 = .00135, we can easily obtain
the solution - 3 / 2 < 0 < +3 / 2, therein providing the standardized interval of
0 ± ZShift = 0 ± 1.5. Under such theoretical but conventional conditions, it is
reasonable to assert that only 2,700 subgroups out of every 1,000,000 would
produce a sampling average outside the interval T ± 1.5σST. In other words, it
is not likely that µ would be momentarily shifted more than 1.5σST from T,
owing to the presence of random sampling error. Hence, it can be statistically
concluded that the 1.5σ shift factor is a rational means for realistically and
meaningfully injecting the bias of process centering error into the ways and
means of a producibility analysis (repeatability study). Of course, the
statistical confidence underpinning such an assertion would be given as 100( 1
– α ) = 100( 1 – .0027 ) = 99.73 percent confidence, but only when
considering the special case of N = 4 randomly selected measurements drawn
from a normal distribution and where the population parameters µ and σST are
known a priori.36
In many organizations, these sampling conditions and process control
criteria are often at the forefront of daily operation. Given this, it would be

The reader must recognize that such a level of confidence ( 1- α = 1 - .0027 = .9973, or 99.73 percent)
36

is frequently employed in the quality sciences, especially in the application of statistical process control
(SPC) charts. Statistically speaking, this particular level of confidence is defined or otherwise
circumscribed by the ± 3.0σXbar limits commonly associated with the distribution of sampling averages.

© Mikel J. Harry, Ph.D. 2002


34
most reasonable to artificially induce a 1.5σST shift in the target value each
CTQ during the course of design analysis. By doing so, the engineer can
better study the performance repeatability of a product configuration prior to
its release for full-scale production. To this end, the methods and tools
associated with the practice of design for six sigma (DFSS) can be readily
employed so as to avoid, tolerate, or otherwise neutralize the influence of a
momentary shift in the process center.

3.6 Applying the shift


To better understand the implications of the six sigma definition and the
1.5σST shift, let us consider a simple example. Suppose that a particular
system is characterized by K = 2,500 opportunities. For the system design
discussed in this example, it will be know there is only one opportunity per
CTQ. 37 It will also be known that each opportunity is fully independent and
4.5σLT capable (in the long-term). Thus, for such a level of sustained
capability, we would expect (on average) only one nonconformance out of
every 1 / .0000034 = 294,048 opportunities.
Based on these facts, the total defects-per-unit would be given as dpu =
p(d) * K = .0000034 * 2,500 = .0085. Given this, we recognize p(d) as the
statistical probability of nonconformance (per opportunity, over many cycles
of process operation). We also understand that K is the total number of
independent opportunities contained within the system (unit). With these
facts in mind, we note that about one out of every 118 systems (units) could
be expected to contain at least one CTQ that is defective (verified to be in a
state of nonconformance).
Owing to this level of long-term quality, it is often desirable to
approximate the probability of zero defects, or “throughput yield” as it is
frequently referred to. It is possible to provide such an estimate by way of the

37
The reader must recognize that a critical-to-quality characteristic (CTQ) is, by definition, a defect
opportunity (assuming it is actively assessed and reported). To illustrate, let us consider a product
called “Z.” As expected, Z would most likely consist of Y number of CTQs, where any given CTQ could
have X number of occurrences. Therefore, the total number of defect opportunities per unit of product
would be computed as O = Σ(Y * X).

© Mikel J. Harry, Ph.D. 2002


35
Poisson function. Considering this function, we would recognize that Y =
(np)re-np / r!, were n is the number of trials, p is the event probability, and r is
the number of such events. By rational substitution, we would further observe
that Y = (dpu)re-dpu / r!, where dpu is the defects-per-unit. Thus, for the
special case of r = 0 (zero defects), we are able to ascertain the throughput
yield (probability of zero defects) by the simple relation YTP = e -dpu.
For the case example at hand, we compute the throughput yield to be YTP
= e -dpu = e -.0085 = .9915, or about 99 percent. This is to generally say there is
99 percent confidence that all K = 2,500 CTQ opportunities will “yield”
during the course of production – assuming that each characteristic
(opportunity) is fully normal, independent and exhibits a long-term capability
of 4.5σLT. If so, each system would then maintain a 99 percent probability of
zero defects.38
Reasoning from the flip side of our example, it can be said that the long-
term, first-time yield expectation is known to be YFT = .9999966. Since there
exists K = 2,500 independent yield opportunities per unit, the throughput yield
should be given as YTP = YFT2,500 = .9915, or about 99 percent. In turn, the
defects-per-unit expectation could be computed as dpu = -ln(YTP) = -ln(.9915)
= .0085. Here again, the Poisson distribution is used to facilitate this
approximation.
So as to provide a first-order approximation of the short-term capability,
we merely add the “standard shift correction” of 1.5σ to the long-term
capability.39 In this case, we would compute ZST = ZLT + 1.5 = 4.5 + 1.5 = 6.0.

38
Of sidebar interest, the advanced reader will understand that the Poisson distribution can be employed
to establish the throughput yield of a process (likelihood of zero defects) when the dpu is known or has
been rationally estimated. This is done by considering the special case of r = 0 (zero defects), where
r -dpu
the quantity Y = [(dpu) * e ] / r! is reduced to Y = e -dpu. In this reduced form, the quantity Y
represents the statistical probability of first-time yield. In other words, e -dpu is a quantity that reports the
statistical probability of a unit of product (or service) being realized with zero defects (based on the
historical dpu or projection thereof).
39
Generally speaking, the shift factor is added to an estimate of long-term capability in order to remove
long-term influences, therein providing an approximation of the short-term capability. Conversely, the
shift factor is subtracted from an estimate of the short-term capability in order to inject long-term
influences, thereby providing an approximation of the long-term capability. For example, if the long-
term capability of a process was known to be 4.5σ, and we seek to approximate the short-term

© Mikel J. Harry, Ph.D. 2002


36
At this point, the reader is admonished to recognize that ZST is merely a
general figure of merit (performance index) that constitutes nothing more than
a high-level approximation of the prevailing instantaneous reproducibility (per
CTQ opportunity).40
In this example, we would naturally recognize that the approximated
value of ZST = 6.0 would statistically translate to an equivalent bilateral short-
term throughput yield expectation of 99.9999998 percent (per opportunity).
Based on this level of short-term yield, we could expect about one out of
every 1 / ( 1 - .999999998) = 1 / .000000002 = 500,000,000 defect opportunities to
be validated in a state of nonconformance.41 Based on these facts, the total
defects-per-unit expectation would be given as dpu = p(d) * K = .000000002 *
2,500 = .00000495. Thus, we could expect about one out of every 201,996
units to contain at least one defective CTQ opportunity.

3.7 Framing the correction


At this point in our discussion, it should be generally understood that the
short-term standard deviation is at the core of many process capability
metrics. Without the short-term standard deviation, it would not be possible
to establish the instantaneous reproducibility of a process. To better
understand the nature of this particular performance measure, let us consider a
rational sampling strategy consisting of g subgroups, each of which is
comprised of n sequential observations – where g is sequentially obtained and
substantially large, n is relatively small and the sampling intervals are
randomly determined. In this manner, we are able to “trap” the vast majority

capability, then 1.5σ would be added to 4.5σ, therein providing the short-term estimate of 6.0σ.
Conversely, if the short-term capability was known to be 6.0σ, and we seek to approximate the long-
term capability, then 1.5σ must be subtracted from 6.0σ, therein providing the long-term estimate of
4.5σ
40
In other words, it is a “best guess” in the light of ambiguity – especially in the absence of actual short-
term performance information. As such, it must not be viewed as an empirical measure of inherent
capability or instantaneous reproducibility – as many uninformed practitioners might falsely believe. It is
simply a rational “ballpark” approximation, expectation, or projection of short-term performance – made
in the absence of empirical data or experiential information.
41
Naturally, this assumes that the process of verification is perfect. This is to say that the test or
inspection process is fully devoid of any type or form of error. In other words, the probability of decision
error is zero, regardless of its nature -- Type I ( α ) or Type II ( β ).

© Mikel J. Harry, Ph.D. 2002


37
of white noise (random variation) over time, while concurrently disallowing
the influence of nonrandom sources of variation.
As the ng observations are sequentially and progressively made
available over time, the short-term standard deviation will asymptomatically
approach its maximum limit – reflecting only random sources of variation. Of
course, this provides a rational estimate of inherent capability (instantaneous
reproducibility). However, whenever ng is relatively small and sequentially
originated (say ng = 30), it is often not possible to trap all of the non-
deterministic sources of error (owing to the fact that some of the sources could
be time-dependent). Consequently, it would be analytically desirable to
compensate the biased estimate of error (due to the constrained sampling
strategy) so as to approximate or otherwise forecast the true magnitude of
random error naturally inherent to the defined population.
Since all of the random errors cannot be made instantaneously available
for analytical consideration and evaluation, we attempt to compensate the
estimate of instantaneous reproducibility (short-term standard deviation) by
simply expanding its relative magnitude by a rational correction. As we shall
come to understand later in this book, the specific magnitude of expansion in
the short-term standard deviation (due to un-sampled time-dependent sources
of white noise) can be expressed in the form of an equivalent mean offset (on
the order of ZShift = 1.5σ). Such an offset can be given in unilateral or
bilateral form, depending upon the application circumstances. So doing
provides an analytical model of the long-term capability, but expressed as an
equivalent short-term distribution experiencing a transient shift.
Without such a correction, it would often not be possible to
meaningfully execute certain types of engineering and producibility analyses
that are dependent on a consideration or evaluation of the full range of random
variation inherent to a system of causation. Consequently, the six sigma shift
factor should be thought of as a corrective measure for calibrating the
instantaneous reproducibility of a process for unaccounted, but natural long-
term random variations. More specifically, the shift factor is a statistically

© Mikel J. Harry, Ph.D. 2002


38
based mechanism that is intended to adjust the short-term capability of a
performance variable for the influence of unknown (but yet anticipated) long-
term random variations.
In this regard, the unilateral correction can be implemented by
considering the quantity µ + 1.5σ, or µ −1.5σ depending on the worst-case
direction of effect. Of course, the bilateral correction is recognized as µ ±
1.5σ, but is seldom implemented as such – owing to the fact that the expanded
standard deviation is of more convenient form. Thus, the net effect of
unknown long-term sources of random error can be rationally postulated and
generally accounted for.
The ability to postulate and subsequently induce such a statistical bias is
often essential for the effective study of certain types of physical phenomenon
and engineering conditions. For example, it might make more “engineering
sense” to simulate an electrical circuit or conduct a mechanical tolerance
analysis under the condition where the nominal (target) specification of each
component is temporarily repositioned to a location other than the nominal
specification. This is often done to facilitate or otherwise enhance the larger
engineering analysis. But without a corrective guideline (such as the 1.5σ
shift factor), the designer is often uncertain as to the extent of adjustment that
should be applied.
In most engineering applications, the engineer naturally recognizes the
design will be subjected to some “process perturbations and anomalies” over
time, but the relative extent to which such random errors should be considered
is often unclear or ambiguous. In other words, the practicing engineer often
does not know how to declare or define a corrective device to sufficiently
compensate for the overestimation or underestimation of process capability.
Because of this, the designer simply reverts to classical worst-case analysis so
as to be “absolutely sure.” Of course, the practice of worst-case analysis
inevitably leads to overly conservative design specifications. To this end, the
six sigma concept of an equivalent static mean shift and that of the expanded
short-term standard deviation provides the engineer with the analytical

© Mikel J. Harry, Ph.D. 2002


39
capability and flexibility to combine the benefits of statistical reasoning with
the merits of worst-case analysis.

3.8 Establishing the center


Related to the previous discussion is the issue of establishing an
operating center for a process. To illustrate some of the more prevalent
nuances associated with establishing a pre-determined process center (target
specification), let us evaluate a particular CTQ. For the sake of discussion, it
will be known that the related process is a “system of material removal,” such
as a milling operation. In this particular instance, we will say the CTQ is an
outside dimension related to a particular steel block. With respect to this
CTQ, we will also declare that the design specification sets forth a certain
centering condition called the “target value,” simply abbreviated as “T.”
However, we will further assert that after some number of process cycles, it is
economically demonstrated that another process set point would be more
beneficial. For the sake of our discussion, we shall refer to such a point as ζ.
For the most part, the manufacturing implications of ζ are fairly self-
apparent. If the process is initially centered on T such that µ = T, we surely
understand that µ will eventually “drift” toward the upper specification limit
(USL) over time – owing to the unavoidable effects of natural tool wear.42
Interestingly, at some point called “P” the tool must be replaced, regardless of
how much “useful life” might remain in the tool. However, if the initial
process centering is established at ζ such that ζ < T, the average tool life can
be significantly extended, thereby resulting in a legitimate cost savings
without sacrificing capability or quality.
Hence, the idea of “optimality” now revolves around ζ and not T, even
though the design clearly specifies T as the required nominal condition. In the
scenario at hand, the process owner is more concerned about the disparity

42
In this particular case example, the “centering drift” is biased toward the USL, owing to the fact that an
outside dimension (OD) is being considered. If an inside dimension (ID) is being considered, the drift will
be biased toward the LSL.

© Mikel J. Harry, Ph.D. 2002


40
between the natural process center (µ) and ζ, versus any discrepancy that
might exist between the µ and T. From this vantage point, we recognize that
such a bias provides the process owner with a significant benefit.
To better understand the practical meaning of this discussion, let us form
another application example. Suppose we are considering the thickness of
nickel-plating on a particular engine part. In this case, we would say that
plating thickness is the CTQ of concern. For the sake of discussion, we will
also say that the design engineer established a symmetrical-bilateral plating
specification (in terms of metal thickness).
Under this condition, the process owner would seek to set ζ < T because
an under-plated part can be resubmitted to the process for additional plating,
but over-plated parts must be scraped. Such scrap is generated because it is
not economical or practical to remove progressive layers of plating. Hence, to
set ζ < T makes more “manufacturing sense.” Given this, it is easy to see why
engineering and manufacturing often butt heads. Simply stated, the fate of
ζ is subject to negotiation because everyone has their own idea of what is
“optimum.” Again, the idea of “success” is relative.
This point is most dramatically reinforced when we consider a case that
involves the use of an asymmetrical bilateral tolerance. For purposes of
illustration, let us say that the design target T is intentionally located off-
center relative to the specification limits. In other words, T is asymmetrical
with respect to USL and LSL when design engineers employ this type of
performance specification to realize some form of technical benefit.
However, the process owner may recognize some other type of benefit
(usually of an operational nature) that supports a process center other than T,
say ζ. More specifically we will say that ζ is symmetrical with respect to
USL and LSL, but T is not.
Given that ζ is symmetrically located with respect to the specification
limits, and the process distribution is also symmetrical in terms of its shape
(i.e., a normal distribution), we naturally recognize that defects are inherently
minimized, especially when contrasted to the case µ = T, where T is

© Mikel J. Harry, Ph.D. 2002


41
asymmetrical. As well, the process owner runs the risk of getting “dinged” by
engineering for not centering the process on the nominal specification T;
however, she stands to be praised by management for minimizing defects, cost
and cycle-time by the action of centering µ on ζ. Again, it is easy to see why
the idea of “success” is often relative.
The reader must fully understand that an equivalent shift factor can be
circumstantially computed for the quantities T – ζ and ζ − µ, but not in the
context of six sigma. The reason for this is quite simple – the exact form or
magnitude of these disparities cannot be statistically interrogated or studied a
priori in a generic way, as each case is unique. Consequentially, any attempt
to establish a global correction or standardized constant based on such
disparities would be highly inappropriate and constitute a spurious practice,
owing to the “context sensitivity” of ζ.
While such examples certainly highlight the need for judicious design
practices, we will constrain our ensuing discussion about process centering to
the most common instance. In short, we will constrain our focus to the
symmetrical-bilateral case where T = ζ. The other cases have been
intentionally omitted to limit the length of this book without loss of
specificity. In other words, by only considering the instance T = ζ, we lose
some breadth of discussion, but realize significant depth and color.

4.0 Understanding the Shift

4.1 Identifying the expectations

Given the qualifying understandings presented thus far, we may now


proceed with our discussion about the practice of six sigma and the basis for
employing a 1.5σST shift. To begin, let us once again construct an application
scenario. For purposes of illustration, we shall consider the design of a certain
electrical system. To this end, the project design manager reported that the
initial system configuration was comprised of M = 1,000 performance
features. Of this number, V = 300 “value-centric” features were subsequently

© Mikel J. Harry, Ph.D. 2002


42
determined to possess leverage with regards to utility, access or worth (the
basic ingredients of quality).43 Of the V = 300 value-centric features, Q = 160
were deemed essential to the realization of utilitarian value (form, fit and
function). Thus, it was established that 160 of the 1,000 design features (16
percent) were critical-to-quality. Consequently, these features were
designated as CTQs 44
With these circumstances in mind, the project manager declared the
short-term system-level confidence to be .99, or 99 percent. Of course, this
constitutes the collective confidence for all Q = 160 CTQs. Based on this, the
project manager was fairly confident about the system reproducibility,
especially since he had established the expectation based on statistical worst-
case assumptions. 45
Based on the system requirements, the instantaneous reproducibility for
each CTQ (also known as the short-term capability) was established by one of
the resident engineers. He was able to make such an estimate by statistically
normalizing the short-term system-level confidence to the opportunity level.
For the scenario at hand, the normalization was presented as CQ = Y1/Q =
.991/160 = .99994, or 99.994 percent. This is to say there would exist at least

43
Utility has to do with the form, fit and function of a deliverable. Access has to do with the various timing,
volume and location aspects associated with the delivery of a product or service. Worth covers the
emotional value, intellectual value and economic value of any given deliverable.
44
We naturally recognize that the configuration and composition of a system’s design is unique in every
case. In fact, the interactions within and between these two aspects of a design can spawn a very
complex system of classification in terms of scope and depth. Owing to this, we often see the Pareto
principle at play when considering a certain aspect of the design. Translated, this principle holds that a
certain 15 percent of a system’s complexity will fully account for 85 percent of the value associated with a
given aspect of quality (utility, access, worth). However, when the various aspects of quality are
considered as a collective whole, the Pareto principle is often severely mitigated or otherwise distorted.
In general, however, the Pareto Principle (85/15 rule) will emerge and become self-apparent as a given
system of quality classification is hierarchically and progressively interrogated. The reader should be
aware that many practitioners advocate the rule of Pareto to be 80/20. Regardless of analytical
precision, the main lesson under girding the Pareto Principle is about how the vital few often has more
influence than the trivial many.
45
In the namesake of pragmatic communication, this author has made liberal use of the term “worst-case.”
For the given context, it must not be interpreted as a mathematical absolute or engineering construct, but
rather as a statistical boundary condition (much like the natural limits of a confidence interval). For
example, one of the confidence bounds related to some mean (or standard deviation) can be thought of
as the “statistical worst-case condition” of that parameter. In this context, the term is quite relative to
such things as alpha risk and degrees-of-freedom, not to mention various distributional considerations.
Nonetheless, its use carries high appeal for those not intimately involved with the inner workings of
statistical methods. More will be said on this topic later on in the discussion.

© Mikel J. Harry, Ph.D. 2002


43
99.994 percent certainty that any given utility-centric CTQs would comply
with its respective performance specification. Another interpretation would
be that 99.994 percent of the attempts to replicate or otherwise create any
given utility-centric CTQ would prove successful (with respect to the
performance specifications).
In this case, the instantaneous reproducibility (short-term confidence of
replication) statistically translated to a ±4.00σST level of bilateral capability
per CTQ (per quality opportunity). From a unilateral perspective, the
capability would be given as 3.83σST. Either way, the odds are about one out
of 15,920 that any given attempt to produce a CTQ will fail otherwise fall
short of performance expectation (unilateral or bilateral, as the case may be).
Once the system configuration was finalized and agreed upon, the
project manager then appointed a certain quality engineer to continue the
producibility analysis and optimization exercise. At this point, several of
engineer’s colleagues pondered: “Given the performance specifications of a
CTQ, how can someone go about analyzing the manufacturing viability
(reproducibility) of a design when a corresponding production process has yet
to be selected?”
For this case, the quality engineer (producibility analyst) decided to
begin her study by first isolating one of the most critical utilitarian-centric
design features. In this case, we will say she selected CTQ4. We will further
suppose that CTQ4 was assigned a nominal specification (target value) such
that T = 100 units. It will also be known that the tolerance bandwidth was
specified as B = ±30 units. Thus, the range of performance expectation for
CTQ4 was given as LSL < Y < USL, or simply 70 < Y < 130. At the onset of
the study, the analyst discovered that the original product designer imposed a
conventional safety margin of M = .25, or 25 percent. The analyst also
learned that this particular level of guard-banding was selected and
subsequently specified because of several reliability considerations
surrounding CTQ4. For the reader’s convenience, figure 4.1.1 provides a

© Mikel J. Harry, Ph.D. 2002


44
graphical understanding of the design margins used in the case scenario under
consideration.

Figure 4.1.1
Visualization of the Design Margins Imposed on CTQ4

4.2 Conducting the analysis

At this point, the analyst decided it would be necessary to set forth the
short-term standard deviation that would be associated with CTQ4. Using the
short-term system-level producibility analysis as a backdrop, she computed
the quantity σA = (SL – T) / ZST = (130 –100) / 4.0 = 7.50. Of course, this
particular standard deviation represents or otherwise constitutes the
instantaneous capability of CTQ4.46 For purposes of our discussion, we will

46
Instantaneous capability only reports on the short-term reproducibility of a characteristic. In other words,
it only considers the influence of random background variations (white noise, or pure error as some
would say). In this context, the instantaneous (short-term) capability offers a moment-in-time “snapshot”
of the expected performance error. An extension of this idea provides the understanding of “longitudinal
capability.” The longitudinal capability (also called temporal capability) not only considers the influence

© Mikel J. Harry, Ph.D. 2002


45
simply refer to this idealized standard deviation as the “short-term variation
model,” or SVM.47 In this case example, the specified SVM represented the
analyst’s assertion (preliminary expectation) about the instantaneous process
capability that would be needed to realize the short-term system-level
producibility goal as well as the anticipated value entitlements.48
Remember that the original design engineer established a uniform safety
margin (guard band) of 25 percent at both ends of the tolerance. With this in
mind, the analyst was able to employ a second approach for establishing a
short-term standard deviation. Thus, she computed the SVM and provided the
result as:

(1 − M )(USL − T ) = (1 − .25)(130 − 100) = 7.50 .


σˆ A =
3 3

Eq.( 4.2.1 )

of black noise (nonrandom variations), but includes the influence of white noise as well. In the real world,
short-term capability is always greater than long-term capability (in terms of Z) for a wide range of
pragmatic reasons (e.g., the influence of tool wear, machine set-up and the like). Only when there is an
absence of black noise will the two forms of capability be equal. Under this condition, the characteristic
of concern is (by definition) said to be in a perfect state of “statistical control.” In other words, variation in
the characteristic’s performance is free of assignable and special causes and, as a consequence, is
subject to only random sources of error.
47
The short-term variation model (SVM) is offered as an analytical contrast to the long-term variation model
(LVM). By definition, the SVM only reflects the influence of random variation (extraneous error of a
transient nature), also called “white noise.” The LVM not only reflects random sources of error, but
nonrandom sources as well. In this sense, the LVM echoes “gray noise” because it reflects the mixture
of random and nonrandom sources of error. The differential between the SVM and the LVM portrays the
pure effect of nonrandom variation, or “black noise” as it is often called. In general, it can be said that the
influence of random error determines the bandwidth of a performance distribution, whereas the signal
(central tendency) of that distribution is governed by nonrandom error. Thus, we say that T = W + B,
where T is the total noise, W is the white noise and B is the black noise. Owing to this relationship, it
should be apparent that the total noise can be decomposed into its component parts for independent
analysis and optimization.
48
There are a number of different types and forms of variation design models (VDMs), such as that for a
hypothetical mean and variance. In most cases, the VDM is a theoretical construct (or set of constructs)
that is postulated so as to engage or otherwise facilitate some type of design-related analysis, simulation
or optimization. Interestingly, in more progressive design organizations, the VDMs are provided in a
database that consists of actual process capabilities and various types of parametric data. Such
databases provide a distinct advantage when attempting to “mate” a CTQ specification to a production
process. The pairing of a design specification with the performance capability of a candidate process is
a key topic in the field of design for six sigma, or DFSS as it is most often called. Of course, the primary
aim of such “pairing” is to optimize all value entitlements, not just those that are product performance or
quality related.

© Mikel J. Harry, Ph.D. 2002


46
Of interest, the reader should recognize that the denominator quantity of
3 was given as the analytical equivalent of unity. By providing such a
quantity, the analyst was able to rationally postulate a distribution based on
the specification limits. Thus, the SVM for CTQ4 was declared to be σA =
7.50, thereby satisfying the expectation M = .25, where M was made relative
to the upper specification limit (USL). Of course, the same would hold true
for the left side of the specification owing to its symmetrical-bilateral
character. We also recognize that the value σA = SVM = 7.50 is theoretical by
nature. As a consequence of construction, it was implicitly prescribed with
infinite degrees of freedom (df).49
As previously stated, this magnitude of planned variation (instantaneous
error) provided the analyst with a short-term process capability expectation of
±ZσA = ±4σA = ±30 units. Based on this, the ±4σA range of CTQ4 was
recognized to be 70 < Yi < 130, where Yi is the ith replication of CTQ4. A
visualization of this condition is fully illustrated in figure 4.2.1 and referenced
in the form of case “A.”

49
As a theoretical construct, the notion of degrees of freedom is fully independent of time, but not so in
practice. For example, it would take an infinite period of time to produce an infinite number of units.
However, when considering the many approaches to the conduct of a producibility analysis, it should be
recognized that any given VDM containing an infinite degrees of freedom can be declared as a short-
term or long-term model, depending on application circumstances. For the case scenario at hand, we
can say that the designer postulated the referenced VDM as a short-term construct, owing to the
application context. In other words, the designer treated the given VDM as an “instantaneous model,”
versus a “temporal model.” As a result, the analytical focus is on “error expansion” as compared with
“error contraction.” More will be said later in this discussion about these two unique but interrelated
concepts.

© Mikel J. Harry, Ph.D. 2002


47
Figure 4.2.1

Theoretical Short-Term Performance Capability of CTQ4

During a brief discussion with the manufacturing manager, the analyst


discovered that, by convention, a design-process qualification procedure
(DPQ) would be executed for all of the utilitarian-centric CTQs. She also
learned that the qualification procedure would be conducted on the basis of n
= 30 random samples.50 Such a sample size is often recognized as “ideal” in

50
During the execution of a design-process qualification (DPQ) it is often not possible to obtain the
measurements by way of a random sampling plan. For example, a newly developed process might be
brought “on-line” just long enough to get it qualified for full-scale production. Given this, it is likely that
only a few units of product will be produced (owing to the preparation and execution costs associated
with a short production run). As yet another example, the candidate process might currently exist (and
have a performance history), but has been selected to produce a newly developed product (with no
production history). In either case, there is no “steady stream” of measurements from which to “randomly
sample”. When such constraints are at hand, such as presented in our application scenario, the
performance measurements are usually taken in a sequential fashion. Owing to this, one must often
assume that the resulting measurements are independent and random (for purposes of statistical
analysis). The validity of this assumption can be somewhat substantiated by autocorrelation (testing the
measurements at various lag conditions). If the resulting correlation coefficients are statistically
insignificant (for the first several lag conditions) it is reasonable to assume that the measurements are
random-even though they were sequentially obtained. Given the general absence of correlation, it would
then be rational to assert their random nature and independence. In addition, we are also often forced to
assume that the measurements are normally distributed. Employing a simple normal probability plot can
test this assumption (to a reasonable extent). In essence, we are often forced to employ a sequential

© Mikel J. Harry, Ph.D. 2002


48
terms of the tradeoffs between statistical precision and sampling costs. The
point of diminishing returns on sample size can be visually understood by
referencing figure 4.2.2. This illustration clearly shows why many
statisticians and quality practitioners attempt to enforce this rule of thumb. 51
Following this sampling, the measurements would be recorded and
subsequently analyzed at some point prior to the design’s release for full-scale
production. Unsurprisingly, the DPQ would be invoked so as to verify that
the selected process could (in reality) fully satisfy the SVM expectation.52

sampling strategy and then subsequently utilize a family of statistical tools that assumes the data is
normal, independent, and random. Fortunately, many statistical procedures (such as those often used
during a DPQ) are relatively robust to moderate violations of the aforementioned assumptions.
51
Statistically speaking, we recognize that the given sample size (n = 30) constitutes a point of “diminishing
return” with respect to “precision of estimate.” To better understand this point, let us consider the
standard error of the mean. This particular statistic is defined by the quantity σ/sqrt(n). Now suppose we
were to plot this quantity for various cases of n under the condition σ = 1.0. Such a plot would reveal
several break points, or “points of diminishing return.” The first point occurs at about n = 5, the second
point at around n = 30, and the third point in the proximity of n = 100. Thus, as n is incrementally
increased, the quantity σ/sqrt(n) decreases disproportionately. This is one of the reasons statisticians
often say that n = 30 is the ideal sample size – it represents a rational tradeoff between statistical
precision and sampling costs.
52
The reader should recognize that many manufacturing organizations “buy off” on a process (during the
design phase) on the basis of only a few samples. In fact, some execute a practice called “first article
inspection.” From a statistical point of view, this is a very spurious practice, since it is virtually impossible
to construct meaningful (useful) confidence intervals with only a few degrees of freedom. Without proof,
there are valid reasons for supporting the case of n = 30. For the purpose of process qualification, it may
be necessary to form g rational subgroups consisting of n observations to realize ng = 30 samples.
Rational subgrouping is often employed to block sources of black noise. In essence, such a practice
enables the benefits of a larger df, but minimizes the likelihood of “black noise contamination” in the final
estimate of instantaneous capability.

© Mikel J. Harry, Ph.D. 2002


49
Figure 4.2.2
The Point of Diminishing Return on Sample Size

4.3 Considering the implications

Since the given DPQ called for n = 30 samples, the analyst recognized
that she would only have df = n – 1 = 30 – 1 = 29 degrees of freedom with
which to statistically verify the SVM during the course of process evaluation.
Given this, she reasoned to herself that such a sampling constraint might
produce a biased estimate of the “true” short-term process standard deviation.
Owing to this phenomenon, there would exist some statistical likelihood of
rejecting a candidate process that might have otherwise been fully qualified.
As we shall come to understand, the implications of this are quite profound.
For example, if the true short-term process standard deviation of a
particular “candidate process” is in reality 7.50, it is quite likely that a limited
sampling of n = 30 will reveal a biased short-term standard deviation. This is
to say that any given estimate of instantaneous reproducibility could provide a
short-term standard deviation greater than 7.50, owing to a pragmatic
constraint on the degrees of freedom made available for the process

© Mikel J. Harry, Ph.D. 2002


50
evaluation. Of course, any given random sampling of n = 30 could just as
well provide an estimate less than 7.50.
If the resulting estimate proves to be greater than 7.50, management
would falsely reject a process that, in reality, would have otherwise been fully
qualified. In statistical lingo, the probability of such a decision error is called
“alpha risk” and is understandably designated as α. Of interest, it also called
“producer’s risk.” From this perspective, we can treat the alpha state as if it
were a worst-case condition, but of a statistical nature. In this sense, the
worst-case condition is statistically defined by the given degrees of freedom
and the selected level of decision risk.
On the other hand, if the random sample of n = 30 revealed a short-term
standard deviation less than 7.50, management would falsely believe they
adopted a supremely qualified process when, in reality, it would prove to be
only marginal (but yet acceptable). Obviously, the alpha state is of more
concern to the project manager (and analyst) since this particular type of
decision risk is far more likely to produce negative consequences (with
respect to the realization of value entitlement). Owing to this, the alpha state
is often referred to as the “statistical worst-case condition,” or SWC in
abbreviated form.53

4.4 Constructing the worst case


Because of such reasoning, the analyst decided to compute an upper
confidence limit for the SVM so as to account for random sampling error
(under the constraint df = 29 and α = .005).54 In other words, the analyst

53
The idea of an alpha state can be applied to any type of sampling distribution (empirical or theoretical) or,
more specifically, to any or all of the parameters associated with a sampling distribution (such as
measures of central tendency and variability, or mean and standard deviation, respectively). Owing to
this, a “statistical worst-case distribution” is also called the “alpha sampling distribution.” As such, it
constitutes a producibility risk condition that prescribes the “statistical state of affairs” in the presence of
random sampling error.
54
By convention, alpha risk is often established at the .05 level. Of course, this translates to 95 percent
decision confidence. Since the statistical analysis of a design almost always involves multiple decisions,
a higher level of decision confidence is often required to compensate for the degradation of confidence
when considering the cross-multiplication of decision probabilities. Therefore, we impose a 99.5 percent
level of confidence (as a convention) for purposes of producibility analysis. This substantially improves
the aggregate confidence when considering multiple decisions. For example, a .95 level of decision

© Mikel J. Harry, Ph.D. 2002


51
wanted to compute the upper confidence bound of the SVM, but with 100(1 -
α) = 100(1 - .005) = 99.5 percent certainty, given the limitation of df = 29.
Knowing this, she set about estimating the upper confidence interval, also
known as the UCL. This was accomplished with assistance of the chi-square
distribution. Following the computations, she presented her result as:

n −1 30 − 1
σˆ B = σˆ A = 7. 5 = 11.15
χ1− α
2
13.12

Eq.( 4.4.1 )

Thus, she was able to compute (estimate) the worst-case condition of the
short-term standard deviation model (SVM). The analyst then concluded that
if the DPQ team isolated a process that exhibited a short-term standard
deviation of 7.50 (on the basis of n = 30 random samples), it would be
possible that the “true” standard deviation could be as large as 11.15 (worst-
case condition). Obviously, if such a magnitude of variation eventually
proved to be true (because of random error in the qualification sample), there
would be a practical (as well as statistical) discontinuity between the analyst’s
reproducibility expectation and reality.55

confidence applied to 10 decisions provides an aggregate (joint) confidence of only 60 percent, whereas
a .995 level reveals the joint certainty to be about 95 percent. Of course, it is fully recognized that some
circumstances might require a more stringent alpha while others might tolerate a more relaxed criterion.
For the reader’s edification, it should be pointed out that the general combination α = .005 and df = 29 is
55

arguably the most “generally optimal” set of such decision criteria to employ when conducting a design-
process qualification (DPQ). When considering this, and other factors, the given combination offers a
standard convention from which to initiate the practice of design for six sigma (DFSS). Of course, this
particular convention should give way to other combinations as DFSS becomes entrenched in an
organization. Experience will naturally show the path to more optimal conditions of α and df, owing to the
unique circumstances associated with each application – sampling costs, destructive testing, production
volume, background knowledge, internal procedures, customer requirements and so on. Owing to the
consideration of these and many other factors, the combination α = .005 and df = 29 was employed by
this researcher to originally establish the first six sigma DPQ and subsequently validate the 1.5σ shift
factor. Again, it must be recognized that this particular set of decision criteria was judiciously selected
and practiced by this researcher for an array of theoretical and pragmatic reasons, many of which are far
beyond the scope and intent of this book. Consequently, we must recognize and always bear in mind
that the 1.5σ shift factor is a dynamic construct of a theoretical nature. As such, it is only retained in
static form when the aforementioned decision criteria are considered and practiced as a convention.

© Mikel J. Harry, Ph.D. 2002


52
Given the level of sampling risk on the upper end (α = .005), she could
nominally expect such an adverse discrepancy (or smaller) in one out of every
200 samplings. On the flip side, the analyst would have 99.5 percent certainty
that the true standard deviation would eventually prove to be less than the
worst-case condition of 11.15, given a “qualified” standard deviation of 7.50,
as estimated on the basis of n = 30 random samples.
More poignantly, there would exist a 50/50 nominal chance that the true
standard deviation would eventually prove larger than what the DPQ estimate
would suggest.56 Obviously, the odds would not be in the analyst’s favor if
such a sampling criterion was set forth (and subsequently obtained during the
qualification trial). In other words, the likelihood of being satisfied with such
a process would not be biased in the analyst’s favor. At the risk of
redundancy, this is to say that an estimate of σ = 7.50 (at the time of
qualification) could ultimately lead to the adoption of an incapable process.

4.5 Exploring the consequences

To better understand the import of the latter discussion, let us reason


from the other side of the coin. In other words, let us assume that the true
standard deviation of CTQ4 was in reality 11.15, but the sampling standard
deviation was found to be 7.50 upon execution of the DPQ. Of course, such
an estimate of the short-term standard deviation would be due to a statistically
biased sample at the time of qualification. Given this, the project manager
could unwittingly adopt a highly biased “4σ” process.
Based on such a biased DPQ result, it is reasonable to assert that the true
short-term standard deviation would eventually stabilize at its genuine value
of 11.15 (across an infinite number of observations). Of course, such an
inflationary effect would become quite evident during the course of

Based on an expected short-term standard deviation of 7.50, and given that α = .50 and df = 29, the 50th
56

percentile of the chi-square distribution reveals a theoretical short-term standard deviation of 7.68. This
small discrepancy is attributable to the fact that df = 29. However, as the degrees of freedom
approaches infinity, the consequential discrepancy would necessarily approach zero. Thus, we
recognize the 50/50 odds that the true short-term standard deviation will be greater (or less) than 7.50.

© Mikel J. Harry, Ph.D. 2002


53
cumulative sampling. In other words, the progressive discovery and
integration of other sources of random sampling error during ongoing
production would naturally tend to inflate the short-term standard deviation
until it stabilized at its true value.57 Without a reasonable doubt, the veritable
capability would ultimately prove to be

USL − T 130 − 100


ZUSL = = = 2.69 ,
σˆ B 11.15

Eq.( 4.5.1 )

owing to the unknown presence of random sampling error at the time of


qualification. Of course, such a level of reproducibility would translate to a
capability ratio of Cp = (USL - LSL)/3σ = Z/3 = 2.69/3 = .897. Based on this,
it could be said that 1/.897 = 1.115, or about 112 percent of the design
bandwidth (USL - LSL) would be consumed by the process bandwidth
(±3σST).58 Thus, we recognize that, by establishing a short-term standard

57
Following a DPQ, it is conventional practice to continually monitor the instantaneous and longitudinal
capability of CTQs. For a continuous, high-volume performance characteristic (such as CTQ4), this task
can be effectively accomplished by way of a statistical process control device called an “Xbar and S
chart.” The use of such an oversight mechanism forces the white noise (extraneous variations) to
appear in the S chart while the influence of black noise (nonrandom variations) is forced to emerge in the
Xbar chart. The general use of such analytical tools requires the implementation of a sampling technique
called “rational subgrouping.” Essentially, this method of sampling forces the random sampling errors to
be retained within groups, while the nonrandom sources of error is forced to develop between groups.
By virtue of the merits associated with rational subgrouping, one-way analysis of variance can be
naturally employed to interrogate the root-mean-square of the error term (within-group standard
deviation). As would be intuitively apparent to the seasoned practitioner, the within-group component is
a direct measure of instantaneous reproducibility. As a natural consequence, the various components of
error can be subsequently utilized to formulate certain other indices of short-term capability (ZST, Cp, Ppk
and so on). To achieve this aim, we employ the general model SST = SSW + SSB, where SS is the sum
of squares, T is the total estimate, W is the within group estimate and B is the between group estimate.
In this form, the SSW term can be continually updated to obtain an ongoing estimate of the background
noise (random error) without the contaminating influence of nonrandom sources of error, as this type of
error is continually integrated into the SSB term. As a side benefit of this, the SSB term can be employed
to establish the “equivalent mean shift.” Of course, all of these assertions can be directly verified by
mathematical examination or by way of a simple Monte Carlo simulation. More will be said about this
later on in the discussion.
58
From this perspective, it should be evident that a process capability ratio of Cp = 2.00 defines a six sigma
level of performance. For this level of capability, only 50 percent of the design bandwidth is consumed
by the process bandwidth. Of course, the remaining 50 percent of the design bandwidth is dedicated as
“design margin.” Given this, it should be self-evident that a process capability ratio of Cp = 2.00
corresponds to a 50 percent design margin (M = .50). Here again, the criterion of “six sigma” would be
fully satisfied.

© Mikel J. Harry, Ph.D. 2002


54
deviation of 7.50 as a process qualification target, random sampling error will
dictate that the true capability could be worse than ±4.0σ, under the condition
n = 30 and α = .005. Of course, the same may be said when reasoning on the
opposite side of this discussion.
Flipping the coin back to its original side, let us return to the unique
circumstances of our case example. Remember, at this point in the analysis,
the SVM was established at 7.50. Also recall that n = 30 measurements are to
be taken during the DPQ. From this point forward, we will say that the DPQ
reveled a short-term standard deviation of 7.50. As a consequence, it would
seemingly appear that the sampled variation was “in tune” with the theoretical
design expectation.

4.6 Visualizing the distributions


For purposes of visualization, the nominal design distribution (Case A)
and its corresponding worst-case sampling distribution (Case B) are presented
and contrasted in figure 4.6.1. By careful examination of this illustration, the
reader can better visualize the differences between these two cases and reason
through the implications. Doing this will produce a better appreciation for the
potential impact of random sampling error under the constrained condition n =
30 and 1 – α = .995.

© Mikel J. Harry, Ph.D. 2002


55
Figure 4.6.1
Nominal Design Distribution for CTQ4
(Case A) Contrasted to its Worst-Case Sampling Distribution (Case B).

© Mikel J. Harry, Ph.D. 2002


56
From this figure it is quite easy to see that the “inflationary effect” of
random sampling error (case B) can be quite profound when reasoning from
the classical engineering mindset of worst-case analysis.59 This is particularly
apparent when the nominal expectation of performance capability (case A) is
contrasted to its statistical worst-case expectation (case B).
The reader will recall that in the field of quality engineering (as well as
other technical fields) it is conventional practice to constrain the idea of unity
between the ± 3.0σ limits of a distribution. Such an understanding of unity
conventionally applies when researching process capability. When
considering the limits of unity related to case B, such a level of inflation (c =
11.15/7.50 = 1.487) is probabilistically equivalent to shifting the theoretical
design distribution (case A) by µA = T ± 1.46σA, or simply 1.5σ. In this
particular case, the equivalent condition is exemplified by cases A1 and A2,
respectively. When considering the upper limit of unity for case A2 ( given as
+3σA2), notice the exact coincidence to the upper limit of unity for case B
(given as +3σB). This is also true when comparing case B to case A1, but on
the left-hand side of the distribution.

59
The reader is admonished to recognize that the general practice of worst-case analysis is, in an of itself,
generally not a good thing. Such a position is rational because the statistical probability of such events
often proves to be quite remote. For example, if the probability of failure for a single characteristic is 10
percent and there are only five such interdependent characteristics, the likelihood of worst-case would be
5
.10 , or .00001. Of course, this translates to one in 100,000. Obviously, this probabilistic circumstance
would imply an “overly conservative” design. Although the aim of worst-case design is to secure a
“guarantee” of conformance to performance standards, it usually does nothing more than suboptimize
the total value chain. However, when descriptive and inferential statistics are integrated into the general
practice of worst-case analysis, we are able to scientifically secure a “conditional guarantee” of sorts, but
without absorbing any of the principal drawbacks. In other words, the application of modern statistics to
worst-case analysis provides us a distinct opportunity to significantly enhance the ways and means by
which the producibility of a product design can be assessed and subsequently optimized. From this
perspective, it is easy to understand why this six sigma practitioner advocates the use of applied
statistics when establishing performance specifications (nominal values and tolerances) during the
course of product and process design.

© Mikel J. Harry, Ph.D. 2002


57
5.0 Examining the Shift

5.1 Establishing the equality


Although we can easily visualize the equivalent mean shift with the aid
of an illustration, such as that presented in figure 4.6.1, it is often more
convenient to formulate a mathematical understanding of the situation. To
this end, we seek to mathematically equate the statistical cases of A2 and B,
but only at their respective upper limits of unity. In order to develop an
equivalent mean shift, the analyst decided that she should begin by
establishing a fundamental equality. In other words, the analyst recognized
that a “shifted distribution” must equate to an “inflated distribution.” Given
this, she offered such equality in the form

T + 3 σˆ A + Zshift σˆ A = T + 3 σˆ B .

Eq. ( 5.1.1 )

Applying this equality to the case data, the analyst reaffirmed that T =
100, σA = 7.50, Zshift = 1.46, σB = 11.15. In addition, she used the conventional
value of Z = 3.0 as a constant to prescribe the upper limit of unity. By
substitution, the analyst computed the equality as 100 + (3 * 7.5) + (1.46 *
7.5) = 100 + (3 * 11.15) = 133.45.
Recognizing the equality of these two quantities, the analyst was able to
successfully establish that the upper limit of unity related to case A exactly
coincided with the worst-case condition given by case B. Simply aligning the
elements of Eq. (5.1.1) with the corresponding elements provided in figure
4.6.1 provides even greater insight into the stated equality. To further such
insight, she determined that the standardized equivalent mean offset (Zshift)

© Mikel J. Harry, Ph.D. 2002


58
could be isolated by simple algebraic rearrangement of Eq. (5.1.1). Doing so
provided her the solution:

3 σˆ B − 3 σˆ A (3 ∗11.15) − (3 ∗ 7.50)
Z shift = = = 1.46 .
σˆ A 7.50

Eq. ( 5.1.2 )

Thus, she was able to recognize that the quantity Zshift describes the
relative differential between the mean of case A2 and that of case B, but
scaled by the nominal standard deviation associated with case A. Because of
the nature of Zshift, the analyst clearly understood that it could not be
technically referenced as a “mean shift” in the purest and most classical sense.
However, she did come to understand that it could be declared as an
“equivalent mean shift,” but only in a theoretical sense. From another
perspective, she recognized that the quantity Zshift provided her with yet
another benchmark from which to gain deeper insight into the “statistical
worst-case” condition of the design, but only with respect to the temporal
reproducibility of CTQ4 set in the context of a DPQ.
Owing to this line of reasoning, the analyst concluded that the quantity
Zshift is simply a compensatory static (stationary) off-set in the mean of a
theoretical performance distribution (TPD) reflecting the potential influence
of dynamic sampling error (of a random nature) that would otherwise inflate
the postulated short-term standard deviation of the TPD (at the time of
performance validation). In light of this understanding, the analyst noted to
herself that Zshift cannot be statistically described as a “true” standard normal
deviate, simply because its existence is fully dependent upon the chi-square
distribution (owing to the theoretical composition of σB).
Stated in more pragmatic terms, the analyst recognized that Zshift does
not infer a “naturally occurring” shift in a distribution mean (in an absolute or
classical sense). Rather, it is an “equivalent and compensatory shift”
employed to statistically emulate or otherwise account for the long-term

© Mikel J. Harry, Ph.D. 2002


59
sampling uncertainties that could potentially cause an initial estimate of short-
term process capability to be unfavorably biased in its worst-case direction.

5.2 Developing the correction


As an addendum to the analyst’s reasoning, she made brief recollection
of the dynamic correction called “c.” As presented earlier on in the case, the
analyst reasoned that c is a compensatory measure used to adjust or otherwise
correct for the influence of dynamic random sampling error over a protracted
period of time or many cycles of operation. Given this, she reconciled that σB
= σAc. By simple algebraic rearrangement, the analyst was able to solve for c
and presented her results as

σˆ B 11.15
c= = = 1.488 ≅ 1.49 ,
σˆ A 7.50

Eq. ( 5.2.1 )

where the resultant 1.49 indicated that the SVM (σA = 7.50) should be
artificially inflated or “expanded” by about 149 percent to account for the
potential effect of statistical worst-case sampling error.
At this point, the analyst decided to transform σA to the standard normal
case. Thus, she was able to declare that σA = 1.0 and thereby constitute a unit-
less quantity. For this case, she then rationalized that c2 = ( n - 1) / χ2, where
χ2 is the chi-square value corresponding to selected α and df. Thus, the
analyst was able to establish the theoretical connection between c and the chi-
square distribution. Given this, she then formulated Zshift from a rather unique
perspective by providing the standardized equivalent mean shift in the form:

Zshift = 3(c − 1) = 3(1.49 − 1) = 1.46 .

Eq. ( 5.2.2 )

© Mikel J. Harry, Ph.D. 2002


60
Based on the case data, the analyst discovered that the standardized
equivalent shift should be given as Zshift = 1.46, or approximately 1.5σA. She
also noted the same result when considering the left-hand side of Case B in
relation to A1. Hence, the analyst now better understood the theoretical basis
for the proverbial “1.5σ shift factor” commonly employed in six sigma work.
To a large extent, this answered her colleague’s initial question: “Where does
the 1.5σ shift factor come from – and why 1.5 versus some other magnitude?”
However, the analyst went on to reason that Zshift will vary somewhat,
depending on the selection of α and n. Latter on in the day she was informed
by another practitioner that such a combination of α and n is fairly typical
when conducting a six sigma DPQ. The more experienced practitioner
informed her that if the aggregate decision confidence (C) was to be
established such that C = 100(1 - α) = 95 percent, and there were k =10
independent CTQs to be concurrently but independently qualified, then each
SVM should maintain a statistical confidence of C1/k = .9501/10 = .995, or
about 99.5 percent. Hence, the rationale for the six sigma convention of
setting n = 30 and 1 - α = .995 for each CTQ during the course of a DPQ.
Following these insights, the analyst (and one of her colleagues)
reflected on their experiences and realized that a great many design
configurations (perhaps the vast majority) are predicated on the assumption
that each CTQ will possess a short-term capability of 4.0σST. They fully
recognized that such a level of capability is generally inferred by virtue of the
conventional reliance on 25 percent design margins. Because such a level of
capability is often targeted when a process is first brought into service, they
believed it was reasonable to assert that the long-term, statistically-based,
worst-case capability expectation (per opportunity) could be rationally
approximated as 4.0σST − 1.5σST = 2.5σLT. Naturally, they both understood
that such a level of capability translates to a long-term first-time yield
expectation of YFT = .99379, or 99.38 percent. As a consequence, they began
to discuss the implications of this for the design of their system.

© Mikel J. Harry, Ph.D. 2002


61
5.3 Advancing the concepts
Of sidebar interest to the analyst, her colleague continued his dialogue
by reminding her that the ratio of two variances can often be fully described
by the F distribution. He then called upon the classical F distribution to define
yet another approach for establishing the equivalent static mean off-set (Zshift).
Without further consideration, he demonstrated that

c = σˆ B =
.
F
σˆ A

Eq. ( 5.3.1 )

As may be apparent, from Eq. 5.2.2 and Eq. 5.3.1 the analyst was able
to formulate the quantity Zshift = 3( sqrt(1 / F ) - 1 ). Using this particular
relationship as a backdrop, she then computed Zshift = 3( sqrt( 1/ .4525) - 1) =
1.46, or 1.50 in its rounded form. Of course, she referenced the F distribution
with the appropriate of freedom. In this case, she utilized df = ( n – 1 ) = ( 30
– 1 ) = 29 degrees of freedom in the numerator term and declared an infinite
degrees of freedom for the denominator term. In addition to this, she
referenced the F distribution with a decision confidence of C = (1 - α ) = ( 1 -
.005) = .995, or 99.5 percent. Given these criteria, the analyst discovered that
F = .4525.

5.4 Analyzing the system


At this point in our discussion, recall that the system was known to
contain Q = 160 independent critical-to-quality characteristics of a utilitarian
nature. Given this, the analyst reasoned that if each of the CTQs exhibited a
long-term, first-time yield expectation of YFT = 99.38 percent (also equivalent
to ZLT = 2.5), then the aggregate success rate at the system level could be
rationally projected as YSYS = YFTQ = .99379160 = .3691, or about 37 percent.
In other words, there would exist a 37 percent likelihood of realizing a system
without exceeding the specification limits related to any of the Q = 160 CTQs.

© Mikel J. Harry, Ph.D. 2002


62
In essence, the long-term first-time yield projection of 37 percent
represented the statistical probability of realizing a system with zero utility-
related defects. Consequentially, the analyst determined that the defects-per-
unit metric should be computed. Relying on certain theoretical properties of
the Poisson function, she knew that the long-term defects-per-unit could be
given as dpu = –ln(YSYS) = –ln(.3691) = .9966, or about 1.0. Of course, she
also recognized that this result (dpu = 1.0) assumed that all Q = 160
utilitarian-centric characteristics were postulated in their alpha state.
Without regard to the statistical worst-case condition, she again
judiciously interrogated the total system confidence. This time around, she
declared all Q = 160 critical-to-quality characteristics to be at their respective
nominal states of reproducibility (ZST = 4.0), and then estimated the system
throughput yield expectation. She presented the results of this analysis as YSYS
= YFTQ = .999968160 = .9949, or about 99.5 percent. Given such a high level
of system confidence, she immediately recognized that the corresponding
defects-per-unit would be quite favorable. Again relying on the Poisson
function, she computed the system-level defect rate as dpuSYS = –ln( YSYS ) = -
ln(.995) = .005. Given this computational outcome, she reasoned that, in the
short-term, only one out of about every 200 electrical systems would contain a
CTQ that was not in conformance to its respective specification.

6.0 Validating the Shift

6.1 Conducting the simulation


To validate the statistical assertions presented thus far in our discussion
and highlight our case example, a simple Monte Carlo simulation was easily
designed and executed by this researcher. The simulation-based study was
undertaken to empirically demonstrate that the influence of random sampling
error (in and of itself) is often much larger and more profound than most
quality professionals believe it to be, simply because the nature of its

© Mikel J. Harry, Ph.D. 2002


63
influence is felt across the full bandwidth of unity, not just in the dispersion
parameter called the “standard deviation.”
In other words, the random error assignable to a single sampling
standard deviation must be multiplied by 3 when estimating any of the
common indices of process capability (e.g., ZUSL), owing to the fact that unity
is declared at the ±3.0σ limits of a distribution. Due to such an accumulation
of error across the total bandwidth of unity, the resulting long-term
inflationary effect is statistically equivalent to bilaterally shifting the short-
term sampling distribution by approximately 1.5σST.
In this instance, several thousand cases (each consisting of n = 30
randomly selected observations) were compiled under the generating
condition µ = 100, SVM = σST = 7.50. For each sampling case (subgroup),
the standard deviation was estimated and made relative to the upper semi-
tolerance zone of the performance specification (USL - T) by expressing the
ratio in the form ZUSL. In turn, each unique estimate of short-term capability
(ZUSL) was then contrasted to the theoretical design expectation ZUSL = 4.0,
therein noting the simple difference as Zshift. The result of this author’s Monte
Carlo simulation is presented in figure 6.1.1.

© Mikel J. Harry, Ph.D. 2002


64
Figure 6.1.1
Results of the Monte Carlo Simulation

Not surprisingly, a rudimentary analysis of the Monte Carlo simulation


confirmed this author’s theoretical assertion – the most central condition for
the Zshift metric should be given as approximately 1.5σST. Again, the reader is
reminded that this particular magnitude of equivalent shift is a theoretical
construct based on the influence of random sampling error over a great many
sampling opportunities.
Owing to the equations previously set forth, and given the results of the
aforementioned simulation, this researcher and long-time practitioner strongly
asserts the following points. First, it is scientifically rational and operationally
prudent to contend that a ±4σST model of instantaneous reproducibility can be
meaningfully shifted off its target condition (in a worst-case direction) on the
order of ±1.50σST so as to analytically consider or otherwise compensate for
the influence of long-term random effects when examining or attempting to
optimize the producibility of a design. Second, such a corrective device

© Mikel J. Harry, Ph.D. 2002


65
should be methodically applied when attempting to establish a short-term
qualification standard deviation during the course of a DPQ.
It should also be recognized that the exact magnitude of compensation
(contraction) will vary somewhat depending on the selection of α and df.
Third, the compensatory mean offset provides an excellent way to assess the
short-term and long-term reproducibility of a new design prior to its release
for full-scale production, or for conscientiously trouble-shooting an existing
design. Thus, a designer can rationally and judiciously study the
instantaneous and temporal reproducibility of a design without many of the
intellectual encumbrances often associated with direct application of
mathematical statistics.
In further considering our case example, we can say that if the
theoretical short-term standard deviation (SVM = 7.50) was to be utilized as a
decision threshold for process adoption (under the constraint n = 30), the
designer would have at least 99.5 percent confidence that the adopted process
(based on the alpha sampling distribution) would ultimately prove to be less
than 11.15 – thereby offering a potential violation of the 25 percent bilateral
safety margin. Expressed another way, it could be said that, if the “buy-off”
sampling distribution revealed a short-term standard deviation of 7.50, and the
candidate process was subsequently adopted on the basis of this result, there
would be 99.5 percent confidence that the terminal capability would prove to
be greater than 2.7σ. Obviously, such a level of "worst-case" capability is not
very appealing.

6.2 Generalizing the results


As we have already stated, the expansion (inflation) factor c describes
the statistical impact of worst-case sampling error during the course of a DPQ.

© Mikel J. Harry, Ph.D. 2002


66
Based on the example design qualification set forth in our case example, we
demonstrated that c = 11.15 / 7.5 = 1.46. As our example calculations
revealed, such a magnitude of long-term expansion is equivalent to bi-
directionally “shifting” the mean of a short-term theoretical process
distribution by approximately ±1.5σST from its natural center. Given such an
equivalent shift in a short-term theoretical process distribution, the resulting
limits of unity will exactly coincide with the corresponding limits of the
inflated long-term sampling distribution, given that the six sigma design
qualification guidelines were followed.
Given these discussions, the reader should now be on familiar terms
with the idea that the 1.5σ shift factor (as associated with the practice of six
sigma) is not a process centering issue, although it can be applied in this
context in a highly restricted sense. In a theoretical sense, it is very much
intertwined with the mechanics of certain statistical process control methods;
however, it is not a leading indicator or measure of how much a process will
shift and drift over time due to assignable causes. It is simply a mathematical
reconstruction of the natural but random error associated with a theoretical
performance distribution.
The reconfigured form of such sampling error (1.5σ shift) helps the
uninformed practitioner better comprehend the consequential relationship
between a four sigma theoretical design model and its corresponding alpha
sampling distribution in the context of a DPQ. Use of the 1.5σ shift factor is
an effective means to assure that the influence of long-term random sampling
error is accounted for when assigning tolerances and specifying processes.
We should also understand that the 1.5σ shift factor has many other six sigma
applications where producibility analysis, risk management and benchmarking
are concerned.

© Mikel J. Harry, Ph.D. 2002


67
6.3 Pondering the issues
Statistically speaking, the 1.5σ shift factor represents the extent to which
a theoretical ±4σ design model (based on infinite degrees of freedom) should
be “shifted” from the target value of its performance specification in order to
study the potential influence of long-term random effects. In this sense, it
provides an equivalent first-order description of the expected “worst-case”
sampling distribution under the condition df = 29 and 1 - α = .995.
Again, the general magnitude of expected random error (across many
periods of rational sampling) is sufficient to support an equivalent 1.5σ shift
in the theoretical design model, when considering the upper confidence limit
of the SVM. As previously discussed, this statement is fully constrained to
the case n = 30 and α = .005, and is generally limited to the purpose of design
qualification and producibility analysis, as well as several other common
types of engineering assessments.
As with any compensatory measure, the given correction factor can be
inappropriately applied or otherwise improperly utilized. However, it is fairly
robust and can be confidently employed when a) there are many CTQs being
simultaneously considered, b) there is an absence of empirical data from
which to estimate the true long-term capability expectation and/or c) the need
for a high level of statistical precision is questionable or unnecessary.

7.0 Contracting the Error

7.1 Conducting the analysis


To accomplish the overall design goal related to our case example, we
may now consider working the problem “in reverse.” This is to say the
analyst would start with the initial product performance expectations related to
CTQ4 and then “reverse compute” the target short-term sampling standard
deviation that would be necessary to “qualify” a candidate process with 99.5
percent certainty, but under the limitation of n = 30. In this manner, she
would be able to account for the potential influence of sampling bias by
“deflating” or otherwise contracting the SVM (short-term variation model).

© Mikel J. Harry, Ph.D. 2002


68
In an effort to rationally establish a contracted short-term standard
deviation, the analyst once again called upon the merits of the chi-square
distribution. With this distribution, she could “reverse compute” a short-term
sampling standard deviation that could be used as a qualification criterion
during the course of a DPQ. After some careful study, the analyst formulated
an approach and presented her results as

σˆ A χ1− α = 7.5 ∗ 13.12 = 5.045 ≅ 5.0 .


2 2 2

σˆ =
n −1 30 − 1
C

Eq.( 7.1.1 )

She then informed one of her colleagues that the given value of 5.045
represents the criterion short-term sampling standard deviation that must be
acquired by the production manager upon execution of the DPQ in order for
the related process to “qualify” as a viable candidate for full-scale production.
By isolating a process with a short-term sampling standard deviation of 5.045
or less (at the time of the DPQ), the production manager would be at least
99.5 percent certain that the true short-term standard deviation would not be
greater than σA = 7.50, given df = n – 1 = 30 –1 = 29 at the time of sampling.
As should be evident, such a “target” standard deviation will virtually ensure
that the design engineer’s minimum producibility expectation will be met.
Merging the statistical mechanics of a confidence interval with the idea
of design margins and specification limits, the analyst computed the same
result by interrogating the relation
.

(1 − M )(USL − T ) (1 − .25 )(130 − 100 )


σˆ C = = = 5 . 045 ≅ 5 . 0
n −1 30 − 1
3 3
χ 1− α 13 . 12
2

Eq.( 7.1.2 )

© Mikel J. Harry, Ph.D. 2002


69
After some algebraic rearrangement of this equation, she established a
more convenient and general model in the form

χ1−α (1 − M ) (USL − T ) 13.12(1 − .25) (130 − 100) .


2 2 2 2 2

σˆ C = = = 5.045 ≅ 5.0
9(n − 1) 9(30 − 1)

Eq.( 7.1.3 )

Based on the model constraint µ = T, it was then evident the production


manager would have to isolate a prospective (candidate) process with a short-
term sampling standard deviation of 5.045 or less in order to qualify as a
viable process (in terms of reproducibility). This represented a 32.8 percent
reduction in the theoretical design expectation of SVM = 7.50.

7.2 Drawing the conclusions


Since it was resolved that the criterion short-term sampling standard
deviation of 5.045 could support a ±4σLT level of reproducibility with at least
99.5 percent certainty under the assumption of statistical worst-case sampling
conditions, she then concluded that the short-term capability would
necessarily need to be

USL − T 130 − 100


ZUSL = = = 5.947 ≅ 6.0 .
σˆ C 5.045

Eq.( 7.2.1 )

Following this calculation, the analyst suddenly realized that, if a 4σLT


(long-term) level of capability is to be statistically assured (with a rational
level of confidence) over many cycles of production, a 6σST (short-term)level

© Mikel J. Harry, Ph.D. 2002


70
of capability must be “targeted” as the prerequisite condition for process
adoption (during the course of a DPQ). Thus, she indirectly answered the
pervasive and sometimes perplexing question: “Why did Motorola adopt a
six sigma level of capability as a performance standard and not some other
level?
Based on the outcome of Eq. 7.2.1, the analyst decided to express the
given level of capability in the form of Cp. That particular estimate was
presented as

USL − T 130 − 100


Cp = = = 1.982 ≅ 2.0 .
3σˆ C 3 * 5.045

Eq.( 7.2.2 )

Since the analyst’s goal was to net Cp = 1.33 with at least 99.5 percent
certainty, she determined that a process capability of Cp = 2.0 or greater would
have to be discovered upon execution of the DPQ (based on df = 29). Of
course, this was to say that the ±3.0σST limits of such a process would
naturally consume only one-half of the specification bandwidth (tolerance
zone), owing to the understanding that 1 / Cp = .50, or 50 percent. Of course,
it is fully recognized that this quantity is just another form of design margin.
If such a process could be isolated and subsequently qualified, there would be
a very small risk (α = .005) of inappropriately accepting that process as a
candidate for adoption and implementation. Figure 7.2.1 graphically
summarizes all of the fundamental assertions related to our discussion thus
far.

© Mikel J. Harry, Ph.D. 2002


71
Figure 7.2.1
A Comparison of the Theoretical Design Distribution (Case A)
and the Corresponding Alpha Sampling Distribution (Case C)

© Mikel J. Harry, Ph.D. 2002


72
Here again, it is relatively easy to visualize the driving need for an initial
capability of ±6σST when the “statistical worst-case” goal is ±4σLT. As
previously stated, this portion of the book has sufficiently answered the two
key questions that have been puzzling the world of quality for many years.
Almost without saying, those two central questions are: “Why six sigma and
not some other value?” and “Where does the 1.5σ shift come from?”
As the reader should now understand, a short-term sampling capability
of ±6σST must be realized (during qualification) in order to be at least 99.5
percent confident that the net effect of random sampling error (as accumulated
over an extended period of time) will not compromise or otherwise degrade
the idealized design capability of ±4σLT, given that the DPQ was founded on n
= 30 random samples collected over a relatively short period of time.
Extending this reasoning, we naturally come to understand that the 1.5σST shift
factor is merely a linear offset in the idealized process center that compensates
for “statistical worst-case” random sampling error over the long haul, given
that certain statistical criteria have been satisfied. In this sense, the 1.5σ shift
factor represents an equivalent short-term statistical condition that is
otherwise manifested in the form of long-term “inflated” standard deviation.

7.3 Verifying the conclusions


Confirmation of these assertions was realized by the construction of a
Monte Carlo simulation under the condition of a random normal distribution
defined by the parameters µ = 100 and σST = 5.045. The reader will recognize
the specified standard deviation as the “target” index of variation discussed in
our case example. For this case, g = 3,000 subgroups were randomly
assembled, each consisting of n = 30 random observations. Following this,
the capability of each subgroup was computed and subsequently graphed for
the purpose of visual and statistical interrogation. For the reader’s
convenience, the results of this simulation have been summarized in the form
of a histogram and located in figure 7.3.1.

© Mikel J. Harry, Ph.D. 2002


73
Figure 7.3.1
Histogram of Process Capabilities Related
to the Monte Carlo Simulation

From this figure, it is apparent that the general range of capability


extends from approximately +4σST to a little over +10σST, while maintaining a
central condition of about +6σST. Focusing our attention on the minimum
capability (left side of the distribution), it is virtually certain that the analyst’s
original long-term producibility goal of ±4σLT can be realized if a ±6σST
process capability can be isolated – even in light of worst-case sampling error.

7.4 Establishing the shortcut


Given that the DPQ criteria are sufficiently recognized and understood,
we may now establish a “shortcut” method for specifying a candidate process

© Mikel J. Harry, Ph.D. 2002


74
based solely on the design specifications. In general, that method may be
summarized in the context of our example case and presented as

SL − T 130 − 100
σˆ spec = = = 5.0 .
6 6

Eq. ( 7.4.1 )

Thus, we have the theoretical basis for a design-related rule of thumb.


This rule advocates that, if the design goal is to realize a long-term capability
of 4σLT, then the semi-tolerance zone of a symmetrical, bilateral performance
specification should be divided by a quantity generally known as the “six
sigma constant,” where the value of that constant is 6. Naturally, this constant
is indicative of ZST = 6.0. Of course, when considering the full range of a
bilateral tolerance, the constant should be given as 6 * 2 = 12. Essentially, the
rule of thumb says that a “six sigma specification” is defined or otherwise
constituted when a bilateral performance requirement is “mated” with a
corresponding process performance distribution whose standard deviation
consumes no more than 8.33 percent of the total specification bandwidth.
As may be reasoned, this rule of thumb can be translated into a design
margin. In other words, if a designer seeks to “net” a 4σLT level of long-term
reproducibility for a given performance characteristic under the assumption of
worst-case random sampling error, then it would be necessary to assign a
“gross” design margin of 50 percent. This means that, in the unlikely event
that worst-case sampling error is present at the time of process qualification,
the design margins will not drop below 25 percent. In comparison, this is
twice the margin advocated by conventional engineering practice.

© Mikel J. Harry, Ph.D. 2002


75
8.0 Partitioning the Error

8.1 Separating the noise


For many years, practitioners of quality methods have directly or
indirectly utilized the idea of “rational subgroups.” The reader should
recognize that this particular sampling strategy is inherently germane to a
wide array of statistical applications. We see this practice commonly
employed whenever such tools as control charts and design of experiments are
selected as the instruments of analysis.
At the theoretical core of many statistical tools and methods is the idea
of “error partitioning.” Essentially, an error partition is a “screen” that
excludes certain types of variation from contaminating or otherwise biasing
one or more of the other partitions. In this manner, a rational sampling
strategy can block certain types of error so as to ensure that the effects of
interest are not confounded or statistically biased. In addition, such a strategy
ensures that the response measurements remain fully independent.60 For
example, when using an Xbar and S chart, we seek to separate the background
noise (reflected in the S chart) from the signal effect (displayed in the Xbar
chart). Doing so provides us various types of “reports” on how well the
process center is controlled over time – relative to the extent of inherent
background noise (random error).
When designing a statistically based test plan (experiment) we have the
same ultimate goal – the deliberate partitioning of “random noises” from the
various signal effects induced by the experimental factors. In this manner, we
can examine the inherent repeatability of the response characteristic while

60
The idea of independence is essential to the existence of modern statistics and the practice of six sigma.
To illustrate, consider a cupcake pan. If we prepared n = 8 cupcakes from the same “mix” and then put
them in a standard 8-hole pan for purposes of baking, we could subsequently measure the “rise height”
of each cupcake once removed from the oven. In this scenario, all n = 8 cupcakes would have likely
experienced very little difference in the baking conditions during preparation. In other words, each hole
would have simultaneously experienced the same causal conditions during preparation and baking. As a
consequence, we could not consider the “within pan” measurements to be independent of each other. It
is interesting to notice that by preparing all n = 8 cupcakes at the same time, we would likely have
“blocked” the influence of many variables (of a random and nonrandom nature).

© Mikel J. Harry, Ph.D. 2002


76
concurrently evaluating any changes in the central condition (mean) that may
result from changes in the competing settings among the test variables.
Only in this manner can the signal effects of a given test variable be
separated from the other signal effects that may be present during the
experiment. At the same time, the signal effects of concern must be made
relative to the observed level of background noise (emanating from all of the
other variables not germane to the experiment or otherwise controlled). Only
by virtue of the experimental test plan can the nonrandom noise (variable
effects) be separated from and subsequently contrasted to the random noise
(experimental error). From this perspective, it is easy to see why the idea of
rational sub-grouping is so important.
As previously stated, rational sub-grouping is a sampling strategy that
has the principal aim of separating white noise from black noise. Regardless
of the basis for such sampling (sequential or random) the overall aim in the
same.61 When the data are to be sequentially gathered, a rational sub-
grouping strategy would necessarily seek to exclude any special effects (also
called assignable causes or nonrandom sources of error) that may be present
across the total range or interval of sampling. Essentially, this has the effect
of forcing any assignable causes (nonrandom effects) that might be present to
appear between subgroups (partitions) rather than within subgroups
(partitions). Only if this intent is sufficiently satisfied in a statistical and
pragmatic way can the experimental test plan or control chart “do what it was
designed to do.”
Process characterization studies are not exempt from the discussion at
hand – at least where rational sub-grouping is concerned. Naturally, our
terminal goal is to estimate how capable the process actually is. Most
generally, we seek an estimate of the “short-term” capability, as well as the

61
The reader is kindly asked to remember that throughout this portion of this book the term “process
center” is used without regard to the nominal (target) specification. It simply implies the central location
of a normal distribution relative to some continuous scale of measure. Furthermore, it must always be
remembered that the employment of a rational sampling strategy gracefully supports the study of
autocorrelations and time-series phenomenon during the course of a process characterization and
optimization initiative. Of course, this is another discussion in and of itself – perhaps at some other time.

© Mikel J. Harry, Ph.D. 2002


77
“long-term.” An estimate of the short-term capability reports on the inherent
repeatability (reproducibility) of the process without regard to the overall
process mean. In this context, the random errors are “trapped” within
subgroups. Thus, when the quadratic mean deviations of all the individual
subgroups are “pooled” and subsequently normalized to the subgroup level,
the resulting variance only reflects the influence of random error – assuming
that the basic intents of rational sub-grouping have been fully satisfied. By
taking the square root of this variance, we are left with an instantaneous
measure of error – the short-term standard deviation.
As the informed reader might have surmised, the influences of both
random (un-assignable causes) and nonrandom errors (assignable causes) are
reflected in the long-term standard deviation – also recognized as a
longitudinal measure of reproducibility. Where this particular type of
standard deviation is concerned, the individual quadratic mean deviations are
based on the grand mean rather than the individual subgroup means. Owing
to this, the long-term estimate of the standard deviation reflects all sources of
error, not just those of the random variety. By contrasting the long-term
standard deviation to its short-term counterpart, we are able to surmise how
well the short-term measure of repeatability can be sustained over time.
Hence, the simple ratio of these two types of estimates provides great insight
into how well a process is controlled over time (in terms of process centering).
In one way or another, virtually all of the commonly used indices of
capability are predicated on the short-term standard deviation. Expressed
differently, it can be said that most indices that claim to report on process
repeatability (capability) assume that the underlying measure of variation
(standard deviation) only reflects the influence of white noise (un-assignable
causes). Only when this assumption is fully and rationally satisfied can we
reliably proclaim a certain level of instantaneous repeatability
(reproducibility), or momentary capability as some would say. However, by
formulating the given index of capability to include the corresponding long-
term standard deviation, the resulting performance measure reports on the

© Mikel J. Harry, Ph.D. 2002


78
“sustainable” reproducibility of the process, or longitudinal capability as it is
often called.
All too frequently, this architect of six sigma has observed that such
performance metrics (indices of capability) are often improperly formulated,
incorrectly applied or inappropriately reported in such a way that the terminal
result is counterproductive to the aims of management. This is to say that the
“analytical pieces” of a capability metric are often found to be inappropriate
or somehow deficient in their construction or compilation. The net effect is a
performance metric that does not have the contextual ability to report on what
it purports to measure or otherwise assess.
For example, this architect of six sigma has been told (on numerous
occasions) that the “Cp” of a particular process is of a certain magnitude, only
to latter discover that the data used to compute the underpinning short-term
standard deviation were gathered over a relatively lengthy period of time
without regard to the fundamental principles of rational sub-grouping. On
such occasions, it is usually discovered that the standard deviation should
have been classified as “long-term” in nature.
Obviously, when this occurs, management is presented with a
misleading indicator of instantaneous reproducibility. Without elaboration it
should be fairly obvious how such an understatement of entitlement capability
(owing to improper or insufficient partitioning) could easily mislead someone
attempting to make use of that metric for purposes of decision-making.62
So as to fully avoid such problems, the “rationality” of each subgroup
should always be ensured before the onset of a DPQ or process
characterization study. Generally speaking, the subgroup sampling interval
should be large enough to trap the primary sources of white noise that may
exist, but not so large that nonrandom influences are captured or otherwise

62
Numerous times, this practitioner of six sigma has witnessed (after the fact) precious resources
squandered on new capital-intensive technology because the true capability of the existing technology
was not properly estimated, or was improperly computed. It is professionally shameful that, virtually
every day across the world, many key quality and financial decisions are founded upon highly biased
indices of capability. Arguably, the most common error in the use of Cp is the inclusion of a standard
deviation that, unknown to the analyst, was confounded with or otherwise contaminated by sources of
nonrandom error (black noise) – thereby providing a less favorable estimate of short-term capability.

© Mikel J. Harry, Ph.D. 2002


79
“trapped” within the subgroup data. Unfortunately, there is no conventional
or standardized guidance on how this should be accomplished.63 Only when
these principles are theoretically and pragmatically understood, and then
linked to an intimate knowledge of the process, can the practitioner properly
prescribe, compute, report and subsequently interpret a given measure of
capability, such as ZST, ZLT, CP, CPK, Pp, PPK, and the like.
As many of us are painfully aware, the relative economic and functional
vitality of commercial and industrial products, processes and systems is often
positively correlated to the extent of replication error that is present in or
otherwise inherent to a situation. In general, the relative capability of a
deliverable (or the ways and means of realizing a deliverable) is directly
proportional to one’s capacity and capability to repeat a set of conditions that
yields success.

63
So as to facilitate the execution of a process characterization study, this six sigma practitioner has often
employed a rational sampling strategy that is “open ended” with respect to subgroup size. In other
words, the sample size of any given subgroup is undefined at the onset of sampling. The performance
measurements are progressively and sequentially accumulated until there is a distinct ”slope change” in
the plotted cumulative sums-of-squares. As a broad and pragmatic generality, the cumulative sums-of-
squares will aggregate as a relatively straight line on a time series plot-but only if the progressive
variations are inherently random. It can generally be said that as the sampling progresses, a pragmatic
change in process centering will reveal itself in the form of a change in slope. Naturally, the source of
such a change in slope is attributable to the sudden introduction of nonrandom variation. Of course, the
point at which the slope change originated is also a declaration for terminating the interval of subgroup
sampling. Although it can be argued that this particular procedure is somewhat inefficient, it does help to
ensure that virtually all of the key sources of random error have had the opportunity to influence the
subgroup measurements. At the same time, the sampling disallows the aggregate mean square
(variance) from being biased or otherwise contaminated by the influence of assignable causes. After
defining the first subgroup, the second subgroup is formed in the same manner, but not necessarily with
the same sample size. This process continues until the “cumulative pooled variance” has reached a
rational plateau (in terms of its cumulative magnitude). Naturally, the square root of this quantity
constitutes the composite within-group error and is presented as the “short-term” standard deviation. As
such, it constitutes the instantaneous capability of the process and constitutes the pragmatic limit of
reproducibility. However, it is often necessary to continue the formation of such subgroups until all
principal sources of nonrandom variation have been accounted for. This objective is usually achieved
once the between-group mean square has reached its zenith (in terms of its cumulative magnitude). At
this point, the composite sums-of-squares can be formed and the total error variance estimated. Of
course, the square root of this quantity constitutes the overall error and is known as the “long-term”
standard deviation. As such, it constitutes the sustainable capability of the process. The relative
differential between these two indices constitutes the extent of “centering control” that has been exerted
during the interval of sampling. To facilitate the computational mechanics of such analyses, several
years ago this researcher provided the necessary statistical methods to MiniTab. In turn, they created
the “add-on” analytics (and reports) now known as the “Six Sigma Module.” This particular module has
been specifically designed to facilitate the execution of a six sigma process characterization study.

© Mikel J. Harry, Ph.D. 2002


80
8.2 Aggregating the error

As the sources of replication error vary in number and strength, one’s


ability to deliver a product or service changes accordingly. In other words,
the error associated with the underlying determinants (input error) is
propagated (or otherwise transmitted) from the system of causation through
the transformational function, ultimately manifesting itself in the resultant
(outcome). Expressed differently, we may state
σ2T = f σ21 , σ22 , ... , σ2N ,
Eq.( 8.2.1 )

where σΤ2 is the total performance error exhibited by a system, product,


process, service, event or activity. Of course, N is the Nth source of error
inherent to the characteristic of concern. If we assume that the sources of
error are independent such that
2 ρij σi σj = 0 ,
Eq.( 8.2.2 )

the total error may be fully described by

2 2 2 2
σT = σ1 + σ2 + + σ N.

Eq. ( 8.2.3 )
Obviously, as the leverage or vital few sources of replication error are
discovered and subsequently reduced in strength and number, our capability
and capacity to replicate a set of “success conditions” will improve
accordingly.
When reporting the performance of an industrial or commercial product,
process or service, it is customary and recommended practice to prepare three
separate but related measures of capability. The first performance measure
reflects the minimum capability or "longitudinal reproducibility" of the
characteristic under investigation. In this case, the performance measure is
given by σ2T. As previously indicated, the total error accounts for all sources

© Mikel J. Harry, Ph.D. 2002


81
of variation (error) that occur over time, regardless of their type or nature.
The second type of performance measure reports the maximum capability or
"instantaneous reproducibility" of the process and is designated as σ2W. It
should be understood that σ2W reflects only random sources of error and is
exclusive to those variations that occur within sampling subgroups.64 As may
be apparent, σ2B must only reflect those sources of variation (error) that occur
between sampling subgroups.
By virtue of the latter definitions, we may conceptually consolidate the
right-hand side of Eq. (8.2.3) in a rational manner so as to reveal

σ2T = σ2B + σ2W .


Eq. ( 8.2.4 )

In a great many process scenarios, it would be considered quite desirable


to obtain an estimate of the long-term error component in the form σ2T and the
short-term error in the form σ2W. As previously indicated, such estimates of
repeatability are most conveniently assisted via a rational sampling strategy.65
With such estimates at hand, we can rearrange Eq. (8.2.4) so as to solve for
the between-group error component σ2B. Such a rearrangement would
disclose that

64
The instantaneous reproducibility of a manufacturing process reflects the state of affairs when the
underlying system of causation is entirely free of nonrandom sources of error (i.e., free of variations
attributable to common or "assignable" causes). Given this condition, the related process would be
operating at the upper limit of its capability. Hence, this particular category of error cannot be further
reduced (in a practical or economical sense) without a change in technology, materials or design.
65
For purposes of process characterization, such a sampling strategy has been thoroughly described by
Harry and Lawson (1990). The theoretical principals and practical considerations that underpin the issue
of "rational sub-grouping" have also been addressed by Juran (1979), as well as Grant and Leavenworth
(1980). In the context of this book, it should be recognized that the intent of a rational sampling strategy
is to allow only random effects within groups. This is often accomplished by “blocking” on the variable
called "time." When this is done, the nonrandom or assignable causes will tend to occur between
groups. Hence, one-way ANOVA can be readily employed to decompose the variations into their
respective components. With this accomplished, the practitioner is free to make an unbiased estimate of
the instantaneous reproducibility of the process under investigation. With the same data set, an estimate
of the sustained reproducibility may also be made, independent of existing background noise.

© Mikel J. Harry, Ph.D. 2002


82
σ2B = σ2T - σ2W .

Eq. ( 8.2.5 )

In this manner the signal effects (if any) can be ascertained by


subtraction since the total variations are reflected by σ2T and the random
influences are forced into the σ2W term. At great risk of redundancy, the author
again points out that, for a wide range of processes, the effect of nonrandom
and random perturbations can be estimated and subsequently separated for
independent consideration. As previously stated, the concept of “rational sub-
grouping” must be employed to realize this aim.

8.3 Rationalizing the sample


As many practitioners of process improvement already know, it is often
the case that the influence of certain background variables must be
significantly reduced or eliminated so as to render “statistically valid”
estimates of process capability. Of course, this goal is often achieved or
greatly facilitated by the deliberate and conscientious design of a sampling
plan.
Through such a plan, the influence of certain causative variables can be
effectively and efficiently “blocked” or otherwise neutralized. For example, it
is possible to block the effect of an independent variable by controlling its
operative condition to a specific level. When this principle is linked to certain
analytical tools, it is fully possible to ensure that one or more causative
variables do not “contaminate” or otherwise unduly bias the extent of natural
error inherent to the response characteristic under consideration.
As a given sampling strategy is able to block the influence of more and
more independent variables, the total observed replication error σ2T is
progressively decreased in magnitude. In other words, as each of the
independent variables within a system of causation are progressively blocked,
it is theoretically possible to reach a point where it is not possible to observe

© Mikel J. Harry, Ph.D. 2002


83
any type, form or magnitude of replication error. At this point, only one
measurement value could be repeatedly realized during the course of
sampling. For such a theoretical case, we would observe σ2T = σ2B = σ2W = 0.
In short, the system of classification would be so stringent (as prescribed by
the sampling plan) that no more than one measurement value would be
possible at any given moment in time.
Should such a highly theoretical sampling plan be activated, the same
response measurement would be observed upon each cycle of the process –
over and over again it would be the same measurement, and the replication
error would be zero. However, for any given sampling circumstance, there
does exist a rational combination of blocking variables and corresponding
control settings that will allow only random errors to be made observable.
However, the pragmatic pursuit of such an idealized combination would be
considered highly infeasible or impractical in most circumstances, to say the
least. For this reason, we simply elect to block on the variable called “time.”
In this manner, we are able to indirectly and artificially “scale” the system of
blocking to such an extent than only random errors are made known or
measurable.
If the window of sampling time is made too small, the terminal estimate
of pure error σ2W is underestimated, owing to the forced exclusion of too
many sources of random error. On the other hand, if the window of time is too
large, the terminal estimate of σ2W is overestimated, owing to the natural
inclusion of nonrandom effects. However, by the age-old method of trial-and-
error, it is possible to define a window size (sampling time frame) that
captures the “true” magnitude of σ2W but yet necessarily precludes nonrandom
sources of error from joining the mix. In this instance, the nonrandom errors
would be assigned to the σ2B term.
In short, it is pragmatically feasible to discover a sampling interval (in
terms of time) that will separate or otherwise partition the primary signal
effect of a response characteristic from the background noise (random
variation) that surrounds that effect. Only when this has been rationally

© Mikel J. Harry, Ph.D. 2002


84
accomplished can the instantaneous reproducibility of a performance
characteristic be assessed in a valid manner. Such a sampling plan is also
called a rational sampling strategy, as execution of the plan rationally
(sensibly and judiciously) partitions the array of signal effects from the mix of
indiscernible background noises. In this sense, a rational sampling strategy
can effectively unmask or otherwise make known the array of signal effects,
thereby allowing σ2B and σ2W to be statistically contrasted in a meaningful and
rational manner.

9.0 Analyzing the Partitions

9.1 Defining the components


The analytical separation of nonrandom and random process variations
may be readily accomplished via the application of one-way analysis-of-
variance, herein referred to as "one-way ANOVA." In the context of a rational
sampling strategy, one-way ANOVA provides a means to decompose the total
observed variation (SST) into two unique parts, namely SSB and SSW.
The first component of the total variation (SST) is the between group
sums-of-squares (SSB). This particular component (partition) describes the
cumulative effect of the between-group variations attributable to common
causes, often called "assignable causes." The second component (partition) is
the within group sums-of squares (SSW,). This partition contains the
variations that cannot be assigned. As previously indicated, any error
(variation) that cannot be statistically described or assigned is generally
referred to as "white noise," or random error. As such, the within-group
component (partition) reports on the extent of instantaneous reproducibility
naturally inherent to the underlying system of causation (process). In this
context, the experienced practitioner will attest that the total variation can be
expressed as
g nj
2
SS T = Σ Σ
j=1 i=1
Xij - X

Eq. ( 9.1.1 )

© Mikel J. Harry, Ph.D. 2002


85
where Xij is the ith variate associated with the jth subgroup, X is the grand
average and nj is the total number of variates within the jth subgroup. In order
to utilize the information given by the g subgroups, we must decompose Eq.
(9.1.1) into its component parts. The first component of variation to be
considered is the within-group sum of squares. This particular partition is
given as

g nj
SS W = Σ Σ Xij - Xj 2
j=1 i=1

Eq. ( 9.1.2 )
th
where X j is the average of the j subgroup. The second component to be
examined is the between-groups sum of squares. Considering the general
case, this partition be described by

g
2
SS B = Σ
j=1
nj X j - X

Eq. ( 9.1.3 )

or, for the special case where n1 = n2 = ... = ng, the between-group sum of
squares can be presented in the form

g
2
SS B = n Σ
j=1
Xj - X .

Eq. ( 9.1.4 )
Thus, we may now say that

SS T = SS B + SS W.
Eq. ( 9.1.5 )

In expanded form, the special case is expressed as

© Mikel J. Harry, Ph.D. 2002


86
g n g g n
2 2
Σ Σ Xij - X = n Σ Xj - X + Σ Σ Xij - Xj 2
j=1 i=1 j=1 j=1 i=1 .
Eq. ( 9.1.6 )

Recognizing the quantities n and g, it is naturally understood that the total


degrees of freedom (dfT) must also be rationally evaluated. In view of Eq. (
9.1.5 ), we note that

dfT = dfB + dfW


Eq. ( 9.1.7 )
which expands to reveal

ng - 1 = g-1 + g(n-1) .
Eq. ( 9.1.8 )

9.2 Analyzing the variances


Relating the sums-of-squares of each partition to the corresponding
degrees of freedom provides the mean square ratio, or variance as some would
say. Of course, the variance reports on the “typical” quadratic mean deviation
of a given error component (partition). Making such a formulation for the
mean square ratios commonly associated with a process characterization study
reveals that

SS T
MST =
ng - 1
Eq. ( 9.2.1 )
and

SS B
MSB =
g - 1,
Eq. ( 9.2.2 )

© Mikel J. Harry, Ph.D. 2002


87
as well as

MSW = SS W
g(n - 1) .
Eq. ( 9.2.3 )

To further our understanding, let us turn our attention to the F statistic


and one-way ANOVA. As most six sigma practitioners are aware, F = MSB /
MSW. Both the F statistic and ANOVA seek to highlight the presence of a
signal effect (as represented by the MSB term) and then contrast the absolute
magnitude of that effect to the absolute magnitude of random background
noise (recognized as the MSW term). In other words, the F statistic is
designed to “pull” a signal out of the total variations – but only if that signal is
large enough to be statistically discernable from the variations that constitute
the bandwidth of random error.
Expressed in another way, we may say that if a signal effect can be
statistically differentiated from the pure error (white noise), we would have
sufficient scientific evidence to conclude that the signal is “real,” with some
degree of statistical confidence (1 - α). On the other hand, if the signal was
not sufficiently large, we would then conclude that it was just an artifact of
naturally occurring random variations. In such cases, it would be statistically
inappropriate to further interrogate the between group sums-of-squares.
By all means, we must recognize that the square root of Eq. ( 9.2.3 )
constitutes a short-term standard deviation (estimate of instantaneous process
reproducibility). As such, it represents the typical mean deviation that can be
expected at any moment in time, without any nonrandom bias. Expressed
differently, the square root of Eq. ( 9.2.3 ) is a relative measure of dispersion
that reflects the “momentary capability” of a performance characteristic, fully

© Mikel J. Harry, Ph.D. 2002


88
independent of any nonrandom influences that would otherwise be considered
“assignable.”66
From this vantage point, the within-group errors can be fully evaluated
as if the process center never varied. Only then can the “true” instantaneous
reproducibility of the process be rightfully declared, given that all the normal
statistical assumptions have been fully satisfied or adequately rationalized. In
this manner, we are able to discover the true extent of entitlement capability
(in terms of instantaneous reproducibility). Of course, such an estimate of
short-term capability would constitute the “best possible performance
scenario” that could be expected, given the prevailing system of causation.67
Within this context, we often say the short-term standard deviation only
reflects the influence of pure error (effects that have no deterministic origin).
At the risk of unnecessary iteration, we employ a rational sampling strategy to
ensure the within-group errors are fully independent and unpredictable
(random). In this regard, the short-term standard deviation is not analytically
biased by any momentary centering condition, or unduly influenced by any
assignable cause that might have been present at some point during or across
the total interval of sampling.
By retaining the between-group partition, the reproducibility of process
centering is not considered when developing the short-term standard
deviation. In other words, retention of this partition ensures that the errors
occurring between groups (random or otherwise) do not become confounded
or manifested within the short-term standard deviation. 68

66
The reader must bear in mind that a temporal source of error requires the passage of time before it can
develop or otherwise realize its full contributory influence – such as the effects of machine set-up, tool
wear, environmental changes and so on.
67
From a process engineering perspective, the system of causation relates directly to the technologies
employed to create the performance characteristic. Thus, the measure of instantaneous reproducibility
(short-term standard deviation) reports on how well the implemented technologies could potentially
function in a world that is characterized by only random variations (error), and where the process is
always postulated to be centered on its nominal specification (target value).
68
For example, consider a high-speed punch press. Natural wear in the die is not generally discernable
within a small subgroup (say n=5), but can be made to appear between subgroups in the context of a
rational sampling strategy. With this understanding as a backdrop, we can say that the influence of die
wear will be manifested over time in the form of a “drifting process center.” To some extent, wear in the
die does exert an influence during the short periods of subgroup sampling, but that effect is so miniscule

© Mikel J. Harry, Ph.D. 2002


89
Thus, the root mean square of the between group partition (once
corrected for the influence of n) can be thought of as the “typical” difference
that could be expected between any two subgroup means (over a protracted
period of time). As we shall come to understand later on in the discussion,
such a difference can be reduced to an equivalent mean shift. In this context,
the equivalent mean shift is a subtle but direct measure of process control, but
only as a measure of “centering reproducibility.”
With this in mind, we will now turn our attention to the “longitudinal
state of affairs” by further considering the total sums-of-squares and its related
degrees of freedom. To this end, we must recognize that the square root of
MST represents the long-term standard deviation. In this form, the long-term
standard deviation is a direct measure of sustainable capability, or longitudinal
reproducibility as some would say. By nature, it can be thought of as the
“typical” mean deviation that can be expected over a protracted period of
time. Because the long-term standard deviation is composed of both random
error (white noise) and nonrandom error (black noise), it is quite sensitive to
time-centric sources of error, especially when the subgroup sampling is
protracted over a long period.69

it would not be practical or economical to analytically separate its unique contributory effect for
independent consideration. With respect to the within-group partition, the influence of die wear should
remain confounded with the many other sources of background noise. Consequently, its momentary
influence should be reflected in the MSW term, but its temporal influence should be reflected in the MSB
term. Thus, the long-term standard deviation would likely be considerably larger than the short-term
standard deviation. Given this line of reasoning, it is easy to understand why nonrandom sources of
temporal error generally have a destabilizing effect on the subgroup averages rather than their respective
standard deviations. Based on this line of reasoning, we must recognize that a rational sampling
strategy can greatly facilitate “bouncing” nonrandom variations into the between-group partition while
restraining the inherent random variations within groups. Only in this manner can the short-term
standard deviation (instantaneous capability) of a given performance characteristic be contrasted or
otherwise made relative to the long-term standard deviation (temporal capability).
69
A temporal source of error generally exerts its unique and often interactive influence over a relatively
protracted period of time. Although such errors can be fully independent or interactive by nature, their
aggregate (net) effect generally tends to “inflate” or otherwise “expand” the relative magnitude of the
short-term standard deviation. Naturally, a statistically significant differential between the temporal and
instantaneous estimates of reproducibility error necessarily constitutes the relative extent to which the
process center is “controlled” over time. This understanding cannot be overstated, as it is at the core of
six sigma practice, from a process as well as design point of view.

© Mikel J. Harry, Ph.D. 2002


90
9.3 Rearranging the model
As may be apparent, the aforementioned statistics are excellent tools
with which to assess short-term and long-term process capability
(reproducibility). Given this perspective, it is not a stretch to compare the
estimate of long-term reproducibility error (as reflected by the RMST term) to
the estimate of short-term reproducibility error (as revealed by the MSW term).
As stated earlier in this book, such a contrast can be expressed in the form of a
ratio, otherwise known as “c.” Recall that c is a multiplication factor used to
inflate the short-term standard deviation so as to account for the influence of
temporal sources of error. In this context, c is somewhat analogous to a
signal-to-noise ratio.
But if we consider the ratio MSB / MSW, we discover the F statistic. In
this context, the F statistic fully constitutes a “true” signal-to-noise ratio.
Furthermore, this classic (if not cornerstone) statistic answers a question
where process characterization work is concerned: “How big is the signal
effect relative to the inherent background noise?” With this question
answered, we may now ask: “How big can the typical between-group
deviation become before it is no longer statistically insignificant?” As we
shall come to see, some simple algebraic manipulation of the one-way
ANOVA model will reveal the answer.
By simple rearrangement of the one-way ANOVA model, we quickly
take note that SSB = SST – SSW. By conducting a longitudinal capability
study in the context of a rational sampling plan, we can fully estimate the total
variation, as well as the inherent extent of random variation, and then uncover
the relative impact of nonrandom variation by subtraction. In this manner, we
are able to gain insight into the long-term performance of the process, as well
as the short-term performance.
It is well recognized that one-way ANOVA has the ability to fully
decompose the total variations into two fundamental sources: the random
error component (which reflects the relative extent of white noise) and the
between-group component (which reflects the relative extent of black noise).

© Mikel J. Harry, Ph.D. 2002


91
With a good rational sampling plan and one-way ANOVA, we have the
analytical power to say that “if it wiggles, we can trap it, slice it and dice it
any way we want.” In other words, we can look at the reproducibility of a
performance characteristic from any angle. Without this power, it is doubtful
that a process or design characterization and optimization study could be
rationally planned and executed in a meaningful way.
To underscore the latter point, let us further our understanding of the
power behind one-way ANOVA by way of some simple algebra. By
subtracting the observed within-group variations from the total variations
comprising the capability study, we are left with the between-group variations.
In closed form, we can express this operation by the relation SSB = SST - SSW.
In this case, the SSB term is an absolute measure that, when normalized by the
related degrees of freedom , reports on the extent to which we can
successfully replicate the overall centering condition of the process. In other
words, the SSB term is an aggregation of process centering errors that, when
normalized for the given degrees of freedom, reports on our efforts to
maintain a stable location parameter over time, irrespective of the target
specification (if given by the design).

9.3 Examining the entitlement


So as to expand our discussion, recall that SST/(n-1) is an estimate of the
long-term variance and SSW/g(n-1) is an estimate of the short-term variance –
for reasons previously stated. Of course, the square root of these respective
variances respectively reveals the long-term and short-term performance of
the process in the form of a root mean square (standard deviation). The reader
will also recall that these two types of root mean squares are merged in the
form of a ratio called “c” and estimated as c = σLT / σST. From this
perspective, c is a hybrid performance metric that provides us with an insight
into how much “dynamic centering error” is occurring in the process over the
total period of sampling.

© Mikel J. Harry, Ph.D. 2002


92
When the ratio is such that c = σLT / σST = 1.0, then SSB = 0, but only
after adjusting for differences in degrees-of-freedom (between the numerator
and denominator terms). On the other hand, we note that when σLT > σST, then
SSB > 0. Thus, we assert that c is a general measure of capability related to
process centering.70 Given the circumstance that SSB = 0, we naturally
recognize that all of the subgroup means would necessarily be identical in
value. Under this condition, one could proclaim a “state of perfection” in the
“centering repeatability” of the given process.
On the flip side, we recognize that as SSB approaches its theoretical
limit, our confidence of repeating a given centering condition approaches
zero. Naturally, there would be some point far short of infinity at which one
would conclude, based on pragmatic argument, that the potential of replicating
any given centering condition would become nonexistent. Of course, there
would also exist a “point of pragmatism” on the bottom end as well – a point
where the subgroup-to-subgroup variations are not of any statistical or
practical significance, but merely “noise.” In this circumstance, we would
conclude that such variations were segmented from the SSW term and then
inappropriately assigned to the SSB term. Under this condition, it is quite
possible that the subgroup sampling interval was too small.
Given the previous discussions it should now be fairly easy to see that
the SSW term is a foundational measure from which to launch a better
understanding of what is meant by the phrase “instantaneous reproducibility.”
In this context, the terminal measure of instantaneous process capability (ZST
or Cp), would be unbiased and uncontaminated in terms of perturbing
influences of a non-random nature (free of black noise influences). In other
words, the SSW term (once normalized to the form of a standard deviation)
reflects the relative “upper limit of capability” inherent to a given process.
more to the point, it represents the “best-case capability scenario” given the
70
As a sidebar of interest, it should be recognized that there is a special mathematical relationship between
“c” and the “F” statistic. More specifically, we may mathematically define the dynamic expansion factor c
and its interrelationship with the F statistic by way of the equation c = sqrt(1+(((F-1)*(g-1))/(ng-1))). As
you investigate this relationship through algebraic manipulation and Monte Carlo simulation, you will
likely gain many new insights into the field of statistical process control (SPC).

© Mikel J. Harry, Ph.D. 2002


93
current operational settings and technological circumstances related to that
process.
In this sense, Cp can be thought of as a measure of “entitlement
capability,” only if the underlying standard deviation is a true and full
measure of white noise. Of course, such noise would stem from the effects
due to uncontrolled background variables. If the sampling strategy allowed
sources of nonrandom or systematic errors of some form (black noise) to
influence the data, the resulting calculation of SSW would necessarily be
inflated, thereby, forcing the estimate of Cp to be worse than should be
rightfully known. The decision-making consequences related to such an
understatement of inherent capability need not be stated or expounded upon,
as they are intuitively evident.
Since the idea of “entitlement” is defined as a “rightful level of
expectation,” the term “entitlement capability” would seem to make pragmatic
and theoretical sense, given that the variations associated with the SSW term
would only reflect random influences, assuming sufficient and appropriate
blocking. Regardless of terminology, the SSW term reports on the short-term
repeatability of the process, but without any regard to performance
specifications or degrees of freedom. In this sense, the short-term standard
deviation is, in its own right, an absolute measure of repeatability and,
therefore, an absolute measure of instantaneous capability.

10.0 Computing the Correction

10.1 Computing the shift


At this point in our discussion we must turn our attention to the
research of Bender (1975), Gilson (1951), and Evans (1975). In essence,
their work focused on the problems associated with establishing engineering
tolerances in light of certain manufacturing variations that assume the form
of mean shifts and drifts. In synopsis, Evans pointed out that

© Mikel J. Harry, Ph.D. 2002


94
... shifts and drifts in the mean of a component occur for a number of
reasons ... for example, tool wear is one source of a gradual drift...
which can cause shifts in the distribution. Except in special cases, it is
almost impossible to predict quantitatively the changes in the
distribution of a component value that will occur, but the knowledge
that they will occur enables us to cope with the difficulty. A solution
proposed by Bender ... allows for [nonrandom] shifts and drifts.
Bender suggests that one should use

V = 1.5 VAR (X)

as the standard deviation of the response ... [so as] to relate the
component tolerances and the response tolerance.

Of particular interest is the idea that we cannot forecast any or all of


the random or nonrandom errors at any given moment in time, but mere
knowledge that they will occur over time provides us with an engineering
edge, so to speak. In view of this research, we may redefine Bender’s
correction in the form

2 2

v T
= d c vW i
Eq. ( 10.1.1 )
or

c = σT
σW
Eq. ( 10.1.2 )
where c is the relative magnitude of inflation imposed or otherwise overlaid
on the measure of instantaneous reproducibility.
In this context, c is a corrective measure used to adjust the
instantaneous reproducibility of a performance characteristic. This
corrective device is intended to generally account for the influence of
random temporal error, and an array of transient effects that periodically

© Mikel J. Harry, Ph.D. 2002


95
emerge over extended periods of process operation. Again, such a
compensatory measure is used to inflate or otherwise expand the
instantaneous estimate of reproducibility (short-term standard deviation).
As discussed throughout this book, the correction c can be established in
different theoretical and empirical forms, then subsequently employed for a
variety of purposes.
In general, the correction is most often employed to better project the
first-time yield of a performance characteristic or product. To this end, the
correction facilitates a consideration of unknown (but yet expected) sources
of error that, ultimately, have the effect of upsetting the momentary
condition of a process center. Of course, such a correction provides us with
a more realistic basis for assigning and subsequently analyzing performance
specifications.
Calling upon the previously mentioned research, we would discover
that the general range of c, for "typical manufacturing processes," can be
confidently given in the general range of

1.4 ≤ c ≤ 1.8 .
Eq. ( 10.1.3 )

Recognize that this particular range of c is considered “normal” for the


general case – per the previously mentioned research. The author personally
conducted additional investigations into these phenomena over a three-year
period of time during the mid 1980’s. The results reaffirmed the
aforementioned conclusion – empirically and theoretically. Several key
elements of this supporting research have been set forth in this book.
By algebraic manipulation of the one-way ANOVA model, it is
possible to isolate and declare the between-group root mean square (RMSB).
In turn, the RMSB term can be transformed to reveal the “typical shift”
occurring between subgroups but expressed in the form of an equivalent Z
value. This is to say that the absolute mean deviation can be normalized by

© Mikel J. Harry, Ph.D. 2002


96
the extent of extraneous error inherent to the system of causation (short-term
standard deviation). The final result of these manipulations provides an
estimate of the “typical” momentary mean offset, but expressed in the form
of a standard normal deviate and designated as a Zshift.
Now, if we were to consider the case c = 1.8, n = 5, and g = 25, the
resulting calculations would reveal that ZshiftσST = 1.498σST. Thus, we have
the “shift expectation” of 1.5σ, as often discussed in the six sigma literature.
This would be to say that the “normalized and standardized” mean deviation
for a typical subgroup, as related to a common process, is likely to be about
1.5σST, given no other knowledge about the process or prevailing
circumstances. In other words, when the amount of long-term dynamic
expansion is such that c = 1.8, we would mathematically determine that the
typical-but-equivalent subgroup mean shift expectation would be about
1.5σST. To better understand the derivation of this quantity, it is necessary to
expand the correction c and then solve for the “typical” momentary mean
offset.
To accomplish the latter aim, we will first consider the total and
within-group error components obtained by way of a rational sampling
strategy. Given this, we may rewrite Eq. (10.1.2) as

SS T
ng -1
c=
SS W
g(n-1) .
Eq. ( 10.1.4 )

By virtue of the additive properties associated with Eq. (9.1.6), it can be


shown that

© Mikel J. Harry, Ph.D. 2002


97
SS B + SS W
ng -1 g(n-1)
c2 = = SS B + SS W .
SS W SS W ng -1
g(n-1) .
Eq. ( 10.1.5 )

Further rearrangement reveals

SS B +1 = c2 ng-1
SS W g n-1 .
Eq. ( 10.1.6 )

We may now solve for SSB and present the result as

SS B = SS W c2 ng-1 - g n-1
g n-1 .
Eq. (10.1.7 )

By expanding the left side of Eq.(10.1.7), we observe that

g
2 c2_ ng - 1i - g _ n - 1i
R ` Xj - Xj = vt 2W
j=1
n
Eq. ( 10.1.8 )

and dividing by n reveals

g
2 c2_ ng - 1i - g _ n - 1i
R ` Xj - Xj = vt 2W
j=1
n
Eq. ( 10.1.9 )

© Mikel J. Harry, Ph.D. 2002


98
To define the average quadratic deviation, we divide both sides by g, therein
providing the relation

g
2
R ` X - Xj
j=1
j
ct 2_ ng - 1i - g _ n - 1i
t 2W
=v
g ng
Eq. ( 10.1.10 )

Taking the square root of both sides, we are left with the "typical" absolute
mean deviation (shift). Of course, this is provided in the form

g
2
R `X - Xj j R 2 V
S ct _ ng - 1i - g _n - 1i W
d = vW j=1
tW
=v
g SS ng WW
T X
Eq. ( 10.1.11 )

By standardizing to the normalized case NID(0,1), we observe that

c2 (ng-1) - g(n-1)
ZShift.Typ = ng .
Eq. ( 10.1.12 )

However, for the case c = 1, Eq. ( 10.1.12 ) reduces to

(ng-1) - g(n-1) g-1


ZShift.Typ = =
ng ng .
Eq. ( 10.1.13 )
For purposes of application, it would be highly desirable to set ZShift.Typ
= 0 for the case c = 1 so as to maintain computational appeal. To accomplish

© Mikel J. Harry, Ph.D. 2002


99
this, we correct Eq. (10.1.12) by subtraction of Eq. (10.1.13). 71 The result
of this algebraic operation is given as

(c2 -1)(ng-1)
ZShift = ng .
Eq. ( 10.1.14 )

For the reader's convenience, a case specific comparison of Eq.(10.1.12) and


Eq. (10.1.14) is presented in figure 10.1.1.

Figure 10.1.1

71
The mathematically inclined reader will quickly recognize that such a proposal is somewhat spurious
from a theoretical point of view. However, the author asserts that the practical benefits tied to the
proposal far outweigh the theoretical constraints. For example, when c=1, the uninformed practitioner
would intuitively reason that ZShift = 0 since the variances are equal. However, from Eq. (10.1.13) it is
apparent that ZShift would necessarily prove to be greater than zero, owing to a differential in the degrees
of freedom. Needless to say, this would present the uninformed practitioner with a point of major
contention or confusion. Although less precise, Eq. (10.1.14) provides a more intuitive result over the
theoretical range of c. It should also be noted that this operation has a negligible effect on ZShift for
typical combinations of n and g. As a consequence of these characteristics, the author believes the
application of Eq. (10.1.13) as a corrective device or compensatory measure is justified, particularly in
the spirit of many conventional design engineering practices and forms of producibility analysis.

© Mikel J. Harry, Ph.D. 2002


100
The Absolute Difference Between Eq. ( 10.1.12 ) and
Eq. ( 10.1.14 ) for the case n=5, g=50.

Of particular interest, figures 10.1.2 and 10.1.3 display the relation


between ZShift and c for a selected range of ng. The most prominent
conclusion resulting from these graphs is that the relationship is reasonably
robust to sample size and sub-grouping constraints, as well as their product.
It should also be noted that Z2Shift asymptotically approaches the quantity c2 -
1 as ng approaches infinity.

2.5

2
ZSHIFT
1.5

1
ng=10
.5 ng=100
ng=1000
0

-.5 C
.75 1 1.25 1.5 1.75 2 2.25 2.5 2.75 3 3.25

Figure 10.1.2
The Effect of ng on ZShift for c=1 to c=3.

© Mikel J. Harry, Ph.D. 2002


101
C

Figure 10.1.3
The Effect of ng on ZShift for c=1.5 to c=2.0.

In the spirit of establishing a relative but standard mean correction


(shift factor), let us consider the general range of conventional sampling
practice. Arguably, such a range is given by the combinations formed under
the constraint that

4≤n≤6

and

25 ≤ g ≤ 100.

Perhaps the most commonly employed combination is that of n = 5


and g = 50. Under this combination, the total sample size is given as ng =
250. Other practitioners writing on the topic of process capability studies

© Mikel J. Harry, Ph.D. 2002


102
have often recommended such a sample size as a general guideline,
particularly when a statistical process control (SPC) chart for variables data
is employed to facilitate the study.

10.2 Resolving the shift


When presented with the case c = 1.8 in the context of a common
rational sampling strategy (n = 5 and g = 50), we compute ZShiftσST = 1.49σST.
Notice that the given value of c corresponds to the worst-case expectation as
per the research. As with many engineering conventions, the worst-case
condition is most often utilized as a critical threshold. Thus, we have
rationally established that δ = 1.5σST can be considered as a general correction
that accounts for “typical” momentary disturbances to process centering (but
only with regard to µ and not T).
Based on these findings and other discoveries, the range of
generalization for the “shift expectation” is given as 0.50 < ZShift < 2.00,
retaining a modal condition of ZShift = 1.50. Of course, this assertion is
constrained to “normal and typical” sampling strategies involving the practice
of rational sub-grouping. It is also generally constrained to those processes
exhibiting a short-term capability in the range of 3.0 < ZST < 5.0, with the
modal condition of ZST = 4.0. The aforementioned range of short-term
capability should be considered most reasonable owing to its consistency with
the stated research and conclusions resulting from extensive empirical
benchmarking studies.72
Essentially, the argument should not be whether or not a compensatory
shift of 1.5σST is valid for each and every CTQ. We know that every CTQ
will exhibit its own unique shift value; however, when considering a great
many such CTQs, it is a safe bet that the typical shift will be about 1.5σST.

72
The vast majority of benchmarking data gathered by this researcher and practitioner (since 1984) has
revealed the process capability associated with a great many products and services to exist in the range
of 3.5σ to 4.5σ, with the trailing edges dropping off at 3.0σ and 5.0σ respectively. Obviously, this tends
to infer that the typical CTQ of a product or service will exhibit a performance capability of about 4.0σ.
This is to be generally expected, given the conventional practice of establishing 25 percent design
margins. Of course, 4.0σ is the equivalent form of such a safety margin.

© Mikel J. Harry, Ph.D. 2002


103
This author would, therefore, suggest that any such debate on this topic should
be focused on how to get people using such “research-based” rules of thumb.
After all, it is far more rational to evaluate the reproducibility of a design
under the assumption of an unfavorably vectored 1.5σST shift (in all of the
critical components of a system) than it is to simply set those parts at their
respective nominal condition and then perform the evaluation. Obviously, the
latter type of analysis is unrealistic and will only reveal the best-case
performance condition. At the other extreme, the probability of a “worst-case
stack” is virtually zero – even when evaluating designs of relatively low
complexity. Therefore, the six sigma practice of imposing a 1.5σST shift on
each critical performance opportunity represents a rational method for
analyzing the robustness of a design.

10.3 Calculating the minimum


In accord with the one-way ANOVA model, we readily recognize that
the between-group mean square can be contrasted to the within-group mean
square so as to form a type of signal-to-noise ratio. As theoretically known in
the field of mathematical statistics, such a ratio can be evaluated via the F
distribution in the form

F = MSB
MSW .
Eq. ( 10.3.1 )
By standardizing to the case NID(0,1), it will be recognized that

F = MSB
Eq. ( 10.3.2 )

since MSW = 1.0. In expanded form Eq. ( 10.3.2 ) is given as

SS B
F=
g-1

© Mikel J. Harry, Ph.D. 2002


104
Eq. ( 10.3.3 )
or

SS B = F g-1 .
Eq. ( 10.3.4 )

Expanding the between-group sums-of-squares yields

g 2
n Σ
j=1
X j - X = F g-1

Eq. ( 10.3.5 )

and dividing both sides by ng gives

g
2
Σ
j=1
Xj - X
F g-1
g = ng .

Eq. ( 10.3.6 )

After correcting for degrees of freedom, we have

g
2
Σ
j=1
Xj - X
g-1 F g-1 g-1
g - ng = ng - ng
.
Eq. ( 10.3.7 )

By standardizing and some simple rearrangement, we are left with the


quantity

F-1 g-1
ZShift =
ng .

© Mikel J. Harry, Ph.D. 2002


105
Eq. ( 10.3.8 )

Drawing upon the merits of our previous discussion, we may now state the
equality

(c2 -1)(ng-1) = (F-1) (g-1)


Eq. ( 10.3.9 )
from which we obtain

(F-1) (g-1)
c= 1+
ng-1 .
Eq. ( 10.3.10 )

Thus, we may compute the minimum expected ZShift or corresponding


value of c at the critical threshold of Ho for any combination of α, n and g.
Of course, such estimation is predicated on application of the one-way
ANOVA model. As may be apparent, the equations presented in this portion
of the book have many implications for the practice of six sigma.

10.4 Connecting the capability


We shall now turn our focus to the various indices of capability by
building upon the analytical foundation constructed thus far. The informed
reader will recall the basic Z transformation as


Z =X
σ
Eq. ( 10.4.1 )

© Mikel J. Harry, Ph.D. 2002


106
where µ is the population mean, σ is the population standard deviation, and X
is a random normal measurement obtained from the corresponding population.
In terms of Z, we may now estimate the short-term performance of a process
as

T - SL
ZST = t ST
v
Eq. ( 10.4.2 )

where T is the nominal or target specification, SL is the specification limit of

interest and σST is an estimator of the short-term population σST. Notice that
Eq. (10.4.2) assumes µ = T. However, when the mean does not coincide
with the target value, we may calculate

X - SL
Z1 = t ST
v

Eq. ( 10.4.3 )
where x is the grand mean of the sample data, or the estimator of µ.
Due to dynamic perturbations of a transient or temporal nature, we often
witness an inflation of the initial short-term standard deviation that, over many
cycles of a process, will degrade the value of ZST . To compensate for this
phenomenon, we calculate the quantity

T - SL T - SL
Z2 = vLT = vSTc
Eq. ( 10.4.4 )

When considering the simultaneous occurrence of static and dynamic sources


of error, the resultant Z value is expressed as

© Mikel J. Harry, Ph.D. 2002


107
X - SL X - SL
Z3 = t LT = t STc
v v
Eq. ( 10.4.5 )

By convention, we recognize that unity is often defined as existing between


the three sigma limits of a performance distribution. Given this, we may
describe the short-term process capability ratio as

T - SL ZST
CP = 1 t =
3 vST 3
Eq. ( 10.4.6 )

In accordance with existing literature, we may account for the effect of a static
mean offset by computing the ratio

T–X
k1 =
T – SL
Eq. ( 10.4.7 )
which may be restated in the form

X – SL
1– k1 =
T – SL .
Eq. ( 10.4.8 )
Finally, the cross multiplication of Eq. (10.4.6) and Eq. (10.4.8) reveals

X - SL 1 X - SL Z1
CPK1 = CP _1 - k1i = T -t SL : = t ST = 3
3vST T - SL 3 v
Eq. ( 10.4.9 )

Hence, CPK1 may be expressed as an equivalent Z value. As a consequence, we


may calculate the quantity

© Mikel J. Harry, Ph.D. 2002


108
Z1 = 1 ZST 1 - K1 = ZST _1 - K1i
3 3 _ i 3
Eq. ( 10.4.10 )
or simply

Z1
ZST = _1 - k1i
Eq. ( 10.4.11 )
By analogy, we write the equation

Z2 = 1–k
2
ZST
Eq. ( 10.4.12 )
and

Z3 = 1–k
3
ZST .
Eq. ( 10.4.13 )
By the manipulation of Eq. (10.4.4) we discover that

t ST 1
v
t 1T = c = _1 - k2i
v
Eq. ( 10.4.14 )
from which we observe
k2 = 1 - 1c .

Eq. ( 10.4.15 )
By substitution, we recognize that

1 - k2 = 1 - k
c 3

Eq. ( 10.4.16 )
which may be rearranged to reveal

© Mikel J. Harry, Ph.D. 2002


109
k3 = 1 - 1 -ck1 .

Eq. ( 10.4.17 )
Thus, from the latter arguments, it follows that

1–k3 = 1–k1 • 1c = 1–k1 1–k2


Eq. ( 10.4.18 )
from which we obtain

k3 = k1 + k2 - k1 k2 .
Eq. ( 10.4.19 )

Based on this, we may conclude the joint occurrence of static and


dynamic error is additive by nature, but their correlation must be statistically
accounted for. If the product is zero, Eq. (10.4.19) is reduced to k3 = k1 + k2.
In summary, we can express the long-term Cpk's as

Cp
Cpk2 = Cp 1 - k2 = c
Eq. ( 10.4.20 )
and

Cp 1 - k1 Cpk1
Cpk3 = Cp 1 - k3 = c = c .

Eq. ( 10.4.21 )

© Mikel J. Harry, Ph.D. 2002


110
11.0 Harnessing the Chaos

11.1 Setting the course


During the last several decades, we have experienced an enormous
explosion of technology. As a result, a large proportion of today's product
designers have been liberated from many types of material and component
related constraints. Because of this technological liberation, we see many
relatively familiar products that operate faster, have more features, occupy
less space, and in some instances, even cost less than their predecessors.
The overriding design implication of the so called "technology boom"
is quite clear -- for every incremental increase in design complexity and
sophistication there must be a corresponding increase in producibility;
otherwise, the manufacturer will not remain competitive. In many factories,
producibility has become a major business issue. It is often the key to
economic success or catastrophe.
The purpose of this portion of the book is to provide the reader with a
novel method to enhance the study of complex designs using a dynamic
simulation method based upon chaos theory and fractal geometry. During
the course of discussion, it will be demonstrated how such an approach can
significantly enhance our engineering understanding of manufacturing
behavior and its consequential impact on performance projections.

11.2 Framing the approach

Classical product design often employs Monte Carlo simulation to study


the resultant distribution and stochastic properties of certain functions of
several variables. This is accomplished by specifying a cumulative density
function (c.d.f) for each of the c independent variables and then randomly
selecting r members from each c.d.f., thus forming a matrix, X, consisting of r
rows and c columns. The X matrix is given by

x 11, x 12, ... ,x 1c


x 21, x 22, ... ,x 2c
X= .
.
.
x r1, x r2, ... ,x rc

© Mikel J. Harry, Ph.D. 2002


111
and the response column c+1 would then be constructed subsequent to the
application of an appropriate transfer function to each of the r rows.
To illustrate the aforementioned method, let us assume the case
NID(µ,σ2) for each of the c independent variables. In this case, we might be
interested in studying the resultant distributional form, D, of their products. In
other words, D would be obtained by summing the elements in each row of X.
The response vector, V, would then be located in column c+1. Following this,
we would compose a histogram of V and compute its germane indices. In
turn, the D vector and its associated indices would serve as the basis for
drawing various conclusions. In some instances, it may be desirable to study
certain unique segments of D. One method for achieving this involves a
stratification of the c.d.f. This is accomplished by dividing the c.d.f. into k
intervals of equal probability. For example, if k = 5, the c.d.f. intervals are
from 0-.2 , ... , .8-1, respectively. From each of the k intervals, a random
selection is made. Here again, D and its associated properties would be used
to make certain decisions.

11.3 Limiting the history

The classical form of the Monte Carlo method is a very powerful and
commonly used simulation tool; however, in the instance of product design, D
frequently suffers a major limitation -- the approach assumes a static universe
for each selected c.d.f. In most industrial applications, the µ and σ2
associated with any given c.d.f. are not necessarily immobile. In fact, it is
almost an idealization to make such an assumption. For example, such
nonrandom phenomenon as tool wear, supplier selection, personnel
differences, equipment calibration, etc. will synergistically contribute to
nonrandom parametric perturbations. For many varieties of nonlinear transfer
functions, the resulting pattern of D and V will appear random. In fact, many
analytical methods would substantiate such a qualitative assertion.
As may be apparent, the previously mentioned limitation can adversely
influence the decision making process during the course of product
configuration (design). Therefore, it is here postulated that to account for
these seemingly random perturbations, it becomes mandatory to sample from

© Mikel J. Harry, Ph.D. 2002


112
multiple distribution functions corresponding to a set of dynamic population
parameters, S, where

2
S = µ ij , σ ij i = 1 to r
j = 1 to c .
Eq. (11.3.1)

Hence, the resulting response distribution, D', will more realistically


reflect the "true" manufacturing state-of-affairs. Intrinsically, the paradigm of
S follows certain natural rules of mathematical order. To explore such rules
and the consequential impact of S on D, we shall undertake a study of chaos
theory and fractal geometry.

11.4 Understanding the chaos

To begin our discussion on the use of chaos theory and fractal geometry,
let us qualitatively define what is meant by the term "chaos." According to
Gleick (1987), the phenomenon of chaos may be described by its unique
properties. In a mathematical sense, chaos is " ... the complicated, aperiodic,
attracting orbits of certain (usually low-dimensional) dynamical systems." It
is also described by " ... the irregular unpredictable behavior of deterministic,
nonlinear dynamical systems." Yet another description is " ... dynamics with
positive, but finite, metric entropy ... the translation from math-ease: behavior
that produces information (amplifies small uncertainties), but is not utterly
unpredictable."
Obviously, the latter descriptions may prove somewhat bewildering to
the uniformed reader, to say the least. So that we may better understand the
unique properties associated with the chaos phenomenon, let us consider a
simple example. Suppose that we have some experimental space given by Q,
where Q is a quadrilateral. We shall label the northwest corner as A, the
southwest corner as B, the southeast corner as C, and the northeast corner as
D.
With the task of labeling accomplished, we must locate a randomly
selected point within the confines of Q. This is a starting point and shall be
referred to as τ0. Next, we shall select a random number, r, between 0 and 1.
Based on the value of r, we follow one of three simple rules:
Rule 1: If r <= .333, then move one-half the distance to vertex A

© Mikel J. Harry, Ph.D. 2002


113
Rule 2: If r >= .667, then move one-half the distance to vertex B
Rule 3: If .333 < r < .667, then move one-half the distance to vertex C

The selection process would be iterated a substantial number of times.


The astonishing discovery is that this process of iteration reveals a distinct
pattern often referred as a fractal shape or "mosaic." For the given rule set,
the resulting mosaic has been displayed in figure 11.4.1.

Figure 11.4.1
Fractal Mosaic Created by Successive Iteration of a Rule Set

Such a fractal phenomenon has been observed and studied by many


mathematicians, perhaps most notably by Mandelbrot (1982). In particular,
Barnsley (1979) was able to make the following conclusion: "fractal shapes,
thought properly viewed as the outcome of a deterministic process, has a
second equally valid existence as the limit of a random process." The
implications of this conclusion are quite profound for product design
simulation and process control.

© Mikel J. Harry, Ph.D. 2002


114
11.5 Evolving the heuristics

In order to evolve the aforementioned phenomenon into a practical


design simulation methodology, we must set forth a generalization of the
mathematics that underlie the fractal mosaic displayed in figure 11.4.1.

So as to translate the concept of rule-based geometric reduction, we will


incorporate the Cartesian coordinates (x,y) in a Euclidian plane, ψ. In this
instance, the boundary constraints may be given by α, β, and γ. Hence, we
will now say that ψ = f (α,β,γ ). Without loss of generality, we shall assume
that the vertices α and γ are 1 unit removed from the origin, β = 0,0.
Recognize that the resulting x,y intersect is denoted as τi. Therefore, we may
say that τi = f (x,y) at the ith generation. Furthermore, the reader must remain
cognizant of the fact that τ0 is initially established as an arbitrary location
within ψ.
A simple algebraic manipulation of the arbitrary rules pertaining to the
chaos game, yields the three point fractal generator. In general form, the
generator may be described by

x t+1 = φx t + θ 1(φ − x t)
Eq. (11.5.1)
and

y t+1 = φy t + θ 2(φ − y t)
Eq. (11.5.2)

given a decision such that θI = 0 or 1. Naturally, the decisions are based on a


random number, r, such that if

r ≤ ξ, then θ 1 =θ 2 =0
Rule (11.5.1)
or, if

r ≥ 1 − ξ, then θ 1 =1,θ 2 =0
Rule (11.5.2)

© Mikel J. Harry, Ph.D. 2002


115
otherwise,

θ 1 =0,θ 2 =1
Rule (11.5.3)

where r is a uniform random number between the limits 0 ≤ r ≤ 1, φ is a


constant which is always less than 1, θi assumes the value of 0 or 1, and ξ is a
constant such that 0 < ξ ≤ .5. For the sake of reading ease, we shall employ
the notation GRS to represent any given set of rules.73

11.6 Timing the geometry

In order to make use of the fractal described in figure 1, we must project


τi ... τn into a domain representative of time, T. To perform this task, we will
construct a axis (Z) such that it is perpendicular to the defining axes of ψ. In
two dimensional space, the latter condition can be portrayed as a straight line
(λ) at an angle of π/4 radians with the X axis. The projection of τi on λ is
given by

ωi = τi sinθ
Eq.( 11.6.1 )

where θ is defined as

θ=α-π
4
Eq.( 11.6.2 )
and

α = tan -1(τi ).

The reader should recognize that the mosaic displayed in figure 11.4.1 was generated by letting φ = .5,
73

ξ = .333 and then plotting the resulting observations τi ... τr.

© Mikel J. Harry, Ph.D. 2002


116
Eq.( 11.6.3 )

Notice that resultant projections of ωi on λ are not restricted to positive


values, owing to the fact that θ can also be negative.
We may now generate the set {ω1, ω2, ..., ωn}, where the indices 1, ..., n
correspond to progressive equally spaced points in time (t1 ... tn). Under this
condition, ω1 would occur at time t1, ω2 at time t2, and so on. Thus, we are
able to project τi ... τn into the domain T, as displayed in figure 11.6.1.

α = 0, 1
τi
ωi

.5
+.3535

β = 0, 0

−.3535
.5

Time Axis

γ = 1, 0

Figure 11.6.1
Transformation of the Fractal Mosaic into a Time Series

© Mikel J. Harry, Ph.D. 2002


117
Based on Eq. (11.5.1) and Eq. (11.5.2), it can be demonstrated that any xi,yi is
derived from xi - 1, yi - 1. This phenomenon can be expressed in the form of
an autoregressive model given as

zt = φzt -1
Eq.( 11.6.4 )

where z is an observation at time t and φ is a constant that ranges between -


1,1. The reader will recognize that time is invariant under the τ and ω
transformation schemes. As a result, the pattern in figure 11.6.1 will be the
deterministic portion of the general AR(1) model.
From the preceding arguments, the reader may have gleaned the close
association between the previously defined fractal rule set and a standard time
series model of an autoregressive nature. In other words, the association is
related to the deterministic aspects of the model. To demonstrate the
aforementioned association, we shall consider the x coordinate of the
Cartesian system. In this case, we will subtract xt from xt+1. This operation
will yield

x t+1 = 1 + φ x t − φx t-1 + θ 1 x t-1 - x t


.
Eq. ( 11.6.5 )

When θ1=0, we discover

x t+1 = 1 + φ x t − φx t-1
Eq. ( 11.6.6 )
and for the case θ1=1, we find that

x t+1 = φx t + 1 − φ x t-1 .
Eq. ( 11.6.7 )

Naturally, the y coordinate of the Cartesian system would reflect the same
mathematical constructs. From either perspective, Eq. (11.6.6) and Eq.
(11.6.7) will be recognized as the deterministic portion of an AR(2) time

© Mikel J. Harry, Ph.D. 2002


118
series model. Of general interest, the expression of a stationary autoregressive
model of order p is given by

Xt = !ziXt - i + at
i=1

Eq. ( 11.6.8 )

where at is the shock at time t. Notice that the shock is also referred to as
"white noise." It is imperative to understand that the stochastic nature of Eq.
(11.6.8) manifests itself in the distribution of at. This particular distribution is
described by g(at) = N(0,σat). Of course, it has been well established that a
great many industrial processes display a time dependency of some form. It is
interesting that the time series phenomenon is also displayed by the chaotic
pattern described in this book. The reader is directed to Box and Jenkins
(1976) for a more thorough discussion on the nature of autoregressive models.

11.7 Exemplifying the fractal

Now that we have generated the set {ω1, ω2, ..., ωn}, each ωi may be
employed as a new universe mean (µi). This is done for purposes of
enhancing the Monte Carlo simulation; e.g., it provides a mechanism for
introducing dynamic perturbations in process centering during the course of
simulation. The same logic and methodology may be applied to the
distribution variance (σ2). However, for the sake of illustration, we shall
constrain the ensuing example by only perturbing µ.
Let us suppose that we are concerned with the likelihood of assembly
pertaining to a certain product design, say a widget such as displayed in figure
11.7.1.

© Mikel J. Harry, Ph.D. 2002


119
4.976 in. ±.003
in.

Part 1 Part 2 Part 3 Part 4

Part 5

P1 ... P4 = 1.240 in. ± .003 in.

Figure 11.7.1
Illustration of the Widget Product Example

In this case, we are concerned with predicting the probability of


assembly; i.e., the likelihood that P1 ... P4 with fit into the envelope (P5), given
the specified design tolerance (∆ = .003 in.). Given this, we will assume that
the process capability for all Pj is Cp=1.0, where

Cp = ∆
3σ ,
Eq.( 11.7.1 )

T is the nominal or "target" specification74. Based on the process capability


ratio we may estimate the process standard deviation as σ = ∆/3 = .001 in. If
we let Xij be the measured length of any given manufactured part, the
assembly gap (G) may be computed as

4
Gi = Xi5 - !R ij

j=1

Eq.( 11.7.2 )

74
Recognize that such information is obtained from a process characterization study. The reader is
directed to Harry and Lawson (1988) for additional information on this topic.

© Mikel J. Harry, Ph.D. 2002


120
With the aforementioned ingredients and the assumption Xij ~ NID(µ,σ),
as well as the constraint µ = T, we are fully prepared to conduct a static Monte
Carlo simulation as outlined in section 11.2. With an additional process
parameter, k, and the aforementioned methodology, we are postured to
significantly enhance simulation accuracy. For the reader's convenience, we
shall define the parameter k as

µ-T
k=
∆ .
Eq.( 11.7.3 )

As may be apparent, k constitutes the proportion of ∆ consumed by a given


amount of off-set in µ.

To begin, we must first establish the functional limits for perturbing µ,


relative to T, during the course of dynamic Monte Carlo simulation. To do
this, we must set α = γ = 1 so that the maximum hypotenuse of ψ is equal to
.5. As we shall see, this is done to simplify subsequent calculations. In turn,
this leads us to the limiting projection that is given by

ωmax = 1
2 2.
Eq.( 11.7.4 )

Since the maximum mean shift is k∆, it can be demonstrated the scaling factor
is

ρ = k∆ 2 2 .
Eq.( 11.7.5 )

Thus, the corrected universe mean is given by

© Mikel J. Harry, Ph.D. 2002


121
µi = T + ρωi .
Eq.( 11.7.6 )

Given the previously mentioned product parameters, we shall now


evaluate the Monte Carlo outcomes under two different process centering
conditions; namely, when k=.00 and k=.75. We shall say that the latter
condition was established on the basis of empirical evidence resulting from a
process characterization study. In this case, the simulation was conducted
across N = 2,500 iterations for both values of k. Recognize that a direct
comparison is possible since both simulations were conducted using the same
seed (originating number). The overall results of the simulations are
displayed in figure 11.7.2.

Figure 11.7.2
Effects of Dynamic Mean Perturbations on the
Widget Monte Carlo Simulation
(Note: ordinates not to scale)

© Mikel J. Harry, Ph.D. 2002


122
From this figure, the inflationary effect of chaotic perturbations in µ are
quite apparent. It is also evident that the introduction of mean shifts during
the course of simulation expanded the variances over many sampling
intervals. Obviously, this constitutes a more realistic picture of expected
performance. A more detailed comparative view of the assembly gap
conditions is given in figure 11.7.3. This figure reveals the critical Z value
changed from 7.1σ to 4.5σ as a result of the dynamic simulation. Tables
11.7.1 and 11.7.2 present the summary statistics related to the simulation.

Table 11.7.1. Summary Statistics for the Widget Monte Carlo


Simulation Under the Condition k=.00

Index Part 1 Part 2 Part 3 Part 4 Part 5 Gap


Frequency = 2500 2500 2500 2500 2500 2500
Mean = -1.24000 -1.24000 -1.24000 -1.24000 4.97600 0.01604
Median = -1.24000 -1.24000 -1.24000 -1.23990 4.97600 0.01605
Std Dev = 0.00101 0.00100 0.00099 0.00101 0.00102 0.00226
Range = 0.00761 0.00649 0.00728 0.00665 0.00726 0.01662
Variance = 0.00000 0.00000 0.00000 0.00000 0.00000 0.00001
Minimum = -1.24380 -1.24300 -1.24340 -1.24330 4.97250 0.00682
Maximum = -1.23610 -1.23660 -1.23610 -1.23670 4.97980 0.02344
Skewness = 0.06587 0.03926 -0.04197 -0.01383 -0.09580 -0.03210
Kurtosis = 0.10720 -0.01050 0.02323 -0.14992 0.23387 -0.08135

Table 11.7.2. Summary Statistics for the Widget Monte Carlo


Simulation Under the Condition k=.00

Index Part 1 Part 2 Part 3 Part 4 Part 5 Gap


Frequency = 2500 2500 2500 2500 2500 2500
Mean = -1.24000 -1.24000 -1.23990 -1.23990 4.97600 0.01617
Median = -1.24010 -1.24000 -1.24000 -1.23980 4.97610 0.01617
Std Dev = 0.00156 0.00154 0.00154 0.00159 0.00157 0.00355
Range = 0.00990 0.00992 0.01004 0.00956 0.00951 0.02327
Variance = 0.00000 0.00000 0.00000 0.00000 0.00000 0.00001
Minimum = -1.24530 -1.24460 -1.24480 -1.24480 4.97160 0.00456
Maximum = -1.23540 -1.23470 -1.23470 -1.23520 4.98110 0.02783
Skewness = 0.02861 -0.02511 -0.01509 -0.15426 -0.05947 -0.02460
Kurtosis = -0.40932 -0.35181 -0.38077 -0.40191 -0.45821 -0.02434

© Mikel J. Harry, Ph.D. 2002


123
The autocorrelation of the assembly gap is located figures 11.7.4 and
11.7.5 for both values of k. It is interesting to note that the first order
autoregressive model revealed a lack of fit for the case k = .00. For the case k
= .75, a highly significant fit was observed, even in spite of the fact that the
terminal response underwent several unique nonlinear transformations. It is
generally believed that this observation supports the assertion that the pattern
of simulated means will follow a time series model.

700

600 Cp = 1.0 , k = .00


Cp = 1.0 , k = .75
500

400

300

200

100

0
0 .005 .01 .015 .02 .025 .03 .035 .04
No Shift: Z = 7.1
Z
Shifted: Z = 4.5

Figure 11.7.3
Effect of Dynamic Mean Perturbations on the
Widget Assembly Gap

© Mikel J. Harry, Ph.D. 2002


124
y = .003585x + .015979, r2 = .000013
.024
.022
.02
.018
.016
.014
.012
.01
.008
.006
.006 .008 .01 .012 .014 .016 .018 .02 .022 .024

Figure 11.7.4
Autocorrelation (Lag=1) of the Widget Assembly
Gap Under the Condition k=.00

y = .485063x + .008325, r2 = .235286


.03
.0275
.025
.0225
.02
.0175
.015
.0125
.01
.0075
.005
.0025

Figure 11.7.5
Autocorrelation (Lag=1) of the Widget Assembly
Gap Under the Condition k=.75

© Mikel J. Harry, Ph.D. 2002


125
11.8 Synthesizing the journey

As discussed in this portion of the book, the study of complex designs


can be significantly enhanced using dynamic Monte Carlo simulation. In
particular, it was concluded that the use of chaos theory and fractal geometry
has the potential to shed light on the structure of engineering problems and
manufacturing behavior. It is also generally believed the suggested approach
has the potential to constitute a new paradigm in engineering simulation
methodology.
As demonstrated, the approach consists of projecting the intersect of k
sets of Cartesian coordinates (resulting from a fractal rule set) into the time
domain. The resulting pattern emulates the behavior of a dynamic process
mean. Interestingly, it was noted that such behavior may be described by an
autoregressive model. Once the projections are made, the means are used as a
basis for dynamic Monte Carlo simulation. After application of the transfer
function, the resulting autocorrelated response vector is formed into a
histogram for subsequent study.
The net effect of such dynamic simulation results in an expanded
variance when compared to that of the classical Monte Carlo method. From
this perspective, the suggested methodology provides a better basis for the a
priori study of producibility during the product design cycle.

12.0 Concluding the Discussion

Where the theory and practice of six sigma is concerned, there has been
much debate since its inception in 1984. Central to this debate is the idea of
reproducibility, but often discussed in the form of process capability – a contrast
of the actual operating bandwidth of a response characteristic to some established,
theoretical or expected performance bandwidth. It is widely recognized among
quality professionals that there exists many types of performance metrics to report
on process capability, and that all such valid measures are interrelated by a web of
well established statistics and application practices.
In order to better understand the pragmatic logic of six sigma, we have
underscored the key ideas, tenets and statistical concepts that form its core. In
specific, this book has presented and interrogated the theoretical underpinnings,

© Mikel J. Harry, Ph.D. 2002


126
analytical rationale and supporting practices that facilitate the assessment of
configuration reproducibility, from both a design as well as process point of view.
We have also set forth the arguments necessary to develop and support a
technically sound design qualification procedure. Throughout the related
discussion, we have established that the valid assessment of process capability is
highly dependent upon a solid understanding of a) the inherent nature of variation,
b) the idea of “rational” sampling, c) analysis of variance and d) basic control
chart theory – just to name a few key concepts, tools and practices.
Of course, many of these concepts underlie the field often called “quality
science” and should be relatively familiar to most practitioners of six sigma.
However, embedded within the discussion of such ideas, we have set forth and
explored several unique twists and turns that depart from conventional theory and
practice. To this end, we considered conventional wisdom, then constructed the
unconventional arguments that differentiates the practice of six sigma from other
well know (but less effective) initiatives.
At this point in time, the world generally acknowledges that six sigma has
many practical applications and economic benefits, some of which are inclusive
of but not at all limited to a) benchmarking, b) parameter characterization, c)
parameter optimization, c) system design, d) detail design, e) reliability analysis,
f) product simulation, and g) process simulation. The ideas, methods and
practices set forth in this book can greatly extend the reach of such applications
by providing the bedrock upon which a network of unique knowledge can be
built.
In turn, new knowledge spawns new insights, which foster new questions.
Naturally, the process of investigation drives the discovery of answers. As a
consequence, ambiguity diminishes and new direction becomes clear. Only with
clear direction can people be mobilized toward a super-ordinate goal – six sigma.
Thus, the intellectual empowerment of this goal represents the ultimate aim of six
sigma.

© Mikel J. Harry, Ph.D. 2002


127
Appendix A: Guidelines for the Mean Shift

Guideline 1: If an opportunity-level metric is computed on the basis of discrete data


gathered over many cycles or time intervals, the equivalent Z transform should be
regarded as a long-term measure of performance. If we seek to forecast short-term
performance (ZST), we must add a shift factor (ZShift) to ZLT so as to remove time-related
sources of error that tend to degrade process capability. Recognize that the actual value
of ZShift is seldom known in practice when the measurements are discrete in nature (pass-
fail). Therefore, it may be necessary to apply the accepted convention and set ZShift at
1.50. As a consequence of this linear transformation, the resultant Z value is merely a
projection (approximation) of short-term performance. Thus, we are able to approximate
the effect of temporal influences (i.e., normal process centering errors) and remove this
influence from the analysis via the transform ZST = ZLT + ZShift.

Guideline 2: If a metric is computed on the basis of continuous date gathered over a


very limited number of cycles or time intervals, the resultant Z value should be regarded
as a short-term measure of performance. Naturally, the short-term metric ZST must be
converted to a probability by way of a table of area-under-the-normal-curve, or any
acceptable computational device. If we seek to forecast long-term performance, we must
subtract ZShift from ZST so as to approximate the long-term capability. Recognize that the
actual value of ZShift is seldom known in practice. Therefore, it may be necessary to
apply the accepted convention and set ZShift =1.50. As a consequence of this linear
transformation, the resulting Z value is a projection of long-term performance. Thus, we
are able to artificially induce the effect of temporal influences into the analysis by way of
ZST - ZShift = ZLT.

Guideline 3: In general, if the originating data are discrete by nature, the resulting Z
transform should be regarded as long-term. The logic of this guideline is simple: a fairly
large number of cycles or time intervals is often required to generate enough
nonconformities from which to generate a relatively stable estimate of Z. Hence, it is

© Mikel J. Harry, Ph.D. 2002


128
reasonable to conclude that both random and nonrandom influences (of a transient nature)
are reflected in the data. In this instance, guideline 1 would be applied.

Guideline 4: In general, if the originating data are continuous by nature and were
gathered under the constraint of sequential or random sampling across a very limited
number of cycles or time intervals, the resulting Z value should be regarded as short-
term. The logic of this guideline is simple: data gathered over a very limited number
cycles or time intervals only reflects random influences (white noise) and, as a
consequence, tends to exclude temporal sources of variation.

Guideline 5: Whenever it is desirable to report the corresponding “sigma” of a given


performance metric, the short-term Z must be used. For example, let us suppose that we
find 6210 ppm defective. In this instance, we must translate 6210 ppm into its
corresponding sigma value. Doing so reveals ZLT = 2.50 Since the originating data was
long-term by nature, guidelines 1 and 3 apply. In this case, ZLT + ZShift = 2.5 + 1.5 = 4.0.
Since no other estimate of ZShift was available, the convention of 1.5 was employed.

© Mikel J. Harry, Ph.D. 2002


129
References and Bibliography
Barnsley, M. and Demko, S. (1986). Chaotic Dynamic and Fractals. Academic Press, San
Diego, California.

Bender, A. (1975)."Statistical Tolerancing as it Relates to Quality Control and the


Designer." Automotive Division Newsletter of ASQC.

Briggs, J. and Peat, F. (1989). Turbulent Mirror. Harper and Row, New York, New York.

Ekeland, I. (1988). Mathematics and the Unexpected. University of Chicago Press.


Chicago, Illinois.

Evans, David H., (1974). “Statistical Tolerancing: The State of the Art, Part I:
Background,” Journal of Quality and Technology, 6 (4), pp. 188-195.

Evans, David H., (1975). “Statistical Tolerancing: The State of the Art, Part II: Methods
for Estimating Moments,” Journal of Quality and Technology, 7 (1), pp. 1-12.

Evans, David H., (1975). “Statistical Tolerancing: The State of the Art, Part III: Shifts
and Drifts,” Journal of Quality and Technology, 7 (2), pp. 72-76.

Gilson, J., (1951). “A New Approach to Engineering Tolerances,” Machinery Publishing


Co., Ltd., London.

Gleick, J. (1987). Chaos, Making a New Science. Penguin Books, New York, New York.

Grant, E.L., and Leavenworth, R.S. (1972). Statistical Quality Control (4th Edition). New
York: McGraw-Hill Book Company.

Harry, M.J. (1986). The Nature of Six Sigma Quality. Motorola University Press,
Motorola Inc., Schaumburg Illinois.

Harry, M.J. and Lawson, R.J. (1988). Six Sigma Producibility Analysis and Process
Characterization. Publication Number 6σ-3-03/88. Motorola University Press, Motorola
Inc., Schaumburg Illinois.

© Mikel J. Harry, Ph.D. 2002


130
Harry, M.J. and Stewart, R. (1988). Six Sigma Mechanical Design Tolerancing.
Publication Number 6σ-2-10/88. Motorola University Press, Motorola Inc., Schaumburg
Illinois.

Harry, M.J. and Prins, J. (1991). The Vision of Six Sigma: Mathematical Constructs
Related to Process Centering. Publication Pending. Motorola University Press, Motorola
Inc., Schaumburg Illinois.

Harry, M.J. and Schroeder R. (2000). Six Sigma: The Breakthrough Management
Strategy Revolutionizing the World’s Top Corporations. New York, NY: Doubleday.

Juran, J.M., Gryna, F.M., and Bingham, R.S. (1979). Quality Control Handbook. New
York, NY: McGraw-Hill Book Co.

Krasner, S. (1990). The Ubiquity of Chaos. American Association for the Advancement
of Science. Washington D.C.

Mandelbrot, B. (1982). The Fractal Geometry of Nature. W.H. Freeman, San Francisco.

Mood, A. and Graybill, F. (1963). Introduction To The Theory of Statistics (2nd Edition).
New York: McGraw-Hill Book Co.

Motorola Inc. (1986). Design for Manufacturability: Eng 123 (Participant Guide).
Motorola Training and Education Center, Motorola Inc., Schaumburg, IL.

Pearson, E.S. and Hartley, H.O. (1972). Biometrika Tables for Statisticians. Vol. 2,
Cambridge University Press, Cambridge.

Shewhart, W.A. (1931). Economic Control of Quality of Manufactured Products. D.


Van Nostrand Company, Inc.

© Mikel J. Harry, Ph.D. 2002


131

You might also like