Artificial Neural Networks As A Quality Loss Function For Six Sigma

Total Quality Management & Business Excellence
ISSN: 1478-3363 (Print) 1478-3371 (Online) Journal homepage: http://www.tandfonline.com/loi/ctqm20
Artificial Neural Networks as a quality loss

function for Six Sigma
Meryem Uluskan
To cite this article: Meryem Uluskan (2018): Artificial Neural Networks as a quality
loss function for Six Sigma, Total Quality Management & Business Excellence, DOI:
10.1080/14783363.2018.1520597
To link to this article: https://doi.org/10.1080/14783363.2018.1520597
Published online: 25 Sep 2018.
Submit your article to this journal
Article views: 12
View Crossmark data
Full Terms & Conditions of access and use can be found at

http://www.tandfonline.com/action/journalInformation?journalCode=ctqm20
Total Quality Management, 2018
https://doi.org/10.1080/14783363.2018.1520597
Artificial Neural Networks as a quality loss function for Six Sigma

Meryem Uluskana,b*
a
Department of Industrial Engineering, Faculty of Engineering, Eskisehir Osmangazi University,
Eskisehir, Turkey; bESOGU, Muhendislik Mimarlik Fakultesi, Endustri Muhendisligi Bolumu,
Meselik Kampusu, Eskisehir, Turkey
In this study, Artificial Neural Networks (ANNs) are proposed to be used as a quality
loss function for Six Sigma projects. An industrial data set consisting of power
consumption rates of refrigerators and thermal camera readings around their
compressors is analysed. For industrial data, relationships between inputs and outputs
can be nonlinear and complex. Therefore, traditional statistical models may result in
poor inferences. At this point, ANNs emerge as effective tools because of their ability
to learn nonlinear and complex relationships. While Six Sigma remains as the major
quality initiative, its popularity started to decline among industrial practitioners. Since
the methodology was established, Six Sigma toolbox was not radically improved.
Therefore, to enhance Six Sigma toolbox, an ANN-based structure is proposed to
detect the refrigerators not complying with power specifications through thermal
camera readings. Four quality loss function models are compared: one-dimensional
parabolic Taguchi loss functions, multivariate Maximum Likelihood cost, logistic
regression and finally ANNs. The analyses are conducted by Monte Carlo cross-
validation to obtain precision-recall curves for these methods. The ANN-based cost
function is shown to outperform other three methods. Finally, ANNs are found to be
an effective tool which may bring new dimensions to Six Sigma concept.
Keywords: Six Sigma; Artificial Neural Networks; quality loss function; Maximum
Likelihood; Taguchi loss function; logistic regression; quality control; pattern
recognition
1. Introduction
Six Sigma is a disciplined, project-oriented, statistically based, highly quantitative approach
to improving product and process quality and is a major force in business improvement
(Hahn, Doganaksoy, & Hoerl, 2000; Montgomery & Woodall, 2008). While Six Sigma
maintains its status as the major methodological quality initiative, the popularity of this
method has already started to decline among industrial practitioners. It can be argued stat-
istically that the enthusiasm for Six Sigma and expectations for it have started to decrease
(Uluskan & Erginel, 2017). There are many reasons for this situation including issues
arising during implementation, inefficient training programmes, and ineffective or improper
use of tools. While these issues constitute a major part of the decline of enthusiasm in Six
Sigma, an important fact which must not be ignored is that the Six Sigma methodology or
Six Sigma toolbox have not been radically improved since the methodology was
established.
The enhancement of Six Sigma, which includes topics such as new Six Sigma models
and new tools in Six Sigma, has long been studied in quality engineering literature (Non-
thaleerak & Hendry, 2006). These studies proposed new tools for Six Sigma projects.
*Email: meryemulus@yahoo.com
© 2018 Informa UK Limited, trading as Taylor & Francis Group

2 M. Uluskan
However, further research is needed on the effectiveness of those new tools (Nonthaleerak
& Hendry, 2006). Moreover, prior literature found that Artificial Neural Networks (ANNs),
an advanced machine-learning tool, is one of the least frequently used tools in Six Sigma
projects (Uluskan, 2017). Therefore in this study, it is proposed that ANNs can be used
effectively as a quality loss function for Six Sigma projects. The effectiveness of ANNs
is tested in this study using data collected in an industrial setting.
In practical situations with real industrial data, generally, the relationships between pre-
dictors (inputs) and outcomes (outputs) are nonlinear as well as complex. Therefore, the use
of traditional statistical models may result in poor inferences regarding the real world data.
At this point, ANNs emerge as an effective tool for real industrial situations as they have the
ability to learn and model nonlinear and complex relationships. Moreover, after ANNs learn
from inputs and their corresponding outputs, they can infer hidden relationships between
these and can predict relationships more thoroughly for future data sets. This characteristic
of ANNs make them powerful tools while dealing with real data as opposed to traditional
techniques. Finally, unlike many other prediction techniques, ANNs do not impose any
restrictions on the input variables, such as assumptions on how the data should be distrib-
uted. Therefore, ANNs provide great flexibility in real industrial practices.
As machine learning and data mining technology developed, several disciplines took
advantage of these developments. Many disciplines which had been previously accustomed
to statistical models now adopted very quickly the contemporary ANNs. While many
different disciplines take advantage of machine learning, quality engineering does not effec-
tively utilise these structures.
The use of statistical tools and the correct interpretation of the statistical tests require at
least a moderate level of understanding about statistics. So, a significant level of training is
needed. However, industrial processes can be so complicated that basic statistical tools
cannot model the complexity effectively. On the other hand, the use of machine-learning
tools such as ANNs can lead to new understanding in quality engineering. Once a prac-
titioner becomes an expert in using machine-learning tools, he or she can more accurately
assess and predict industrial processes.
Therefore in this study, ANNs are proposed as an effective tool to model quality loss
function for Six Sigma projects. This study argues that to increase its effectiveness, Six
Sigma must be replenished by new powerful tools. To support this idea, an industrial
data set, which includes power consumption rates of refrigerators as well as thermal
camera readings around the compressors and the refrigerators themselves, is analysed.
An ANN-based model is proposed to detect refrigerators that do not comply with power
specifications by means of thermal camera readings. It is hypothesised that ANN-based
models outperform other statistical models. Consequently, ANN, an advanced machine-
learning tool, is proposed to help provide new insights in quality engineering and increase
the effectiveness of Six Sigma projects.
Up to now, the motivation of this study has been provided. The Background section will
first provide a detailed explanation on ANNs and how they are superior compared to tra-
ditional statistical methods. Then, a literature review about the use of ANNs in quality initiat-
ives is provided. Next, three competing methods whose performances will be compared to
that of ANNs are introduced: parabolic Taguchi loss function, Maximum Likelihood Cost
(multivariate cost function) and logistic regression. The Background section also provides
brief information about Retrieval Systems, Precision, and Recall which are all subjects of
pattern recognition. The Methods section introduces how different quality loss functions
are used to create retrieval systems for the refrigerator data. This section is enriched by
many visual displays, i.e. by figures and by videos to make these concepts more tangible
Total Quality Management 3
to the readers. Maximum Likelihood and logistic regression-based retrieval and then, the pro-
posed solution (i.e. ANNs-based retrieval) are thoroughly explained in this section. In the
Experiments and Results section, a description of how the experiments were conducted is
provided together with the precision-recall curves for all the four methods. A discussion of
the results is also presented. Finally, the Conclusion section discusses the success of ANN-
based-systems as strong candidates to be included in the toolbox of Six Sigma projects.
2. Background
2.1. ANNs: a brief introduction
ANNs are information processing models inspired by the way biological nervous systems
process information (Basu, Bhattacharyya, & Kim, 2010) and thus, are computational
models of the brain. They are increasingly being used to model complex, nonlinear data
(Yang, 2009). Inspired by the nervous systems, neural networks are comprised of a large
number of highly interconnected processing elements, i.e. neurons, working in harmony
to solve particular problems (Basu et al., 2010). Accordingly, they have the ability to
learn quantitative relationships between input variables and the corresponding output vari-
ables (Tu, 1996). Therefore, neural networks are configured for a specific application
through a learning process (Basu et al., 2010).
Learning is achieved by training the network with a training data set consisting of input
variables and the known or associated outcomes (Tu, 1996). Once a network has been
trained, it can be used for tasks in a separate test data set (Tu, 1996). In general, a neural
network can be divided into three layers: (a) input layer, (b) hidden layers and (c) output
layer. Input layer is responsible for receiving information (data) from the external environ-
ment, whereas, hidden layers are composed of neurons which are responsible for internal
processing of the data. Ultimately, output layer which is also composed of neurons is
responsible for producing and presenting the final network outputs (Da Silva, Spatti, Flau-
zino, Liboni, & Dos Reis Alves, 2017).
The main characteristics of neural networks are that they have the ability to learn
complex nonlinear input-output relationships, use sequential training procedures, and
adapt themselves to the data (Basu et al., 2010). Recent advances in neural network-
based models result in outstanding accuracies in a variety of fields (Cerquitelli, Quercia,
& Pasquale, 2017). Mathematically, a neuron k can be described by the following equations
(Tosun, Aydin, & Bilgili, 2016):

m
uk = wkj xj ,
j=1
yk = w (uk + bk ),
where xj ’s are the inputs and wkj ’s are the weights of the neuron k, and uk is the linear com-
biner output due to input signals. w( · ) is the activation function and bk is the bias of the
activation function and finally yk is the output signal of the neuron.
There are two main types of neural networks: deep neural network and shallow
network. Deep neural network has two or more hidden layers as opposed to shallow
neural networks that usually have only one hidden layer. If there are more than 10
layers, then it is called very deep learning. Additional layers make it possible to extract
data features from the lower layers, i.e. to extract features from features, which creates
4 M. Uluskan
the potential to model complex data with fewer neurons than in a shallow network (Schmid-
huber, 2015).
2.2. ANNs versus traditional statistical methods

With the growing complexity of the industrial systems, traditional statistical tools and
methods cannot satisfy all the demands of current complex industrial environment, so
data-driven modelling which is based on computational intelligence and machine-learn-
ing methods such as ANNs are brought into practice. Traditional parametric modelling
uses data to search for an optimal value of a parameter that varies within a space of
specified dimension. Complex data, such as those encountered in contemporary data
analysis can seldom be fully studied by this traditional approach (Ciampi & Lecheval-
lier, 2007).
Traditional statistical modelling is formalisation of relationships between variables in
the form of mathematical equations. Statistical modelling works on a number of assump-
tions. As an example, linear regression assumes that there exists a linear relation
between independent and dependent variables, observations should be independent of
each other and error should be normally distributed. In statistical modelling, even a non-
linear model has to comply to a continuous separation boundary.
On the other hand, machine learning, upon which data-driven modelling is based, is an
algorithm that can learn from data without relying on rules-based programming. Machine-
learning algorithms, in general, are spared from most of these assumptions. The biggest
advantage of using a machine-learning algorithm is that there might not be any continuity
of the boundary. In data-driven modelling, which can be successfully used to model
complex industrial data, the underlying relationship among measured data is calculated
by the model itself (Stundner & Al-Thuwaini, 2001).
From a practical or industrial point of view, the use of ANNs can appear to be some-
what difficult. For industrial practitioners, attaining the necessary experience to efficiently
use ANNs can be quite costly in terms of time and financial resources. In general, these
methods are often considered as a black box, and most of the rules for building a neural
network model are empirical rather than theoretical (Kislov & Gravirov, 2018). Compu-
tational burden required for model development and its tendency to over-fit are sometimes
regarded as disadvantages of ANNs (Tu, 1996). On the other hand, usage of neural net-
works has a number of advantages as it has ability to completely detect complex nonlinear
relationships between dependent and independent variables, has ability to detect all poss-
ible interactions between predictor variables, and has the availability of multiple training
simultaneously (Tu, 1996). Therefore, once a practitioner becomes proficient in ANNs, it
is quite sure that he or she will obtain superior results in their industrial processes and
projects.
2.3. ANNs in quality initiatives

Neural networks represent a novel approach that can provide solutions to problems for which
traditional mathematics is not able to find a reasonable solution. These problems are gener-
ally complex in nature and some of the mechanisms involved in the problem have not been
fully understood by the researchers studying them (Lolas & Olatunbosun, 2008).
Neural networks model the neural connections in the human brain and imitate the
human ability to learn from experience. Their ability to capture knowledge is among the
main advantages that make these expert systems attractive for a large variety of applications
(Lolas & Olatunbosun, 2008). Therefore, complex problems in quality and Six Sigma pro-
jects can be better addressed by ANNs. Previous literature proposed the potential use of
ANNs in Six Sigma projects and provided examples of their usage (Mahanti & Antony,
2005; Brady & Allen, 2006).
Pyzdek’s (2009) Six Sigma paradox states that Six Sigma focuses on the reduction of
process variation in order to meet or exceed customer requirements, but significant quality
levels or improvements can be achieved only by changing the way of thinking within the
organisation. In other words, the Six Sigma paradox addresses the necessity of changing the
mindset of the organisation in order to eliminate the variation within the processes. This
paradox emphasises the significance of the creativity within organisations (Pyzdek,
2003). Keeping this paradox in mind, Pyzdek utilised ANNs as a tool for Six Sigma projects
(Pyzdek, 1999). He used ANNs to predict the level of defects based on the parameters of the
processes. He compared the surface created by ANNs with the corresponding response
surface model from the classical design of experiments (DOE). He found that compared
to the classical DOE method, an ANN-based surface can be well complicated to produce
better predictions about defect rates. He emphasised that his study simply points out the
potential applications of ANNs for quality and performance improvement.
Prior to Pyzdek, Su and Hsieh (1998) similarly examined the potential use of ANNs in
creating more efficient response surface models. They argued in their paper that prac-
titioners with limited statistical training, especially engineers, can more easily use ANNs
in quality control. In Mahanti and Antony (2005), ANNs were mentioned as one of the
computational intelligence techniques which help software quality assessment for Six
Sigma projects in the software development. ANNs is mentioned in additional studies as
a potential tool for Six Sigma projects (Patterson, Bonissone, & Pavese, 2005; Brady &
Allen, 2006).
In a study by Johnston, Maguire, and McGinnity (2009), the authors utilised ANNs as a
key component in their Six Sigma project. In their case study, after observing the nonlinear-
ity between read/write capability of hard disc drives (HDD) and the associated predictor
parameters, the authors trained ANNs to better predict the performance of HDDs. In
research conducted by El-Midany, El-Baz, and Abd-Elwahed (2012), an ANN-based
approach was used for performance prediction within a manufacturing environment. The
study evaluated the manufacturing system by integrating ANNs with other Six Sigma
tools. Fahey and Carroll (2016) proposed a neural network approach in biopharmaceutical
manufacturing and compared ANNs to multiple linear regression. In the context of Six
Sigma, their research showed how ANNs can be used to interpret the data gathered
during the manufacturing process.
In a study by Wu, Wang, Zhang, and Huang (2011) the authors investigated a new
Design for Six Sigma (DFSS) approach by employing ANNs to optimise burnishing for-
mation process quality and yield. Their results indicated that the DFSS-Neural networks
method is an effective tool to improve the yield in machining. Kuthe and Tharakan
(2009) applied ANNs in a Six Sigma DMADV project in the steel industry and compared
the ANNs with regression analysis. Sen (2015) presented a case study of an iron manufac-
turing plant where the variation of CO from the blast furnace was the problem. The
researcher identified the significant process parameters responsible for CO emission and
then used ANNs as a Six Sigma tool to model the process.
In examining the previous literature the potential for ANNs as a Six Sigma tool is clear,
so they can be well utilised as an advanced tool in Six Sigma projects. Further studies that
integrate ANNs into Six Sigma projects are needed to persuade quality engineers to use
ANNs.
6 M. Uluskan
2.4. Multivariate quality loss function: Maximum Likelihood cost

The quality loss function is defined as a quantitative way of evaluating quality. Taguchi,
Chowdhury, and Wu (2005) stated that quality loss can be expressed as a deviation from
the target value. The most basic version of the quality loss function is the parabolic loss
function (Taguchi et al., 2005):
L = k (y − m)2 (1)
where y is the value of the quality characteristic, m is the target value of y, k is a constant and
L is the quality loss. This parabolic definition of the loss function is similar to the Gaussian
squared error concept. When there exist multiple quality characteristics to be evaluated, the
multivariate version of the quality loss function is defined as the following (Suhr & Batson,
2001):

n
n
L= kij (yi − mi ) (yj − mj ) (2)
i=1 j=1
Keeping in mind the analogy of these loss functions and the Gaussian squared error, a more
organised loss function can be written as the Maximum Likelihood cost:
L = x C −1 xT (3)
where x is the vector of the deviations from the target values for multiple quality character-
istics:
x = [(y1 − m1 ) (y2 − m2 ) . . . (yn − mn )] (4)
and C −1 is the inverse of the covariance matrix of the quality characteristics. When
Maximum likelihood cost is expressed as Equation (2), the following relation must be
established:
kij = C −1 (i , j) (5)
When there exist only two quality characteristics, the maximum likelihood cost function
can be expressed by means of a surface as shown in Figure 1. In Figure 1, the maximum
likelihood surface is plotted based on the mean vector and the covariance matrix of
Thermal Cameras 1 and 3 of the refrigerators. This surface touches the x–y plane at the
mean point of Thermal Cameras 1 and 3. As the quality characteristics move away from
the mean point, the cost starts to increase in all directions. However, the cost surface is ellip-
tically aligned in accordance with the multivariate distribution of the data. In the next sec-
tions, a threshold value will be applied to this cost function to detect the items which do not
comply with the power specifications.
2.5. Logistic regression

Binary logistic regression is a model-building technique developed for the case where the
outcome variable is binary or dichotomous (Hosmer & Lemeshow, 2000). Unlike linear
regression, the dependent variable has a ceiling and a floor. The function describing the
Figure 1. Maximum Likelihood cost surface for thermal cameras 1 and 3.
transition from the floor to the ceiling is generally desired to be an S-shaped curve just like
the cumulative distribution of a normal random variable (Hosmer & Lemeshow, 2000).
Linear regression faces significant problems in dealing with binary dependent variables.
Logistic regression applies a transform to allow the dependent variable to approach a
ceiling and a floor while the independent variables can range from −1 to +1. Although
many nonlinear functions can represent the S-shaped curve, the logit transformation has
become popular because it is a flexible and easily used function (Pampel, 2000). Conse-
quently, when there are N independent variables (i.e. x1 , x2 , … , xN ), the logistic regression
estimates a multiple linear regression model as the following (Hosmer & Lemeshow, 2000),

p(x)
ln = b0 + b1 x1 + b2 x2 + · · · + bN xN (6)
1 − p(x)
where p(x) represents the conditional mean of the output (i.e. Y) given x, namely E(Y|x).
The left hand-side of the above equation is the logit transformation of p(x) which has
just mentioned above. After the logistic model is fitted and the estimated coefficients
(i.e. b̂0 , b̂1 , … , b̂N ) are obtained, the estimated logistic probability p̂(x) can be calculated
by means of the logistic function as the following,
eb̂0 + b̂1 x1 + ··· + b̂N xN

p(x) = (7)
1 + eb̂0 + b̂1 x1 + ··· + b̂N xN
While the training data used in model fitting includes a binary output variable, the estimated
logistic probability p̂(x) is a continuous function of x. So, this property allows the logistic
regression to be used as quality loss function as will be described in Section 3.2.
2.6. Precision and recall

In pattern recognition, precision is defined as the ratio of the number of relevant items
retrieved to the number of all retrieved items. Recall is defined as the ratio of the
8 M. Uluskan
number of relevant items retrieved to the number of all relevant items. Therefore, usually as
the retrieval system applies a more strict rule to retrieve items, then recall (i.e. the ability to
retrieve more relevant items) decreases, while precision (i.e. the ability to retrieve a higher
rate of relevant items within all retrieved items) increases. There is a tradeoff between recall
and precision. Consequently, to compare the performance of retrieval systems, precision-
recall curves are established. When a precision-recall curve that encloses all the other
curves is obtained, then it is said that a better retrieval system is created.
In this study, an ANN-based retrieval system will be created to detect the refrigerators
that do not comply with the power specifications through thermal camera readings. The
ANN retrieval system will be compared to the parabolic Taguchi loss, maximum likelihood
cost and logistic regression models.
3. Methods
The industrial data set used in this study includes thermal camera readings as well as the
power consumption rates for the refrigerators that were tested in an isolated test room
after production. In the test room, several different thermal cameras read the temperature
levels in degrees Celsius at different regions of the refrigerator and its compressor. To sim-
plify the overall process and to be able to visualise the models that are created, only two of
these thermal cameras were included in the present study. The industrial data set includes
thermal camera readings of 478 refrigerators. The power specifications determined by the
company is between 150 and 180 Watts. The aim of the research is to detect the refriger-
ators not complying with power specifications through thermal camera readings, assuming
that the incompatible items differ from other items in terms of thermal camera readings.
Four different types of retrieval systems are created to detect refrigerators that do not
comply with power specifications by means of thermal camera readings. These are the para-
bolic Taguchi loss, Maximum Likelihood cost, logistic regression and finally ANNs. The
major aim of the study is to show that ANNs outperform all the other competing
methods. In this section, how retrieval systems are created by Maximum Likelihood
cost, logistic regression and finally by ANNs will be explained in detail.
3.1. Maximum Likelihood cost-based retrieval

The aim of this section is to describe the maximum likelihood cost to detect items which do
not comply with power specifications by means of thermal camera readings. The power
consumption levels of refrigerators are partially correlated with the thermal camera read-
ings, therefore thermal camera readings should provide some cues regarding whether a
refrigerator complies with the specifications or not. From this point of view, the refriger-
ators whose power consumptions do not comply with the specifications should yield
some outlying thermal camera readings in a multivariate sense. Therefore, when a threshold
for the maximum likelihood cost of items is applied, the items with incompatible power
consumptions can be retrieved.
Figure 2 is the two dimensional contour plot version of Figure 1 when a threshold, i.e. a
cutting value, is applied to the upper part of this 3D cost surface. Accordingly, a threshold in
maximum likelihood cost implies an elliptical boundary in the two dimensional space as
shown in Figure 2. Therefore, the items which lay outside this ellipse are considered as
the outlying items, so they are retrieved as the incompatible power consumption items.
As shown in Figure 2, empty circles represent compatible items which are within power
specifications, whereas, filled circles represent incompatible items which are out of
Figure 2. The retrieval system and elliptical threshold based on Maximum Likelihood cost.
power specifications. True positives are the detected incompatible items through threshold
applied in maximum likelihood cost function. It can be seen that there exist only a few com-
patible items, i.e. false alarms, which are located outside the threshold ellipse, yet the
majority of the outlying items are incompatible items. The existence of compatible items
outside of the threshold ellipse decreases the precision of the retrieval system.
Moreover, there exist some incompatible items remaining inside of the threshold
ellipse. These items cannot be retrieved easily without retrieving a lot of irrelevant items.
The existence of these incompatible items inside of the ellipse decreases the recall of the
retrieval system.
The retrieval process by means of Maximum Likelihood cost is also depicted in the fol-
lowing video:
Video 1. The retrieval process by the means of Maximum Likelihood cost

https://youtu.be/ujohDoS8tGI
3.2. Logistic regression-based retrieval

In Section 2.5, the mathematical background of logistic regression has been described. This
section will explain how logistic regression is utilised as a quality loss function. It has been
mentioned that in logistic regression, while the training data used in model fitting includes a
binary output variable, the estimated logistic probability p̂(x) is a continuous function of x.
Therefore, the logistic regression model can be viewed as a 3D surface to which thresholds
are applied for the retrieval systems.
As a three dimensional surface plot, Figure 3(a) shows the logistic regression model
trained on the refrigerator data. As can be seen, the region where the incompatible items
are located yields 1, while the region where the compatible items are dominantly located
yields 0. As mentioned in Section 2.5, the transition from 1 to 0 is a smooth S-shaped tran-
sition along the surface. Therefore, this logistic probability can be used as a cost surface to
create a successful retrieval system for incompatible items. As the threshold is decreasing
10 M. Uluskan
Figure 3. The logistic regression model for the refrigerator data: (a) 3D surface of logistic model, (b)
the retrieval system by means of logistic regression.
from 1 to 0, more incompatible items can be retrieved while the precision is reduced as
shown in Figure 3(b). Video 2 visually demonstrates the use of logistic regression for
the refrigerator data in a more tangible manner.
Video 2. The retrieval process by the means of Logistic Regression

https://youtu.be/uXHfy3L9xnw
3.3. Artificial Neural Network cost-based retrieval

A more advanced retrieval scheme can be modelled with ANNs. Before applying ANNs, the
data are converted to polar coordinates around the mean of the data to make the data more
interpretable when using ANNs. The polar coordinates consist of radial and angular coordi-
nates (i.e. r and u) as shown in Figure 4. The radial coordinate or radius r is the distance from
the reference point (i.e. the mean of the data in our case), and the angular coordinate or the
polar angle u is the counterclockwise angle from the corresponding horizontal-axis.
Figure 4. Cartesian to polar conversion around the mean of the data.
The compatible and incompatible items are now determined by the value of the radial
distance to the origin, and the angle with respect to the x-axis. In Figure 5(b), the new align-
ment of the data in accordance with the polar coordinates is shown. The original alignment
shown is in Figure 5(a). The radial coordinate uses a log scale to be able to analyse the data
in a more compact form.
By means of polar coordinates, the incompatible items are gathered more closely to one
another, while compatible items scatter into a larger area. By this way, the data are scattered
into the plane in a more homogenous way. This helps the ANN to better ‘learn’ the struc-
ture. The conversion of the data to polar coordinates is also depicted in the following video:
Video 3. Conversion of the data to polar coordinates to help ANN

https://youtu.be/VNZWm7RrNr0
The next step is to train an ANN which returns higher values for the items which do
not comply with the power specifications. During the training phase, based on power con-
sumption rates, the training data are labelled as incompatible and compatible. The input
consists of two dimensions of radial (in log scale) and polar coordinates. The output is
a cost value of 1 for incompatible and 0 for compatible items. To prevent the ANN
from overfitting the data, twenty percent of the data are reserved for a validation
dataset. When no further gain is obtained with validation data, training iterations stop.
Moreover, the number of neurons in the hidden layer is set to 10, which is a relatively
small number, in order to prevent overfitting. In other words, the ANN will not learn
unnecessary details within the training data.
Finally, by preventing the ANN from overfitting the data, a smooth ANN-based cost
surface is created, as shown in the 3D plot in Figure 6(a). The output of the training data
is binary (i.e. either 1 or 0). However, for the output of the trained ANN, the transition
from compatible to incompatible states (i.e. transition from low to high values) is a
smooth ramp instead of a sharp increase. This is a natural consequence of preventing
the ANN from overfitting the data. To demonstrate the opposite case, that is the
12 M. Uluskan
Figure 5. The original vs. the new alignment of the data obtained by means of polar coordinates.
over-fitting case, Figure 6(b) shows an ANN surface obtained by 100 neurons in the
hidden layer. As can be seen, while Figure 6(a) is reasonably smooth to be used as a
quality loss function, Figure 6(b) includes too many unnecessary fluctuations, indicating
overfitting.
The match of the ANN cost surface and the data is displayed in Figure 7 where the
contour plot of the cost surface and the scatter of the data are superimposed on each
other.
Finally, based on this ANN-based cost surface, the items that do not comply with the
power specifications can be retrieved by applying a proper threshold level. In Figure 8,
this retrieval process is depicted by means of the boundary which is implied by a certain
Figure 6. The 3D plot of the Artificial Neural Network-based multivariate cost surfaces: Number of
neurons in hidden layer is (a) 10 and (b) 100.
threshold level. Again, true positives in Figure 8 are the incompatible items which are accu-
rately detected as incompatible ones through ANN-based cost function and false alarms are
compatible items which fall beyond this threshold and therefore they are incorrectly deter-
mined as incompatible items by the ANN-based cost function.
Figure 7. The superposition of contour of ANN cost surface and the data.
14 M. Uluskan
Figure 8. The retrieval system and the threshold which is based on ANN Cost.
The ANN-based retrieval process is also depicted by the following video:
Video 4. The retrieval system and the threshold which is based on ANN Cost
https://youtu.be/2hSL0I9YtKs
4. Experiments and results

4.1. Monte Carlo cross-validation
In this section, precision-recall curves of the single dimension parabolic Taguchi loss func-
tion, Multivariate Maximum Likelihood cost, logistic regression and finally ANN-based-
cost are compared to each other. To provide valid results, a repeated training-test split
(i.e. Monte Carlo cross-validation) technique is applied. Each time the data is randomly
divided into exactly two parts as the training and the test data. The ANNs, logistic
Figure 9. The flowchart of a single iteration of Monte Carlo cross-validation.

Figure 10. Precision-recall curves.
regression models, mean vectors and the covariance matrices are all trained or obtained
based only on the training data. Then, the test data are used with the trained models to
obtain a precision-recall curve. These experiments are repeated 100 times to obtain an
average smooth precision-recall curve. Figure 9 shows the flowchart of a single iteration
of this process for ANN-based retrieval systems.
4.2. Precision-Recall curves: the results of the experiments

Consequently, to compare the performance of retrieval systems, precision-recall curves are
established. The precision-recall curves of different methods are plotted in Figure 10. When a
precision-recall curve that encloses all the other curves is obtained, then it is said that a better
retrieval system is created. In other words, the higher the precision and recall values simul-
taneously, the more accurate the retrieval process of the method. As can be seen, single-
dimensional Taguchi cost functions are the least effective solutions for these data. The multi-
variate maximum likelihood cost appears to be superior to the single-dimensional Taguchi
cost functions. Logistic regression exceeds the performance of the Maximum Likelihood
cost. Finally, the ANN-based cost function outperforms all the other methods by producing
a precision-recall curve which encloses all the others. This indicates that there is a high poten-
tial for ANNs to provide accurate interpretations of industrial data in Six Sigma projects.
4.3. Discussion
The Maximum Likelihood cost model performs better than the single-dimensional Taguchi
loss functions, because the maximum likelihood cost handles the retrieval issue in the multi-
variate sense. An additional dimension enriches the total available information and also the
joint relationship between these two dimensions produces a more sophisticated retrieval
system. Logistic regression demonstrates superior performance compared to Maximum like-
lihood cost. This is mainly because the incompatible items do not cover the compatible items
in all directions in Figure 2. In other words, the incompatible items are mostly located under
the compatible items as seen in Figure 2. If this were not the case (i.e. the incompatible items
covered the compatible majority from all directions), then logistic regression which only
creates a linear boundary to separate these items would fail. In this scenario, Maximum like-
lihood cost which creates an elliptical boundary to separate these items would perform better
than logistic regression. Nevertheless, ANNs would be still the best method because it can be
easily adjusted to any type of scenario. As a result, this study demonstrates that ANNs can be
16 M. Uluskan
effectively used in industrial data as long as the main philosophy of this machine-learning
technique is correctly understood by the Six Sigma practitioners.
5. Conclusion
ANNs can be used in a wide range of applications to meet a lot of different needs. As
machine learning and data mining technology developed, several disciplines took advan-
tage of these developments. While many different disciplines take advantage of advanced
machine-learning tools, quality engineering does not effectively utilise these structures.
While Six Sigma maintains its status as the major methodological quality initiative, the
popularity of this method has already started to decline among industrial practitioners
because of improper and inefficient use of Six Sigma tools. The Six Sigma methodology
or Six Sigma toolbox have not been radically improved since the methodology was estab-
lished. Therefore, this study argues that in order to re-increase the enthusiasm against Six
Sigma, Six Sigma toolbox must be replenished by new powerful machine-learning tools.
In real practices, generally, the relationships between inputs and outcomes are nonlinear
as well as complex. Therefore, the use of traditional statistical models may result in poor
inferences regarding the real industrial data. Statistical modelling works on a number of
assumptions. As an example, linear regression assumes that there exists a linear relation
between independent and dependent variables, observations should be independent of
each other and error should be normally distributed. However, ANNs can be quite flexible
to fit any distribution occurred in industrial data without any prior assumption. This charac-
teristic of ANN-based learning makes it attractive for industrial applications.
ANNs can be considered as an effective tool as they have the ability to learn and
model nonlinear and complex relationships. After ANNs learn from inputs and their cor-
responding outputs, they can infer hidden relationships between these and can predict
relationships more thoroughly for future data sets. This characteristic of ANNs makes
them attractive compared to traditional techniques while dealing with real data. Unlike
many other prediction techniques, ANNs do not impose any restrictions on the input vari-
ables, such as assumptions on how the data should be distributed. Therefore, ANNs
provide great flexibility in real practices, such as it requires less formal statistical training
for practitioners.
Considering these advantages of ANNs, in this study ANNs are proposed to be an
effective tool for quality management implementations and to be used for industrial
data sets. In order to be able to show how ANNs provide superior results as opposed
to traditional techniques, four different quality loss function models i.e. one-dimensional
parabolic Taguchi loss function, multivariate Maximum Likelihood cost function, logistic
regression and ANNs are compared in terms of their abilities in detecting the defective
items within a dataset. It is shown that ANNs outperform all the other methods
because they can easily adjust themselves to any type of scenario. Therefore, ANN is
offered as an effective tool for quality management implementations especially for Six
Sigma projects.
Six Sigma training includes many statistical tools and tests which may require prac-
titioners to deeply understand and employ many statistical rules. Each statistical test has
its own steps which must be carefully fulfilled to achieve a valid test. Moreover, industrial
datasets can be so complicated that the basic statistical tools may not uncover the true
relationships within the data. On the other hand, the use of machine-learning tools such
as ANNs can lead to new understanding in quality engineering. Once practitioners learn
how to use ANNs, they can use this tool to solve different kinds of problems. By applying
ANNs to many different problems, engineers will start to become proficient in using ANNs.
As they get better at using ANNs, new insights in quality engineering will be obtained. As
new advanced tools are proved to be successful in Six Sigma projects in the future, many
different techniques will be also utilised by practitioners, such as fuzzy logic systems as
well as Artificial Network-Based Fuzzy Inference Systems (ANFIS) and so on. Finally,
this will result in new dimensions and understanding in the Six Sigma concept.
A video depicting the steps in this study can be found at the following link:
Video 5: Complete Description Video - Artificial Neural Networks as a Quality Loss

Function for Six Sigma
https://youtu.be/y6bSp3RnULQ
Disclosure statement
No potential conflict of interest was reported by the author.
Funding
This research is funded by TUBITAK (Scientific and Technological Research Council of Turkey)
with the project number 115C079.
References
Basu, J. K., Bhattacharyya, D., & Kim, T. H. (2010). Use of artificial neural network in pattern rec-
ognition. International Journal of Software Engineering and Its Applications, 4, 23–34.
Brady, J. E., & Allen, T. T. (2006). Six sigma literature: A review and agenda for future research.
Quality and Reliability Engineering International, 22(3), 335–367.
Cerquitelli, T., Quercia, D., & Pasquale, F., Eds. (2017). Transparent data mining for Big and small
data. Cham: Springer International.
Ciampi, A., & Lechevallier, Y. (2007). Statistical models and artificial neural networks: Supervised
classification and prediction via soft trees. In Advances in statistical methods for the health
sciences (pp. 239–261). Boston: Birkhäuser.
Da Silva, I. N., Spatti, D. H., Flauzino, R. A., Liboni, L. H. B., & Dos Reis Alves, S. F. (2017).
Artificial neural network architectures and training processes. In Artificial neural networks
(pp. 21–28). Cham: Springer.
El-Midany, T. T., El-Baz, M. A., & Abd-Elwahed, M. S. (2012, January). A proposed prediction
approach for manufacturing performance processes using ANNs. Proceedings of the 2012
international conference on industrial engineering and operations management. (pp. 192–
200). Istanbul: IEEE.
Fahey, W., & Carroll, P. (2016). Improving biopharmaceutical manufacturing yield using neural
network classification. BioProcessing Journal, 14(4), 39–50.
Hahn, G. J., Doganaksoy, N., & Hoerl, R. (2000). The evolution of six sigma. Quality Engineering, 12
(3), 317–326.
Hosmer, D. W., & Lemeshow, S. (2000). Applied logistic regression (2nd ed.). New York, NY: John
Wiley & Sons.
Johnston, A. B., Maguire, L. P., & McGinnity, T. M. (2009). Downstream performance prediction for
a manufacturing system using neural networks and Six-sigma improvement techniques.
Robotics and Computer-Integrated Manufacturing, 25(3), 513–521.
Kislov, K. V., & Gravirov, V. V. (2018). Deep artificial neural networks as a tool for the analysis of
seismic data. Seismic Instruments, 54, 8–16.
Kuthe, A. M., & Tharakan, B. D. (2009). Application of ANN in Six Sigma DMADV and its com-
parison with regression analysis in view of a case study in a leading steel industry.
International Journal of Six Sigma and Competitive Advantage, 5(1), 59–74.
Lolas, S., & Olatunbosun, O. A. (2008). Prediction of vehicle reliability performance using artificial
neural networks. Expert Systems with Applications, 34(4), 2360–2369.
18 M. Uluskan
Mahanti, R., & Antony, J. (2005). Confluence of Six Sigma, simulation and software development.
Managerial Auditing Journal, 20(7), 739–762.
Montgomery, D. C., & Woodall, W. H. (2008). An overview of Six Sigma. International Statistical
Review, 76(3), 329–346.
Nonthaleerak, P., & Hendry, L. C. (2006). Six Sigma: Literature review and key future research areas.
International Journal of Six Sigma and Competitive Advantage, 2(2), 105–161.
Pampel, F. C. (2000). Logistic regression: A primer. London: Sage.
Patterson, A., Bonissone, P., & Pavese, M. (2005). Six Sigma applied throughout the lifecycle of an
automated decision system. Quality and Reliability Engineering International, 21(3), 275–292.
Pyzdek, T. (1999). Virtual-DOE, data mining and artificial neural networks. Retrieved from http://
qualityamerica.com/LSS-Knowledge-Center/designedexperiments/virtual_doe_data_mining_
and_artificial_neural_networks.php (Accessed February, 2017)
Pyzdek, T. (2003). The Six Sigma handbook. New York, NY: McGraw-Hill.
Pyzdek, T. (2009). The Six Sigma management paradox. Retrieved from http://sixsigmatraining.com/
leading-six-sigma/the-six-sigma-management-paradox.html
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117.
Sen, P. (2015). Application of ANN in Six Sigma for CO modelling and energy efficiency of blast
furnace: A case study of an Indian pig iron manufacturing organisation. International
Journal of Six Sigma and Competitive Advantage, 9(2–4), 109–125.
Stundner, M., & Al-Thuwaini, J. S. (2001). How data-driven modeling methods like neural networks
can help to integrate different types of data into reservoir management. In SPE Middle East oil
show. Manama, Bahrain: Society of Petroleum Engineers.
Su, C. T., & Hsieh, K. L. (1998). Applying neural network approach to achieve robust design for
dynamic quality characteristics. International Journal of Quality and Reliability
Management, 15(5), 509–519.
Suhr, R., & Batson, R. G. (2001). Constrained multivariate loss function minimization. Quality
Engineering, 13(3), 475–483.
Taguchi, G., Chowdhury, S., & Wu, Y. (2005). Taguchi’s quality engineering handbook. Hoboken,
NJ: John Wiley and Sons.
Tosun, E., Aydin, K., & Bilgili, M. (2016). Comparison of linear regression and artificial neural
network model of a diesel engine fueled with biodiesel-alcohol mixtures. Alexandria
Engineering Journal, 55, 3081–3089.
Tu, J. V. (1996). Advantages and disadvantages of using artificial neural networks versus logistic
regression for predicting medical outcomes. Journal of Clinical Epidemiology, 49, 1225–1231.
Uluskan, M. (2017). Analysis of lean Six Sigma tools from a multidimensional perspective. Total
Quality Management & Business Excellence, 1–22.
Uluskan, M., & Erginel, N. (2017). Six Sigma experience as a stochastic process. Quality
Engineering, 29(2), 291–310.
Wu, J., Wang, Y., Zhang, Q., & Huang, P. (2011). Improve burnishing formation yield applying
design for Six Sigma. In Industrial engineering and engineering management (IEEM), 2011
IEEE international conference (Vol. December, pp. 804–808). Singapore: IEEE.
Yang, X. (2009). Artificial neural networks. In Handbook of research on geoinformatics (pp. 122–
128). Hershey, PA: IGI Global.

Artificial Neural Networks As A Quality Loss Function For Six Sigma

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Artificial Neural Networks As A Quality Loss Function For Six Sigma

Uploaded by

Copyright:

Available Formats

Total Quality Management & Business Excellence

ISSN: 1478-3363 (Print) 1478-3371 (Online) Journal homepage: http://www.tandfonline.com/loi/ctqm20

Artificial Neural Networks as a quality loss

To link to this article: https://doi.org/10.1080/14783363.2018.1520597

Published online: 25 Sep 2018.

Submit your article to this journal

View Crossmark data

Full Terms & Conditions of access and use can be found at

Artiﬁcial Neural Networks as a quality loss function for Six Sigma

© 2018 Informa UK Limited, trading as Taylor & Francis Group

2.2. ANNs versus traditional statistical methods

2.3. ANNs in quality initiatives

2.4. Multivariate quality loss function: Maximum Likelihood cost

x = [(y1 − m1 ) (y2 − m2 ) . . . (yn − mn )] (4)

2.5. Logistic regression

Figure 1. Maximum Likelihood cost surface for thermal cameras 1 and 3.

eb̂0 + b̂1 x1 + ··· + b̂N xN

2.6. Precision and recall

3.1. Maximum Likelihood cost-based retrieval

Video 1. The retrieval process by the means of Maximum Likelihood cost

3.2. Logistic regression-based retrieval

Video 2. The retrieval process by the means of Logistic Regression

3.3. Artiﬁcial Neural Network cost-based retrieval

Figure 4. Cartesian to polar conversion around the mean of the data.

Video 3. Conversion of the data to polar coordinates to help ANN

The ANN-based retrieval process is also depicted by the following video:

4. Experiments and results

Figure 9. The ﬂowchart of a single iteration of Monte Carlo cross-validation.

Figure 10. Precision-recall curves.

4.2. Precision-Recall curves: the results of the experiments

Video 5: Complete Description Video - Artiﬁcial Neural Networks as a Quality Loss

You might also like