Machine Learning and Data Mining in Manufacturing

Journal Pre-proofs
Review
Machine Learning and Data Mining in Manufacturing
Alican Dogan, Derya Birant
PII: S0957-4174(20)30823-X
DOI: https://doi.org/10.1016/j.eswa.2020.114060
Reference: ESWA 114060
To appear in: Expert Systems with Applications
Received Date: 3 February 2019

Revised Date: 2 September 2020
Accepted Date: 24 September 2020
Please cite this article as: Dogan, A., Birant, D., Machine Learning and Data Mining in Manufacturing, Expert
Systems with Applications (2020), doi: https://doi.org/10.1016/j.eswa.2020.114060
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover
page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version
will undergo additional copyediting, typesetting and review before it is published in its final form, but we are
providing this version to give early visibility of the article. Please note that, during the production process, errors
may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2020 Published by Elsevier Ltd.

Title Page
Author #1
Name: Alican DOGAN
Affiliation: Dokuz Eylul University, The Graduate School of Natural and Applied Sciences,
Izmir, TURKEY
Email: alican.dogan@deu.edu.tr
Author #2
Name: Derya BIRANT
Affiliation: Dokuz Eylul University, Department of Computer Engineering, Izmir, TURKEY
Email: derya@cs.deu.edu.tr
Machine Learning and Data Mining in Manufacturing
Alican Dogana, Derya Birantb,1
a Dokuz Eylul University, The Graduate School of Natural and Applied Sciences, Izmir, TURKEY
b Dokuz Eylul University, Department of Computer Engineering, Izmir, TURKEY
Abstract - Manufacturing organizations need to use different kinds of techniques and tools in order to fulfill
their foundation goals. In this aspect, using machine learning (ML) and data mining (DM) techniques and
tools could be very helpful for dealing with challenges in manufacturing. Therefore, in this paper, a
comprehensive literature review is presented to provide an overview of how machine learning techniques
can be applied to realize manufacturing mechanisms with intelligent actions. Furthermore, it points to
several significant research questions that are unanswered in the recent literature having the same target.
Our survey aims to provide researchers with a solid understanding of the main approaches and algorithms
used to improve manufacturing processes over the past two decades. It presents the previous ML studies and
recent advances in manufacturing by grouping them under four main subjects: scheduling, monitoring,
quality, and failure. It comprehensively discusses existing solutions in manufacturing according to various
aspects, including tasks (i.e., clustering, classification, regression), algorithms (i.e., support vector machine,
neural network), learning types (i.e., ensemble learning, deep learning), and performance metrics (i.e.,
accuracy, mean absolute error). Furthermore, the main steps of knowledge discovery in databases (KDD)
process to be followed in manufacturing applications are explained in detail. In addition, some statistics
about the current state are also given from different perspectives. Besides, it explains the advantages of
using machine learning techniques in manufacturing, expresses the ways to overcome certain challenges,
and offers some possible further research directions.
Keywords - Machine learning, data mining, manufacturing, classification, clustering.
1. INTRODUCTION
Machine learning (ML) is an important research field of artificial intelligence that assists computers in
modeling based on experiences and accurately predicting future events. The major ML approaches can be
classified into two main categories: supervised learning (Parvin, Alinejad-Rokny, Minaei-Bidgoli, &
Parvin, 2013) and unsupervised learning (Minaei-Bidgoli, Parvin, Alinejad-Rokny, Alizadeh, & Punch,
2014). A typical problem in supervised learning is classification (Rokach, 2010), while unsupervised
learning is fairly common in clustering (Ahmadinia, Meybodi, Esnaashari, & Alinejad-Rokny, 2013)
problems. Commonly used techniques for classification include neural networks, support vector machines,
and decision trees (Parvin, MirnabiBaboli, & Alinejad-Rokny, 2015), and the most widely-used clustering
technique is k-means (Ahmad, & Dey, 2007). ML techniques have been widely and successfully applied to
many different fields such as health, education (Seyedaghaee, Rahati, Alinejad-Rokny, & Rouhi, 2013),
wireless sensor networks (Ahmadinia, Alinejad-Rokny, & Ahangarikiasari, 2014), and finance. This paper
provides an overview of using ML techniques in manufacturing.
Modern manufacturing plants use powerful data acquisition systems to electronically collect and transfer
data from almost all the processes of the organization. Many manufacturing variables are continuously
measured at various stages, and their values are being stored in organizations’ databases. This data may be
related to the characteristics of products, machines, production line (i.e., which machine has been used with
which setup parameters), the human resources that operate the production line (i.e., the experience level of
the worker, shift type), raw materials that are used in the process, the environment (moistness, temperature,
etc.), sensors attached to the machines (vibration, force, pressure, tension, etc.), machine failures /
maintenances, product quality and other significant manufacturing factors.
1 Corresponding author.
E-mail addresses: alican.dogan@deu.edu.tr (A. Dogan). derya@cs.deu.edu.tr (D. Birant).
As a result of developments in technology, enormous amounts of raw data are generated every day in the
manufacturing industry. This data availability in great sizes and the increasing quantity draw attention to
study in machine learning concept. Machine learning (ML) and data mining (DM) applications came to
existence in this sector nearly two decades ago to solve manufacturing problems. Intelligent systems to
support effective decision-making (Cheng et al., 2018; Kujawinska, Rogalewicz, Muchowski, &
Stankowska, 2018) programs to schedule simultaneous production line (Priore, Ponte, Puente, & Gómez,
2018), arrangements for maintenance of machines (Gandhi, Schmidt, & Ng, 2018; Zhang, Ren, Liu, & Si,
2017) can be considered as examples that use ML methods to perform manufacturing tasks. Other certain
examples are failure prediction (Nedelkoski & Stojanovski, 2017; Pavlyshenko, 2016), estimation of the
energy consumption of machines (Cupek, Ziebinski, Zonenberg, & Drewniak, 2018), product quality
assessment (Rostami, 2015) and defect detection (Huang, Pan, Lin, & Guo, 2018; Wang, 2013) in
manufacturing.
The machine learning field, containing deep learning, ensemble learning (Parvin, Alinejad-Rokny, & Parvin,
2013; Parvin, Minaei-Bidgoli, Alinejad-Rokny, & Punch, 2013), and linkage learning (Parvin, Helmi,
Minaei-Bidgoli, Alinejad-Rokny, & Shirgahi, 2011), has been considered one of the most promising
improvements in the manufacturing domain. Furthermore, its application area in manufacturing includes
many varieties, from automobile manufacturing (Syafrudin, Alfian, Fitriyani, & Rhee, 2018) to garment
industry (Lee, Choy, Ho, Chin, Law, & Tse, 2013), from the semiconductor industry (Lingitz et al., 2018) to
many other fields of science and engineering. A great deal of research in machine learning has focused on
classification which is the task of assigning an object to one of the predefined categories. On the other hand,
some manufacturing problems fall under the category of clustering which is the task of partitioning objects
into groups, called clusters, according to their similarities.
Recently, several reviews concerning ML and DM in the manufacturing industry have appeared. However,
some of them focus on only one field of manufacturing such as the electronics industry (Lv, Kim, Zheng, &
Jin, 2018), additive manufacturing (Alabi, Nixon, & Botef, 2018), or semiconductors manufacturing
(Stanisavljevic & Spitzer, 2016). Some of them focus on only one subject like quality assessment (Rostami,
2015; Köksal, Batmaz, & Testik, 2011); some of them (Wang, 2007) only include data mining studies, not
machine learning ones; some of them (Harding, Shahbaz, Srinivas, & Kusiak, 2005; Wang, Tong, &
Eynard, 2007) are not comprehensive; and some of them (Pham & Afify, 2005; Choudhary, Harding, &
Tiwari, 2009) are former. In contrast to the previous studies having the same target, this article provides a
systematic review on both machine learning and data mining in manufacturing, includes the status quo in
research, gives a comprehensive list of available studies in the related field, states clearly the advantages and
challenges specific to the manufacturing area and opens new perspectives for future applications.
The novelties and main contributions of this review paper are five-folds.
 First, it comprehensively discusses existing solutions in manufacturing according to various aspects,
including tasks (clustering, classification, association rule mining, etc.), algorithms (k-nearest neighbor,
neural network, etc.), and performance metrics (accuracy, mean absolute error, etc.).
 Second, it points to some significant research questions that are unanswered in the recent literature as a
whole or the results are changed with the technological developments in the field.
 Third, for describing the current situation more clearly, ML and DM studies in manufacturing are
categorized into distinct groups according to the learning types they apply (supervised, unsupervised).
 Fourth, some statistics related to the studies between the years of 2000 and 2019 are also given from
different perspectives: according to their subjects (scheduling, monitoring, quality, failure) and
according to the learning types (ensemble learning, deep learning).
 Fifth, it presents the advantages of using ML techniques in manufacturing, as well as the challenges
with the ways to overcome them.
 Last, it indicates some promising further research directions that can help to reveal possible ML
applications in manufacturing in the future.
The following research questions (RQ) are addressed in this paper:

RQ1. What kind of manufacturing problems are solved using ML and DM techniques?
RQ2. Which machine learning methods are commonly used to handle manufacturing tasks?
RQ3. What are the free and/or open-source data processing technologies (tools, engines, libraries, and
frameworks) used in manufacturing in the last few years?
RQ4. Which manufacturing datasets are popular and used as a benchmark?
RQ5. What are the main steps to be followed for developing effective ML-based manufacturing
applications?
RQ6. Which performance metrics are generally used to evaluate ML models constructed in manufacturing
studies?
RQ7. How machine learning techniques can be improved to deal with data-driven manufacturing problems?
RQ8. What are the advantages of using ML techniques in manufacturing?
RQ9. What are the main challenges of ML that are faced in manufacturing?
RQ10. What are the possible research directions for ML studies in manufacturing shortly?
The rest of this paper is organized as follows. In section 2, information about the relationship between
different disciplines which contribute to manufacturing studies is given. The number of highly validated
researches conducted in the last two decades is given. Manufacturing studies are also categorized
concerning their objectives in this section. In addition, the variables used in ML studies in the
manufacturing field are presented. Common data processing tools, libraries, and engines are introduced and
widely used benchmark datasets related to manufacturing are given. Section 3 explains the consecutive
critical steps that should be followed to discover beneficial knowledge in a manufacturing environment. It
gives the order of necessary operations in this process and explains how related data flows among them. In
section 4, manufacturing applications are divided into subgroups, including supervised, unsupervised,
association rule mining, ensemble, and deep learning-based studies. Implementations considered as
fundamental by journals having high reputation are listed and the specific methods they used, their
categories and purposes are clearly expressed. Next, section 5 mentions the good effects and benefits of ML
and DM approaches in the manufacturing sector. In contrast, some difficulties encountered in this field are
emphasized in section 6. Finally, the last section of the paper, section 7, suggests the possible scientific and
practical attempts in the future to improve the efficiency in manufacturing actions.
2. OVERVIEW
Machine learning is generally more efficient than traditional mathematical and statistical models in
manufacturing since they remain incapable of understanding complex relations among features of data
samples and predicting unknown feature values for a new sample. Because of this situation, ML techniques
that are applied in a wide range of scientific disciplines have also been used in the manufacturing field
during recent years.
The use of ML and DM techniques are well established in manufacturing because intelligently analyzed data
is a valuable resource since it gains new insights and can provide a significant competitive advantage. While
there is a high possibility of finding masters in one discipline (manufacturing or machine learning), there are
very few researchers who have great consolidation in both of these two fields. Therefore, ML in the
manufacturing field is mostly addressed by data scientists and manufacturers working together. ML
techniques are applied to manufacturing data which are collected by a data technology system. The important
features and structures are determined by data analytics; implicit knowledge, rules, and patterns about the
data are discovered by data mining; and effective models are constructed by machine learning to train the
behavior of a manufacturing system. Data analytics has played an important role in decision-making
(decision-support) in the manufacturing industry (Ge, Song, Ding, & Huang, 2017). The most common
manufacturing tasks that ML and DM techniques are used can be listed as scheduling, monitoring, quality
assessment, and failure detection. Moreover, other manufacturing tasks that benefit from ML capabilities are
layout planning (Ishizuka, Izui, Yamada, & Nishiwaki, 2016), sales forecasting (Packianather, Davies,
Harraden, Soman, & White, 2017), and process mining (Pospisil, Bartik, & Hruska, 2016). Furthermore, the
ML techniques have also been used for many other manufacturing tasks such as product design (Tootooni et
al., 2017), time/cost prediction (Meidan, Lerner, Rabinowitz, & Hassoun, 2011), yield prediction, anomaly
detection (Rivetti, Busnel, & Gal, 2017; Susto, Terzi, & Beghi, 2017), and the prediction of the energy
consumption of machines (Cupek, Ziebinski, Zonenberg, & Drewniak, 2018).
Figure 1 shows the number of publications that use ML and DM techniques between the years 2000 and
2019. As can be seen from the figure, the number of studies continues to increase and that shows the
popularity of the topic. The presence of a large number of manufacturing data will probably make ML and
DM even more essential in the coming years. The following keywords were searched in three sources
(Scopus, Google Scholar, and Web of Science) to obtain the statistical results given in Figure 1:
("manufacturing") AND ("machine learning" OR "data mining" OR "supervised learning" OR

"unsupervised learning" OR "classification" OR ("regression" AND "prediction") OR "clustering" OR
"ensemble learning" OR "deep learning" OR "decision trees" OR "neural networks" OR "support vector
machines" OR "random forest" OR "k-nearest neighbors" OR "naive Bayes" OR "convolutional neural
network" OR "association rule mining" OR "sequential pattern mining" OR "text mining" OR "web
mining")
ML and DM in Manufacturing
2400
2000
Number of publications
1600
1200
800
400
0
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
(a) Scopus
250
200
150
100
50
0
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
(b) Google Scholar
1200
1000
800
600
400
200
0
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
(c) Web of Science
Figure 1. The number of publications taken from (a) Scopus, (b) Google Scholar, and (c) Web of Science related to
ML and DM in the manufacturing field by year.
ML and DM studies in manufacturing can be grouped under four main subjects [RQ1]:
 Scheduling, including order processing, shop scheduling (Jong, Rubrico, Adachi, Nakamura, & Ota,
2017), sequencing (Ismail, Othman, & Abu Bakar, 2012), resource allocation, job scheduling
(Bergmann, Feldkamp, & Strassburger, 2017) and planning in manufacturing (Waschneck et al., 2018).
 Monitoring, including decision support systems (DSS) and process monitoring (Syafrudin, Alfian,
Fitriyani, & Rhee, 2018) (Qu, Li, & Chen, 2017) to avoid key performance indicator (KPI) value
deviations and to increase the visibility of manufacturing systems.
 Quality, including quality prediction of products (Bai, Li, Sun, & Chen, 2018; Mohammadi & Wang,
2016; Arif, Suryana, Hussin, 2013), quality improvement in a large and complex process (Kamsu-
Foguem, Rigal, & Mauget, 2013) quality monitoring/control/diagnosis and defect detection in
manufacturing (Das, Pal, & Bag, 2017).
 Failure, including the detection of abnormal situations (faults) (Mangal & Kumar, 2016; Nakata,
Orihara, Mizuoka, & Takagi, 2017), machine maintenances (Djelloul, Sari, & Sidibe, 2018), failure
prediction (Lim, Kim, & Kim, 2017), equipment monitoring (Zhao et al., 2019), equipment downtime,
and systematic analysis of information about equipment failure.
Figure 2 gives information about the proportional distribution of ML and DM based manufacturing studies
according to their specific objectives. These categories are scheduling, monitoring, quality, and failure
(fault) detection. The goals of manufacturing projects may change from company to company and they are
not limited to these categories. However, most of the manufacturing studies with machine learning aim at
solving those types of problems. As can be seen in Figure 2, there is a continuous increase in the number of
machine learning studies for all manufacturing tasks. Especially, this augmenting behavior accelerated in the
last three years. The cause behind this trend can be the promotions and encouragements given by
governments and multinational production companies, and the popularity of Industry 4.0. Figure 2 also
reveals the fact that the studies which are expected to improve the quality in manufacturing are significantly
greater than the others. The following precise terms were added to the aforementioned keywords separately
and then the search queries were executed in Scopus and Web of Science to obtain the results given in
Figure 2:
 AND ("quality control" OR "quality prediction" OR "quality assurance" OR "quality management" OR

"defect detection" OR "defect prediction")
 AND ("fault diagnosis" OR "fault detection" OR "fault prediction" OR "fault classification" OR "failure
analysis")
 AND ("process monitoring" OR "condition monitoring" OR "monitoring system")
 AND ("scheduling")
ML and DM Studies Grouped Under Manufacturing Tasks

250
200
150
100
50
0
19
18
17
16
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
Scheduling Monitoring Failure Quality
(a) Scopus
ML and DM Studies Grouped Under Manufacturing Tasks
150
100
50
0
19
18
17
16
15
14
13
12
11
10
09
08
07
06
05
04
03
02
01
00
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
20
Scheduling Monitoring Failure Quality
(b) Web of Science
Figure 2. The number of publications taken from (a) Scopus and (b) Web of Science according to their objectives
between the years 2000 and 2019.
Weichert et al. (2019) were categorized manufacturing data used in ML as follows: qualitative vs.
quantitative data, time series vs. workpiece-related data, controllable vs. uncontrollable data, present vs.
historical data, measured vs. simulated data, and observable quantities vs. process state variables. Wang
(2007) also classified the manufacturing variables often used in ML, including resource variables,
machining variables, and working condition variables. Here, we provide a more extended list in which the
manufacturing variables often used in machine learning are categorized by their types as follows:
 Product variables, including the characteristics of products such as color, size, and shape
 Machine variables, including the properties of the machines such as speed, force, vibration, pressure,
temperature, voltage, lubricants, coolants, energy, and maintenance
 Manufacturing process variables such as forgings, casting, cleaning, packing, extrusions, stampings,
assembling, and current signals
 Raw material variables, including the characteristics of materials that are used in the process, such as
density, particle size, chemical composition, layer thickness, surface mapping characteristics,
conductivity, and thickness
 Environment variables such as humidity, moistness, and temperature
 Operator variables such as gender, age, experience level, team, mental condition, and health
 Production line variables such as machine setup parameters, shift, staff size, breakdowns, production rate,
and accidents
 Scheduling variables such as load demand and the priority of orders
 Quality control variables such as purity, surface condition, its appearance, concentration, and impurity
levels
 Service variables such as service name, service provider, service price, and executive status
 Supply chain variables such as lead time, manufacturing capacity, labor rate, inventory levels, backlog
levels, delivery status, and supporters
 Target variables such as yield, quality, performance index, and productivity.
These variables are mainly collected by the information systems such as SCADA, computer-aided
design/computer-aided manufacturing (CAM/CAD), programmable logic controller (PLC), enterprise
resource planning (ERP), customer relationship management (CRM), product data management (PDM), and
product lifecycle management systems.
Figure 3 shows the general categories of manufacturing duties concerning their goals2. Cited publications
which realized studies to fulfill these objectives are referred with their reference numbers as well. Although
each manufacturing task is relatively specific to its working conditions, they can be united under these main
groups. These manufacturing tasks are covered under the titles: product design, decision support,
production, process, monitoring, quality, defect, failure/fault, scheduling, layout planning, sales, and energy.
Sales Production Monitoring Quality

Packianather et al., 2017 Lingitz et al., 2018 Zhao et al., 2019 Bustillo et al., 2018
Purnama et al., 2015 Rivetti et al., 2017 Ren et al., 2018 Lee et al., 2018
Susto et al., 2017 Syafrudin et al., 2018 Bai et al., 2018
Ko et al., 2017 Lei et al., 2017 Kao et al., 2017
Nakata et al., 2017 Lee et al., 2017
Mohammadi et al., 2016
Djatnaa et al., 2015
Failure / Fault Defect

Djelloul et al., 2018 Liukkonen et al., 2018
Lim et al., 2017 Huang et al., 2018
Lee et al., 2017 Hu et al., 2018
Layout planning Shao et al., 2017 Das et al., 2017
Kim et al., 2017 Zidek et al., 2016
Ishizuka et al. 2016
Pavlyshenko, 2016 Wang, 2013
Lee et al., 2013
Energy
Cupek et al., 2018
Process Wang et al., 2018
Kim et al., 2018
Moldovan et al., 2017
Zhang et al., 2017 Scheduling
Ge et al., 2017 Dolgui et al., 2018
Decision support Product design Zhou et al., 2017 Priore et al., 2018
Cheng et al., 2018 Tootooni et al., 2017 Sand et al., 2016 Jong et al., 2017
Gandhi et al., 2018 Wang et al., 2007 Pospisil et al., 2016 Bergmann et al., 2017
Figure 3. ML and DM studies grouped under manufacturing tasks.
There is a multitude of data processing tools offering modeling and predictive capabilities based on ML
techniques. The literature presents several studies that examine the implementation of data processing tools
in manufacturing (Nagorny, Lima-Monteiro, Barata, & Colombo 2017). In manufacturing studies (at least
for academic use), the most widely used free and/or open-source data processing tools are Weka, R,
RapidMiner, KNIME, Orange, Elki, Tanagra, Mallet, and KEEL. These tools allow easy application in
many cases and also comfortable adjustment of parameters to increase the accuracy of models. Python
libraries like Keras, Theanos, TensorFlow, Caffe, and Scikit-Learn are available to make programming ML
relatively easy. Some of the most commonly used open-source data processing engines are Hadoop, Spark,
Samza, Flink, and Storm. Machine learning techniques are generally applied to manufacturing data with
great size, for this reason, they should be able to cope with high dimensionality (dataset having more than
20 attributes). Some examples of currently available large-scale distributed machine learning frameworks
are MLlib, Mahout, SAMOA, H2O, and MLbase. [RQ3]
In scientific studies, benchmark datasets are used to demonstrate the capability of the presented approach
and to compare the performances of algorithms. One of the most widely used benchmark datasets related to
manufacturing is SECOM (Kim, Han, & Lee, 2016; Kerdrasop & Kerdrasop, 2011; Munirathinam &
Ramadoss, 2016) which was obtained from a semiconductor manufacturing process. Thus, this dataset
consists of manufacturing process variables. The other well-known manufacturing dataset, named 'Steel
Plates Faults', was used in a variety of studies (Tian, Fu, & Wu, 2015) to train machine learning algorithms
for automatic pattern recognition. It contains information about steel plate product and so it mainly includes
product variables. A highly remarkable benchmark manufacturing dataset, called ‘Bosch Production Line
Performance’, is also processed in several studies (Pavlyshenko, 2016; Nedelkoski & Stojanovski, 2017;
Mangal and Kumar, 2016) to test the classification performance of proposed algorithms. Production line
variables are the basic features of this dataset. [RQ4]
2 Background image: http://www.aurinkapv.com/

3. KNOWLEDGE DISCOVERY PROCESS IN MANUFACTURING
The overall knowledge discovery in databases (KDD) process applied in manufacturing is given in Figure 4.
This process often includes five main steps: understanding the manufacturing domain, data preparation,
machine learning/data mining, evaluation, and presentation. [RQ5]
Data Supervised Learning

Collection
Classification
Production Quality Data NN, SVM, DT, KNN, NB
Data Data Integration
Plant Ensemle Learning
Data RF, AdaBoost
Cleaning
Deep Learning
Data Data
Machine Product CNN, Deep NN
Data Data Reduction Warehouse
Regression
CAD / Data
Transformation SVR
Industrial CAM
machines Operational Process
Data Data
Data Sources Extract, Transform, Load Unsupervised Learning
(ETL)
Clustering
K-Means, DBSCAN
Process optimization Accuracy
Association Rule M.
Yield improvement Precision
Apriori, FP-Growth
Recall
Market competitiveness
F-measure Sequential Pattern M.
Product quality GSP, SPAM, SPADE
ROC curve
Cost reduction SSE
Product design Model Outlier Detection
Presentation 2
$ RMSE, MAE, R LOF
etc. and
Interpretation
Application purposes Evaluation Machine Learning
and
Data Mining
Figure 4. Overall KDD process applied in manufacturing.
The first phase can be called a design phase. In this phase, the objective of the application, available
resources, constraints in the mining process, success criteria of the problem, costs, and benefits of the
application are determined. The second phase, called data preparation, consists of data collection,
integration, cleaning, reduction, and transformation. The data collection step includes the collection of data
about many different manufacturing variables such as raw materials, end-products, or machine adjustments
(temperature, pressure, production settings, time scales, etc.) with the help of sensors or external automatic
recorders. Data integration makes an effort in combining multiple data sources. Data cleaning deals with
filling in missing values, handling noisy data, resolving inconsistencies in data, considering imbalanced data
(Parvin, Minaei-Bidgoli, & Alinejad-Rokny, 2013), detecting and removing outliers from data to improve
the quality of it. Data reduction is performed to obtain the target dataset from the original data without a
significant loss of information, such as feature selection (Minaei-Bidgoli, Asadi, & Parvin, 2011). Data
transformation deals with converting data into forms suitable for mining when necessary, such as
normalization and discretization. After the data preparation phase, the dataset is stored in a data warehouse.
The third step involves the application of appropriate machine learning algorithms on data in the warehouse
for the extraction of patterns/rules or the development of a model. While supervised learning algorithms can
be used for classification and regression problems, unsupervised learning algorithms can be used for
clustering, association rule mining, sequential pattern mining, and outlier detection problems. In the fourth
step, the constructed model is evaluated by using an appropriate performance indicator. For instance, the
most popular performance metrics used for the regression problems are root mean square error (RMSE),
mean absolute error (MAE), and coefficient of determination (R2). The last phase includes the interpretation
and visualization of patterns, i.e., patterns may be represented as a key performance indicator (KPI) on a
dashboard or an alarm can be given in the case of anomaly detection or the prediction obtained by a
regression model can be presented on the screen. Usually, some of the KDD steps need to be iterated several
times until a satisfying result is obtained. Finally, the constructed model is incorporated into the
manufacturing domain. The model should be modified further when new data become available. For
example, due to the dynamic nature of manufacturing systems, the regression model should be updated
periodically to maintain its generalization ability. The knowledge discovered as a result of the KDD process
may support operators/managers in their decision-making or used to automatically improve the
manufacturing system directly.
4. MACHINE LEARNING APPLICATIONS IN MANUFACTURING
Machine learning tasks can be categorized as supervised, unsupervised, and reinforcement learning. While
both supervised and unsupervised learning techniques have already been widely used in the manufacturing
industry, which approximately accounts for 90-95% of all applications, reinforcement learning has been
studied less extensively than others. For this reason, this section presents a selection of the important
research roadmaps concerning both supervised and unsupervised learning about manufacturing.
4.1. Supervised Learning in Manufacturing
Supervised learning aims to learn the mapping between sample input and output pairs. Simply, a supervised
learning algorithm may have many input variables and a single output variable. Logically, the number of
examples available for learning proportionally affects the prediction capability of a supervised learner.
Supervised learning is generally performed for two separate tasks: classification and regression. The major
difference is that classification is the process of predicting discrete or nominal (categorical) values such as
low, medium, high; while regression is used to predict continuous (numeric quantity) or ordered values such
as the price of a car. A wide range of machine learning algorithms is available to serve these goals, each
with its pros and cons, such as decision trees (DT), neural network (NN), support vector machines (SVM),
k-nearest neighborhood (KNN) and naive Bayes (NB).
The studies related to supervised machine learning in manufacturing are listed and compared in Table 1,
where the columns indicate the name of the authors, the year in which the work was carried out, the
objective of the study, task and algorithms applied on the manufacturing datasets. Table 1 shows that both
classification and regression are frequently used in manufacturing because they are involved in to model the
discrete values and continuous values respectively. Apart from novel task-specific algorithms proposed by
researchers; NN (Alfaro-Cortes, Alfaro-Navarro, Gamez, & Garcia, 2020), SVM (Forero-Ramirez,
Restrepo-Giron, & Nope-Rodriguez, 2019), DT (Tootooni et al., 2017), KNN (Bergmann, Feldkamp, &
Strassburger, 2017), and NB (Munirathinam & Ramadoss, 2016) classification algorithms are widely used
in manufacturing respectively [RQ2]. In manufacturing, some commonly used regression methods are
support vector regression (SVR) (Zhang, Kano, Tani, Mori, Ise, & Harada, 2020), NN (Ferreira, Sabbaghi,
& Huang, 2020), and random forest (RF) (Cho, Jun, Chang, & Choi, 2020). It can also be observed from the
table that failure (fault) detection and quality assessment the most targeted manufacturing problems that are
intended to be resolved in those studies [RQ1]. While accuracy (ACC) and f-measure are commonly used as
performance measures for classification (Kim, Han, & Lee, 2016); the coefficient of determination (R2) is
generally preferred to evaluate the performances of the regression models (Das, Pal, & Bag, 2017) [RQ6].
Since the manufacturing data are generally characterized by multisource (i.e., machine, product, operator),
heterogeneity, and noisy (Lv, Kim, Zheng, & Jin, 2018), certain data pre-processing steps were considered
in many studies such as normalization (Cheng et al., 2018), attribute construction (Bergmann, Feldkamp, &
Strassburger, 2017), feature selection (Forero-Ramirez, Restrepo-Giron, & Nope-Rodriguez, 2019), and
eliminating missing values (Mohammadi & Wang, 2016). The quality of data mining results depends on the
quality of data (Alfaro-Cortes, Alfaro-Navarro, Gamez, & Garcia, 2020). All the preprocessing operations
are expected to deal with different kinds of problems related to manufacturing variables such as product,
process, and machine variables. Currently, the class imbalance problem has received much attention in
manufacturing communities (Ong, Choo, & Muda, 2015) since manufacturing data is often unbalance
distributed. Since it may cause performance loss for ML algorithms, the SMOTE (synthetic minority
oversampling technique) technique has been used in several studies (Kim, Oh, Jung & Kim, 2018; Kim,
Han, & Lee 2017) to overcome this problem. Many manufacturing organizations store processing data in
time series and they intend to improve quality control by building prediction models on a great scale of
temporal data (Cho, Jun, Chang, & Choi, 2020).
Supervised machine learning methods have been widely used to predict and eliminate the defects and
failures in the steel industry in the early production steps (Zhang, Kano, Tani, Mori, Ise, & Harada, 2020).
They have been also utilized to build effective models in additive manufacturing. In this field, especially,
neural networks help to construct the models with high prediction accuracy (Ferreira, Sabbaghi, & Huang,
2020). The control procedures in manufacturing have been improved with the supervised learning
techniques such as the random forest method (Alfaro-Cortes, Alfaro-Navarro, Gamez, & Garcia, 2020).
Since smart manufacturing enables the production of high-quality goods, many studies have been conducted
for creating quality prediction models using machine learning methods (Cho, Jun, Chang, & Choi, 2020).
Some manufacturing raw materials or process equipment are prone to be deteriorated over time. They make
manufacturing processes more risky and hazardous. In this aspect, the supervised methods have been
utilized to detect internal failures of materials in advance (Forero-Ramirez, Restrepo-Giron, & Nope-
Rodriguez, 2019).
Table 1. Comparison of supervised machine learning studies in manufacturing.
Algorithm Subject
Short Explanation
Monitoring
Scheduling
Task Authors Year Dataset Results
DT NN SVM KNN NB Reg. Others
Quality
Failure
Zhang et al. 2020 √ √ RF Prediction and causal analysis of defects Defect data R=0.86 √
Modeling deviations in the manufactured Product shapes RMSE = 1.87

Ferreira et al. 2020 √ √
products × 10−3
Regression
Lead time prediction in semiconductor
Lingitz et al. 2018 √ √ √ √ RF Production data NRMSE=12.5 √
manufacturing
Das et al. 2017 √ √ √ Defect detection and quality modelling Real-time torque signals R2=0.55 √
Cho et al. 2020 √ √ RF Quality prediction in manufacturing process Process data ACC=89.96% √
Correlation
Alfaro-Cortes et al. 2020 √ RF Manufacturing process control Control signals √
levels > 0.9
Forero-Ramirez 2019 √ Defect detection in manufacturing process Thermal data ACC=98.4% √
Bergmann et al. 2017 √ √ √ √ √ Scheduling strategies of production jobs PDA datasets ACC= 99% √
Classification Tootooni et al. 2017 √ √ √ √ √ SRC Classifying manufactured parts 3D point cloud data F-score> 95% √
Kim et al. 2016 √ Fault detection prediction SECOM ACC= 0.887 √
Munirathinam and F-measure

2016 √ √ √ √ √ √ Equipment fault detection SECOM √
Ramadoss =0.641
GS,
Faults dataset of steel
Tian et al. 2014 √ GA, Fault diagnosis ACC= 80.74% √
plates
PSO
3D point cloud of the F-measure
Wang 2013 √ √ √ Defect detection for product quality √
wheel =0.902
Quality prediction in multi-stage ACC=

Arif et al. 2013 √ PCA SECOM √
manufacturing 91.13%
TPR=
82.99%,
Fault detection and
Kim et al. 2012 √ √ PCA Faulty wafer detection √
classification (FDC) data
FPR=
34.61%
Cycle-time prediction in semiconductor ACC=

Meidan et al. 2011 √ √ √ √ SEMATECH √
manufacturing 73.2%
Kerdprasop and F-measure

2011 √ √ √ √ Fault detection SECOM √
Kerdprasop =0.641
4.2. Unsupervised Learning in Manufacturing
Unsupervised learning is one of the paradigms in machine learning that is used to identify regularities and
dependencies in unlabeled data. Clustering, association rule mining, anomaly (outlier) detection, density
estimation, and representation learning can be listed as the most popular unsupervised learning methods. In
all of these mentioned tasks, the main objective is to induce the inner data structure without the help of
explicit class labels, in a manner that produces a useful representation. Unsupervised learning studies are
relatively fewer than the supervised ones because it is highly possible to encounter data with class labels in
manufacturing.
4.2.1. Clustering and Anomaly Detection in Manufacturing
Clustering divides instances to different groups with respect to their similarities. Clusters are created in
terms of similarity or distance measures which determines the degree of how similar or how different the
data are from each other. Major clustering methods can be organized into five categories. Partitioning
clustering methods attempt to decompose the data into k clusters such that items in each cluster are closely
related to each other. Hierarchical clustering methods construct a tree of clusters by either repeatedly
merging smaller clusters into larger ones (agglomerative), or by splitting larger clusters into smaller ones
(divisive). Density-based clustering methods try to find high-density clusters separated by sparse areas that
clusters can differ in terms of their size and shape (i.e., nonconvex, spherical, or elongated). Besides, there
are others categorized as grid-based and model-based methods.
In the manufacturing industry, cluster analysis was used for the recognition of patterns (Liukkonen &
Hiltunen, 2018), for improving yield (Nakata, Orihara, Mizuoka, & Takagi, 2017), for quantitative
evaluation (Onel et al., 2019), and for equipment condition diagnosis (Rostami, Blue, & Yugma, 2016).
Furthermore, clustering was also used for the detection of product errors (Zidek, Maxim, Pitel, &
Hosovsky, 2016), for aiding the decision-making process (Kujawinska, Rogalewicz, Muchowski, &
Stankowska, 2018) and for layout planning problems (Ishizuka, Izui, Yamada, & Nishiwaki, 2016). In
addition, human-robot interaction plays an important role in the manufacturing industry. Necessary
operations in this field have been determined by unsupervised learning models and event-driven reactions
of robots have been developed (Wang, Jiao, Yu, Johnson, & Zhang, 2019).
As put forward by some researchers (Syafrudin, Alfian, Fitriyani, & Rhee, 2018; Sand, Kunz, Hubbert, &
Franke, 2016; Wuest, Irgens, & Thoben, 2014; Lieber, Stolpe, Konrad, Deuse, & Morik, 2013) clustering
can be used as a preprocessing step before a classification algorithm is applied on the data, it can increase
the classification performance. Before performing a supervised learning task, outliers are detected by an
unsupervised method since the removal of outliers can significantly increase the prediction capabilities of
supervised models. The defect detection in manufacturing data has also been realized via a clustering task
before a classification task (Jin, Na, Piao, Pok, & Ryu, 2019). The evaluation of the degree of similarity in
manufacturing data has been done by using clustering tools and helped to the decision-making processes
(Onel et al., 2019).
Clustering results obtained from manufacturing data are evaluated using various performance metrics to
ensure the quality of clusters. Typical objective functions to assess the performance of clustering algorithms
include internal and external criteria. The internal criterion can be obtained by the Davies-Bouldin index,
Silhouette index, Dunn index, Calinski-Harabasz index, and so on, while external criteria include purity,
entropy, normalized mutual information, rand index, F-measure, and so on.
Anomaly (outlier) detection is the recognition of the items whose presence may not be noticed easily and
frequently. These hidden strange observations or events have significantly different values or behaviours
than the general tendency of the dataset to which they belong. ML related anomaly detection techniques are
divided into three main groups: density-based, clustering-based, and SVM-based methods. KNN and local
outlier factor (LOF) are well-known types of density-based methods. As a clustering-based algorithm,
DBSCAN (density-based spatial clustering and application with noise) is generally used to detect noises.
Lastly, SVM-based techniques such as one-class-SVM are also efficiently used for anomaly detection. Data
mining enables monitoring and observation based on defined patterns in a manufacturing system. This
could provide hints about anomalies such as machine failures (Amruthnath & Gupta, 2018), abnormal
energy consumptions of machines (Wang, Li, & Gan, 2018), unexpected quality outcomes (Ko, Lee, Cho,
Cho, 2017) outliers in process values (Sand, Kunz, Hubbert, & Franke, 2016) or other happenings.
Problems or conditions identified by the algorithm can trigger notifications, for example, an operator or/and
a system can be notified to remedy the problem or condition. Outliers can also be represented as a key
performance indicator (KPI) on a dashboard.
Table 2 lists some of the important and recent clustering and anomaly detection studies with their authors,
publication years, objectives, and algorithms. As it can be seen in Table 3, semiconductor manufacturing
data has been used in some studies (Nakata, Orihara, Mizuoka, & Takagi, 2017; Susto, Terzi, & Beghi,
2017; Rostami, Blue, & Yugma, 2016). Monitoring tasks are generally intended to be realized with the help
of unsupervised learning methods in order to create more efficient systems that can positively affect the
manufacturing process. The commonly-used clustering algorithms in manufacturing are k-means (Wang,
Jiao, Yu, Johnson, & Zhang, 2019), hierarchical clustering (Wang, Jiao, Yu, Johnson, & Zhang, 2019),
DBSCAN (Jin, Na, Piao, Pok, & Ryu, 2019), and self-organizing map (SOM) (Liukkonen & Hiltunen,
2018) [RQ2]. Evaluation metrics are systematically reflected through cost reduction percentage, number of
derived rules, and minimum defect coverage. [RQ6]
Likewise supervised learning techniques, unsupervised learning techniques are often applied to
manufacturing data with high dimensional (i.e., dataset having more than 20 attributes). In order to reduce
the dimensionality of the feature space, some studies used feature selection techniques such as principal
component analysis (PCA) (Amruthnath and Gupta, 2018; Susto, Terzi, & Beghi, 2017). In this way, not
only the algorithm speeds up, it is sometimes the best way to get good results on complex data (Wuest,
Irgens, & Thoben, 2014).
Table 2. List of clustering and anomaly detection studies applied in manufacturing.
Algorithm Subject
Hierarchical
Monitoring
Scheduling
DBSCAN
K-Means
Quality
Task Authors Year Short Explanation Dataset
Failure
SOM
LOF
Others
Jin et al. 2019 √ Defect pattern detection WM-811K data √
Onel et al. 2019 √ Clustering of the manufacturing substance categories Analytical chemistry data √
Wang et al. 2019 √ Modeling human-robot interaction in manufacturing Human welder data √
Cluster analysis in support of purchases of manufacturing

Kujawinska et al. 2018 √ √ Flux-welding wire data √
materials
Liukkonen and
2018 √ √ Recognition of systematic spatial patterns Silicon wafers data √
Hiltunen
Clustering Cupek et al. 2017 √ Determination of the machine energy consumption Machine energy consumption data √
Fp-
Monitoring system for yield enhancement in
Nakata et al. 2017 √ Growth Wafer map data √
semiconductor manufacturing
CNN
Application of several data mining techniques in Product orders in commercial sales

Packianather et al. 2017 √ √ √ √
manufacturing data
Zidek et al. 2016 √ √ √ Detection of product errors by clustering algorithms Product error data √
Wuest et al. 2014 √ SVM Monitoring quality in manufacturing Product state data √
Anomaly detection to improve safety and maintenance

Quatrini et al. 2020 RF Granulation process data √
activities
Anomaly
Syafrudin et al. 2018 √ RF Real-time monitoring system in automotive manufacturing IoT-generated sensor data √
Detection
Outlier detection in a baking process to improve the energy
Wang et al. 2018 √ COF Oven data in a production line √
efficiency in manufacturing
ABOD Anomaly detection approach for semiconductor Semiconductor manufacturing
Susto et al. 2017 √ √
PCA manufacturing etching process
GMM, Manufacturing, inspection, after-

Ko et al. 2017 √ √ Anomaly detection for quality management √
SVDD sales service data
4.2.2. Association Rule Mining in Manufacturing
Association rule mining (ARM) is a common technique in data mining that discovers relationships
among items in a dataset with huge size. Frequent patterns are extracted in the form of X→Y rules
with two measures of interestingness: support and confidence. Support represents the percentage of
transactions that contain both X and Y among all transactions in the dataset. Confidence expresses the
fraction of transactions containing X that also contain Y. In association rule mining studies, some
particular values for minimum support (MinSup) and minimum confidence (MinConf) should be
determined beforehand. Some well-known association rule mining algorithms are Apriori (Odabasi
and Yildirim, 2020), FP-Growth (Mutlu and Altuntas, 2019), and Eclat (Kamsu-Foguem, Rigal, &
Mauget, 2013).
Sequential pattern mining (SPM) is interested in finding frequent sequences of itemsets in a dataset to
identify patterns of ordered events. It generally intends to discover meaningful subsequences from a
group of sequences. Some significant parameters (i.e., occurrence frequency, length, profit) are
considered as measurement criteria for the decision of how interesting a subsequence is. The most
popular SPM algorithms are PrefixSpan, SPADE (spanning‐tree progression analysis for
density‐normalized events) (Lim, Kim, & Kim, 2017), SPAM (Sequential Pattern Mining), and GSP
(Generalized Sequential Pattern). Apart from these traditional approaches, there are also recently
introduced methods such as CM-SPADE (Co-occurrence Map SPADE), CM-SPAN (Co-occurrence
Map SPAN), FCloSM (frequent closed sequence mining), and FGenSM (frequent generator sequence
mining). In manufacturing applications, it is observed that SPM techniques are less frequently
conducted than ARM techniques.
Table 3 lists some important and recent association rule mining and sequential pattern mining studies
conducted in manufacturing. According to the results, it is noticed that the most preferred algorithm
in ARM is Apriori (Park and Jung, 2020). In the manufacturing industry, ARM and SPM algorithms
are generally used for production scheduling (Ismail, Othman, & Abu Bakar, 2012; Dolgui et al.,
2018) text mining (Jung & Chang, 2018), machine performance analysis (Pospisil, Bartik, & Hruska,
2016), quality improvement (Lee et al., 2013; Qu, Li, & Chen, 2017; Kamsu-Foguem, Rigal, &
Mauget, 2013; Kao, Hsieh, Chen, & Lee, 2017; Li et al., 2019; Odabasi & Yildirim, 2020) process
design (Zhou, Li, Wang, & Niu, 2017) failure detection (Nakata, Orihara, Mizuoka, & Takagi, 2017;
Lim, Kim, & Kim, 2017; Ong, Choo, Author, & Muda, 2015) and production analysis (Purnama,
Abdullah, Rokhmat, & Herawan 2015; Djatnaa & Alitu, 2015; Mutlu & Altuntas, 2019).
Association rule mining techniques have been effectively used for many different manufacturing
operations. For example, the occupational accidents in manufacturing were tried to prevent by using
ARM models since many related features causing undesired conditions could be detected (Mutlu &
Altuntas, 2019). Decreasing pollution is one of the indispensable goals of manufacturing to reduce
harming outcomes of industrialization for the environment. For this purpose, association rules were
found to assess the relationship between industrialization and air quality in China (Li, Li, An, Han,
Xu, Lu, & Crittenden, 2019). The stability of some manufacturing materials like solar cells was
improved with ARM studies (Odabasi & Yildirim, 2020). Finally, ARM was also used to discover the
deviant event patterns since outliers and deviations are important factors in system monitoring for
manufacturing processes (Park & Jung, 2020).
Table 3. List of association rule mining studies applied in manufacturing.
Algorithm Subject
Task Authors Year Short Explanation Dataset Parameters
Monitoring
Fp-Growth
Scheduling
Others
Apriori
Quality
Failure
Eclat
Analyzing the effects of MinConf
Odabasi and Long term
2020 √ cell manufacturing 0.36 √
Yildirim stability data
materials Lift 1.71
MinSup
Deviant event pattern Manufacturing 0.033
Park and Jung 2020 √ √
discovery process dataset MinConf
0.632
Assessment of
Mutlu and Occupational
2019 √ occupational risks in MinConf 0.8 √
Altuntas accident data
manufacturing
Finding rules between

MinConf
Li et al. 2019 √ industrialization level and Quality data √
0.34
quality
Different
Detection of factors for Truck assembly
Dolgui et al. 2018 √ levels of √
production scheduling process data
confidence
Text mining based online

Jung and Korean portal site MinSup 0.1
2018 √ news analysis about smart √
Chang data MinConf 0.7
factory
Association Product data

Rule Mining Analysis of machine (materials and
Pospisil et al. 2016 √ performance in a MinSup 0.15 √
manufacturing company construction
types)
Extracting rules for Machining

Zhou et al. 2017 √ MinCon 0.6 √
process design process data
Online monitoring of Yarn production MinSup 0.06

Qu et al. 2017 √ √
manufacturing process data MinConf 0.8
Failure dataset
PCA- Manufacturing failure root from
Ong et al. 2015 √
WARM cause analysis semiconductor
manufacturing
Productions of
SLP-
Purnama et al. 2015 √ Production analysis motorcycle/scoote MinSup 0.05 √
Growth
r dataset
Extraction of rules related Wooden door

Djatnaa and
2015 √ to overall equipment manufacturing MinSup 0.25 √
Alitu
effectiveness (OEE) industry data
Extraction of defect Garment MinSup 0.25

Lee et al. 2013 √ √
patterns in the garment manufacturing MinConf 0.9
industry data (Hong-Kong)
Analyzing of the
Kamsu- MinSup 0.06
2013 √ √ √ manufacturing process for Vam drilling data √
Foguem et al. MinConf 1
quality improvement
Yield analysis to identify

Nakata et al. 2017 √ Wafer map data √
the cause of failure
Sequential Samsung
Lim et al. 2017 SPADE Failure prediction MinSup 0.5 √
Pattern electronics dataset
Mining
Buddhakulsom Finding patterns among
Automotive
siri 2009 SPM occurrences of warranty P >= 0.45% √
warranty data
and Zakarian claims over time
4.3. Ensemble Learning and Deep Learning in Manufacturing
When all kinds of highly validated manufacturing research studies are analyzed, it is seen that
ensemble learning and deep learning are mostly utilized to improve the positive effect of machine
learning tasks [RQ7]. Ensemble learners are a group of machine learning methods that unite a
committee of classifiers in order to perform a classification or regression process. If the ensemble
learner is homogeneous, then the same algorithm with different arrangements forms the committee or
different training sets are generated from the original dataset. On the contrary, its heterogeneous
counterpart is composed of distinct types of classifiers. Apart from ensemble learning, deep learning is
a recently presented machine learning technique that processes data in many connected layers. It has a
non-linear and complex structure.
Ensemble learning and deep learning are the most widely used machine learning paradigms and have
achieved great success in a wide range of applications attributing to better generalization ability. In
this section, we mainly summarize state-of-the-art technologies of ensemble learning and deep
learning into the field of manufacturing.
In recent years, there has been increasing use of ensemble learning in manufacturing due to its ability
to improve the performance of weak learners. Deep learning is used in manufacturing more frequently
than ensemble learning. Apart from separate usages, it is also possible to see in the literature that some
studies utilized both ensemble and deep learning to benefit from both of them or to compare their
improvements and weaknesses against each other. The number of deep learning studies in
manufacturing is exponentially growing due to its significant contributions in improving performance.
Deep learning is becoming more and more popular recently because the convolutional neural network
structures have proven to be very effective in learning complicated patterns.
4.3.1. Ensemble Learning in Manufacturing
Ensemble learning is a technique of using multiple learners to solve the same problem. Promising an
answer to many challenges of manufacturing, ensemble learning is widely discussed by researchers.
Overall it is agreed upon that, in the manufacturing industry, combining multiple models generally
provides a more accurate prediction than depending upon the single model (Priore, Ponte, Puente, &
Gomez, 2018; Y. Huang, Pan, Lin, & Guo, 2018). For example, in the study (Priore, Ponte, Puente, &
Gomez, 2018), while the decision tree reached 81.82% classification accuracy, the bagged decision
tree achieved 83.36% accuracy. As an ensemble learning algorithm, researchers have commenced
using random forest (RF), instead of a single decision tree, to be able to achieve better results (Huang,
Pan, Lin & Guo, 2018).
There exist various ensemble-learning techniques, commonly used in manufacturing are bagging,
boosting, stacking, and voting. In bagging based ensemble classifiers, bootstrap replicates of the
training set were generated randomly and a model is constructed on each one. The chosen group of
data to create a model has a possibility to be reselected in other iterations (Gandhi, Schmidt, & Ng,
2018; Syafrudin, Alfian, Fitriyani, & Rhee, 2018; Quatrini, Costantino, Di Gravio, & Patriarca, 2020).
On the other hand, boosting methods adjust weights of the data instances in each cycle according to
their abilities in classification (Hu, Zhou, Xiang, & Feng, 2018; Bustillo, Urbikain, Perez, Pereirb, &
Lopez de Lacalle, 2018; Kim, Oh, Jung, & Kim, 2018; Gao, Chen, Zhang, Ren, Chen, & Chen, 2019).
Those types of solutions boost the created models which classify data better and increase the
possibility of this data being reselected more and more (Nedelkoski & Stojanovski, 2017;
Pavlyshenko, 2016; Pham & Afify, 2005). The most well-known bagging algorithm is random forest
(Lee, Noh, Kim, & Kang, 2018; Moldovan, Cioara, Anghel, & Salamie, 2017; Raktham & Piromsopa,
2011), while commonly-used boosting algorithms are AdaBoost and Gradient Boosting. In
manufacturing, for example, AdaBoost algorithm is used for steel-plates surface defect detection (Hu,
Zhou, Xiang, & Feng, 2018; Deng, Diao, Wu, Zhang, Ma, & Zhong, 2019), the optimization of a
friction-drilling process (Bustillo, Urbikain, Perez, Pereirb, & Lacalle, 2018) and quality analysis
(Kim, Oh, Jung, & Kim, 2018) while Gradient Boosting is used to improve the efficiency of plant
production (Nedelkoski & Stojanovski, 2017), to manage production processes (Moldovan, Cioara,
Anghel, & Salamie, 2017), and to recognize faults (Pavlyshenko, 2016).
The manufacturing studies that particularly benefit from ensemble learning are given in Table 4. While
some studies in this table include only boosting or bagging ensemble methods, some works contain
both of them to compare their performances. As it is clear in the table that almost all applications of
ensemble learning include classification tasks. The most frequently preferred ensemble classification
algorithm is the random forest algorithm. Particularly, classification accuracy (ACC) (Syafrudin,
Alfian, Fitriyani, & Rhee, 2018), mean absolute percentage error (MAPE) (Gao, Chen, Zhang, Ren,
Chen, & Chen, 2019), and root mean squared error (RMSE) (Lingitz, Gallina, Ansari, Gyulai, Pfeiffer,
Sihn, & Monostori, 2018) were generally taken into consideration as a performance measure to
evaluate the success of ensemble learners in many manufacturing studies.
As a result of recent developments in IT, data acquisition systems collect an enormous amount of
manufacturing data having distinct types of variables such as operator, service, and raw material
variables. (Mangal & Kumar, 2016). Since big manufacturing data always needs more time and
processing power, some studies have been used Storm and Hadoop computing frameworks for
machine learning applications (Zhang, Ren, Liu, & Si, 2017). Distributed database system (DDBS),
Hadoop distributed file system (HDFS) and non-relational data management system (NoSQL) have
been used to store the heterogeneous big manufacturing data (Nagorny, Lima-Monteiro, Barata, &
Colombo 2017).
Building a single prediction model for failure detection may not be sufficient for some manufacturing
tasks. For this reason, Deng et al. (2019) used an ensemble learning method, called D-CART, to
achieve fault detection, fault location, and quantitative diagnosis at the same time. High-performance
component manufacturing requires successful material removal. Boosting ensemble learning methods
like k-fold XGBoost has been proved to be promising in that field in the production environment (Gao,
Chen, Zhang, Ren, Chen, & Chen, 2019). The success of condition monitoring depends on the real-
time measurements of specific manufacturing stages. Quatrini et al. (2020) used a two-step ensemble
method to provide identification of the ongoing production phase and classification of input data.
Table 4. List of ensemble learning studies conducted in manufacturing.
Task Algorithm Subject
Monitoring
Scheduling
Base
AdaBoost
Authors Year Short Explanation Dataset Results
Gradient
Boosting
Stacking
Boosting
Bagging Classifiers
Quality
Failure
Voting
RF
Anomaly detection and
Quatrini et Decision Granulation F Score:
2020 √ √ process phase √
al. Jungle process data 99.21%
classification
Online fault diagnosis Fault ACC =

Deng et al. 2019 √ √ √ √
for rotor systems simulation data 96.77%
Material removal Inconel 718 MAPE=

Gao et al. 2019 √ √ XGBoost √
prediction Data 4.373%
Scheduling jobs in Data from four

SVM, DT, ACC=
Priore et al. 2018 √ √ √ √ flexible manufacturing machining √
NN, CBR 99.08%
systems centers
Ball screw and

Decision support in
ballbar
Gandhi 2018 √ √ DT manufacturing ACC √
measurement
maintenance
data
Electrically failure Through- ACC >

Huang et al 2018 √ √ KNN √
detection silicon via data 90%
SVM, NN, Predicting of

Lingitz et Semiconductor NRMSE=
2018 √ √ KNN, manufacturing lead √
al. industry data 12.5
MARS times
Syafrudin NB, LR, Monitoring in IoT-based ACC=

2018 √ √ √
et al. NN manufacturing sensor data 100%
Recognition of steel- ACC=

Hu et al. 2018 √ √ NN Defect dataset √
plate surface defects 88.35%
Optimization of a
Bustillo et DT, NN, Experimental ACC=
2018 √ √ friction-drilling √
al. ZeroR data 91.84%
process
DT, KNN, Quality prediction

Die casting ACC=
Kim et al. 2018 √ √ √ CART, during manufacturing √
dataset 92.78%
SVM process
Quality prediction and

DT, NN, ACC=
Lee et al. 2018 √ √ operation control in IoT/MES data √
SVM 93.84%
metal casting
HMM, A monitoring system Acoustic

Kannatey- Bayesian, based on the classifier Emission ACC=
2017 √ √
Asibu et al. GMM, K- fusion and class- Monitoring 98.5%
mean weighted voting Data
4.3.2. Deep Learning in Manufacturing
Deep learning (DL) is defined as modeling of neural networks which include many hidden layers. As
deep learning methods contribute to many fields of study, they are benefited from the manufacturing
sector as well. To learn optimal weights for NN models, gradient descent is the most commonly used
method (Zhao et al., 2019). Even though deep neural networks (DNN) have a high number of critical
hyper-parameters to be decided certainly during designing them, they are considered as the most
promising machine learning technique in many manufacturing fields. In order to maximize the accuracy,
the genetic algorithm (GA) can be used to determine the proper NN architecture such as the number of
hidden layers, the number of neurons, the epoch, and the learning rate. DL has been proven highly
effective to optimize condition prognosis (Wang, Wang, Wang, Xuang, & Hue, 2018), to improve/predict
quality (Ren, Sun, Chui, & Zhang, 2018; Bai, Li, Sun, & Chen, 2017), to detect failure causes (Lee, Kim,
& Kim, 2017; Lee, Cheon, & Kim, 2017; Shao, Sun, Yan, Wang, & Gao, 2017; Kim, Han, & Lee, 2017),
to improve inspection process (Yasutomi & Enoki, 2020), to estimate the positions of mobile objects in
industrial environments (Niitsoo, Edelhauber, Eberlein, Hadaschik, & Mutschler, 2019), and to predict
degradation (Luo, Yan, Hu, Zhou, & Pang, 2015). Hybrid approaches (combining fuzzy logic with NN)
have also been developed for different applications due to the un-deterministic characteristic factors in
manufacturing. Especially, neural networks were combined with fuzzy logic to solve the scheduling
optimization and production time prediction in manufacturing (Lv, Kim, Zheng, & Jin, 2018).
As a deep learning technique, a convolutional neural network (CNN) has been generally preferred in
manufacturing (Nakata, Orihara, Mizuoka, & Takagi, 2017; Lee, Kim, & Kim, 2017; Imoto, Nakai, Ike,
Haruki, & Sato, 2019). Deep recurrent neural network (DRNN) has also been utilized in industrial
environments for different purposes such as for the droplet evolution prediction and process dynamics
understanding (Huang, Segura, Wang, Zhao, Sun, & Zhou, 2020). Deep Boltzmann machine (DBM) is
also used as a type of neural network structure, which is composed of symmetrically coupled stochastic
binary units. There also exists a restricted Boltzmann machine (RBM) that is a form of Boltzmann
machine with some parameters having a constant set of values. This strategy is efficiently used to provide
advantages for manufacturers (Wang, Wang, Wang, Xuang, & Hue, 2018; Kim, Han, & Lee, 2017; Luo,
Yan, Hu, Zhou, & Pang, 2015).
A deep belief network (DBN) is defined as a group of neural networks including several layers of hidden
units. Belief net and RBM are two important terms to comprehend DBN more clearly. This method can
analyze relationships between features at different levels. Some manufacturing researches utilize it in
order to monitor machine health (Zhao, Yan, Chen, Mao, Wang, & Gao, 2019), to predict product quality
(Bai, Li, Sun, & Chen, 2018), and to recognize failures in induction motors (Shao, Sun, Yan, Wang, &
Gao, 2017).
Another powerful deep learning technique is called deep autoencoder (DA). It is a combination of two
symmetrical DBNs (encoding half and decoding half) having generally four or five layers. Bearing health
analysis (Ren, Sun, Chui, & Zhang, 2018), data-driven fault diagnosis (Wen, Gao, & Li, 2019; Arellano-
Espitia, Delgado-Prieto, Martinez-Viol, Saucedo-Dorantes, & Osornio-Rios, 2020), and wafer fault
detection (Lee, Kim, & Kim, 2017) are some of the tasks in manufacturing in which the DA principle is
implemented.
Table 5 lists some important and recent deep learning studies conducted in manufacturing. DBM (Zhao et
al., 2019; Shao, Sun, Yan, Wang, & Gao, 2017) and DBN (Kim, Han, & Lee 2017; Luo, Yan, Hu, Zhou,
& Pang, 2015) were generally used in manufacturing studies. Quality in manufacturing is generally
provided with deep learning techniques (Ren, Sun, Chui, & Zhang, 2018; Niitsoo, Edelhauber, Eberlein,
Hadaschik, & Mutschler, 2019; Huang, Segura, Wang, Zhao, Sun, & Zhou, 2020).
Deep learning studies serve for specific manufacturing goals. For example, Huang et al. (2020) used deep
learning techniques to recognize motion information which increases final quality in the inkjet printing
process. Deep learning methods may have an advantage over traditional models because they can work
without general assumptions which make them more adaptive (Wen, Gao, & Li, 2019). They enable
automatic defect classification and lower manufacturing costs (Imoto, Nakai, Ike, Haruki, & Sato, 2019).
Damage of the transported objects can be prevented in advance via deep neural networks (Yasutomi &
Enoki, 2020). Niitsoo et al. (2019) used the CNN approach to predict the time-of-flight of radio burst
signals which digitalize manufacturing processes. Electromechanical systems in smart manufacturing
require efficient monitoring strategies. In this aspect, deep learning models such as auto-encoders can
meet the requirements (Arellano-Espitia, Delgado-Prieto, Martinez-Viol, Saucedo-Dorantes, & Osornio-
Rios, 2020).
Table 5. List of deep learning studies applied in manufacturing.
Algorithm Subject
Monitoring
Scheduling
Quality
Authors Year Short Explanation Dataset Results
Failure
DBM
CNN
DNN
DBN
DA
Yasutomi
2020 √ Inspection device localization Belt conveyor data ACC=85.44% √
and Enoki
Droplet jetting
Huang et al. 2020 √ Droplet evolution prediction MSE= 0.0109 √
process video data
Arellano- Fault detection in Electrical motor

2020 √ RMS=1.71 √
Espitia et al. electromechanical systems driven system data
Sparse auto-encoder for fault Motor bearing

Wen et al. 2019 √ ACC=99.82% √
detection dataset
Position estimation from

Niitsoo et al. 2019 √ Meander dataset MAE=0.17 √
channel impulse responses
Defect classification in Semiconductor

Imoto et al. 2019 √ ACC=88.4% √
semiconductor manufacturing fabrication data
Zhao et al. 2019 √ √ √ √ Machine health monitoring CNC machine data MAE=9.3 √
Condition prediction for smart Centrifugal

Wang et al. 2018 √ MSE=0.0157 √
manufacturing compressor data
Bearing remaining useful life IEEE PHM2012

Ren et al. 2018 √ √ MSE=0.2 √
prediction data
Nakata et al. 2017 √ Failure cause identification Wafer map data F-measure=0.92 √
Wafer samples and

Lee et al. 2017 √ Wafer fault monitoring ACC=98.5% √
sensor data
Fault classification and Chemical vapor

Lee et al. 2017 √ diagnosis in semiconductor deposition ACC=97.9% √
manufacturing process data
Fault diagnosis of induction Machine fault

Shao et al. 2017 √ √ ACC=99.98% √
motors in manufacturing simulator data
Kim et al. 2017 √ √ Fault detection SECOM ACC=86.59% √
Degradation prediction in Wafer fabrication

Luo et al. 2015 √ √ ACC = 74.1% √
semiconductor manufacturing plant data
5. ADVANTAGES OF DM and ML IN MANUFACTURING
5.1. Benefits
Data mining has been widely used as a fundamental tool for knowledge discovery from manufacturing
databases. The necessary data to be analyzed can be gathered throughout the ordinary manufacturing
operations. In manufacturing, data mining provides many competitive advantages such as higher product
quality, decreased cost, and improved process for the production. It may help to automate the knowledge
discovery process and this utility is considered very important for the development of the knowledge-
based system.
There are many manufacturing fields in which machine learning may have a positive effect. First of all,
efficient demand forecasting is highly supported by machine learning. This trend is to estimate how many
or how much product should be produced in order to supply future demand by analyzing past events.
Secondly, the release of a new product is the process where machine learning takes part. While
introducing a new product, machine learning is utilized in order to keep track of release success,
containing sales and customer data. Another effect is price optimization. Manufacturing companies are
able to take into consideration the location, seasonality, weather, and demand so that they rearrange
prices and present products with optimal prices. It is observed that ML enables manufacturers to reduce
the overall cycle time as well as improve resource utilization in certain NP-hard manufacturing problems.
Moreover, ML provides powerful approaches for continuous quality improvement in complex and large
processes (Wuest, Weimer, Irgens, & Thoben, 2016).
Other advantages of DM and ML in manufacturing are given in the following [RQ8]; however, it has to
be stated that the degree of the importance of the advantages may change depending on the chosen
algorithm.
 Predictive Maintenance: ML has been successfully used for predictive maintenance in the different
manufacturing industries (Amruthnath, & Gupta, 2018; Han, & Chi, 2016). It helps to predict the
remaining useful life of machine components (Ren, Sun, Cui, Zhang, 2018). It allows for the
prediction of machine failure by analyzing machine variables before the machine fails (Nedelkoski &
Stojanovski, 2017; B. Pavlyshenko, 2016; Lim, Kim, & Kim, 2017). Using ML techniques can be
useful to prevent the breakdown of a generic manufacturing machine. Furthermore, it improves
human-computer interaction and optimizes the manufacturing process (Traini, Bruno, D’antonio, &
Lombardi, 2019). The prediction of component malfunctions based on the data collected from sensors
attached to machines can save money due to prevented following big damages.
 Resource management: ML provides powerful tools for improving resource utilization in certain
manufacturing problems, such as human resource management, equipment management (Rivetti,
Busnel, & Gal, 2017; Rostami, Blue, & Yugma, 2016) and raw material management (Kujawinska,
Rogalewicz, Muchowski, & Stankowska, 2018). It is possible to develop a model to manage resources
effectively in the manufacturing sector or to identify staff-related patterns in a manufacturing factory.
Furthermore, models could be built to allow engineers to use less material in a component or reduce
the density of manufacturing materials such as wood, glass, paper, plastic, and metals (Lee, Noh, Kim,
& Kang, 2018), including conducting copper (Huang, Pan, Lin, & Guo, 2018), steel plates (Tian, Fu,
& Wu, 2015; Hu, Zhou, Xiang, & Feng, 2018), aluminum, magnesium, and titanium with the
application of machine learning methods on raw material variables.
 Product design: It has become an important topic to extract knowledge from historical data that can
assist the designers to create a new product and serve as a basis for a similar end-product design. Data
mining results obtained using product variables can provide additional hints for the potential product
(Wang, Tong, & Eynard, 2007).
 Quality control: Quality can be considered as the level of fulfillment of customer requirements that are
addressed by the manufacturing firm. Since data mining turns the raw data into useful knowledge, it
can support quality and reduce costs due to damages or loss of production (Wuest, Irgens, & Thoben,
2014). Machine learning has been widely applied for quality assessment in manufacturing industries,
including quality prediction of products (Bai, Li, Sun, & Chen, 2018; Mohammadi & Wang, 2016;
Arif, Suryana, Hussin, 2013), quality improvement (Kamsu-Foguem, 2013), defect detection (Das,
Pal, & Bag, 2017) and classification of defects (Wang, 2013). Quality control variables play an
important role in these tasks.
 Diagnosis: Data analysis can support the diagnosis of anomalies (Rivetti, Busnel & Gal, 2017; Susto,
Terzi, & Beghi, 2017; Ko, Lee, Cho, Cho, 2017) to realize the purpose of giving an alarm. Data
mining results can be saved as patterns that are usable for future automatic detection in the case of
similar problems. This could be useful to avoid failures (Ong, Choo, Author, & Muda, 2015), and
predict wear (Han & Chi, 2016) to reduce downtimes. Machine learning can give deeper insights into
the manufacturing processes and inform about the processes that are gone wrong.
 Decision support: Data analytics has played an important role in decision-making/decision-support in

the manufacturing industry (Cheng, Chen, Cheng, Lin, Yang, 2018; Kujawinska, Rogalewicz,
Muchowski, & Stankowska, 2018; Gandhi, Schmidt, & Ng, 2018; Syafrudin, Alfian, Fitriyani, &
Rhee, 2018; Ge, Song, Ding, & Huang, 2017) since it could provide new additional information for
operators and managers. Empirical studies have shown that ML is effective in improving the decision-
making capabilities of operation managers (Dubey, Gunasekaran, Childe, Bryde, Giannakis, Foropon,
Roubaud, & Hazen, 2020). For example, the records of financial transactions in the manufacturing
sector can be analyzed for better decision-making. Furthermore, it is also possible to apply process
mining techniques to the context of workflow management to improve the processes in the
manufacturing sector (Pospisil, Bartik, & Hruska, 2016).
 Optimization: Meaningful patterns could be identified by data mining, useful to optimize the related
manufacturing system (Bustillo, Urbikain, Perez, Pereirb, & Lopez de Lacalle, 2018; Wang, Wang,
Wang, Xuang, & Hue, 2018; Kim, Han, & Lee 2017). An example could be the optimization of the
process and machine variables like production time per product and energy consumption of machines
(Cupek, Ziebinski, Zonenberg, & Drewniak, 2018; Wang, Li, & Gan, 2018; Raktham & Piromsopa,
2011). Further exploitation can be a machine maintenance plan and scheduling optimization for
manufacturing system/process improvements. Concurrently, machines used in the production process
can be optimized in critical stages.
 Descriptive analysis: Machine learning provides a basis for the understanding of the domain since
most manufacturing problems are data-rich but knowledge-poor. Unsupervised learning techniques are
very powerful tools for creating descriptive models, since they reveal the relations among variables,
providing a better understanding of the process. They describe the current situation more clearly, like
relationships between events, influencing factors, or causalities. For example, clustering the items in
manufacturing data allow operators or managers to plan different activities for different clusters
(Wuest, Irgens, & Thoben, 2014; Zidek, Maxim, Pitel & Hosovsky, 2016). Similarly, association rule
mining has a great potential in manufacturing, since it is useful to increase the knowledge about
analyzed manufacturing systems and processes (Kamsu-Foguem, 2013; Kao, Hsieh, Chen, & Lee,
2017; Ong, Choo, Author, Muda, 2015; Purnama, Abdullah, Rokhmat, & Herawan 2015; Djatnaa &
Alitu, 2015).
 Predictive analysis: Forecasting is one of the main issues in the manufacturing industry. Applying ML
in manufacturing can provide a basis for the development of models that predict approximations about
the future behavior of the system. It may lead to approximately correct predictions about the possible
future actions and reactions of the manufacturing system. Depending on the characteristic of the ML
algorithm, the prediction ability on the available data may vary. However, the overall ability of ML
algorithm to achieve robust prediction results has been successfully proven in a manufacturing field
(Lingitz, Gallina, Ansari, Gyulai, Pfeiffer, Sihn, & Monostori, 2018; Meidan, Lerner, Rabinowitz, &
Hassoun, 2011; Dolgui et al., 2018; Wang, Wang, Wang, Huang, & Xue, 2018) (Ren, Sun, Cui, &
Zhang, 2018; Kim, Han, & Lee, 2017; Luo, Yan, Hu, Zhou, & Pang, 2015). The prediction power of
ML methods helps to advance flexible, efficient, and high-quality manufacturing. Furthermore, they
can allow detecting and measuring internal faults in materials (Ferreira, Sabbaghi, & Huang, 2020;
Rodriguez-Martin, Fueyo, Gonzalez-Aguilera, Madruga, Garcia-Martin, Munoz, & Pisonero, 2020).
 Parameter analysis: It is possible to build inference mechanisms (soft sensors) where a process
parameter is inferred from other available (measured) variables in manufacturing. In other words, data
mining techniques can be used to improve the manufacturing process through the discovery of
correlations between manufacturing process variables (Arif, Suryana, & Hussin, 2013). Furthermore, it
is also possible to determine the most important parameter that affects the performance of a
manufacturing system by using a decision tree algorithm since the parameter placed in the root node of
the tree indicates the most fundamental element that has an impact on the process (Lieber, Stolpe,
Konrad, Deuse, & Morik, 2013).
 Market demand analysis: Market demand is growing day by day and customer needs are changing in
direction of higher quality products and more efficient services (Wuest, Irgens, & Thoben, 2014;
Zidek, Maxim, Pitel, & Hosovsky, 2016). For this reason, manufacturing firms start to search for
innovative solutions to analyze marketing demands (Purnama, Abdullah, Rokhmat, & Herawan 2015).
Manufacturing organizations have to discover trend patterns to cope with the increasing complexity of
end-products expectations. It is possible to predict trend patterns, supported by advanced analytics
approaches.
 Cycle-time reduction: Cycle time consists of set-up, queueing, processing, inspection, and
transportation times, as well as waiting time for equipment due to breakdown of a machine. ML allows
us to predict cycle-time in a production line to be able to reduce it further (Meidan, Lerner,
Rabinowitz, & Hassoun, 2011).
 Adaptation: The nature of manufacturing systems is dynamic, complex and at times even chaotic. ML
algorithm has an automatic adaptation mechanism to altering conditions so that it is not dependent on
static systems (Wuest, Irgens, & Thoben, 2014; Zidek, Maxim, Pitel & Hosovsky, 2016). ML
algorithm would continue to learn as new data arrive. The adaptation may vary depending on the ML
algorithm, but in general, is faster than traditional mathematical and statistical methods.
 Multi-stage applications: Another advantage of DM is the usability of algorithms as a preprocessing

step before performing the essential manufacturing study. For example, clustering or outlier detection
may be used as a prior step (initial task) before the application of classification algorithm to increase
the accuracy (Syafrudin, Alfian, Fitriyani, & Rhee, 2018; Sand, Kunz, Hubbert, & Franke, 2016;
Wuest, Irgens, & Thoben, 2014; Lieber, Stolpe, Konrad, Deuse, & Morik, 2013).
 Document categorization and clustering: Another advantage of data mining has been established in
the classification and clustering manufacturing text documents according to their types and main
contents. Text mining is useful to extract interesting patterns and perform knowledge extraction from
the unstructured manufacturing documents obtained from different sources (Shotorbani, Ameri,
Kulvatunyou, & Ivezic, 2016; Jung, & Chang, 2018).
5.2. Managerial Implications
Using ML methods in manufacturing has positive effects on the managerial side as well. Business
corporations highly benefit from DM approaches to handling multi-variety data in dynamic
environments. Since ML and DM are capable of data processing and real-time predictions, they enable
various department managers to make decisions more efficiently. They contribute to big data analytics
for cleaner manufacturing and maintenance processes of complex products. Managerial implications of
ML algorithms in manufacturing can be grouped into four main categories: marketing department, R&D
department, production department, and service department (Zhang, Ren, Liu, & Si, 2017).
 Benefits for the R&D department: To make right decisions in the product design stage, to
improve product development experiments and to give guidance for the new products development
according to by analyzing previous feedbacks (Wang, Tong, & Eynard, 2007; Tootooni et al., 2017;
Raktham & Piromsopa, 2011; Wang, Wang, Wang, Huang, & Xue, 2018).
 Benefits for the production department: To monitor product quality, to manage production equipment
(i.e., to estimate equipment wear, to predict equipment failures), to decrease machine energy
consumption, to analyze the dynamic and real-time production data, to optimize production
scheduling and to solve staff-allocation problem (Ismail, Othman, & Abu Bakar, 2012; Waschneck et
al., 2018; Kamsu-Foguem, 2013; Mangal & Kumar, 2016).
 Benefits for the service department: To increase the customers' satisfaction by analyzing service-
related feedbacks, to monitor services in real-time, to predict a service by analyzing the historical
maintenance records, to select a suitable service strategy, to monitor product's status continuously, to
trace the products through its lifecycle and to prevent the failure before it occurs (Ko, Lee, Cho, &
Cho, 2017; Buddhakulsomsiri & Zakarian, 2009; Luo, Yan, Hu, Zhou, & Pang, 2015).
 Benefits for the marketing department: To identify promising customers, to predict customer churns,
to forecast customers' unspoken needs by analyzing data come from the customers' searching
recordings and historical purchasing behaviors, to match various products with various customers
respectively, to pick the most suited customers for a new product, to score customers, and so on
(Packianather, Davies, Harraden, Soman, & White, 2017; Purnama, Abdullah, Rokhmat, & Herawan
2015).
6. CHALLENGES OF MACHINE LEARNING IN MANUFACTURING
Through machine learning is a helpful tool based on the aforementioned advantages, some challenges
could be overcome when you are aware. All of the tasks which form the basic steps of the knowledge
discovery process may not always be easy to apply. The key challenges of ML in manufacturing that
most of the researchers agree upon are the following (Wuest, Weimer, Irgens, & Thoben, 2016). Learning
from and automatically adapting to changing environments is the main strength of machine learning. Due
to the dynamic and fast-changing manufacturing environment, the ML system should have the ability to
learn and adapt to changes, and the system designer needs to provide solutions for all possible situations.
Another major challenge is the acquisition of both accurate and relevant manufacturing data since it has a
strong influence on the performance of ML algorithms. A very common challenge of ML application in
manufacturing is the pre-processing of data since it has a critical impact on the results. Another key
challenge is the question of what the ML method and algorithm to choose. The last major challenge is the
interpretation of the results (Wuest, Weimer, Irgens, & Thoben, 2016). The details about these challenges
and many others can be listed as follows [RQ9]:
 Manufacturing data preparation problem:

The success of each ML technique depends on the data structure on which it is performed. However, it
is not an easy and rapid process to obtain well-organized data in manufacturing. Because the
manufacturing data are generally characterized by multisource (i.e., product, machine, process,
operator, raw material, environment and service data), heterogeneity (i.e., structural or unstructured,
syntactic or semantic data), and noisy (i.e., incomplete, incorrect, improper, duplicated and inconsistent
data) (Lv, Kim, Zheng, & Jin, 2018). Certain data pre-processing steps (data integration, cleaning,
reduction, transformation) are required before applying ML techniques. Data preprocessing has a
critical impact on the results. However, there are no standardized rules that indicate which data
preprocessing techniques should be applied for a specific type of manufacturing problem. Determining
the appropriate technique is achieved with insight and knowledge, or by trying and comparing
alternative options.
 Time-consuming problem:
When the time spent for all machine learning process is divided into sub-segments containing
preprocessing, feature extraction and classification, it is noticed that data preparation before
implementing any data mining algorithm takes the most (Wang, 2007). This is because manufacturing
datasets are mostly composed of a large sequence of events and non-standardized measurement values.
Sometimes the data preprocessing itself takes significant time, approximately 50% or 60% of the total
effort on a machine learning project.
 Missing data problem:

In manufacturing practice, the most common data preprocessing problem is the missing data problem
(Wuest, Weimer, Irgens, & Thoben, 2016) since it is not always possible to collect all values from
machine sensors with respect to time. Because of the machinery problems, some sensors may not take
correct measurement values for the specific period and those values are recorded as missing or with an
irrelevant numeric value that is out of the minimum or maximum limits of its attribute. Although there
are some solution methods (i.e., filling with mean value or most repeated value) to this problem, none
of them can eliminate information loss.
 Data selection problem:

A very common challenge of ML application in manufacturing is the selection of the data relevant to
the analysis from the available database. All manufacturing data obtained from machine measurements
may not always serve to be used in the data mining process to solve the targeted problem. They can be
related to different types of problems and become useless for that aspect. Principally, it is not obvious
what part of the whole manufacturing dataset to be utilized. This non-deterministic behavior forces data
miners to waste time within the pile of non-beneficial data.
 Imbalanced manufacturing data problem:

The other most common challenge related to data preparation is imbalanced manufacturing data which
may cause performance loss for ML algorithms (Ong, Choo, & Muda, 2015; Kim, Oh, Jung & Kim,
2018; Kim, Han, & Lee 2017). Inherent data imbalance may cause significant internal and external
failure costs (Fuqua & Razzaghi, 2020). Especially, if the task is binary classification, then the
capability of the method chosen can be adversely affected. For this reason, homogeneous class
distribution becomes very important for the implementation of an ML technique. Nonetheless, gathered
manufacturing data is generally unbalance distributed and so elements from one class can extremely
outnumber elements from the groups of the rest. To overcome this problem, a balancing technique,
such as SMOTE (synthetic minority oversampling technique), may be used.
 Interdisciplinary collaboration challenges:

Machine learning in manufacturing is quite a multi-disciplinary research area, which may require
expertise from both sides. Even though there are a great number of various ML methods that are proper
for different kinds of manufacturing data, they all require clear descriptions and distinctive behaviors
about data. Negative impacts on performances of ML methods may be caused by the non-existence of
data descriptions and improperly determined class labels. The interpretation of domain experts can be
needed to clarify the meaning of each measurement. In this respect, it is necessary to bring researchers
and practitioners from different disciplines to work together.
 Manufacturing data protection and security concerns:

Manufacturing institutions are generally profit-oriented foundations and they compete with opponent
brands that produce the same kind of product or offer the same kind of services in the market. This
business competition makes them protect valuable information, which is extracted by data mining, from
others. That is to say, they automatically tend to hide their manufacturing data due to data protection
and security concerns. At least, it becomes difficult for data miners to access original or full-size data.
There is a strong requirement for using a test dataset to evaluate performances of machine learning
methods. The main cause is that a technique or an approach should not be proposed unless it is
supported with promising results implemented on a validated manufacturing dataset. However, many
manufacturing companies do not present testing data as public access. Although several benchmark
datasets related to manufacturing exist, they should be increased to allow scientists to try new ML
techniques on real-life datasets.
 High dimensionality problem:

In ML studies, it is necessary to deal with high dimensionality and complexity of manufacturing data.
The curse of dimensionality diminishes computational performance (Wu, Zhou, Tsai, Yu, & Dauzere-
Peres, 2020). To reduce the dimensionality of the feature vector, several studies used techniques such
as principal component analysis (PCA) (Munirathinam & Ramadoss, 2016; Ge, Song, Ding, & Huang,
2017) and conditional mutual information maximization (CMIM) (Meidan, Lerner, Rabinowitz, &
Hassoun, 2011). Not only the algorithm will speed up, but it is also sometimes the best way to get good
results on complex data (Nedelkoski & Stojanovski, 2017; Wuest, Irgens, & Thoben, 2014). Some
algorithms (i.e., SVM) can handle high dimensionality better than others (Rostami, Dantan, & Homri,
2015; Tian, Fu, & Wu, 2015). In some manufacturing studies, there can be complex relationships
among multiple variables and various factors that cannot be easily captured with simple approaches.
Another problem is to deal with heterogeneous data by adding problem-specific science algorithms to
the solution. ML algorithms have to cope with different types of data (i.e., continuous, discrete,
nominal, and ordinal). Most of the distance measurement metrics are calculated distances according to
numeric values like Euclidean, Minkowski, and Manhattan. There are also some distance metrics based
on categorical data such as Jackard Distance, but in this case, the method needs to span all values from
each attribute for the element-pairs.
 Big data problem:

As a result of recent developments in information technology and machines with growing complexity
and sensor equipment, data acquisition systems lead to an enormous amount of data that is continuously
increasing and contains hundreds of variables. Large scale and the sheer volume of manufacturing data
is a big challenge in its own right since it increases computational cost. Big data always needs more
time and processing power. There are various ML techniques; however, existences of a huge amount of
data in this sector force scientists to avoid utilizing some ML methods due to complex time
requirements. To overcome this problem, big data platforms such as Hadoop and Spark can be used, as
well as big data enabling technologies such as cloud computing, in‐memory computing and grid
computing (Zhang, Ren, Liu, & Si, 2017; Lv, Kim, Zheng, & Jin, 2018; Alabi, Nixon & Botef, 2018;
Jong, Rubrico, Atachi, Nakamura, & Ota, 2017; Mangal & Kumar, 2016; Nagorny, Lima-Monteiro,
Barata, & Colombo 2017).
 Complex nature of manufacturing problems:

Manufacturing includes dynamic, non-linear, complex, and often chaotic processes. Especially,
optimization tasks have an NP-complete nature and this creates a certain challenge in manufacturing.
When data volume is enormously large, this situation is added as another difficulty to machine learning
in manufacturing. Reliable machine learning has to deal with basically four important aspects:
generalization, robustness, adaptation, and the growing size of the data.
 Algorithm selection challenges:

Every knowledge discovery operation is unique in its context. Therefore, the optimal ML technique
which is capable of solving the targeted manufacturing problem is not clear previously. Experts who
have some experiences can only propose some kind of approach by referring to past studies. However,
those suggestions do not guarantee positive results for that context.
Each manufacturing problem is different and the performance of each algorithm depends on the data
available, as well as parameter settings. Researchers should be aware of available algorithms in the
literature and should compare them according to their specific performance in manufacturing
applications by different attributes. However, since manufacturers have solely worked in the
manufacturing sector, they do not have strong background information and experience about ML
methods and their usage. The opposite fact also creates another difficulty in manufacturing. That is to
say, ML experts may have great difficulty in focusing on manufacturing because of insufficient
knowledge and experience in this field of study. Therefore, it would be advisable to organize
collaboration between data miners and manufacturing experts.
In addition, the optimal parameter settings of the algorithms should be determined through a number of
tries (Huang, Pan, Lin & Guo, 2018). It can be considered a general challenge for most research in
manufacturing. Alternative algorithms and parameter trials bring about time usage costs. For this
reason, it has taken too much time to put machine learning methods in daily practice in the
manufacturing sector. Furthermore, more than one machine learning techniques may be applied one
after another (Syafrudin, Alfian, Fitriyani, & Rhee, 2018; Sand, Kunz, Hubbert, & Franke, 2016;
Wuest, Irgens, & Thoben, 2014; Lieber, Stolpe, Konrad, Deuse, & Morik, 2013). In addition, the
model should be updated dynamically with the most current data to enable potential changes.
 Challenges about the evaluation of results:

Evaluation of the DM results obtained in manufacturing is one of the most difficult processes. The
question arises as to how to assess the validity of the results. It is also necessary to evaluate the under-
fitting and overfitting of algorithms. Data mining methods produce patterns and rules which are specific
to the domain, so the interpretation becomes challenging. The knowledge of data mining experts may
not be sufficient in this aspect. Hence, they probably need to cooperate with related domain experts so
as to succeed in knowledge discovery.
7. FUTURE DIRECTIONS
Machine learning applications will probably keep growing at a higher rate particularly in manufacturing
[RQ10] because computing power is increasing day by day and the size of available data is much greater
than it was in a few years ago. Big data techniques and technologies can process high dimensional data.
Especially concerning the increasing availability of manufacturing data, they will most likely become
even more important in the future.
Association rule mining is more frequently used in the manufacturing industry than sequential pattern
mining (SPM). The extended investigation of SPM, considering time information for the model
development, could enable manufacturers to respond promptly to time-related (temporal) situations.
Frequent sequential patterns provide potentially important knowledge to predict future activities.
Text mining has become widespread lately in many fields. However, there is limited usage of it in the
manufacturing industry. Future works can be focused on text mining related to manufacturing. Because
textual data presence of manufacturing suppliers is continually getting higher in number and more than
80% of corporate information are stored as textual. A variety of text mining methods like association rule
generation, classification, or clustering can be implemented in order to process this data in great sizes and
extract valuable knowledge about manufacturing. Manufacturing documents can be classified or
clustered according to their types, main contents, and similarities. Text mining methods may be very
helpful to manage those digital document sources. For example, sentiment analysis can be used as a
methodology to investigate the emotions in manufacturing-related content and as a tool to assist in
analyzing consumer trends. Text mining implementations such as question answering, expert finding,
sentiment detection, recommendation, part of speech tagging, and parsing can be increased to improve
manufacturing.
Manufacturing companies share information about the business (i.e., plans, processes, materials, and
technologies) on the web, which can help manufacturers make decisions based on this information.
Growing internet usage makes a boosting impact on the creation of complex digital manufacturing
information. Web mining provides an automated mechanism to collect the manufacturing relevant data
and to extract valuable and intelligible business information from this mass Web business data, to further
decision support. However, different jargons used by different manufacturing companies cause confusion
and difficulties in a practical and dynamic process. Therefore, more contributions are also expected on
the topic of web mining in manufacturing in the future.
Recently, several approaches based on ontology for the representation of information semantics were
implemented to organize manufacturing data. However, more contributions are also expected in the
development of ontology-based manufacturing systems using ML techniques in the coming years.
Because ontology-based systems provide the extraction of semantic relationships to improve accuracy
and to form better decision support systems. New ML algorithms or heuristic approaches may be
developed to match manufacturing concepts to ontology. Several ontologies can be developed for the
manufacturing domains.
While both supervised and unsupervised learning have been widely carried out for the manufacturing
sector, which approximately accounts for 90-95% of all applications, reinforcement learning (RL) has
been studied less extensively than others. However, RL offers goal-directed learning without requiring
external supervision, adapts to dynamic environments, and provides frameworks for understanding and
modeling systems in the face of rewards and punishments. RL can help solve complex combinatorial
decision-making problems in manufacturing, especially for addressing a wide variety of planning and
control problems. RL will probably become more popular in the manufacturing industry in the near
future.
The current clustering studies in the manufacturing sector generally use the k-means algorithm.
However, the k-means++ algorithm could be addressed in future research to improve both the speed and
the accuracy of k-means. Very few studies (Nakata, Orihara, Mizuoka, & Takagi, 2017) have been used
k-means++ so far in the manufacturing sector. DBSCAN will probably play more important roles shortly
since it is capable of forming arbitrarily shaped clusters and dealing with noise in the data.
There are two common approaches in data processing: batch processing and stream processing (real-time
data processing). Batch data processing is conducted statically where a group of transactions is collected
over a period of time and processed to construct a model, as well as the constructed model, is updated
dynamically based on newly accumulated data. Stream (real-time) data processing provides processing
data as it comes in. Most of the data mining studies in the manufacturing area were conducted on batch
data. However, recent improvements in technology demand stream data processing to gain the
completive advantage of real-time decision-making. In real-time or near real-time data processing, the
fast response time is critical, so processing time in seconds is acceptable. Real-time data mining and
learning in the manufacturing area will be an important and difficult subject for further research.
The usage of robots has started to be used in the manufacturing field. Industrial robots have become a
new tendency for manufacturing companies. They are drawing attention more and more each day. It is
estimated that ML techniques, presence of smart factories, and utilization of industrial robots, will play a
much more important role and applications based on them will significantly increase in manufacturing in
the near future.
8. CONCLUSION
This paper provides a review of the literature about current trends of ML and DM applications developed
in the manufacturing industry. Current applications were identified along with suitable techniques to
perform the required assignment. Different types of machine learning such as supervised (classification
and regression), unsupervised (clustering, ARM, SPM, anomaly detection), ensemble learning, and deep
learning are briefly introduced with their applications in the manufacturing sector. This paper also
presents the advantages of ML-based studies in the industrial domain. It also gives a clear idea about the
difficulties that the practitioners of machine learning face when studying with manufacturing processes
and equipment.
CRediT authorship contribution statement

Alican Dogan: Investigation; Methodology; Visualization; Writing - original draft.
Derya Birant: Supervision; Investigation; Validation; Writing - review & editing.
Declaration of interest
None.
Funding
This research did not receive any specific grant from funding agencies in the public, commercial, or
not-for-profit sectors.
APPENDIX
Although we aim to provide a comprehensive survey on “ML and DM in manufacturing”, we cannot

possibly cover every study in existence. Due to the sheer size of the literature on the topic, we focused
on the key insights, the most influential works, and the most important developments in the area over
the past twenty years. We followed the following steps when conducting this systematic review:
Step 1 - Defining Research Questions (RQ): We specified unambiguous and structured

problems/questions to be addressed by the review. The RQs addressed in this paper is given in Section
1.
Step 2 - Selecting Search Terms: We selected search terms that we would use to identify relevant
articles in various databases. The selected search terms are given in Section 2.
Step 3 - Conducting the Search and Identifying Relevant Work: We conducted a comprehensive
search of the literature by using three sources (Scopus, Google Scholar, and Web of Science) to
identify relevant articles for our analysis. Especially we focused on three parts of the articles: title,
abstract, and keywords. When preparing statistical results given in Figure 1, the queries were executed
in Scopus, Google Scholar, and Web of Science.
Step 4 - Assessing the Quality of Studies: We assessed the relevant studies with a more refined quality
assessment by using general critical appraisal and by considering inclusion and exclusion criteria. We
grouped the relevant studies according to various criteria such as their manufacturing subjects
(scheduling, monitoring, quality, and failure detection), tasks (i.e., clustering, classification,
regression), algorithms (i.e., support vector machine, neural network), learning types (i.e., ensemble
learning, deep learning), and performance metrics (i.e., accuracy, mean absolute error). As a result, we
selected approximately 130 papers from different groups. We especially focused on articles recently
published in prestigious journals within this scope.
Step 5 - Summarizing the Evidence: We extracted relevant data from individual studies and used
established methods to synthesize the data such as statistical methods.
Step 6 - Interpreting the Findings: Corresponding to each RQ, we interpreted the results, summarized
into review findings, and prepared a comprehensive report on all aspects of our systematic review.
REFERENCES
Ahmad, A., & Dey, L. (2007). A k-mean clustering algorithm for mixed numeric and categorical data.
Data & Knowledge Engineering, 63(2), 503-527. doi:10.1016/j.datak.2007.03.016.
Ahmadinia, M., Alinejad-Rokny, H., & Ahangarikiasari, H. (2014). Data aggregation in wireless sensor
networks based on environmental similarity: A learning automata approach. Journal of Networks,
9(10), 2567-2573. doi:10.4304/jnw.9.10.2567-2573.
Ahmadinia, M., Meybodi, M. R., Esnaashari, M., & Alinejad-Rokny, H. (2013). Energy-efficient and
multi-stage clustering algorithm in wireless sensor networks using cellular learning automata. IETE
Journal of Research, 59(6): 774-782. doi:10.4103/0377-2063.126958.
Alabi, M.O., Nixon, K., & Botef, I. (2018). A survey on recent applications of machine learning with big
data in additive manufacturing industry. American Journal of Engineering and Applied Sciences,
11(3), 1114-1124. doi:10.3844/ajeassp.2018.1114.1124.
Alfaro-Cortes, E., Alfaro-Navarro, J., Gamez, M., & Garcia, N. (2020). Using random fores to interpret
out-of-control signals. Acta Polytechnica Hungarica, 17(6), 115-130.
doi:10.12700/APH.17.6.2020.6.7.
Amruthnath, N., & Gupta, T. (2018). A research study on unsupervised machine learning algorithms for
early fault detection in predictive maintenance. In Proceedings of the 5th International Conference on
Industrial Engineering and Applications (ICIEA), Singapore, 26-28 April 2018, (pp. 355-361).
doi:10.1109/IEA.2018.8387124.
Arellano-Espitia, F., Delgado-Prieto, M., Martinez-Viol, V., Saucedo-Dorantes, J., & Osornio-Rios, R.
(2020). Deep-learning-based methodology for fault diagnosis in electromechanical systems. Sensors,
20(14), 3929. doi:10.3390/s20143949.
Arif, F., Suryana, N., & Hussin, B. (2013). A data mining approach for developing quality prediction
model in multi- stage manufacturing. International Journal of Computer Applications, 69(22), 5-40.
doi:10.5120/12106-8375.
Bai, Y., Li, C., Sun, Z., & Chen, H. (2017). Deep neural network for manufacturing quality prediction.
In Proceedings of the 8th IEEE Prognostics and System Health Management Conference, Harbin,
China, 9-12 July 2017. doi:10.1109/PHM.2017.8079165.
Bai, Y., Sun, Z., Deng, J., Li, L., Long, J., & Li, C. (2018). Manufacturing quality prediction using
intelligent learning approaches: A comparative study. Sustainability, 10, article no 85, 1-15.
doi:10.3390/su10010085.
Bergmann, S., Feldkamp, N., & Strassburger, S. (2017). Emulation of control strategies through machine
learning in manufacturing simulations. Journal of Simulation, 11(1), 38-50. doi:10.1057/s41273-016-
0006-0.
Buddhakulsomsiri, J., & Zakarian, A. (2009). Sequential pattern mining algorithm for automotive
warranty data. Computers & Industrial Engineering, 57(1), 137-147. doi:10.1016/j.cie.2008.11.006.
Bustillo, A., Urbikain, G., Perez, J., Pereira, O., & de Lacalle, L. (2018). Smart optimization of a friction-
drilling process based on boosting ensembles. Journal of Manufacturing Systems, 48 (Part C), 108-
121. doi:10.1016/j.jmsy.2018.06.004.
Cheng, Y., Chen, M., Cheng, F., Cheng, Y., Lin, Y., & Yang, C. (2018). Developing a decision support
system (DSS) for a dental manufacturing production line based on data mining. In Proceedings of 4th
IEEE International Conference on Applied System Innovation 2018 (ICASI 2018), Chiba, Japan, 13-
17 April 2018, (pp. 638-641). doi:10.1109/ICASI.2018.8394336.
Cho, E., Jun, J., Chang, T., & Choi, Y. (2020). Quality prediction modeling of plastic extrusion process.
ICIC Express Letters Part B: Applications, 11(5), 447–452. doi:10.24507/icicelb.11.05.447.
Choudhary, A., Harding, J., & Tiwari, M. (2009). Data mining in manufacturing: a review based on the
kind of knowledge. Journal of Intelligent Manufacturing, 20, 501–521. doi:10.1007/s10845-008-
0145-x.
Cupek, R., Ziebinski, A., Zonenberg, D., & Drewniak, M. (2018). Determination of the machine energy
consumption profiles in the mass-customised manufacturing. International Journal of Computer
Integrated Manufacturing, 31(6), 537-561. doi:10.1080/0951192X.2017.1339914.
Das, B., Pal, S., & Bag, S. (2017). Torque based defect detection and weld quality modelling in friction
stir welding process. Journal of Manufacturing Processes, 27, 8-17.
doi:10.1016/j.jmapro.2017.03.012.
Deng, H., Diao, Y., Wu, W., Zhang, J., Ma, M., & Zhong, X. (2019). A high speed D-CART online fault
diagnosis algorithm for rotor systems. Applied Intelligence, 50, 29-41. doi:10.1007/s10489-019-
01516-2.
Djatnaa, T., & Alitu, I. (2015). An application of association rule mining in total productive maintenance
strategy: An analysis and modelling in wooden door manufacturing industry. Procedia
Manufacturing, 4, 336-343. doi:10.1016/j.promfg.2015.11.049.
Djelloul, I., Sari, Z., & Sidibe, I. (2018). Fault diagnosis of manufacturing systems using data mining
techniques. In Proceedings of the 5th International Conference on Control, Decision and Information
Technologies, CoDIT 2018, Thessaloniki, Greece, April 10-13, 2018, (pp. 198-203).
doi:10.1109/CoDIT.2018.8394807.
Dolgui, A., Bakhtadze, N., Pyatetsky, V., Sabitov, R., Smirnova, G., Elpashev, D., & Zakharov, E.
(2018). Data mining-based prediction of manufacturing situations. IFAC-PapersOnLine, 51(11), 316-
321. doi:10.1016/j.ifacol.2018.08.302.
Dubey, R., Gunasekaran, A., Childe, S., Bryde, D., Giannakis, M., Foropon, C., Roubaud, D., & Hazen,
B. (2020). Big data analytics and artificial intelligence pathway to operational performance under the
effects of entrepreneurial orientation and environmental dynamism: a study of manufacturing
organizations. International Journal of Production Economics, 226, Article 107599. doi:
10.1016/j.ijpe.2019.107599.
Ferreira, R., Sabbaghi, A., & Huang, Q. (2020). Automatic geometric shape deviation modelling for
additive manufacturing systems via Bayesian neural networks. IEEE Transactions on Automation
Science and Engineering, 72, 584-598. doi:10.1109/TASE.2019.2936821.
Forero-Ramirez, J., Restrepo-Giron, A., & Nope-Rodriguez, S. (2019). Detection of internal defects in
carbon fiber reinforced plastic slabs using background thermal compensation by filtering and support
vector machines. Journal of Nondestructive Evaluation, 38(1), Article Number 33. doi:
10.1007/s10921-019-0569-6.
Fuqua, D., & Razzaghi, T. (2020). A cost sensitive convolutional neural network for control chart pattern
recognition. Expert Systems with Applications, 150, Article Number 113275.
doi:10.1016/j.eswa.2020.113275.
Gandhi, K., Schmidt, B., & Ng, A. (2018). Towards data mining based decision support in manufacturing
maintenance. Procedia CIRP, 72, 261-265. doi:10.1016/j.procir.2018.03.076.
Gao, K., Chen, H., Zhang, X., Ren, X., Chen, J., & Chen, X. (2019). A novel material removal prediction
method based on acoustic sensing and ensemble XGBoost learning algorithm for robotic belt grinding
of Inconel 718. International Journal of Advanced Manufacturing Technology, 105(1-4), 217-232.
10.1007/s00170-019-04170-7.
Ge, Z., Song, Z., Ding, S., & Huang, B. (2017). Data mining and analytics in the process industry: The
role of machine learning. IEEE Access, 5, 20590-20616. doi:10.1109/ACCESS.2017.2756872.
Han, J., & Chi, S. (2016). Consideration of manufacturing data to apply machine learning methods for
predictive manufacturing. In Proceedings of the Eighth International Conference on Ubiquitous and
Future Networks (ICUFN), July 5-8, 2016, Vienna, Austria, (pp. 109-113).
doi:10.1109/ICUFN.2016.7536995.
Harding, J., Shahbaz, M., Srinivas, S., & Kusiak, A. (2005). Data mining in manufacturing: A review.
Journal of Manufacturing Science Engineering, 128(4), 969-976. doi:10.1115/1.2194554.
Herrera, A., Stoyanov, S., Bailey, C., Walshaw, C., & Yin, C. (2019). Data analytics to reduce stop-on-
fail test in electronic manufacturing. Open Computer Science, 9, 200-211. doi:10.1515/comp-2019-
0014.
Hu, L., Zhou, M., Xiang, F., & Feng, Q. (2018). Modeling and recognition of steel-plate surface defects
based on a new backward boosting algorithm. International Journal of Advanced Manufacturing
Technology, 94(9), 4317-4328. doi:10.1007/s00170-017-1113-4.
Huang, J., Segura, L., Wang, T., Zhao, G., Sun, H., & Zhou, C. (2020). Unsupervised learning for droplet
evolution prediction and process dynamics understanding in inkjet printing. Additive Manufacturing,
35, 101197. doi:10.1016/j.addma.2020.101197.
Huang, Y., Pan, C., Lin, S., & Guo, M. (2018). Machine-learning approach in detection and classification
for defects in TSV-based 3-DIC. IEEE Transactions on Components, Packaging and Manufacturing
Technology, 8(4), 699-706. doi:10.1109/TCPMT.2017.2788896.
Imoto, K., Nakai, T., Ike, T., Haruki, K., & Sato, Y. (2019). A CNN-based transfer learning method for
defect classification in semiconductor manufacturing. IEEE Transactions on Semiconductor
Manufacturing, 32(4), 455-459. doi:10.1109/TSM.2019.2941752.
Ishizuka, D., Izui, K., Yamada, T., & Nishiwaki, S. (2016). Comparison of data mining techniques for
analysis of pareto optimal solutions in layout planning problems in manufacturing systems. In
Proceedings of the 46th International Conferences on Computers and Industrial Engineering (CIE
2016), Tianjin, China, 29-31 October, 2016.
Ismail, R., Othman, Z., & Abu Bakar, A. (2012). A production schedule generator framework for pattern
sequential mining. In Proceedings of the 7th International Conference on Computing and
Convergence Technology (ICCIT 2012), Seoul, South Korea, 3-5 December 2012, (pp. 784-788).
Jin, C., Na, H., Piao, M., Pok, G., & Ryu, K. (2019). A novel DBSCAN-based defect pattern detection
and classification framework for wafer bin map. IEEE Transactions on Semiconductor
Jong de, W. A., Rubrico, J., Adachi, M., Nakamura, T., & Ota, J. (2017). Big data in automation:
Towards generalized makespan estimation in shop scheduling problems. In Proceedings of the 13th
IEEE Conference on Automation Science and Engineering (CASE), Xi'an, China, August 20-23, 2017,
(pp. 1516-1521). doi:10.1109/COASE.2017.8256319.
Jung, Y-S., & Chang, T-W. (2018). Text mining based online news analysis about smart factory. ICIC
Express Letters, Part B: Applications, 9(6), 559-565. doi:10.24507/icicelb.09.06.559.
Kamsu-Foguem, B., Rigal, F., & Mauget, F. (2013). Mining association rules for the quality
improvement of the production process. Expert Systems with Applications, 40(4), 1034-1045.
doi:10.1016/j.eswa.2012.08.039.
Kannatey-Asibu, E., Yum, J., & Kim, T. (2017). Monitoring tool wear using classifier fusion.
Mechanical Systems and Signal Processing, 85, 651-661. doi:10.1016/j.ymssp.2016.08.035.
Kao, H., Hsieh, Y., Chen, C., & Lee, J. (2017). Quality prediction modeling for multistage manufacturing
based on classification and association rule mining. In Proceedings of the 2nd International
Conference on Precision Machinery and Manufacturing Technology, ICPMMT 2017, Pingtung,
Taiwan, 19-21 May 2017, MATEC Web of Conferences, 123, 21 September 2017, Article number
00029, 1-6. doi:10.1051/matecconf/201712300029.
Kerdrasop, K., & Kerdrasop, N. (2011). A data mining approach to automate fault detection model
development in the semiconductor manufacturing process. International Journal of Mechanics. 5(4),
336-344.
Kim, A., Oh, K., Jung, J., & Kim, B. (2018). Imbalanced classification of manufacturing quality
conditions using cost-sensitive decision tree ensembles. International Journal of Computer Integrated
Manufacturing. 31(8), 701-171. doi:10.1080/0951192X.2017.1407447.
Kim, D., Kang, P., Cho, S., Lee, H., & Doh, S. (2012). Machine learning-based novelty detection for
faulty wafer detection in semiconductor manufacturing. Expert Systems with Applications. 39(4),
4075-4083. doi:10.1016/j.eswa.2011.09.088.
Kim, J., Han, Y., & Lee, J. (2016). Euclidean distance based feature selection for fault detection
prediction model in semiconductor manufacturing process. Advanced Science and Technology
Letters. 133, 85-89.
Kim, J., Han, Y., & Lee, J. (2017). Particle swarm optimization–deep belief network–based rare class
prediction model for highly class imbalance problem. Concurrency and Computation Practice and
Experience, 29, 1-11. doi:10.1002/cpe.4128.
Ko, T., Lee, J., Cho, H., & Cho, S. (2017). Machine learning-based anomaly detection via integration of
manufacturing, inspection and after-sales service data. Industrial Management & Data Systems,
117(5), 927-945. doi:10.1108/IMDS-06-2016-0195.
Koksal, G., Batmaz, I., & Testik, M. (2011). A review of data mining applications for quality
improvement in manufacturing industry. Expert Systems with Applications, 38(10), 13448-13467.
doi:10.1016/j.eswa.2011.04.063.
Kujawinska, A., Rogalewicz, M., Muchowski, M., & Stankowska, M. (2018). Application of cluster
analysis in making decision about purchase of additional materials for welding process, Smart
Technology. 10-20. doi:10.1007/978-3-319-73323-4_2.
Lee, C., Choy, K., Ho, G., Chin, K., Law, K., & Tse, Y. (2013). A hybrid OLAP-association rule mining
based quality management system for extracting defect patterns in the garment industry. Expert
Systems with Applications, 40(7), 2435-2446. doi:10.1016/j.eswa.2012.10.057.
Lee, H., Kim, Y., & Kim, C. (2017). A deep learning model for robust wafer fault monitoring with sensor
measurement noise. IEEE Transactions on Semiconductor Manufacturing, 30(1), February 2017,
Article number 7744687, 23-31. doi:10.1109/TSM.2016.2628865.
Lee, J., Noh, S., Kim, H., & Kang, Y. (2018). Implementation of cyber-physical production systems for
quality prediction and operation control in metal casting. Sensors, 18(5), 1-17,
doi:10.3390/s18051428. doi:10.3390/s18051428.
Lee, K., Cheon, S., & Kim, C. O. (2017). A convolutional neural network for fault classification and
diagnosis in semiconductor manufacturing processes. IEEE Transactions on Semiconductor
Manufacturing, 30(2), May 2017, 135-142. doi:10.1109/TSM.2017.2676245.
Lei, Q., Shao bo, L., & Jing kun, C. (2017). Online monitoring of manufacturing process based on
autoCEP. International Journal of Online Engineering, 13(6), 22-34. doi:10.3991/ijoe.v13i06.6812.
Li, T., Li, Y., An, D., Han, Y., Xu, S., Lu, Z., & Crittenden, J. (2019). Mining of association rules
between industrialization level and air quality to inform high-quality development in China. Journal
of Environmental Management, 246, 564-574. doi:10.1016/j.jenvman.2019.06.022.
Lieber, D., Stolpe, M., Konrad, B., Deuse, J., & Morik, K. (2013). Quality prediction in interlinked
manufacturing processes based on supervised & unsupervised machine learning. In Proceedings of the
Forty Sixth CIRP Conference on Manufacturing Systems 2013, Procedia CIRP, 7, 193-198.
doi:10.1016/j.procir.2013.05.033.
Lim, H., Kim, Y., & Kim, M. (2017). Failure prediction using sequential pattern mining in the wire
bonding process. IEEE Transactions on Semiconductor Manufacturing, 30(3), 285-292.
doi:10.1109/TSM.2017.2721820.
Lingitz, L., Gallina, V., Ansari, F., Gyulai, D., Pfeiffer, A., Sihn, W., & Monostori, L. (2018). Lead time
prediction using machine learning algorithms: A case study by asemiconductor manufacturer.
Procedia CIRP, 72, 1051-1056. doi:10.1016/j.procir.2018.03.148.
Liukkonen, M., & Hiltunen, Y. (2018). Recognition of systematic spatial patterns in silicon wafers based
on SOM and k-means. IFAC-PapersOnLine, 51(2), 1 January 2018, 439-444.
doi:10.1016/j.ifacol.2018.03.075.
Luo, M., Yan, H., Hu, B., Zhou, J., & Pang, C. (2015). A data-driven two-stage maintenance framework
for degradation prediction in semiconductor manufacturing industries. Computers and Industrial
Engineering, 85, 1 September 2015, Article number 4006, 414-422. doi:10.1016/j.cie.2015.04.008.
Lv, S., Kim, H., Zheng, B., & Jin, H. (2018). A review of data mining with big data towards its
applications in the electronics industry, Applied Sciences, 8(4), 582-616. doi:10.3390/app8040582.
Mangal, A., & Kumar, N. (2016). Using big data to enhance the bosch production line performance: A
Kaggle challenge. In Proceedings of the IEEE International Conference on Big Data. December 5-8,
Washington D.C, USA, (pp. 2029-2035). doi:10.1109/BigData.2016.7840826.
Meidan, Y., Lerner, B., Rabinowitz, G., & Hassoun, M. (2011). Cycle-time key factor identification and
prediction in semiconductor manufacturing using machine learning and data mining. IEEE
Transactions on Semiconductor Manufacturing, 24(2), 237-248. doi:10.1109/TSM.2011.2118775.
Minaei-Bidgoli, B., Asadi, M., & Parvin, H. (2011). An ensemble based approach for feature selection.
In L. Iliadis, & C. Jayne (Eds), Engineering Applications of Neural Networks. Advances in
Information and Communication Technology, 363, 240-246. doi:10.1007/978-3-642-23957-1_27.
Minaei-Bidgoli, B., Parvin, H., Alinejad-Rokny, H., Alizadeh, H., & Punch, W. F. (2014). Effects of
resampling method and adaptation on clustering ensemble efficacy. Artificial Intelligence Review, 41,
27-48. doi: 10.1007/s10462-011-9295-x.
Mohammadi, P., & Wang, Z. (2016). Machine learning for quality prediction in abrasion- resistant
material manufacturing process. In Proceedings of the IEEE Canadian Conference on Electrical and
Computer Engineering. May 15-18, 2016, Vancouver, Canada, (pp.1-4).
doi:10.1109/CCECE.2016.7726783.
Moldovan, D., Cioara, T., Anghel, I., & Salomie, I. (2017). Machine learning for sensor-based
manufacturing processes. In Proceedings of the 13th IEEE International Conference on Intelligent
Computer Communication and Processing (ICCP), Cluj-Napoca, Romania, 7-9 Sept. 2017, (pp. 147-
154). doi:10.1109/ICCP.2017.8116997.
Munirathinam, S., & Ramadoss, B. (2016). Predictive models for equipment fault detection in the
semiconductor manufacturing process. IACSIT International Journal of Engineering and Technology,
8(4), August 2016. 273-285. doi:10.7763/IJET.2016.V8.898.
Mutlu, N. & Altuntas, S. (2019). Assessment of occupational risks in Turkish manufacturing systems
with data-driven models. Journal of Manufacturing Systems, 53, 169-182.
doi:10.1016/j.jmsy.2019.09.008.
Nagorny, K., Lima-Monteiro, P., Barata, J., & Colombo, A. (2017). Big data analysis in smart
manufacturing: A review. International Journal of Communications, Network and System Sciences,
10(3), 31-58. doi:10.4236/ijcns.2017.103003.
Nakata, K., Orihara, R., Mizuoka, Y., & Takagi, K. (2017). A comprehensive big-data-based monitoring
system for yield enhancement in semiconductor manufacturing. IEEE Transactions on Semiconductor
Nedelkoski, S., & Stojanovski, G. (2017). Machine learning for large scale manufacturing data with
limited information. In Proceedings of the 13the IEEE International Conference on Control and
Automation, 3-6 July 2017, Ohrid, Macedonia, (pp.70-75). doi:10.1109/ICCA.2017.8003037.
Ong, P-L., Choo, Y-H., & Muda, A. K. (2015). A manufacturing failure root cause analysis in imbalance
data set using PCA weighted association rule mining. Jurnal Teknologi, 77(18), 103-111.
doi:10.11113/jt.v77.6496.
Packianather, M., Davies, A., Harraden, S., Soman, S., & White J. (2017). Data mining techniques
applied to a manufacturing SME. Procedia CIRP, 62, 123-128. doi:10.1016/j.procir.2016.06.120.
Park, H. & Jung, J. (2020). SAX-ARM: Deviant event pattern discovery from multivariate time series
using symbolic aggregate approximation and association rule mining. Expert Systyems with
Applications, 141, Article Number: 112950. doi:10.1016/j.eswa.2019.112950
Parvin, H., Alinejad-Rokny, H., Minaei-Bidgoli, B., & Parvin, S. (2013). A new classifier ensemble
methodology based on subspace learning. Journal of Experimental & Theoretical Artificial
Intelligence, 25(2), 227-250. doi:10.1080/0952813X.2012.715683.
Parvin, H., Alinejad-Rokny, H., & Parvin, S. (2013). A classifier ensemble of binary classifier
ensembles. International Journal of Learning Management Systems, 1(2), 37-47.
doi:10.12785/ijlms/010204.
Parvin, H., Helmi, H. Minaei-Bidgoli, B., Alinejad-Rokny, H., & Shirgahi, H. (2011). Linkage learning
based on differences in local optimums of building blocks with one optima. International Journal of
Physical Sciences, 6(14), 3419-3425. doi:10.1016/j.compeleceng.2013.02.004.
Parvin, H., Minaei-Bidgoli, B., & Alinejad-Rokny, H. (2013). A new imbalanced learning and dictions
tree method for breast cancer diagnosis. Journal of Bionanoscience, 7(6), 673-679. doi:
10.1166/jbns.2013.1162.
Parvin, H., Minaei-Bidgoli, B., Alinejad-Rokny, H., & Punch, W. F. (2013). Data weighing mechanisms
for clustering ensembles. Computers & Electrical Engineering, 39(5), 1433-1450.
doi:10.5897/IJPS11.798.
Parvin, H., MirnabiBaboli, M., & Alinejad-Rokny, H. (2015). Proposing a classifier ensemble framework
based on classifier selection and decision tree. Engineering Applications of Artificial Intelligence, 37,
34-42. doi:10.1016/j.engappai.2014.08.005.
Pavlyshenko, B. (2016). Machine learning, linear and bayesian models for logistic regression in failure
detection problems. In Proceedings of the 2016 IEEE International Conference on Big Data, 5-8 Dec.
2016, Washington, DC, USA, (pp. 2046-2050). doi:10.1109/BigData.2016.7840828.
Pham, D., & Afify A. (2005). Machine-learning techniques and their applications in manufacturing. In
the Proceedings of the Institution of Mechanical Engineers Part B, Journal of Engineering
Manufacture, 219(5), 395-412. doi:10.1243/095440505X32274.
Pospisil, M., Bartik, V., & Hruska, T. (2016). Analyzing machine performance using data mining. In
Proceedings of the IEEE Symposium Series on Computational Intelligence (SSCI), Athens, Greece, 6-
9 Dec. 2016, (pp. 1-7). doi:10.1109/SSCI.2016.7849923.
Priore, P., Ponte, B., Puente, J., & Gómez, A. (2018). Learning-based scheduling of ﬂexible
manufacturing systems using ensemble methods. Computers & Industrial Engineering, 126, 282-291.
doi:10.1016/j.cie.2018.09.034.
Purnama, Y., Abdullah, Z., Rokhmat, R., & Herawan, T. (2015). Mining interesting least association
rules in manufacturing industry: A case study in MODENAS. International Journal of Control and
Automation, 8(12), 47-64. doi:10.14257/ijca.2015.8.12.05.
Quatrini, E., Costantino, F., Di Gravio, G., & Patriarca, R. (2020). Machine learning for anomaly
detection and process phase classification to improve safety and maintenance activities. Journal of
Manufacturing Systems, 56, 117-132. doi:10.1016/j.jmsy.2020.05.013.
Raktham, T., & Piromsopa K. (2011). Development of workload models for CNC machines from 3 -
Phase current consumption using ensemble method. In Proceedings of the International Conference
on System Science, Engineering Design and Manufacturing Informatization, ICSEM 2011, Guiyang,
China, 22-23 October 2011, (pp. 102-105). doi:10.1109/ICSSEM.2011.6081155.
Ren, L., Sun, Y., Cui, J., & Zhang, L. (2018). Bearing remaining useful life prediction based on deep
autoencoder and deep neural networks. Journal of Manufacturing Systems, 48(Part C), July 2018, 71-
77. doi:10.1016/j.jmsy.2018.04.008.
Rivetti, N., Busnel, Y., & Gal, A. (2017). FlinkMan : Anomaly Detection in manufacturing equipment
with Apache Flink. In Proceedings of the 11th ACM International Conference on Distributed and
Event-based Systems (DEBS’17), Barcelona, Spain. 19-23 June, 2017, (pp. 274-279).
doi:10.1145/3093742.3095099.
Rodriguez-Martin, M., Fueyo, J., Gonzalez-Aguilera, D., Madruga, F., Garcia-Martin, R., Munoz, A., &
Pisonero, J. (2020). Predictive models for the characterization of internal defects in additive materials
from active thermography sequences supported by machine learning methods. Sensors, 20(14), 3982,
doi:10.3390/s20143982.
Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33, 1-39,

doi:10.1007/s10462-009-9124-7.
Rostami, H., Blue, J., & Yugma, C. (2016). Equipment condition diagnosis and fault fingerprint
extraction in semiconductor manufacturing. In Proceedings of the 15th IEEE International
Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 18-20 December
2016, (pp. 534-539). doi:10.1109/ICMLA.2016.0094.
Rostami, H., Dantan, J., & Homri, L. (2015). Review of data mining applications for quality assessment
in manufacturing industry: Support Vector Machines. International Journal of Metrology and Quality
Engineering, 6, 1-18. doi:10.1051/ijmqe/2015023.
Odabasi, Ç. & Yildirim, R. (2020). Machine learning analysis on stability of perovskite solar cells. Solar
Energy Materials and Solar Cells, 205, Article Number 110284. doi:10.1016/j.solmat.2019.110284.
Sand, C., Kunz, S., Hubbert, H., & Franke, J. (2016). Towards an inline quick reaction system for
actuator manufacturing using data mining. In Proceedings of the 6th International Electric Drives
Production Conference (EDPC), Nuremberg, Germany, 30 November - 1 December 2016.
doi:10.1109/EDPC.2016.7851317.
Seyedaghaee, N., Rahati, S., Alinejad-Rokny, H., & Rouhi, F. (2013). An optimized model for the
university strategic planning. International Journal of Basic Sciences & Applied Research, 2(5), 500-
505.
Shao, S-Y., Sun, W-J., Yan, R-Q., Wang, P., & Gao, R.X. (2017). a deep learning approach for fault
diagnosis of induction motors in manufacturing. Chinese Journal of Mechanical Engineering, 30(6),
1347-1356. doi:10.1007/s10033-017-0189-y.
Shotorbani, P., Ameri, F., Kulvatunyou, B., & Ivezic, N. (2016). A hybrid method for manufacturing text
mining based on document clustering and topic modeling techniques. In Proceedings of the
International Conference on Advances in Production Management System (APMS 2016), (pp. 777-
786). doi:10.1007/978-3-319-51133-7_91.
Stanisavljevic, D., & Spitzer, M. (2016). A review of related work on machine learning in semiconductor
manufacturing and assembly lines. In Proceedings of the 16th International Conference on
Knowledge Technologies and Data Driven Business, Graz, Austria, 18-19 October 2016.
Susto, G., Terzi, M., & Beghi, A. (2017). Anomaly detection approaches for semiconductor
manufacturing. Procedia Manufacturing, 11, 2018-2024. doi:10.1016/j.promfg.2017.07.353.
Syafrudin, M., Alfian, G., Fitriyani, N., & Rhee, J. (2018). Performance analysis of IoT-based sensor, big
data processing, and machine learning model for real-time monitoring system in automotive
manufacturing. Sensors, 18(9), Article number 2946. doi:10.3390/s18092946.
Tian, Y., Fu, M., & Wu, F. (2015). Steel plates fault diagnosis on the basis of support vector machines.
Neurocomputing, 151(1), 296-303. doi:10.1016/j.neucom.2014.09.036.
Tootooni, M.S., Dsouza, A., Donovan, R., Rao, P.K., Kong, Z., & Borgesen, P. (2017). Classifying
the dimensional variation in additive manufactured parts from laser-scanned three-dimensional point
cloud data using machine learning approaches. Journal of Manufacturing Science and Engineering,
Transactions of the ASME, 139(9), Article number 091005. doi:10.1115/1.4036641.
Traini, E., Bruno, G., D’antonio, G., & Lombardi, F. (2019). Machine learning framework for predictive
maintenance in milling. IFAC-PapersOnline, 52(13), 177-182. doi:10.1016/j.ifacol.2019.11.172.
Wang, J., Wang, K., Wang, Y., Huang, Z., & Xue, R. (2018). Deep Boltzmann machine based condition
prediction for smart manufacturing. Journal of Ambient Intelligence and Humanized Computing, 21
April 2018, 1-11. doi:10.1007/s12652-018-0794-3.
Wang, K. (2007). Applying data mining to manufacturing: the nature and implications. Journal of
Intelligent Manufacturing, 18(4), 487-495. doi:10.1007/s10845-007-0053-5.
Wang, K. (2013). Towards zero-defect manufacturing (ZDM)—a data mining approach. Advances in
Manufacturing, 1(1), 62-74. doi:10.1007/s40436-013-0010-9.
Wang, K., Tong, S., & Eynard, B. (2007). Review on application of data mining in product design and
manufacturing. In Proceedings of the Fourth International Conference on Fuzzy Systems and
Knowledge Discovery (FSKD 2007), Haikou, China, 24-27 August 2007.
doi:10.1109/FSKD.2007.482.
Wang, Q., Jiao, W., Yu, R., Johnson, M., & Zhang, Y. (2019). Modeling of human Welders’ operations
in virtualş reality human-robot interaction. IEEE Robotics and automation letters, 4(3), 195775490.
doi:10.1109/LRA.2019.2921928.
Wang Y., Li K., & Gan, S. (2018). A kernel connectivity-based outlier factor algorithm for rare data
detection in a baking process. IFAC-PapersOnLine, 51(18), 1 January 2018, 297-302.
doi:10.1016/j.ifacol.2018.09.316.
Waschneck, B., Reichstaller, A., Belzner, L., Altenmuller, T., Bauernhansl, T., Knapp, A., & Kyek, A.
(2018). Deep reinforcement learning for semiconductor production scheduling. In Proceedings of the
29th Annual SEMI Advanced Semiconductor Manufacturing Conference, ASMC 2018, Saratoga
Springs, United States, 30 April-3 May, 2018, (pp. 301-306). doi:10.1109/ASMC.2018.8373191.
Weichert, D., Link, P., Stoll, A., Ruping, S., Ihlenfeldt, S., Wrobel, S. (2019). A review of machine
learning for the optimization of production processes. The International Journal of Advanced
Manufacturing Technology, 104, 1889-1902. doi: 10.1007/s00170-019-03988-5.
Wen, L., Gao, L., & Li, X. (2019). A new deep transfer learning based on sparse auto-encoder for fault
diagnosis. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 49(1), 136-144.
doi:10.1109/TSMC.2017.2754287.
Wu, C., Zhou, F., Tsai, C., Yu, C., & Dauzere-Peres, S. (2020). A deep learning approach for the
dynamic dispatching of unreliable machines in re-entrant producton systems. International Journal of
Production Research, 58(9), 2822-2840. doi:10.1080/00207543.2020.1727041.
Wuest, T., Irgens, C., & Thoben, K. (2014). An approach to monitoring quality in manufacturing using
supervised machine learning on product state data. Journal of Intelligent Manufacturing, 25(5), 1167-
1180. doi:10.1007/s10845-013-0761-y.
Wuest, T., Weimer, D., Irgens, C., & Thoben, K. (2016). Machine learning in manufacturing:
advantages, challenges, and applications. Production & Manufacturing Research, 4(1), 23-45.
doi:10.1080/21693277.2016.1192517.
Yasutomi, A. & Enoki, H. (2020). Localization of inspection device along belt conveyors with multiple
branches using deep neural networks. IEEE Robotics and Automation Letters, 5(2), 2921-2928.
doi:10.1109/LRA.2020.2974709.
Zhang, X., Kano, M., Tani, M., Mori, J., Ise, J., & Harada, K. (2020). Prediction and causal analysis of
defects in steel products: handling nonnegative and highly overdispersed count data. Control
Engineering Practice, 95, Article Number 104528. doi:10.1016/j.conengprac.2019.104258.
Zhang, Y., Ren, S., Liu, Y., & Si, S. (2017). A big data analytics architecture for cleaner manufacturing
and maintenance processes of complex products. Journal of Cleaner Production, 142(2), 626-641.
doi:10.1016/j.jclepro.2016.07.123.
Zhao, R., Yan, R., Chen, Z., Mao, K., Wang, P., & Gao, R. (2019). Deep learning and its applications
to machine health monitoring. Mechanical Systems and Signal Processing, 115, 213-237.
doi:10.1016/j.ymssp.2018.05.050.
Zhou, J., Li, X., Wang, M., Niu, R., & Xu, Q. (2017). Thinking process rules extraction for
manufacturing process design. Advances in Manufacturing, 5(4), 321-334. doi:10.1007/s40436-017-
0205-6.
Zidek, K., Maxim, V., Pitel, J., & Hosovsky, A. (2016). Embedded vision equipment of industrial robot
for inline detection of product errors by clustering–classification algorithms. International Journal of
Advanced Robotic Systems, 13(5), 1-10. doi:10.1177/1729881416664901.
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
Highlights
 Providing an extensive overview about the use of machine learning in manufacturing.

 Reviewing state-of-the-art studies relevant to data mining in manufacturing.
 Presenting manufacturing tasks grouped under supervised and unsupervised learning.
 Addressing a number of research questions that are unanswered in literature.
 Discussing benefits, challenges and possible further research directions in the area.

Machine Learning and Data Mining in Manufacturing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Machine Learning and Data Mining in Manufacturing

Uploaded by

Copyright:

Available Formats

Journal Pre-proofs

Machine Learning and Data Mining in Manufacturing

Alican Dogan, Derya Birant

To appear in: Expert Systems with Applications

Received Date: 3 February 2019

© 2020 Published by Elsevier Ltd.

Name: Alican DOGAN

Name: Derya BIRANT

Affiliation: Dokuz Eylul University, Department of Computer Engineering, Izmir, TURKEY

Keywords - Machine learning, data mining, manufacturing, classification, clustering.

The following research questions (RQ) are addressed in this paper:

("manufacturing") AND ("machine learning" OR "data mining" OR "supervised learning" OR

(b) Google Scholar

(c) Web of Science

 AND ("quality control" OR "quality prediction" OR "quality assurance" OR "quality management" OR

ML and DM Studies Grouped Under Manufacturing Tasks

Scheduling Monitoring Failure Quality

(b) Web of Science

Sales Production Monitoring Quality

Failure / Fault Defect

Figure 3. ML and DM studies grouped under manufacturing tasks.

2 Background image: http://www.aurinkapv.com/

Data Supervised Learning

4. MACHINE LEARNING APPLICATIONS IN MANUFACTURING

4.1. Supervised Learning in Manufacturing

Modeling deviations in the manufactured Product shapes RMSE = 1.87

Forero-Ramirez 2019 √ Defect detection in manufacturing process Thermal data ACC=98.4% √

Kim et al. 2016 √ Fault detection prediction SECOM ACC= 0.887 √

Munirathinam and F-measure

Quality prediction in multi-stage ACC=

Cycle-time prediction in semiconductor ACC=

Kerdprasop and F-measure

4.2.1. Clustering and Anomaly Detection in Manufacturing

Jin et al. 2019 √ Defect pattern detection WM-811K data √

Cluster analysis in support of purchases of manufacturing

Application of several data mining techniques in Product orders in commercial sales

Anomaly detection to improve safety and maintenance

GMM, Manufacturing, inspection, after-

Task Authors Year Short Explanation Dataset Parameters

Finding rules between

Text mining based online

Association Product data

Extracting rules for Machining

Online monitoring of Yarn production MinSup 0.06

Extraction of rules related Wooden door

Extraction of defect Garment MinSup 0.25

Yield analysis to identify

4.3.1. Ensemble Learning in Manufacturing

Task Algorithm Subject

Online fault diagnosis Fault ACC =

Material removal Inconel 718 MAPE=

Scheduling jobs in Data from four

Ball screw and

Electrically failure Through- ACC >

SVM, NN, Predicting of

Syafrudin NB, LR, Monitoring in IoT-based ACC=

Recognition of steel- ACC=

DT, KNN, Quality prediction

Quality prediction and

HMM, A monitoring system Acoustic

Table 5. List of deep learning studies applied in manufacturing.

Arellano- Fault detection in Electrical motor

Sparse auto-encoder for fault Motor bearing

Position estimation from