You are on page 1of 17

Computers in Industry 129 (2021) 103451

Contents lists available at ScienceDirect

Computers in Industry
journal homepage: www.elsevier.com/locate/compind

Data-driven failure mode and effect analysis (FMEA) to enhance


maintenance planning
Marc-André Filz ∗ , Jonas Ernst Bernhard Langner, Christoph Herrmann, Sebastian Thiede
TU Braunschweig, Institute of Machine Tools and Production Technology, TU Braunschweig Institut für Werkzeugmaschinen und Fertigungstechnik, Langer
Kamp 19 b, 38106, Braunschweig, Germany

a r t i c l e i n f o a b s t r a c t

Article history: Nowadays, the availability of data from the manufacturing environment, such as process and operation
Received 12 November 2020 related data or past maintenance activities enable new possibilities for advanced data analytics like
Received in revised form 11 February 2021 prediction of failure behavior. Possible predictions could consider faults of specific components or even
Accepted 22 March 2021
the current product and component properties.
Available online 3 April 2021
The paper presents a data-driven Failure Mode and Effect Analysis (FMEA) methodology by using deep
learning models on historical and operational data from the use stage of industrial investment goods.
Keywords:
The developed methodology is supposed to support the maintenance planning for industrial investment
Failure mode and effect analysis (FMEA)
Maintenance
goods by enhancing transparency and providing decision support.
Fault prediction The developed framework is applied to and validated by a case study from the aviation sector. The
Decision support results show that the accuracy of the fault prediction is around 95 %. By integrating these results into a
Data analytics data-driven FMEA framework, risk and failure occurrence estimations are no longer subjective. Especially
Deep learning the estimation of failure probabilities no longer solely depends on the experience and knowledge from
Artificial intelligence employees.
© 2021 Elsevier B.V. All rights reserved.

1. Introduction between machines and workpieces. Within these procedures, huge


amount of data can be acquired. This gathered data usually comes
Maintenance processes have a large share on total cost over the from different sources (Kagermann et al., 2013; Kang et al., 2016;
life cycle of industrial equipment. Studies underline that energy Monostori, 2014). To utilize these data is of high importance for the
and maintenance are typically major cost drivers over an indus- industry in order to stay competitive. From the point of research,
trial product life cycle. Therefore, it is a major task for production combining and analyzing the data to derive usable knowledge and
companies to minimize maintenance costs and optimise related decision support is a major goal (Kang et al., 2016; Monostori,
processes like spare part supply. Next to this economic perspec- 2014; Thiede, 2018). The acquired data from the physical world
tive, maintenance processes also have an impact on environmental can be modelled by using data mining methods. These methods
variables over the life cycle of a product or equipment. This is espe- are specialized to access knowledge hidden in a vast amount of
cially important when logistical or operational processes are also data (Fayyad et al., 1996).
included in addition to the basic maintenance activities (Herrmann By increasing digitalization and the associated expanded
et al., 2007a; Hinow and Mevissen, 2011; Xie et al., 2018; Herrmann amount of data as well as derived knowledge, different chal-
et al., 2007b). lenges within the field of maintenance can be overcome. Value loss
Digitalization is a major technological development trend in through wear and unused equipment can be reduced through data
industrial applications, often summarized under the term Indus- acquisition by monitoring of equipment status and associated data
try 4.0. The core of this concept is the communication between analytics. In addition, the behavior of components and parts can
machines among each other, between machines and humans or be predicted by means of modeling and simulation. Here, a lack
of transparency regarding machine and equipment states can be
overcome and a basis for improving maintenance planning can be
given.
∗ Corresponding author.
Furthermore, the improvement of maintenance planning can be
E-mail addresses: m.filz@tu-braunschweig.de (M.-A. Filz),
jonas.langner@tu-braunschweig.de (J.E.B. Langner),
supported by component specific prediction of failure behaviour.
c.herrmann@tu-bbraunschweig.de (C. Herrmann), s.thiede@utwente.nl (S. Thiede). Thus spare part supply and storage processes as well as mainte-

https://doi.org/10.1016/j.compind.2021.103451
0166-3615/© 2021 Elsevier B.V. All rights reserved.
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 1. Framework of Maintenance Terminology based on (DIN, 31051; Wang, 2002; Takata et al., 2004; Herrmann et al., 2007a).

nance scheduling of industrial investment goods can be optimized. models to derive optimal strategies under certain circumstances
In addition, decision support can be made available to workers with (Herrmann et al., 2007a; Wang, 2002; Nilsson and Bertling, 2007).
different training levels to ensure consistent process and quality To create a common understanding of the term maintenance, a hier-
standards. This can also be used to optimize the planning of per- archical framework of maintenance terminology is presented in
sonnel resources. Fig. 1 based on literature findings (DIN, 31051; Wang, 2002; Takata
Therefore, a data based FMEA framework to enhance mainte- et al., 2004).
nance planning for industrial investment goods such as production The hierarchical framework divides the maintenance terminol-
equipment or airplanes has been developed and validated. By ogy into four different layers: concepts, strategies, activities and
using historical and operational data as a source of knowledge, operations. The maintenance activity (e.g. of an industrial invest-
data analytics tools are used to predict component specific failure ment good) is a specific operation that can be differentiated into
probabilities. The results are integrated into a FMEA methodol- service, inspection and repair. Different operations are assigned to
ogy to ensure a dynamic risk evaluation of specific components each of these activities. For example, the activity repair includes
and higher-level modules. Moreover, this framework is intended to the operations replace or restore (Herrmann et al., 2007b; Takata
increase the accuracy of maintenance planning by reducing related et al., 2004; Wang, 2002).
uncertainties regarding specific failure probabilities in order to On the next higher layer, those activities can be assigned to
achieve an economic and environmental improvement of main- different maintenance strategies. Basically three different mainte-
tenance activities. nance strategies exist within this layer that can be classified into
the two main types of corrective and preventive strategies. These
2. Theoretical background of maintenance and fault strategies have been grown evolutionary over time. In case of the
prognosis corrective strategy, maintenance is conducted in case of a break-
down and only repair activities are performed. Since this strategy
With the focus on industrial investment goods such as produc- leads to a high risk of failure costs due to unplanned production
tion equipment or airplanes, different methodological approaches stops, manufacturing companies pursue preventive maintenance
for a fault prognosis related to maintenance activities exist. Tradi- strategies. These can be divided into condition and time based
tionally, these are mainly performed manually and are integrated in strategies. Time based strategies are triggered by the age of the
a higher-level risk or quality assessment procedure. Mostly, these components or parts or by fixed time intervals. However, condi-
approaches are used for condition-based maintenance and prog- tion based strategies depend on predefined thresholds of certain
nostics on component health management. Due to the ongoing parameters like deterioration symptoms. Counted repairs or fixed
digitalization, data from different sources such as sensors can be time limits trigger activities like service or inspection (Herrmann
used for fault prognosis in the context of maintenance processes. et al., 2007b; Takata et al., 2004; Wang, 2002; Sheut and Krajewski,
1994).
2.1. Maintenance terminology In reality, a mixture of diverse maintenance activities can be
observed and are also discussed in literature. For systems with
Maintenance has been an important topic in machinery and multiple components or parts, the question whether to handle
equipment research for several years. Besides standards and certain components or parts isolated or use superordinate measurements
configuration alternatives, there are several studies and calculation to handle the system as a whole arise. Moreover, to define the

2
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

After the function of the system has been described, various


possible failure modes are determined in step three. The concrete
failure analysis is carried out in this process step. Based on these
results, the risk of each failure mode is quantified by evaluating the
probability of occurrence (O), the probability of detection (D) and
the severity (S) of the fault sequence on a scale from 1 to 10 within
the fourth step.
A risk priority number (RPN) is calculated for each failure mode
in step 5.
RPN = Occurence × Severity × Detection = O × S × D (1)
A high value for S indicates a high effect of the fault/error on
the targeted system, a high value of D indicates a small probability
to find the fault/error and a high value of O indicates high prob-
ability that the fault/error occurs. On the basis of this evaluation,
various measures are defined to prevent faults or, in the worst case,
breakdowns. (Wälder and Wälder, 2017; Mohanty, 2014).
The RPN helps to find the highest risks and recommend actions
(step 6). Therefore, the failure modes are prioritized by rank-
ing them based on the RPN value in decreasing order. Especially
Fig. 2. Main steps of FMEA (based on (Liu, 2016)).
activities for high-risk failure modes should be developed and
recommended to enhance the overall system performance. These
actions can be divided into three categories: “eliminating failures
specific maintenance degree of executing activities is of great modes”, “increasing failure detectability” and “minimize loss in
importance since this has direct influence on the success of the case that a failure occurs” (Liu, 2016).
maintenance activity in terms of cost and availability of the Although FMEA is widely used in industry, the method has major
equipment. Therefore, on the highest layer, different concepts shortcomings. The traditional method is based on equally weighted
aggregate the above-mentioned strategies, activities and opera- factors for the three indicators probability, severity and detectabil-
tions. This includes concepts like Reliability Centered Maintenance ity even though in practical use cases an equilibrium of importance
(RCM), Risk Based Maintenance (RBM) or Total Productive Mainte- is barely applicable. In addition, different combinations of scores
nance (TPM). These concepts consider overarching trade-offs and of the three indicators may lead to the same risk priority number.
influences and, in order to achieve certain goals, add different main- However, the risk impact can be completely different from case to
tenance strategies (Herrmann et al., 2007b; Takata et al., 2004; Gits, case. In the worst case, this can lead to unnoticed high risk failure
1992). modes (Chang and Cheng, 2010; Pillay and Wang, 2003; Vahdani
RCM focusses on systematic consideration of system function et al., 2015).
and failure mode by optimizing preventive maintenance activities Due to the methodological approach of the FMEA, the three
based on reliability related criteria. By the success in cost reduc- risk factors are determined by selected team members. Therefore,
tion and availability improvement in the context of aviation, this information of the FMEA are often vague or uncertain. Instead of
concept is transferred to other industries (Guo et al., 2016; Dekker, quantification, the risk factors can often be expressed linguisti-
1996). However, TPM is a production driven concept that strives cally as “very high”, “important” or “likely”. Furthermore, the FMEA
to improve equipment reliability and the efficient management of is conducted by “experts” and hence is based on subjectivity and
plant assets and equipment (Robinson and Ginder, 2007; Ahuja and incompleteness (Liu, 2016).
Khamba, 2008). If the risk factors are analyzed more closely, it can be seen
that the direct and indirect relationships between failure modes
and different causes of failure are not taken into account (Liu,
2.2. Failure mode and effect analysis 2016). Furthermore, the FMEA considers three risk factors within
the RPN. However, for the planning and the efficient operation of
The FMEA is a widely used and recognized method in the indus- industrial investment goods, the prediction of the individual failure
try to determine defect conditions of components, parts or larger occurrence probability is particularly crucial. Therefore, existing
investment goods preventively. FMEA is often used during product approaches to determine the occurrence have a significant disad-
design and development phases and aims at preventing errors and vantage in that they are qualitative and difficult to track or verify.
failures already during the development stage. This can increase
the reliability of individual components or even an entire system 2.3. State of art and research for FMEA and maintenance activities
(Mohanty, 2014; Linß, 2016).
However, through the application of FMEA during the opera- In literature, different methods have been proposed to enhance
tional phase, faults are systematically detected and their effects maintenance processes for industrial investment goods. Roy et al.
can be carried out in order to be able to initiate targeted preven- (2016) present foundations and technologies required to offer a
tive or corrective measurements, e.g. in the context of maintenance continuous maintenance service of high value products such as
(Geiger and Kotte, 2008). Fig. 2 shows the main steps of the FMEA machine tools, aircraft engines or trains. The paper covers technolo-
process. gies that are relevant for components and the whole system level.
First, a structural analysis of the system is carried out to deter- Moreover, approaches for workshop-based and “in-situ” mainte-
mine the scope of the FMEA. These results contain the structure nance of large industrial goods (e.g. gas turbines) are included.
and limits of the system. Based on this, a functional analysis is per- Moreover, the paper excludes a focus on short life cycle products
formed. Moreover, within the second step, a cross-functional team such as consumer goods. It concludes with a summary of a key
is built of subject matter experts from various disciplines since the technology for maintenance such as Industry 4.0 or big data. Major
FMEA cannot be done by one person alone. challenges faced to achieve change in availability of long life equip-

3
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

ment with optimum through-life cost are discussed (Roy et al., power profiles and tool wear images are integrated as data basis.
2016). The methodology firstly analyses wear images by a convolutional
Chen (2013) introduces an autonomous preventive mainte- neural network. The results are subsequently integrated into a
nance approach that integrates FMEA techniques with root cause recurrent neural network to deduce the relationship between tool
analysis to improve manufacturing and equipment efficiency as condition degradation and power profiles. The developed tool is
well as support workers with decision support. The method is applied and validated within a milling use case with different mate-
carried out manually by teams of experts and are applied to and rials (Wang et al., 2019).
verified by means of a use case within the semiconductor industry Ferreira et al. (2016) discuss the advantages and benefits of
(Chen, 2013). predictive maintenance that can be achieved through the use of
Sharma and Sharma (2010) present an approach for system fail- the Industry 4.0 concept and by focussing on remote monitor-
ure behaviour analysis and maintenance decision making using ing and self-diagnosis of health condition for the equipment. The
root cause analysis, FMEA and fuzzy methodology. They provide an main focus emphasises the data acquisition and analysis process
integrated framework to model, analyse and predict the behaviour for predictive maintenance algorithm development. The concept
of industrial systems in a more realistic and consistent manner is evaluated based on a frequency analysis of current signals in a
and to plan suitable maintenance strategies (Sharma and Sharma, machine spindle (Ferreira et al., 2016).
2010). With focus on existing data sources, Villarini et al. (2017) intro-
Tracht et al. (2013) propose an approach for failure probability duce a new assessment of reliability centred maintenance carried
prediction for spare parts supply based on condition monitoring out by using a failure mode and effect analysis approach to photo-
data. Moreover, it is stated that condition monitoring systems are voltaic systems. Therefore, large data amounts are used from real
installed in many companies but information are barely used to maintenance activities to derive a more realistic analysis and infor-
predict the failure probability of machines and single units. There- mation. The data is interpreted and analysed by a team of experts
fore, they developed an enhanced prediction model that considers from different divisions, such as maintenance, engineering or oper-
several data sources such as SCADA data. The model itself is based ation. The aim of this approach is to improve maintenance actions
on a proportional hazards model to predict the spare part demand and to optimize their effectiveness by concentrating on selected
to prevent long lead times of spare parts to decrease the necessity failure modes that mostly affect the system (Villarini et al., 2017).
for stock keeping (Tracht et al., 2013). Liu et al. (2016) developed a new risk priority model that uses
In the publication of Lee et al. (2008), the evolution of CAD, an extended qualitative flexible multiple criteria method for han-
CAM and CAE tools through product data management systems into dling FMEA problems with incomplete weight information. This
product life cycle management (PLM) is analysed. Moreover, based addresses various uncertainties in the assessment information of
on two case studies, current practises and potential applications FMEA team participants. The model is applied to and validated by a
of PLM in aviation maintenance, repair and overhaul are discussed case study conducted on a healthcare risk analysis (Liu et al., 2016).
(Lee et al., 2008). Yazdi et al. (2020) extend the classical FMEA approach by con-
Yu et al. (2013) introduce a preventive maintenance cycle opti- sidering decision making within a group under fuzzy environment
mization model for the metro vehicle based on a generic algorithm that is sensitive to all inputs including linguistic variables or impor-
method. To obtain optimal maintenance cycles, statistically ana- tance weights of experts. This approach is called fuzzy developed
lysed fault data and the Heinrich safety rule are integrated. The FMEA and is applied to an aircraft landing system (Yazdi et al.,
model can be used for decision support for maintenance depart- 2020).
ments (Yu et al., 2013). Yang et al. (2013) propose a framework for data-mining based
With focus on specific machine tool components, Lanza et al. fault isolation with FMEA rank. Therefore, they use updated FMEA
(2009) introduce an approach for dynamic preventive maintenance parameters to rank data driven models and applying it on a case
schedule optimization. Therefore, a stochastic model based on the study from APU by focusing on a predefined failure mode. The
Weibull cumulative damage generalized log-linear model and the results show that the method is effective for fault isolation and
Monte Carlo simulation is suggested. Furthermore, the approach identification in complex systems maintenance.
considers the effect of the dynamic optimization on component
selection for different maintenance strategies. The authors state 2.4. Research demand
that the ideal strategy can be selected from corrective, preven-
tive and condition-based maintenance for every component of a To get a comprehensive overview on the reviewed papers about
complex system (Lanza et al., 2009). the state of art and research with regard to maintenance activities
Chukwuekwe et al. (2016) present a conceptual framework and FMEA in general, a matrix is used to assess and document the
using data mining and smart algorithms for a closed loop feedback scope of each paper. This matrix (see Table 1) does not only help
data driven predictive maintenance system for implementation to classify each paper, it does also help to identify the research gap
within an Industry 4.0 environment. The authors expect, based on and therefore to justify the need of this research. The matrix covers
literature findings and the developed concept, that it has the poten- an overall of four areas which are the scope, the application field,
tial to reduce maintenance frequency and increase safety around the characteristics of methodology and the modelling approach.
production equipment (Chukwuekwe et al., 2016). Within the scope, each paper is assessed by checking if the paper
Bukkapatnam et al. (2019) provide a long-term prognosis of focusses on maintenance planning, fault prognosis and if a struc-
machine breakdowns based on a manufacturing system-wide non- tured approach that can be applied generically was implemented.
parametric machine learning approach that can model complex Within the second area the paper is assessed by its application field,
dynamic dependencies. For this purpose, a Balanced Random Sur- which means if the paper focusses on fault prediction or FMEA
vival Forest algorithm is used and trained with data from shop on component level, part level or on industrial investment goods.
floor automation and information systems. The approach is applied These criteria are used to ensure applicability to various aspects in
and validated within an automotive manufacturing use case of 20 the context of maintenance. The third area covers the characteris-
machines (Bukkapatnam et al., 2019). tics of the used methodology. Here, it is checked if the described
Wang et al. (2019) present a novel hybrid machine learning approaches in the paper considered expert knowledge, operational
tool that uses heterogeneous production data for tool condition data and if some kinds of decision support tool are developed
prognosis. Therefore, structured process parameters, unstructured and integrated in the respective results. Moreover, each paper is

4
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Table 1
Overview of literature review.

checked against the used modelling approach. Here, it is checked if (CPPS). Basically, CPPS can be defined as “systems of collaborating
any kind of data analytics, a standard FMEA or a hybrid modelling computational entities which are in intensive connection with the
approach was used. In this context, a hybrid is classified by a stan- surrounding physical world and its on-going processes, providing
dard FMEA, which used data analytics to calculate at least one of and using, at the same time, data-accessing and data-processing
the needed FMEA parameters based on existing operational data. services” (Kang et al., 2016).
Table 1 gives a graphical overview of the assessed publications. The framework is divided into four areas which are: physical
It is distinguished between completely, partially and not fulfilled world (I), data acquisition (II), cyber world (III) and decision sup-
criteria. port (IV). Since the process is repeated continuously, the latest
On the one hand, the literature review shows that FMEA is a operational data is always used to optimize the fault prediction
widely used risk assessment tool for industrial applications. Espe- algorithm and therefore the data driven FMEA. The physical world
cially for industrial investment goods with high values and long (I) represents actual maintenance activities and KPIs of consid-
lives, it is an important preventive management approach to iden- ered technical equipment like machines. The data acquisition (II)
tify potential faults already during engineering and the planning acquires operational data from the observed components or parts
phase or prevent them to happen during operation to reduce the and performs a data preprocessing. Within the cyber world (III),
overall life cycle costs. However, the traditional RPN calculation the preprocessed data is used for fault prediction and based on the
has some major shortcomings that have been discussed extensively assumption that a data driven FMEA is performed. The decision sup-
in literature. Especially when it comes to the planning and opera- port (IV) provides recommendations for maintenance actions based
tion of efficient maintenance activities, dealing with uncertainties on the data driven FMEA. To standardize the data-driven FMEA
is indispensable and possible faults need to be predicted and iden- the “Cross Industry Standard Process for Data Mining” (CRISP-DM)
tified. On the other hand, the literature review illustrates that due is used. Therefore, the following chapters are aligned with each
to the technological development, increased amounts of data from process step of the CRISP-DM (see Fig. 3).
the use stage of products can be acquired and stored. This enables
new possibilities like advanced data analytics applications. Never-
theless, the combination and analysis of this data to derive usable 3.1. Business understanding
knowledge is a major challenge.
However, there is a significant research gap in combining histor- An essential step within the calculation process of FMEA risk pri-
ical operating data from the use stage with a structured procedure ority numbers is the selection of appropriate values for S, O and D.
for fault prognosis combined with FMEA in order to provide tar- From a business perspective, new data analytics approaches could
geted, comprehensible and feasible data driven decision support be used to select values for the occurrence (O) based on data, which
that can be used for dynamic maintenance planning of high value will help to make the results of the FMEA more reliable and less
industrial investment goods such as production equipment and dependent on individuals and their subjective assessment.
airplanes. With the proposed framework, the powerful FMEA method
and new data analytics techniques should be combined. To use
3. Development of a data-driven FMEA methodology data analytics in this context, some data, regarding the compo-
nents/products and their use/exposure to harmful environmental
With reference to the defined research gap, a data-driven FMEA influences, need to be accessible or acquired. If enough past main-
methodology to support the planning of maintenance activities tenance data is available, an algorithm can be trained to predict
is developed. Fig. 3 shows the data driven FMEA methodology the occurrence of faults, which then can be used as input for the
based on the framework of a cyber-physical production system calculation of the RPN.

5
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 3. Framework of data-driven FMEA methodology.

To address targeted maintenance activities, the scope of the Besides the output data, input data is needed to represent
data-driven FMEA is determined based on a structural analysis of the various influences that can cause the failure modes. Input
the system. Here, the structure and limits of the considered sys- parameters can be chosen from different data sources. These
tem are carried out. Moreover, within the business understanding parameters have in common that they are generated during the
step, a cross-functional team is built of subject matter experts from operational phase. Therefore, relevant parameters and variables
various disciplines to conduct the FMEA. Since the assessment of are design and control parameters (e.g. machine characteristics
occurrence (O) is data-driven, only the severity (S) and the prob- such as age of component or useful life), state variables (e.g.
ability of detection (D) need to be determined by the FMEA team power demand), performance criteria (e.g. process time) or inter-
(see Fig. 6). nal/external influencing factors (e.g. environmental conditions).
KPIs in terms of possible failure modes are defined based on the The input parameters will later be used to predict a failure, which
FMEA scope by the expert team. These KPI are the basis for later makes it essential to select parameters which influence the occur-
maintenance activities since the methodology is based on these rence of faults most likely. Therefore, a correlation analysis between
failure modes. the input parameters and the damages (output data) should be car-
Finally, this framework is a helpful guide to implement a data ried out to reveal which input parameters are important for each
driven FMEA to any maintenance related business. The results can failure mode.
be used to coordinate resources, e.g. man-power or spare parts, and Moreover, the second part of the procedure focusses on the
as decision support to implement specific maintenance activities processing and preparation of the raw databases, so that they
like service, inspection or repair. can be used by appropriate algorithms to generate the prediction
model. As a first step, data samples with missing values have huge
influence on the accuracy of fault predictions and data analysis
3.2. Data understanding and preparation in general, since it is difficult to reveal correlations and to find
patterns within these data (Hand, 2007). Missing values are often
Fig. 4 shows the developed procedure for the data acquisition replaced by “NaN” or “NULL”, which cannot be processed by compu-
and preprocessing to generate a database for the fault prediction tational models. Therefore, a process to handle these data samples
model. The procedure is divided into the two parts “data acquisi- is needed. In literature, two approaches are proposed. The first one
tion and database development” as well as “data processing and is to delete the whole data sample containing missing values. Since
preparation”. this approach also deletes values which could contain important
To generate the raw database during the data acquisition and data, this approach is not selected within this framework. As an
database development step, the most important failure modes, alternative, the missing values can be replaced by the mean value
such as damages, are selected. These failure modes are more impor- of the entire column feature (Raschka and Mirjalili, 2017). This
tant than others since their occurrence has high influence on the will ensure that no important data is deleted and not used for the
overall performance of the respective technical equipment. There- generation of the prediction model.
fore, they are selected as desired output data of the prediction tool. In addition, different input parameters can be available in a dif-
With regard to a practical example, these output data could be ferent range of values and can cause problems for the algorithm.
“highest maintenance/repair cost” or “most time consuming repair Therefore, the dataset is normalised using the min-max-method
process”. due to its usability and good results. The values of the dataset are

6
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 4. Data acquisition and preprocessing pipeline.

normalised in a range of 0–1 by comparing the value to the min- between the variables, and the algorithms need to be trained for
imum and maximum value of the entire column feature (Raschka that specific correlation given in the data. There is no set of hyper
and Mirjalili, 2017). parameters which fits all data perfectly.
Additionally, the output data needs to be processed before train- The fault prediction is carried out for each failure mode sep-
ing the prediction algorithm. A fault prediction can be seen as a arately. The predicted fault probabilities for a respective failure
binary classification task (0 = no fault, 1 = fault). All output data mode are then converted to values between 1 and 10 (e.g. 84 %
must be set to either 0 or 1, which indicates if a specific failure = 8 or 27 % = 3) to determine the occurrence value for the FMEA.
mode occurred or not. Moreover, the FMEA team determines the values for severity and
As a final step of the preprocessing, the database is split into detection based on their knowledge and experience.
training, test and validation data. Only the training- and test dataset The calculation of the RPN and the respective input sources are
are used to train, test and generate the prediction model. The val- shown in Fig. 6.
idation datasets will later be used to validate the final prediction The RPN, with values between 1 and 1000 is calculated by
model. Since the model has never seen the data samples of the multiplying the values (1–10) for severity (S), occurrence (O) and
validation dataset, a reliable validation of the results is possible. A detection (D) for each failure mode of the corresponding compo-
possible rule to divide the database is the 70 : 20 : 10 rule. Around nents and parts (see chapter 2.2). Moreover, the total RPN of a whole
70 % of the data samples are used to train the model, 20 % are used system or inspected area (e.g. engine) can also be calculated by
to test the model and 10 % are used for validation of the final model adding all individual RPN.
(Raschka and Mirjalili, 2017).
If the available dataset is small, it is possible to use data aug- 3.4. Evaluation and deployment
mentation to generate more data samples. This method uses the
original dataset to generate new, slightly noisy datasets. Especially With help of the data-driven FMEA framework, the calculated
when the available dataset is small, this technique cannot only be RPN can be used as decision support for recommendations related
used to generate a sufficient amount of training data but also to to maintenance activities as well as during the definition of main-
increase the robustness of the model (Shanmugamani, 2018). tenance strategies. Therefore, the failure modes can be prioritized
by ranking their RPN in a descending order.
3.3. Modeling In addition, a user centric visualization of FMEA results is neces-
sary due to the complexity of the data-driven FMEA methodology
Since use cases of various companies are based on different and the requirements regarding the applicability for different per-
datasets, there are no perfect settings of the algorithm parameters sons like colleagues with different experience levels and skills as
which fit all use cases best. Therefore, this framework proposes a well as user groups like managers, engineers or workers on shop
general process, which can be applied to different use cases, in order floor level.
to optimise the accuracy of the model. Fig. 7 shows the visualization of RPN for different failures for
One main task of the data-driven FMEA framework is the pre- two different areas. Here, the left wing and the tail of an airplane
diction of fault probabilities. For this task, preprocessed operational are exemplarily chosen as areas.
data (see chapter 3.2) from the observed components and parts of The different RPN values for failure modes A to D are printed out.
the technical equipment are used. Since the parameter and data In addition, in order to provide decision support and an initial inter-
types may vary widely, the choice of the data analytics model pretation of the results, the individual RPN values are highlighted in
strongly depends on these characteristics. colour according to the principle of a heat map. This allows a quick
Fig. 5 gives an overview of possible modelling approaches based and easy interpretation of the results, e.g. to derive further main-
on existing data and parameter structure. Here, three differen- tenance activities like condition based inspection for particularly
tiations in “statistical analysis of one (or few) variables”, “deep critical failure modes.
analysis of one or few variables” and “interdependencies between Based on these results, recommendations for high-risk failure
many variables” are based on the present data structure. Moreover, modes are developed. In this context, especially condition based
potential modelling approaches are given. maintenance strategies can be developed. For example, if the RPN
The data analytics modelling approaches shown in Fig. 5 are of a failure mode exceeds a specific limit, appropriate measures
intended to provide orientation for the selection of possible data can be derived. Here, for example, an inspection can be scheduled
analytics models. For instance, in case of many variables that and corresponding resources like machines, materials as well as
are independent, artificial neural networks (ANN) or multi-variate workers be provided.
regressions can be used. In this context, the performance of the By the integration of data from the operational phase of the
prediction results also strongly depends on the optimization of the inspected systems, the automation is increased and less subjec-
specific hyper parameters, since each data has unique correlations tive and less experience based. Therefore, even unexperienced

7
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 5. Overview of possible modelling approaches based on data structure.

Fig. 6. Calculation of Risk Priority Number.

Fig. 7. Visualization of RPN for different Failure Modes and Areas.

employees can detect and analyse failure modes as well as plan torical maintenance data to build a database to train a prediction
necessary maintenance activities more efficiently and accurately. algorithm. The effect of bad environmental influences on the com-
Moreover, by using operational data, the comparability between ponent condition could be higher maintenance costs, higher failure
FMEA results is increased and the accuracy of strategies and mea- frequency and intensity or also which type of fault appears.
sures is improved. The data-driven fault prediction tool is then used to support the
data driven FMEA. Therefore, several steps are necessary to enable
4. Application of data-driven FMEA for maintenance the data driven FMEA. First of all, the data acquisition and pre-
planning in the aviation industry processing is performed based on the specific use case to generate
the necessary basis to analyze and optimize the model parameter.
4.1. Business understanding Finally, the fault prediction and FMEA risk evaluation is applied for
previous selected failure modes. A team of managers, engineers and
The previously developed data-driven FMEA framework was shop floor workers only need to determine the severity (S) and the
applied to a use case within the aviation industry. Therefore, the probability of detection (D) for each failure mode to be able to cal-
focus of the use case is on the data driven FMEA of individual air- culate the RPN (see Fig. 6). A failure mode is always linked with an
craft equipment to improve and support maintenance as well as area in which it occurs. An area is a specific region of the technical
repair planning processes. Overall, this will also lead to an improved equipment of an aircraft like the left wing or the elevator rudder
reliability of the aircrafts. (see Fig. 7).
Within the airline industry, a dependency between the envi- With the provided RPN for each failure mode a more focused
ronmental influences on an airplane and the condition of its use of resources (e.g. manpower) and selection of maintenance
components is assumed. Different weather situations in which an activities like service, inspection or repair can be carried out.
aircraft is flying and the flight profiles can be merged with his-

8
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 8. Input Data.

4.2. Data understanding and preparation works best with normalised datasets. Two normalisation methods
are proposed in the literature. This framework uses the min-max-
The selection of the correct input parameters is an important method, because it is easy to apply and showed good results in other
task, since they are used to predict faults and influence the achieved examples. The min-max-method means that all values are nor-
fault prediction accuracy. Fig. 8 gives an overview on the used input malised in a range of 0–1 by comparing the value to the minimum
data. and maximum value of the entire column feature.
Environmental influences such as different weather conditions, The following Fig. 9 shows an example of how the input param-
not merely effect the damage types but also the damage frequen- eters I1 to I3 of the raw database are processed.
cies (Verhagen and Boer de, 2018). The available data of weather After these process steps are finished, the final database is pro-
conditions like humidity, temperature and sand concentration are cessed and prepared to be used to train the prediction algorithm.
very important factors for damages based on the experience of the However, the training process must not contain the test- and vali-
people working in the maintenance department. Therefore, these dation datasets, since then a reliable validation of the model would
data will be considered within this case study and related to data not be possible. Due to this, the database is split in a training-, a
from past maintenance and repair events. test- and a validation-dataset.
The major causes of aviation accidents and incidents are envi-
ronmental influences. This can be justified as harsh weather
4.3. Modeling
conditions can physically damage the airplane and increase the risk
of pilot errors (Kushwaha and Sharma, 2014). Predictions of pilot
Before the modeling of the algorithms can be carried out, a fea-
errors are difficult. Due to the pilots’ individual experience they will
ture selection for each area and failure mode needs to be done to
not react in the same way. The physical damages of the airplane,
only select the features which help to predict that specific failure
on the other hand, can be predicted. Each component will react in a
mode. Therefore, a correlation analysis between the input param-
similar way to specific weather conditions. This is the reason why
eters and the damages (output data) is carried out to reveal which
this case study only focuses on the prediction of physical damages
input parameters are important for each failure mode.
caused by environmental influences.
This ensures that only the input parameters with the highest
Various maintenance related data sources are linked to each
correlation are used as input parameters to train the algorithms.
other to build a comprehensive database within this case study.
In this case study, only input parameters with correlations higher
Specifically, information about the environmental influences on the
than 0.05 are selected to be used to train the algorithm.
airplane as well as information about its flight profiles and usage
In a next step, the selection and optimization of the prediction
are combined with specific maintenance/shop findings. The result
model is carried out for each area and failure mode. First, several
is a separate database for each damage area and failure mode.
prediction models are compared until one is selected and optimized
Since different prediction algorithm require different prepro-
along the six steps described in Fig. 10.
cessed data formats as input to perform best, it is necessary to
decide first which algorithm is used for the final fault prediction. To
select the prediction algorithm, different prediction models were 4.3.1. Comparison of prediction models - algorithm selection
tested with a database, which includes all input parameters and With the generated database, which includes input data like
outputs (failure modes). flight profiles and weather data as well as output data like informa-
tion about shop findings (failure modes), a first analysis is used to
4.2.1. Data processing and preparation select the most suitable prediction model for this specific data and
The second step of the data structure and database development use case. The following seven prediction models were compared
framework (shown in Fig. 4) includes the preprocessing tasks and with basic model settings to predict if a failure occurred or not:
the splitting into a test, a train and a validation database to be used
by the deep learning algorithm in order to generate the prediction • Naive Bayes
model. • Generalized Linear Model
The following four main tasks are carried out in this part • Logistic Regression
(Raschka and Mirjalili, 2017): • Deep Learning
• Decision Tree
• Replace missing values with mean value of column • Random Forest
• Set output data to binary • Gradient Boosted Trees
• Normalisation of datasets
• Divide database in training-, test- and validation datasets To determine which model fits the best for the use case, an anal-
ysis for all considered failures with all available models was carried
Regarding the selected algorithm, several processing steps are out and the two quality criteria “classification error” and “class
necessary before the database can be split into test, train and valida- recall” were compared with each other. The classification error
tion data. Values of different input parameters can be available in a gives a percentage of how many datasets are predicted wrongly
different range of values and cause problems for the algorithm. ANN for both scenarios (true = 1 and false = 0). As the class of true failure

9
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 9. Preprocessing of input parameters.

Fig. 10. Use case specific model parameter optimization.

Table 2
Results of average classification error and class recall for selected prediction algorithm.

Naive Bayes Generalized Logistic Deep Decision Random Gradient


Linear Model Regression Learning Tree Forest Boosted Trees

Average classification error 33 % 23 % 23 % 23 % 21 % 21 % 20 %


Average class recall 61 % 43 % 42 % 64 % 45 % 42 % 59 %

modes is more important, the second criteria focus on the case that Since the class recall is the more important criterion and the
a specific failure mode occurs. deep learning algorithm achieved the best recall with an acceptable
Table 2 shows the average values of the classification error and error, the deep learning algorithm was selected as prediction model
the class recall for all failure modes and all prediction models. for further analysis.
The classification error gives a percentage of how many datasets To start the optimization process for a specific use case, an ini-
are predicted wrongly for both scenarios (true = 1 and false = tial network structure and parameters must be selected. Based
0). As the class of true shop finding or corrective actions is more on the review of several papers about prediction examples with
important from a business perspective, the second class recall deep learning, the initial neural network parameters are selected
focuses on the case that a specific shop finding or corrective action (Morfidis and Kostinakis, 2018; Chollet, 2018; Ren et al., 2018a, b;
occurs. Ren et al., 2018a; Kaymak et al., 2017).The initial ANN parameters
The three highest values for the class recall are achieved by within this paper are shown in Table 3.
deep learning, naive bayes and gradient boosted trees. The com- In order to optimize the algorithm parameters specifically for
parison of these three models in consideration of the classification the use case, various parameter settings must be determined. In
error shows that naive bayes has the worst classification error of the literature often only some hyper parameters are (Sun et al.,
all compared algorithms. Deep learning and gradient boosted trees 2016; Morfidis and Kostinakis, 2018). However, within this paper
achieved the best overall performance in terms of average class a holistic approach for hyper parameter optimization is followed
recall. and therefore all hyper parameters found in the literature are con-

10
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 11. Grid Search for Activation Functions for Area 2 and Failure Mode D.

Table 3 4.3.3. Activation function


Overview of initial ANN parameters.
A grid search is carried out to evaluate and select the best com-
ANN parameters Input layer Hidden layer Output layer bination of activation functions. The following two plots are the
Number of Layers 1 6 1 results of each grid search (see Fig. 11), one plot for the accuracies
Neurons per Layer same as input parameter 50 1 and another for the losses. A grid search like this was carried out
Activation function RELU RELU Sigmoid five times for each failure mode to ensure that the result is valid.
Epochs 50 For all failure modes a combination of TANH for the input layer
Batch Size 25
and RELU for the hidden layer achieved the best performance.
Learning Rate automatic
Data Dropout No Data Dropout
Optimiser ADADELTA
4.3.4. Number of layers and neurons
A grid search was carried out and tested several layers in com-
sidered. Fig. 10 presents an overview of the central parameters to bination with several neurons. Fig. 12 shows the grid search for
be optimized for this use case application. failure mode D in Area 2.
The best combination can be found for 6 hidden layers with 150
4.3.2. Optimizer type neurons each. For each failure mode the grid search was carried out
For the comparison of the three chosen optimizers (RMSPROP, five times for validation accuracy and validation loss to ensure that
ADADELTA and ADAM), the training process of the neural network the results are valid.
is carried out five times and the average values are calculated. A summarised overview with the best combinations is given in
Table 4 shows the achieved loss and accuracy values for five Table 4 for all component areas with corresponding failure modes
selected failure modes (FM) of several component areas for all con- (Table 5).
sidered optimizer types. The results show that there is no overall optimal combination
The loss function measures the performance of the classification. for all failure modes. The best number of layers varies between 4
Since all outputs are between 0 and 1, the binary cross-entropy and 10 and the best number of hidden neurons is either 150 or 250.
loss function is used for all neural networks. Low values of the loss Therefore, these combinations are used during the next optimiza-
function indicate a good performance of the model. tion steps.
Since the loss function alone is not enough to evaluate a model,
the accuracy is also calculated. The accuracy gives the percentage
of correct predicted data samples. Therefore, high values for the 4.3.5. Number of epochs and batch size
accuracy indicate a good prediction accuracy of the model. With Fig. 13 shows the results of the grid search for Area 2 and fail-
the combination of both values (loss and accuracy) the training and ure mode D under consideration of the validation accuracy and
optimization process can start. validation loss.
For some failure modes, the initial network settings already By comparing both plots, it can be clearly seen that the lowest
achieve quite good accuracies around 90 %. However, the fault loss and the highest accuracy and therefore the best performance
predictions in the component area 2 are lower and only reach accu- is achieved with 500 epochs and a batch size of 25.
racies around 80 %. The results show that the best performance A summarized overview of the best combinations is given in
(lowest loss and highest accuracy) is achieved with the optimizer Table 6. The results show that at least 100 epochs are needed to
ADADELTA for all failure modes. Therefore, ADADELTA is used as achieve the best performance. With values from 25 to 100, the batch
optimizer for all further improvement steps. size is relatively close to each other. With these individual settings,

Table 4
Comparison of optimizers for loss and accuracy for considered failure modes.

Failure Modes (FM)

Area 1 Area 2 Area 3

Optimizer Attribute FM A FM B FM C FM D FM E

Loss 0.2562 0.3772 0.5057 0.4304 0.2573


RMSPROP
Accuracy 0.8745 0.8303 0.7465 0.7833 0.8904
Loss 0.1620 0.2621 0.4624 0.4018 0.1702
ADADELTA
Accuracy 0.9217 0.9032 0.7962 0.7845 0.9232
Loss 0.1841 0.3107 0.4818 0.4323 0.2056
ADAM
Accuracy 0.9086 0.8827 0.7757 0.7714 0.9187

11
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 12. Grid search for number of neurons and layer exemplarily for Area 2 and Failure Mode D.

Table 5
Performance after Selection of best Combinations of Neurons and Layers.

Component Area Failure Mode (FM) no. Layer no. Neurons Attribute Values

Loss 0.0630
FM A 10 250
Accuracy 0.9856
Area 1
Loss 0.1808
FM B 6 150
Accuracy 0.9378
Loss 0.4045
FM C 8 250
Accuracy 0.8217
Area 2
Loss 0.2591
FM D 6 150
Accuracy 0.8869
Loss 0.0696
Area 3 FM E 4 250
Accuracy 0.9718

Fig. 13. Grid search for number of epochs and batch size for Area 2 and Failure Mode D.

Table 6
Performance after selection of best combinations of epochs and batch size.

Component Area Failure Mode Epochs Batch Size Attribute Values

Loss 0.0325
FM A 150 50
Accuracy 0.9856
Area 1
Loss 0.1285
FM B 500 100
Accuracy 0.9514
Loss 0.3259
FM C 500 100
Accuracy 0.8408
Area 2
Loss 0.1337
FM D 500 25
Accuracy 0.9524
Loss 0.0330
Area 3 FM E 100 100
Accuracy 0.9887

the networks are further optimized and the settings are used during ter study, but in this use case this step can be skipped (Kingma and
the following optimization steps. Ba, 2015).

4.3.6. Learning rate


In this use case, all five selected failure modes achieved the 4.3.7. Data dropout
best performance with ADADELTA as optimizer. Since ADADELTA Fig. 14 shows the progress of the final performance of the neural
changes its learning rate automatically based on a moving window network for Area 1 and failure mode A. On the left side, the training
of gradient updates, there is no need to test manual changes of the and validation accuracy is plotted over the number of epochs and
learning rate for this optimizer (Zeiler, 2012). For other optimizers on the right side, the training and validation loss is also plotted over
like ADAM and RMSPROP, it might be useful to carry out a parame- the number of epochs.

12
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 14. Final performance for Failure Mode A in the Area 1.

Table 7
Final algorithm parameters for specific failure modes.

Component Area and Failure Modes (FM)

Area 1 Area 2 Area 3

FM A FM B FM C FM D FM E

Number of Layers 1
Input Layer Neurons 7 7 7 5 10
Activation function tanh
Number of Layers 10 6 8 6 4
Hidden Layer Neurons 250 150 250 150 250
Activation function relu
Number of Layers 1
Output Layer Neurons 1
Activation function sigmoid
Epochs 150 500 500 500 100
Batch Size 50 100 100 25 100
General Settings Learning Rate automatic
Data Dropout not needed
Optimizer ADADELTA

Table 8
Accuracies and Loss after each Optimization Step.

Component Area and Failure Modes

Area 1 Area 2 Area 3

Step Attribute FM A FM B FM C FM D FM E

Loss 0.1620 0.2621 0.4624 0.4018 0.1702


1 Optimizer
Accuracy 0.9217 0.9032 0.7962 0.7845 0.9232
Loss 0.1043 0.1932 0.4454 0.3121 0.1266
2 Activation Function
Accuracy 0.9591 0.9297 0.8112 0.8512 0.9548
Loss 0.0630 0.1808 0.4045 0.2591 0.0696
3 Layers and Neurons
Accuracy 0.9856 0.9378 0.8217 0.8869 0.9718
Loss 0.0325 0.1285 0.3259 0.1326 0.0330
4 Epochs and Batch Size
Accuracy 0.9856 0.9514 0.8408 0.9286 0.9887

The plot on the right side shows clearly that the validation loss ters like the number of neurons for the input and hidden layer, the
does not start to increase after 150 epochs. The validation loss is still number of layers and neurons as well as the number of epochs and
at a minimum and therefore the model is not overfitted. The left plot the batch size are individual for all failure modes.
shows that the validation accuracy started to increase during the Moreover, the following Table 8 shows the validation loss and
first 40 epochs and then reached a steady value. This is an indication accuracy after each optimization step. It can be seen that with
that the model is not underfitted. Since the model is neither under- each step of the framework. The loss decreased and the accuracy
nor overfitted, there is no need to add data dropout to any layer. increased. Therefore, the optimization process helps to improve the
Similar plots are generated for all five failure modes and do not overall performance in prediction.
show any indication of over- or underfitted networks. Therefore,
there is no need to add data dropout to any layer or network of any
failure mode and this step can be skipped in this use case. 4.4. Evaluation

Before using the results for data-driven FMEA applications, the


4.3.8. Final model parameters and performance validation of the fault prediction is needed. Therefore, for each fail-
Table 7 gives an overview of the finally selected settings for the ure mode and component area 15 validation data samples are used
neural networks after the optimization process. Some parameters to test the fault prediction accuracy.
like the activation function, the optimizer and the number of input Since the output probabilities are between 0% and 100 % and
and output layers are the same for all failure modes. Other parame- the true values are either 0 for “no fault” or 1 for “fault exists”, a

13
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Fig. 15. Results of the Fault Prediction Tool for the Validation Data.

method how the probabilities are assigned is needed. Therefore, all are set by experienced workers involved in the use case. By using
probabilities of 45 % or lower are assigned to class 0 for no fault and the trained fault prediction algorithm, the developed framework
all probabilities of 55 % or higher are assigned to class 1 for fault is tested by using an example dataset. The failure probabilities are
existence. Values between 45 % and 55 % are not assigned to any calculated, converted and finally used to calculate the FMEA risk
class and just handled as unclear prediction. priority numbers based on the predefined severity and detection
All predicted probabilities and the true values of the valida- values (see Table 10).
tion dataset are summarised in Fig. 15. The predicted probabilities Besides the interpretation of the RPN value, the background
are assigned to a class with the described method and compared colour of each risk priority number indicates whether the risk
with the real values. A green background colour indicates that is high (red for 100 ≤ RPN ≤ 1000), medium (orange for 50 ≤
the assigned class and the real value are matching. A red back- RPN ≤ 100), small (yellow for 2 ≤ RPN ≤ 50) or little (green for
ground colour indicates that the assigned class and the real value RPN = 1). The choice of threshold for the different risk areas is
diverge, which means the prediction was wrong. Unclear predic- dependent on the specific use case and can be changed individ-
tions are marked with a yellow background colour. The results ually.
show a high conformity with the true values, which allows the With the calculated FMEA RPN, a graphical user interface (GUI)
developed method to be used for further prediction applications. was developed to visualize the results. For each component area a
To calculate the overall performance (prediction accuracy), the small drawing as well as the RPN for each failure mode is shown in
percentage of correct predicted failures is calculated. Unclear pre- Fig. 16.
dictions between 45 % and 55 % count as incorrect predictions, since It is now possible to compare faults and see which failure mode
they do not provide enough information to be allocated to one of the represents the highest risk. The FMEA also shows that some failure
classes with certainty. Table 9 gives an overview of the prediction modes with different probabilities (e.g. “FM B” in “Area 1 with a
accuracy for each failure mode and component area. fault probability of 94 % and “FM D” in the “Area 2 with 42 %)
With accuracies of 80 % or higher for all failure modes and an reach about the same RPN of around 216. Therefore, even if the
overall accuracy of around 95 %, the developed tool achieves quite probabilities are different, the risk can be the same due to different
good results. Therefore, the results of the fault prediction tool are values for severity and detection.
not only reproducible but also representative. With the FMEA results, the planning of maintenance activities
can be carried out precisely and more accurately. From the exam-
4.5. Deployment ple above, even if the probability for “FM D” in the “Area 2 is
low with 42 %, the risk is quite high. Therefore, an instruction for
With the tested and validated fault predictions, the final risk the next maintenance activity could be to plan some extra time to
evaluation can now be assessed. The calculation of the risk priority inspect the component for “FM D” in the “Area 2 to ensure that the
number, with values between 1 and 1000, is calculated by multi- operation of the aircraft can be maintained until the next planned
plying the values for severity (S), occurrence (O) and detection (D). maintenance.
This helps to determine which failure mode of several failure modes The RPN can also be used for components in the product use
with similar probabilities represents the highest risk. phase. For example, as soon as the RPN for a specific failure mode
The calculated fault probabilities from the developed fault pre- and area exceeds a predefined threshold, the next condition based
diction are converted to values between 1 and 10 with decimal maintenance activity can be scheduled. This allows for calculat-
numbers (e.g. 84 % = 8.4 or 27 % = 2.7) and used as occurrence ing the remaining useful life of components as well as enabling
value for the FMEA. Since the values for severity and detection are predictive maintenance precisely on the current condition of the
fixed values for each failure mode and component area, these values component.

14
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Table 9
Results of the Fault Prediction Tool for the Validation Data (0 = no fault, 1=fault).

Component Area Failure Mode True Values Average Prediction Accuracy [%]

0 100.00%
FM A
1 100.00%
Area 1
0 87.50%
FM B
1 100.00%
0 80.00%
FM C
1 90.00%
Area 2
0 100.00%
FM D
1 100.00%
0 87.50%
Area 3 FM E
1 100.00%
Average prediction accuracy: 94.50 %

Table 10
Data-based FMEA Prediction.

Fig. 16. Data-driven FMEA Result Overview GUI.

With focus on the organisation of maintenance activities, the ties and the predefined parameters for severity and detection, a
data-driven FMEA combines the revealed correlation from past data driven FMEA can be generated. These available risk prior-
maintenance events with the experience of employees and pro- ity numbers for each failure mode are now available and can be
vides support especially for unexperienced employees during the used for the maintenance planning by enhancing transparency and
planning of maintenance and repair. Therefore, by using the devel- providing decision support. Especially for unexperienced or new
oped framework the FMEA risk assessment is no longer subjective employees these fault predictions are very helpful to estimate costs
since every employee will come to the same results. The results and the needed time for the next maintenance or repair. Since fail-
are comparable because the particularly relevant factors are deter- ure probability prediction alone is not the only attribute linked to
mined on the data basis of the use phase. the expected risk, an FMEA is used to give the employees a more
detailed assessment of the failure modes.
Besides time saving advantages, one major advantage of this
5. Conclusion & outlook
developed methodology is that the risk assessment is no longer
subjective, since every employee will come to the same results by
Within this paper, a data-driven FMEA methodology to sup-
using the developed tool. Moreover, by predicting failures already
port the planning and operation of maintenance activities was
during the planning phase, production-related processes and areas,
developed and validated by a use case within the aviation indus-
such as logistics or spare part procurement, can be optimized.
try. This methodology gives a step-by-step instruction from data
In addition, the presented methodology contributes to a sus-
acquisition and preprocessing, over analyzing and optimizing algo-
tainable organisation and planning of maintenance activities. Due
rithm parameters to the final fault prediction. We carried out a
to the high accuracy of the data driven fault prediction, components
more holistic approach to the hyper parameter optimization and
are only exchanged at the necessary time and can therefore be used
therefore considered all hyper parameters were other papers only
longer. On the one hand, this saves resources through fewer mainte-
considered a few of them. With the calculated failure probabili-

15
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

nance activities. On the other hand, the components are exchanged ing. Sustainable Manufacturing V: Global Symposium on Sustainable Product
less frequently. Development and Life Cycle Engineering 2007.
Herrmann, C., Bergmann, L., Thiede, S., Torney, M., Zein, A., 2007b]. Framework for
Further research should be carried out regarding a completely the dynamic and life cycle oriented evaluation of maintenance strategies. 3rd
data driven FMEA. Therefore, further research on how severity and International Virtual Design and Automation Conference 2007, 277–284, ISBN
detection parameters of the RPN could be completely calculated, 978-83-7143-388-7, S.
Hinow, M., Mevissen, M., 2011]. Substation maintenance strategy adaptation for
instead of being predefined by experts, should follow. life-cycle cost reduction using genetic algorithm. IEEE Trans. Power Deliv. 26
With more focus on the optimization of production processes (1), 197–204.
and process chains, there is enough room for further progress in Kagermann, H., Wahlster, W., Helbig, J., Hellinger, A., 2013]. Recommendations for
Implementing the Strategic Initiative INDUSTRIE 4.0 ; Securing the Future of Ger-
developing an automated FMEA that updates dynamically the cur-
man Manufacturing Industry ; Final Report of the Industrie 4.0 Working Group ;
rent RPN. With the help of this tool, failures of individual parts or Securing the Future of German Manufacturing Industry, 2013. Forschungsunion,
components of machines can be predicted using a “Live RPN”. This Berlin, pp. 112.
Kang, H.S., Lee, J.Y., Choi, S., Kim, H., Park, J.H., Son, J.Y., Kim, B.H., Noh, S.D., 2016].
can be used to optimize the entire system. For example, this could
Smart manufacturing: past research, present findings, and future directions. Int.
allow a manufacturing system to developed into a self-controlling J. Precis. Eng. Manuf. Technol. 3 (1), 111–128.
system regarding maintenance activities based on current param- Kaymak, S., Helwan, A., Uzun, D., 2017]. Breast cancer image classification using
eters. artificial neural networks. Procedia Comput. Sci. 120, 126–131.
Kingma, D.P., Ba, J., 2015]. Adam: a method for stochastic optimization. In: 3rd
International Conference for Learning Representations, San Diego.
CRediT authorship contribution statement Kushwaha, Madhavi, Sharma, Somesh Kumar, 2014]. Impact of environmental fac-
tors on aviation safety. Adv. Aerospace Sci. Appl. Vol. 4 (1).
Lanza, G., Niggeschmidt, S., Werner, P., 2009. Behavior of dynamic preventive
Marc-André Filz: Conceptualization, Methodology, Software, maintenance optimization for machine tools. 2009 Annual Reliability and Main-
Writing - original draft, Visualization. Jonas Ernst Bernhard tainability Symposium 2009, 315–320.
Lee, S.G., Ma, Y.-S., Thimm, G.L., Verstraeten, J., 2008]. Product lifecycle manage-
Langner: Conceptualization, Methodology, Software, Writing - ment in aviation maintenance, repair and overhaul. Comput. Ind. 59 (2–3),
review & editing. Christoph Herrmann: Project administration, 296–303.
Writing - review & editing. Sebastian Thiede: Conceptualization, Linß, G., 2016]. Qualitätssicherung - technische Zuverlässigkeit: Lehr- und Arbeits-
buch. Hanser, München.
Writing - review & editing, Supervision.
Liu, H.C., 2016]. FMEA Using Uncertainty Theories and MCDM Methods. Springer
Singapore.
Declaration of Competing Interest Liu, H.-C., You, J.-X., Li, P., Su, Q., 2016]. Failure mode and effect analysis under uncer-
tainty: an integrated multiple criteria decision making approach. IEEE Trans.
Reliab. 65 (3), 1380–1392.
The authors declare that they have no known competing finan- Mohanty, A.R., 2014. Machinery Condition Monitoring. CRC Press.
cial interests or personal relationships that could have appeared to Monostori, L., 2014]. Cyber-physical production systems: roots, expectations and
R&D challenges. Procedia CIRP 17, 9–13.
influence the work reported in this paper. Morfidis, K., Kostinakis, K., 2018]. Approaches to the rapid seismic damage prediction
of r/c buildings using artificial neural networks. Eng. Struct. 165, 120–141.
Nilsson, J., Bertling, L., 2007]. Maintenance management of wind power systems
Acknowledgements using condition monitoring systems—life cycle cost analysis for two case studies.
IEEE Trans. Energy Convers. 22 (1), 223–229.
This work was carried out within the “QU4LITY” project. The Pillay, A., Wang, J., 2003]. Modified failure mode and effects analysis using approxi-
mate reasoning. Reliab. Eng. Syst. Saf. 79 (1), 69–85.
project has received funding from the European Union’s Horizon
Raschka, S., Mirjalili, V., 2017]. Python Machine Learning: Machine Learning and
2020 research and innovation program with the grant agreement Deep Learning with Python, Scikit-Learn, and TensorFlow. Packt Publishing,
NO 825030. Birmingham, Mumbai, September.
Ren, H., Song, Y., Wang, J., Hu, Y., Lei, J., 2018a]. A deep learning approach to the
citywide traffic accident risk prediction. In: 21st International Conference on
References Intelligent Transportation Systems (ITSC), Maui, HI, pp. 3346–3351, 11/4/2018
- 11/7/2018, 2018a, S.
Ahuja, I., Khamba, J.S., 2008]. Total productive maintenance: literature review and Ren, L., Sun, Y., Cui, J., Zhang, L., 2018b]. Bearing remaining useful life prediction
directions. Int. J. Qual. Reliab. Manag. 25 (7), 709–756. based on deep autoencoder and deep neural networks. J. Manuf. Syst. 48, 71–77.
Bukkapatnam, S.T., Afrin, K., Dave, D., Kumara, S.R., 2019]. Machine learning and AI Robinson, C.J., Ginder, A.P., 2007. Implementing TPM: The Nroth American Experi-
for long-term fault prognosis in complex manufacturing systems. CIRP Ann. 68 ence: Productivity Pr.
(1), 459–462. Roy, R., Stark, R., Tracht, K., Takata, S., Mori, M., 2016]. Continuous maintenance
Chang, K.-H., Cheng, C.-H., 2010]. A risk assessment methodology using intuitionistic and the future – foundations and technological challenges. CIRP Ann. 65 (2),
fuzzy set in FMEA. Int. J. Syst. Sci. 41 (12), 1457–1471. 667–688.
Chen, C.-C., 2013]. A developed autonomous preventive maintenance programme Shanmugamani, R., 2018]. Deep Learning for Computer Vision: Expert Techniques to
using RCA and FMEA. Int. J. Prod. Res. 51 (18), 5404–5412. Train Advanced Neural Networks Using TensorFlow and Keras. Packt Publishing,
Chollet, F., 2018. Deep Learning mit Python und Keras: Das Praxis-Handbuch; vom Birmingham, UK.
Entwickler der Keras-Bibliothek. Frechen: mitp. Sharma, R.K., Sharma, P., 2010]. System failure behavior and maintenance decision
Chukwuekwe, D.O., Schjølberg, P., Rødseth, H., Stuber, A., 2016]. Reliable, Robust and making using, RCA, FMEA and FM. J. Qual. Maint. Eng. 16 (1), 64–88.
resilient systems: towards development of a predictive maintenance concept Sheut, C., Krajewski, L.J., 1994]. A decision model for corrective maintenance man-
within the industry 4.0 environment. Euromaintenance 2016 Conference 24. agement. Int. J. Prod. Res. 32 (6), 1365–1382.
Dekker, R., 1996]. Applications of maintenance optimization models: a review and Sun, W., Shao, S., Zhao, R., Yan, R., Zhang, X., Chen, X., 2016. A sparse auto-encoder-
analysis. Reliab. Eng. Syst. Saf. 51 (3), 229–240. based deep neural network approach for induction motor faults classification.
DIN 31051, 2019. DIN 31051:2019-06, Grundlagen der Instandhaltung. Beuth Verlag Measurement 89, 171–178.
GmbH, Berlin, Juni. Takata, S., Kirnura, F., van Houten, F., Westkamper, E., Shpitalni, M., Ceglarek, D., Lee,
Fayyad, U.M., Piatetsky-Shapiro, G., Smyth, P., undefined 1996. Knowledge Discovery J., 2004]. Maintenance: changing role in life cycle management. CIRP Ann. 53 (2),
and Data Mining: Towards a Unifying Framework. 643–655.
Ferreira, S., Konde, E., Fernández, S., Prado, A., 2016]. Industry 4.0: predictive intelli- Thiede, S., 2018]. Environmental sustainability of cyber physical production systems.
gent maintenance for production equipment. Third European Conference of the Procedia CIRP 69, 644–649.
Prognostics and Health Management Society. Tracht, K., Goch, G., Schuh, P., Sorg, M., Westerkamp, J.F., 2013]. Failure probability
Geiger, W., Kotte, W., 2008. Handbuch Qualität: Grundlagen und Elemente des Qual- prediction based on condition monitoring data of wind energy systems for spare
itätsmanagements: Systeme, Perspektiven. Friedr. Vieweg & Sohn Verlag | GWV parts supply. CIRP Ann. 62 (1), 127–130.
Fachverlage GmbH Wiesbaden, Wiesbaden. Vahdani, B., Salimi, M., Charkhchian, M., 2015]. A new FMEA method by integrating
Gits, C.W., 1992]. Design of maintenance concepts. Int. J. Prod. Econ. 24 (3), 217–226. fuzzy belief structure and TOPSIS to improve risk evaluation process. Int. J. Adv.
Guo, J., Li, Z., Wolf, J., 2016]. Reliability centered preventive maintenance opti- Manuf. Technol. 77 (1), 357–368.
mization for aircraft indicators. In: Annual Reliability and Maintainability Verhagen, W.J., Boer de, L.W., 2018]. Predictive maintenance for aircraft components
Symposium, Tucson, AZ, USA., pp. 1–6, 1/25/2016 - 1/28/2016, 2016, S. using proportional hazard models. J. Ind. Inf. Integr. 12, 23–30.
Hand, D.J., 2007]. Principles of data mining. Drug Saf. 30 (7), 621–622. Villarini, M., Cesarotti, V., Alfonsi, L., Introna, V., 2017]. Optimization of photovoltaic
Herrmann, C., Bergmann, L., Thiede, S., 2007a]. An integrated approach for the maintenance plan by means of a FMEA approach based on real data. Energy
evaluation of maintenance strategies to foster sustainability in manufactur- Convers. Manage. 152, 1–12.

16
M.-A. Filz, J.E.B. Langner, C. Herrmann et al. Computers in Industry 129 (2021) 103451

Wälder, K., Wälder, O., 2017. Methoden zur Risikomodellierung und des Risikoman- Yazdi, M., Nedjati, A., Zarei, E., Abbassi, R., 2020]. A reliable risk analysis approach
agements. Springer Vieweg, Wiesbaden. using an extension of best-worst method based on democratic-autocratic
Wang, H., 2002]. A survey of maintenance policies of deteriorating systems. Eur. J. decision-making style. J. Clean. Prod. 256, 120418.
Oper. Res. 139 (3), 469–489. Yu, X., Xing, Z., Qin, Y., Jia, L., Cheng, X., 2013. The optimization of metro vehicle pre-
Wang, P., Liu, Z., Gao, R.X., Guo, Y., 2019]. Heterogeneous data-driven hybrid machine ventive maintenance cycle based on the reliability and genetic algorithm. 2013
learning for tool condition prognosis. CIRP Ann. 68 (1), 455–458. IEEE International Conference on Intelligent Rail Transportation Proceedings,
Xie, H.-B., Wu, W.-J., Wang, Y.-F., 2018]. Life-time reliability based optimization of 291–295.
bridge maintenance strategy considering LCA and LCC. J. Clean. Prod. 176, 36–45. Zeiler, M.D., 2012]. ADADELTA: an adaptive learning rate method. Comput. Sci.

17

You might also like