0% found this document useful (0 votes)
22 views14 pages

Articulo

Uploaded by

Andre Montenegro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views14 pages

Articulo

Uploaded by

Andre Montenegro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Reliability Engineering and System Safety 223 (2022) 108359

Contents lists available at ScienceDirect

Reliability Engineering and System Safety


journal homepage: www.elsevier.com/locate/ress

Dynamic Risk Assessment for CBM-based adaptation of maintenance


planning
Pablo Martínez-Galán Fernández a ,∗, Antonio J. Guillén López a , Adolfo Crespo Márquez a ,
Juan Fco. Gomez Fernández a , Jose Antonio Marcos b
a
Department of Industrial Management, Universidad de Sevilla, 41092, Seville, Spain,
b
Patentes Talgo, Department of Smart Maintenance Engineering and New Projects, 28221 Majadahonda, Madrid, Spain

ARTICLE INFO ABSTRACT

Keywords: This paper proposes a practical method for dynamic maintenance planning based on Dynamic Risk Assessment
DRA (DRA). This is founded on the interpretation, in terms of risk levels evolution, the available information of
CBM monitoring events and maintenance activities integrated in and that conform the condition-based maintenance
Maintenance decision making
(CBM) processes. DRA proposal is supported by ISO 31000 risk management framework in order to better
Maintenance planning
understanding and results integration within other risk management approaches. Proposed method analyzes
Risk-based maintenance
Risk levels
CBM results (monitoring events and maintenance activities) regarding their impact on failure risk level, and
how to program and manage maintenance decision making (maintenance planning) regarding with dynamic
risk evolution. This strategy not only helps maintenance management optimization but also facilitates the link
of intelligent maintenance with global risk management within the organization, which is lined with modern
Asset Management principles. To illustrate the method, an example of a real use case is presented where it
is applied to the dynamic maintenance planning of a critical component in a high-speed train, and which
integrates monitoring, predictive analytics and inspection data.

1. Introduction along assets life cycle as fundamental principle for AM and mainte-
nance [2,3]. They are contributing to more mature risk management
This paper aims to propose and discuss a practical method for : supporting a holistic risk view (from component behavior to business
dynamic maintenance planning in high digitalization contexts. This performance) and integrating and developing new skills and capabil-
method allow treating in an unification way the great amount of ities from digitalization [4]. This is evidenced by the great grow of
information that characterized these contexts (from monitoring, pre- AM platforms and software solutions (under different names: Enterprise
dictive analytics, new reliability complex models, etc.) and that can Asset Management, EAM; Asset Performance Management, APM; Asset
be useful for maintenance, employing a common interpretation into Inversion Planning, AIP; Intelligent Asset Management System, IAMS)
risk level evolution of gathered information and linking risk levels to in last years that are including specific tools for risk control and
maintenance decision-making. risk-based decision making [5].
Taking under control failure risk is one of the main commitments of Risk management improving is, in fact, one of Digital Transforma-
maintenance: not only preserving the asset function but also avoiding tion (DT) main impact areas. But, at the same time, DT reimagines
failure effects. But the maintenance function, and specially mainte- risk management since there are new tools, strategies or analytics
nance planning task, is being reconsidered due to digitalization. In fact, solutions that were not possible before or are greatly enhanced by DT.
digitalization requires more sophisticated ways of doing maintenance In particular, DT will provide more data, information and knowledge
planning, especially because now it is demanded to manage more data, of failure that should be traduced into the continuous improvement of
information, and knowledge [1]. This maintenance digitalization is dynamic failure risk control for a real intelligent maintenance decision
core part of asset management (AM) current evolution that is very close making [1,6]. In the maintenance field, advance CBM (which is also
connect with business digitalization strategies. Advance AM systems, named PHM/CBM [7]) allows enhancing transparency to control the
according with ISO 55 000, explicitly introduces risk management (RM) assets and their current conditions in order to discover hidden patterns

∗ Corresponding author.
E-mail addresses: pmartinezgalan@us.es (P. Martínez-Galán Fernández), ajguillen@us.es (A.J. Guillén López), adolfo@us.es (A.C. Márquez), jfgomez@us.es
(J.F. Gomez Fernández), jamarcos@talgo.com (J.A. Marcos).

https://doi.org/10.1016/j.ress.2022.108359
Received 6 May 2021; Received in revised form 20 January 2022; Accepted 23 January 2022
Available online 18 March 2022
0951-8320/© 2022 Elsevier Ltd. All rights reserved.
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

allows detection, diagnosis and prognosis, and these three results can
Nomenclature and acronyms be used as basic inputs for maintenance planning, the final goal of the
AM Asset Management CBM process [12,13].
AHP Analytical Hierarchy Process The development of reliability models, especially of complex engi-
neering systems, continues to be a prolific field for research [14–20].
CBM Condition-Based Maintenance
However, one of the main challenges now is how to link the devel-
CM Condition Monitoring
opment of new reliability models to risk management promoting the
DRA Dynamic Risk Assessment
transition from QRA (Quantitative Risk Assessment) to PRA (Probabilis-
DT Digital Transformation tic Risk Assessment), and beyond this to a dynamic risk assessment
FM Failure Mode (DRA), once thanks to DT there is the real chance to an effective
FMEA Failure Mode and Effects Analysis description and monitoring of risk level evolution [13]. How to model,
MoEv Monitoring Event represent and employ DRA and how to integrate it into advanced
PdA Predictive Analytics intelligent maintenance solutions is still a challenge today, both in
PdM Predictive Maintenance research and in its industrial application. This paper address how to
PFM Primary Failure Mode interpret in terms of risk the different emerging information, in other
words, how to connect the new information and knowledge available
PHM Prognostics Health Management
with effective risk level evolution control [1], and how this can be
PM Preventive Maintenance
use in advances approaches for digitalization of maintenance decision
PMEv Preventive Maintenance Event
making and planning. But failure risk level does not only change
PRA Probabilistic Risk Assessment because of assets degradation or the reliability reduction. Maintenance
QRA Quantitative Risk Assessment actions have also influence in risk evolution, in general terms. Without
RA Risk Assessment considering a bad execution, these should reduce risk level or guarantee
RAn Risk Analysis the risk will be under control during a specific time window. In this
RCM Reliability Centered Maintenance work both influence factors over risk level has been treated within a
REv Risk Evaluation single an unique method.
RId Risk Identification In summarize and according with this context, this work identifies
and treats two main gaps:
RIF Risk Influencing Factors
RM Risk Management • Further study is needed on analysis methods that allow increasing
SFM Secondary Failure Mode the real capacity of the industrial companies for a more dynamic
SRA Society of Risk Analysis control of the risk level of components and systems. To this aim,
how to translate the great among information currently available
into simple ways to interpret and control risk is still a gap
for many industrial practitioners. This will introduce important
advantages for maintenance management optimization and risk
control over the time.
• Dynamic maintenance activities scheduling. The massive use of
CBM generates dynamic and highly complex decision making
contexts. Not only by the information available but the number
of events and alarms, many of them interconnected. The handling
of all of these complex scenarios is a barrier for the practical
implementation of CBM solutions within preventive maintenance
plans.

The limited industrial implementation of the DRA is not a problem


of data access or lack of mathematical models, it is a matter of facilitat-
ing to the programmer of maintenance the interpretation of the results
and their connection with the real processes of decision-making and
control of the systems, introducing key elements and suitable methods
supporting new planning needs.
In this paper ‘‘risk level’’ and ‘‘failure mode’’ are proposed as key
elements (Fig. 1) for developing new maintenance scheduling solution.
It allows: (i) interpreting real-time data and analytics result on the
Fig. 1. Schema of work’s roadmap and main component.
system evolution in terms of risk level; (ii) as a trigger of maintenance
decision-making, that, in addition and simultaneously produce effects
on risk level (reduction or contention); and (iii) defining and recording
that anticipate risky scenarios [8,9]. Owing to the extensive growth of events linked to failure mode evolution, which are fundamental for risk
condition based and predictive maintenance (CBM and PdM) references evolution control.
(from academia and industry), that generated a non-depreciable noise According with Fig. 1 paper’s contents are structured as follows:
over these terms definition and use, it is needed a brief description Section 2 presents risk concept and risk assessment background un-
of the interpretation of them in this work. This paper uses as the til dynamic risk assessment methods, and the suitability of risk use
main reference the international standard CEN-EN 13306 Maintenance proposed in this work is discussed; Section 3 presents the DRA-based
Terminology [10]. Here CBM is a type of preventive maintenance (PM), method (CBM interpretation and decision-making) which this paper
and PdM is a type of CBM. This way, when we refer to CBM in the is focus on; for better understanding of the proposed method, a use
text, PdM is included. In the same way, predictive analytics (PdA) or case of the railway sector with real data from the company Talgo is
prognosis health management (PHM) techniques can be considered as presented in Section 4; and, finally, conclusions and future application
CBM supporting techniques [10,11]. In the CBM process, data analysis are summarized in Section 5.

2
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

Table 1
Risk definitions.
ISO 31000 [21]
risk: effect of uncertainty on objectives
NOTE 1 An effect is a deviation from the expected — positive and/or negative.
NOTE 2 Objectives can have different aspects (such as financial, health and safety, and environmental goals) and can apply at different levels (such as strategic,
organization-wide, project, product and process).
NOTE 3 Risk is often characterized by reference to potential events and consequences, or a combination of these.
NOTE 4 Risk is often expressed in terms of a combination of the consequences of an event (including changes in circumstances) and the associated likelihood of occurrence.
NOTE 5 Uncertainty is the state, even partial, of deficiency of information related to, understanding or knowledge of an event, its consequence, or likelihood.
Society for risk analysis [22]
We consider a future activity [interpreted in a wide sense to also cover, for example, natural phenomena], for example the operation of a system, and define risk in relation to
the consequences (effects, implications) of this activity with respect to something that humans value. The consequences are often seen in relation to some reference values
(planned values, objectives, etc.), and the focus is often on negative, undesirable consequences. There is always at least one outcome that is considered as negative or
undesirable.
Overall qualitative definitions:
1. Risk is the possibility of an unfortunate occurrence
2. Risk is the potential for realization of unwanted, negative consequences of an event
3. Risk is exposure to a proposition (e.g., the occurrence of a loss) of which one is uncertain
4. Risk is the consequences of the activity and associated uncertainties
5. Risk is uncertainty about and severity of the consequences of an activity with respect to something that humans value
6. Risk is the occurrences of some specified consequences of the activity and associated uncertainties
7. Risk is the deviation from a reference value and associated uncertainties

2. Towards DRA. Risk assessment background • Value. Risk is an expression of effect on objectives, effects re-
spect something that human value. Organization objectives are,
Dynamic nature of risk assumes that risk is changing, continuously. in fact, very concrete and explicit ways for value expression. In
Using the Heraclitus’ formula ‘‘no man ever steps in the same risk conclusion risk is an expression of the organization value; and
twice’’. But main RA methods or approaches have a static view of risk risk metrics are in fact value metrics. In the approach of this
or, more properly speaking, a quasi-static view, which only awards to work, the challenge is how to materialize and make it visible the
the necessity of managing the dynamic nature of risk the possibility link of condition monitoring and CBM with organization value
of periodic reviews of the risk assessment results [6]. Of course, these management.
methods are not only totally valid but the only tool when management • Consequences — deviation from objectives (negative or positive
contexts do not actually require more sophisticated methods or when effects on value). The hope of gaining value drives us to deal with
despite great potential benefits of DRA (Dynamic Risk Assessment) the risks, assuming negative consequences in case they will finally
happen. In the approach of this work, consequences are failure
real thing is that there is not enough information to do it effectively
consequences, that are always negative.
or there is not enough skills and maturity in the organization to do it
• Potential Event — Future activity. it could be concluded that
properly. This section presents the proposed treatment of dynamic risk,
the risk is defined on the basis of potential events, futures ac-
in order to justify, from a scientific point of view, the suitability of this
tivities or a combination of both. The introduction of the term
approach.
activity in the definition of risk [22] is very useful for the study
of decision making in organizations, as it identifies activities
2.1. Risk and risk assessment (which can be designed, planned, executed, controlled and contin-
uously improved, in a word, managed) as one of central element
On the role of risk concept of risk management. In the approach of this work, the chal-
Risk concept is a very complex and rich one. With different ap- lenge is how to describe and use, in a simple and accurate
proaches and nuances, the risk concept is key in all scientific areas, way, the different events that emerge from the implementation
economic and productive sectors management and for political gov- of CBM process (monitoring events and preventive maintenance
ernance [23–25]. Owing this, particular great experience, or even activities), allowing the control of risk level evolution.
doctrine, has been developed in each field. On the other hand, there are • Uncertainty: risk management and reliability engineering are
many different approaches and interpretations that introduce noise and ones of most powerful tools that science has developed to deal
misunderstanding that actually limit and make complex the practical with uncertainty effects. There is uncertainty in the events oc-
treatment of risk, being a barrier for having a suitable risk culture curring (when, how) and in the effects that are the consequence
within an organization. Related to our knowledge area (maintenance of them (what effects, what intensity or magnitude they manifest
and reliability engineering and engineering asset management), there with). Even planned activities can suffer uncertainty (deviation
are very good references that could be consulted in order to dig from expected date and/or duration, deviation from expected
deeper into risk, risk management and risk assessment. We strongly activity results and/or costs) Uncertainty can be aleatory (result
recommended [23,25–27]. All these references try to give a general of the fundamental randomness in the natural phenomena) or
view of risk management and risk assessment evolution from different epistemic (from the lack of knowledge of systems’ process) [28,
29]. In the approach of this work, the challenge is, being aware
perspectives. In general terms, it is possible to conclude that the most
that there is always a great level of uncertainty in CBM solutions
generally accepted views and frameworks for risk management are well
we manage, how to introduce a practical tool for risk control and
represented by two international documents: (i) ISO 31000 standard
maintenance decision making from the experts’ interpretation of
series, and (ii) the scientific-technical publications of the Society of
monitoring and PdA results.
Risk Analysis (SRA). In order to illustrate the risk concept and its
interpretation complexity, Table 2 summarizes both references risk Risk assessment within risk management. ISO 31000 RM framework
definitions ISO 31000 defines RM as coordinated activities to direct and con-
As result of Table 1 analysis, it is possible summarize four key trol an organization with regard to risk. Aven et al. [25] argue that
concepts that define and describe the risk concept: RM is about balancing different concerns, profits, safety, reputation,

3
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

Fig. 2. Risk management framework by ISO 31000.

etc. (value centered elements); and considering a set of alternatives, Table 2


Selected search keywords for finding the most relevant available literature (2015–
evaluates their pros and cons, and makes a decision that best meets the
2020)
decision-makers’ values and priorities. The ISO standard introduces a
Selected search keywords RESS Safety science Comp. in industry
general risk management framework 2. By going through this scheme
Dynamic and Risk 88 74 5
completely (from top to bottom and from left to right) it becomes
Dynamic Risk and CBM 5 5 0
explicit how the RM, as a process, begins with value, or is based on Risk and (FMEA OR Failure Mode) 30 23 0
value, and ends in the practical treatment of risks. This way RA should Dynamic Risk and Maintenance 14 3 1
be interpreted as a part of RM. RA aims to provide risk information to Dynamic Risk and (PRA OR QRA) 10 4 0
support decision making, structuring the information and knowledge Operational and Risk 27 49 3

available by providing systemic modeling and understanding, from


components level to system level.
RA, and by extension RM, is not possible without risk metrics. But But the term risk is also used as an uncountable concept. This is the
RA deals with no few difficulties by the fact that the concept of risk case when expressions like high risk, medium risk or low risk are used.
and the risk metrics many times are overlapped. Such a thing, it is No matter what metric will be finally employed for a specific case, it
possible to say that a specific risk (for example failure risk of a critical is possible to state that what we are really measuring is the risk level.
component) has high risk. There can be different risk metrics, and the This makes it possible to compare and rank two different risks by their
suitability of these metrics/descriptions depends on the situation. But respective risk levels or to monitor risk level evolution in a system
none of them can be viewed as risk itself, and the appropriateness of the using the same previous defined metric. And this is a critical point thus
metric/description can always be questioned [22]. So, what is ‘‘a risk’’ the way we understand and describe risk strongly influences the way
when it can be listed in a ‘‘risks list’’? Note 3 and 4 of ISO definition risk is analyzed and hence it may have serious implications for risk
try to explain how is possible to describe or characterize ‘‘a risk’’ deter- management and decision-making SRA2018.
mining or narrowing down very specific scenarios by combining events, In this work we manage the control of risk level evolution from
circumstances and consequences and the associated uncertainties. In condition monitoring and maintenance activities results characteriza-
these cases, it is possible to give a standard name to each specific tion. As it can be seen later in the text, the proposed DRA method
scenario or combination. Here the risk is countable, making possible pretends not to be disruptive with classical RA methods which it will
to list the risks we find. In a more practical way, ‘‘the risks’’ or ‘‘the have to coexist with, but inclusive. In fact, this method follows or can
list of risks’’ can be directly presented as the outcome of the first step be included within the ISO 31000 general framework.
of ISO 31000 RA process (see Fig. 2), i.e. Risk Identification (RId):
‘‘process of finding, recognizing and recording risks’’. The purpose of 2.2. Introduction to DRA for maintenance
RId is to identify what might happen or what situations might exist
that might affect the achievement of the objectives of the system or DRA is the next border in the intelligent maintenance evolution.
organization [21]. ISO 31010 [30] extend these saying that includes Specially in DT or Industry 4.0 scenario, where accurate knowledge
identifying the causes and source of the risk (hazard in the context of about systems behaviors, including risk evolution, and even in real
physical harm), events, situations or circumstances which could have time, will be provided [5,26,31]. Over the last few years, the interest in
a material impact upon objectives and the nature of that impact [29]. ‘‘dynamic approaches’’ has grown substantially as shown in Table 2 and

4
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

Fig. 3. Literature available with Dynamic Risk keywords for the period (2015–2020).

Fig. 3. A search has been done for the keywords in the titles, abstracts, Bur, at it is said above, most of the DRA-related literature does
and keywords of papers in relevant journals (i.e., Reliability Engineer- not cover decision-making. The results of the risk assessment should
ing and Systems Safety, Computers in Industry, and Safety Science) allow decision-makers to interpret them appropriately to meet their
with emphasis on recent literature (2015–2020). This search resulted objectives and understand the uncertainty associated with risk. Our
in 262 unique papers. From the resulting set, we kept papers that dealt objective is to facilitate maintenance decision-making according to
with complex systems and had one or more of the following elements: risk. This has been studied in part of the DRA literature, although
a system-level perspective, systems logic modeling, use of different in a different way than what we will propose in our model later.
information sources, and dynamic/online/real-time assessment. In [44] the objective is to optimize the long-term maintenance plan
Actually, there are three terms in the literature that have intro- and optimize the maintenance cost by maximizing the RUL. Others,
duced this challenge, from different but complementary perspectives: for example, [45] focus on maintenance strategy/policy rather than
operational risk assessment, dynamic risk assessment and real-time risk decision-making in the operation phase. In [46], unlike conventional
assessment. In recent years, there has been increasing interest in risk approaches that restrict all maintenance activities to a finite planning
and safety research in reflecting risk fluctuations in the operational horizon, they focus on activity-to-activity scheduling without specifying
phase to prevent major accidents from occurring [32]. Regarding pro- the horizon. In addition to decision-making, we want our model to
cess safety area, significant improvements are needed in fault detection integrate the CBM and the information from maintenance activities, so
methods, especially, in the areas of early detection and warning. There that it collects all the available information, and the risk calculation is
has been significant contribution on the topic from C-RISE [33,34]. In as accurate as possible. This integration has been studied in a few cases
this sense the term operational risk is becoming a relevant topic in the in the literature. In some cases, as [47] make a distinction between
world of industry [35,36] and also in the world of maintenance [37]. the static view of the fault distributions derived from historical data
But operational risk term is closer to the view of process or control and the dynamic view of the remaining life derived from the condition,
processes than maintenance view. In [32], after a deep discussion about trying to integrate the PHM into the DRA. On the other hand, in more
the use of these three terms, it is concluded that Real-time RA is recent studies [48] propose a framework to integrate the CBM and
focused on monitoring data (especially process data), DRA on analysis, the inspection data in the DRA. Condition monitoring data is collected
combining different information available to provide a unique risk online by sensors and is indirectly related to component degradation.
estimation over time reflecting the current condition of the system, and Inspection data is recorded in physical inspections that directly measure
linking operational RA with decision making. component degradation. However, in this study they do not incorporate
Zeng and Zio [38] propose to uses DRA definition previously intro- the failure modes of the components. We chose to incorporate them into
duced by Khan et al. [39] : ‘‘RA method that updates the estimated risk the proposed model, since the CBM and maintenance activities are at
of a deteriorating process according to the performance of the control the structure level at the failure mode level.
system, safety barriers, inspection, and maintenance activities, the DRA also can be seen as RA evolution. For RA it is necessary to
human factors, and the implementation of procedures’’. DRA is suitable quantify risk. This is reinforced by the use of QRA concept (Quantitative
for being applied to all asset life cycle phases but with special impact on Risk Assessment). As defined by NORSOK Standard Z-013 [49] and by
operation and maintenance phase, supporting a great improvement in ISO31000 standard [21], QRA include establishment of the context,
risk-informed decision-making processes [40]. It is assumed that risk is risk identification, performance of the risk analysis, risk evaluation.
able to be accurately quantified and actualized using digital capabilities The following step, when it is possible, is to introduce probability for
and tools. In particular it is easy to see the link between monitoring managing uncertainty related to these quantities [23,50]. This moves
and CBM with risk evolution control [8,25]. But CBM not only provides from classical deterministic RA or QRA towards PRA (probabilistic risk
dynamic information of risk level, mainly and even before this, allows assessment) integrated approaches that combines the insights provided
describing with great level of detail the failure mechanism and failure by the deterministic approach and those from the probabilistic ap-
mode evolution [12]. In fact, the inclusion of the element ‘‘failure proach with any other requirements in making decisions, generating
mode’’, and its evolution, is a fundamental aspect in the development of and adequate framework within which expert opinions can be com-
advanced proposals for risk management, especially in relation to main- bined with statistical data to provide quantitative measures of risk [38,
tenance and reliability [41–43]. This way CBM oriented to failure mode 51] . And the last step in this evolution is to develop DRA which is
control is able to be presented as powerful tool in order to deal with the most suitable approach to take advantage of new digitalization
one of main risk management evolution challenges, elicited by SRA: context and to provide RA tools for digitalization context requirements.
‘‘How can we describe and represent the results of risk assessment in a Zeng and Zio [38] in their work introduces a method to mix PRA with
way useful to decision makers, which clearly presents the assumptions condition monitoring results. Villa et al. [26] analyzed the need to
made and their justification with respect to the knowledge which the way forward in risk assessment, from QRA to dynamic risk assessment
assessment is based upon?’’ [22]. approaches. These authors present a specific approach called the ‘‘risk

5
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

barometer’’, which aims to provide operators with a risk-based decision decision-making, that is the basic element for preventive maintenance
support tool. In this tool, it combines three sources of information in scheduling.
the representation of risk: real-time information on safety barriers, the The method has 3 fundamental contributions: (i) it is limited to
results of risk analyses and expert opinions. Risk barometer experience the handling of 3 basic concepts that are easy to understand; (ii)
(Hauge et al. 2015) [50] main points: (i) it is required a previous QRA; for its implementation, it is proposed a process that is based on and
(ii) the use of RIF (Risk Influencing Factors) and their risk indicators coherent with the RM framework of ISO 31000; (iii) finally, it pro-
that support risk level assessment, where indicators are used to deter- vides a model/tool for the representation of the results that facilitates
minate RIF states; (iii) risk indicators are obtained from monitoring; the management of the information generated for the composition of
(iv) it is presented as real-time risk assessment that provides a basis complex maintenance planning solutions.
for dynamic adjustments of inspection and maintenance ‘‘continuously
monitor risk picture changes and support decision makers in daily 3.1. Key concepts: failure modes, events and states
operations’’. Chemweno et al. [45] analyze the importance of adopting
DRA approaches for advanced maintenance planning decision making. This method is based on the employ of only three basic concepts:
These authors propose a method to introduce the dynamic approach failure modes, events, and states. This fundamental simplicity is key
in the failure mode and effects analysis (FMEA). In this sense it is in order to promote its adoption by the industry. Dynamic control
important to highlight that FMEA/FMECA analysis is main part of RCM of system health will increase the complexity of the maintenance
analysis [52,53], and that is presented as one of most powerful for risk management (more information, more events, more decisions). If the
assessment by ISO 31010 [30,54]. method adds complexity, over the strict needed, it could lead to not
All these references treat DRA from the view of dynamic modifica- desirable scenarios with new unknown risk and high operational costs.
tion once risks have been identified; i.e. Risk Analysis (RAn) phase of The relevant aspects for the use of the method of these three concepts
RA schema of ISO 31000. But, dynamic risk changes are able to emerge are summarized below:
Failure Mode Failure modes involved that can be fully or partially
from one or the combination of three possibilities:
managed by CBM. Monitoring solutions and maintenance tasks are
1. New risks identification. Changes in the system or its operation applied at failure mode level. The failure mode is proposed as the
context (real or potential events, activities and circumstances) central element of the method. The handling of failure modes and
appear promoting a risk identification actualization. This drives their linkage to risk is fundamental in the most powerful techniques
to modifications in the risk lists and promotes a RA revision. A for maintenance optimization such as RCM and FMECA. However,
clear example is the treatment of ‘‘significant changes’’ in the predictive and monitoring solutions do not use this reference or are
EU 402/2013, common safety method for risk evaluation and not precise when relating monitoring and failure mode (they talk
assessment for railway infrastructure [55]. about equipment, component, system, etc.), which ends up generating
2. Modification in the probability of different events, activities confusion and interpretation. In the authors’ experience, the use of
and circumstances included in the risk modeling. Even without failure modes significantly facilitates the design, development and use
new risks identification, the system could accumulate new data of CBM/PHM techniques. problems. Within the failure modes it is
(new failure events modifying probability calculations) or new useful to distinguish two types:
knowledge could emerge (new analysis results verification, or-
• Primary failure mode (PFM)
ganization expertise rise or new skills provided by technological
• Secondary failure mode (SFM): initiated by a PFM
development are able to be included here). This is the approach
most extended in literature when DRA is addressed. This terminology (PFM and SFM) is adopted from ISO 13381 [57].
3. Modification of the consequences and/or the evaluation method Event Recordable, scheduled, or supervening moment, in which
of the magnitude of consequences. Not only new consequences, the risk level of the affected failure modes has to be re-analyzed. The
not considered before, can emerge, but also the organization or method considers two types of events:
business priorities can change with time, and this makes weight
• Monitoring Event, MoEv: Events whose occurrence is
of the criteria that composing risk impact evaluation should be
programmed from the information generated from the condition
re-designed [56]. COVID-19 crisis has introduced a dramatic
monitoring process (including direct monitoring and PdA/PHM
example of how changes in consequences evaluation impact RM.
algorithms) They can be detection events, diagnostics events or
In the pre-Covid world only one death was unacceptable, and
prognostics events. These can be obtained from a direct inter-
now (Spain, February 2021) lowering to 100 deaths per day, is
pretation of a single variable or descriptor (according with ISO
an unreachable goal.
13379 [7,58]) or a combination of different descriptors in a single
This work is focused on point 2, considering that monitoring and interpretation.
inspection results and maintenance tasks can be interpreted into differ- • Preventive Maintenance Events, PMEv: Preventive maintenance
ent states by probability of failure. We assume in our analysis that risk task execution events. Preventive actions are those that are ex-
identification is static (related to critical FMs), but that the degradation ecuted before failure occurs. These events may be scheduled or
process increases the risk level by increasing failure mode probability. unplanned (e.g., due to opportunistic considerations). They can
It is needed to highlight that is the degradation process, its physical be related to any preventive maintenance activity, e.g. inspections
causes and or effects, which in the last term allows monitoring and (in workshops or on board). Preventive actions are assumed to
CBM. Other hand, maintenance tasks affect the degradation evolution, allow control of failure risk levels, either by reducing the level of
including the chance of recovering the health of the components (from risk or by ensuring that the equipment remains in a condition so
0–100). that the level remains acceptable.

Both MoEV and PMEv are considered to trigger a change in the risk
3. DRA-based method for CBM interpretation and maintenance level in one or more of the failure modes. So, events trigger a review of
decision making the risk analysis to assess the new risk levels of the affected FMs. Note
that the same event can affect different failure modes, and differently
The objective of this method is provide support to design and apply for each of them in terms of risk level.
the entire process: from data to risk levels (translation obtained from State Qualitative level of risk at a given time. Each event causes
CBM and predictive analytics tools), and from risk level to maintenance a possible change in the level of risk. The method proposes using

6
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

only four states or risk levels (low, medium, high and failure). This
in accordance with the three conventional risk levels (classical traffic
light reference) adding a fourth level corresponding to fault state (after
the failure of the component).

• Fault: State after the failure has occurred. State in immediate


replacement or repair of the item is required.
• High Risk: State of operation closest to failure. Short-term activi-
ties are scheduled to reduce the level of risk.
• Medium Risk: State in which an anomaly has been detected
but it is possible to continue operating under normal condi-
tions. Medium-term activities are planned to confirm the risk and
analyze how it evolves.
• Low Risk: Normal operating state of the item

Higher number of risk levels could be considered in case more accurate


states definition and the very degradation process will be needed,
but solution use complexity could grow exponentially considering real
application context within complex engineering assets. Therefore, it
would be necessary to justify that it is necessary to add more risk levels
in terms of whether this improves or not, significantly, the effectiveness
or/and efficiency of maintenance.

3.2. Process to support the method

To describe the method in a more practical way, a process has


been developed. It helps to design, follow and understand the different
transformations and interpretations of the information that are neces-
sary throughout the process itself: from data to risk levels (translation
obtained from CBM and predictive analysis tools), and from risk level to
maintenance decision making, which is the basic element for preventive
maintenance scheduling. This DRA method pretends not to be disrup-
tive with classical RA methods which it will have to coexist with, but
inclusive. In fact, the process follows or can be included within the RA
general framework. According to this, steps or elements of IS0 31000
framework (view Fig. 2) are now reinterpreted in the process.
Fig. 4 shows the process developed supporting this method, which
includes three main steps:

A. Risk Identification, RId. There is no difference in this point with


ISO 31000, and actually this is not dynamic
B. Risk Analysis and Risk Evaluation, RAn & Rev. This step the
process threat together both steps of ISO. The dynamic approach
means that both the change in the risk level (RA) and the
risk acceptance criteria (REv) have to be reviewed in response Fig. 4. General process.
to new events, whether they are monitoring events (MoEv) or
maintenance events (PMEv).
C. Finally, the DRA method causes the Risk Treatment (RTr) to be
optimization (RCM, FMEA, FMECA) [53,60,61], and in the CBM and
done dynamically as well. In our proposal, the risk level mod-
condition monitoring international standards (ISO 17359, ISO 13374,
ification triggers the scheduling of very concrete maintenance
ADS 79D-HDBK, CBM/MIMOSA) [62]. In fact, criticality analysis is a
actions. These maintenance actions or tasks are also risk related risk-based method [63]. So, in fact, a previous QRA of the engineering
events, occurring at a given time instant, leading to risk level system we are maintaining, should be done before CBM solution design,
reduction (lowering the risk level) or risk level containment or as the very first step in this process. In other words, where CBM
(avoiding a rise in the risk level). should be applied is a risk-based decision. Condition monitoring is
closely related to FMs evolution, since FM are, in fact, evident physical
A. Risk Identification, RId
causes of failure [53,60], and it is the physics of failure what produces
Talking about RId, maintenance function is linked with a very symptoms (relates with effects and/or causes of failure) that can be
specific risk type: the failure risk. But a more detailed identification measured [12,58]. In addition, PHM/PdA techniques allow diagnosing
is possible. FMEA/FMECA techniques tell us that: (i) different FMs, and forecasting asset failure modes, a fundamental tool for the opti-
even being related to the same functional failure, are able to generate mization of maintenance planning in the DT era. In this sense, what
different effects (different impact in system objectives); and (ii) even we are proposing is not to focus on signal or variable monitoring but
being related to the same functional failure, different FMs have differ- to the direct failure mode monitoring, a more practical approach from
ent likelihood of occurrence [53,59]. This way, it is possible to locate a maintenance management view. It is possible dynamically assigning
RId at FM level, and to refer to RA and risk metrics by failure mode. a risk level to each monitoring failure mode of an asset, and, after
Other hand, CBM should only be applied over critical failure modes of that, associating maintenance actions to these risk levels interpretation,
most critical assets. This is a common principle of the CBM application, according with other considerations regarding dynamic evolution of
that is reflected both: in the advances techniques for maintenance plan operation and maintenance context.

7
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

B. Risk Analysis and Risk Evaluation, RAn and REv


The second block of the risk management process deals with the
RAn and the REv. When continuously receiving new information from
monitoring or maintenance activities, a risk analysis must be carried
out each time the information is updated. This characteristic makes it
a dynamic sub-process that allows defining the global risk management
process as a DRA. In Fig. 5 we define the steps to carry out the
RAn, which will vary depending on whether the new information
comes from monitoring events or from a maintenance activity. These
two data inputs must be differentiated since monitoring is continuous
and usually requires predictive analytics and preventive maintenance
activities are discrete and do not require these techniques. The RAn
sub-process requires in turn three sub-processes to transform the events
that have occurred into new levels of risk associated with FMs. These
three sub-processes are defined in Fig. 6.

• Predictive Analytics (PdA): The use of predictive analytics is not


always required at the entry of the RAn. In the simplest cases
in which the events that trigger the RAn are preventive mainte-
nance events, this sub-process is not necessary, since the event
is well defined and can be redirected to the second sub-process.
However, in cases where the event that triggers the RAn is a
monitoring event, predictive analytics is required to transform the
information that arrives from the IoT sensor network into useful
information that can be used in the subsequent sub-process of
event interpretation rules. This sub-process is composed of several
steps. Logically, depending on the quality of the information
received from the sensors, we will have to perform only some of
these steps or all of them in the worst case. First, the cleaning
and transformation of the input data from the monitoring is
required to transform raw data into useful information. Second,
we carry out a quality analysis to ensure the homogeneity and
consistency of the data from its sources to the final exploitation
repository. Third, we develop the predictive model from trained
and validated algorithms that are put into operation. Finally, the
model results are analyzed to produce a predictive descriptor that
will be used as input to the event interpretation rules sub-process.
• Rules of Interpretation of Events: Predictive descriptors or mainte-
nance activities performed are received in the event interpretation
rules sub-process. In the case of predictive descriptors, they are
compared with thresholds that have been previously defined in
the event design. In this design stage, the limits of the descriptors
Fig. 5. Risk analysis.
and the maintenance activities that will involve a change in the
risk level of the item are established. Exceed these thresholds
will trigger an event that will be analyzed later in the third sub-
process of assigning risk levels to events. It is very important to of risk for the item. For this, it is necessary to previously define
differentiate the design stage of events and thresholds from the the different risk levels and analyze the event from the previous
exploitation phase of the model. In the design stage, we analyze step together with other possible considerations that take place at
in depth the evolution of the failure modes and the concrete the time of study. Table 7 in the case study shows an example of
moments of this evolution that imply a change in the level of risk. the rules for assigning new risk levels to the events for the case
At that time, we define certain events and thresholds that identify of bearings in high-speed trains. Once an event has occurred and
this change in the level of risk in order to compare with the the corresponding new risk level has been assigned, we proceed to
predictive descriptors or preventive maintenance events received the last point of the methodology, risk treatment, the main input
during the exploitation phase of the model. The Table 6 in the is the new risk level associated with the failure modes analyzed.
case study shows an example of the event interpretation rules for
the use case in particular. Notice that if an event is not triggered, C. Risk Treatment The objective of the last step of the methodology
the model will be in an iterative loop of monitoring data analysis is to determine the preventive maintenance actions to be carried out
and maintenance activities until the event is detected and the and in general the maintenance schedule over time. Once an event
process advances to the next step has occurred and the new risk level has been assigned, we try to
• Rules for assigning risk levels to events: Up to this point, the mitigate or reduce the risk through various actions. These actions are
process was in the RAn phase, however, once the risk levels the result of the decision-making sub-process associated with each risk
have been quantified in the design phase, the REv is carried out level shown in Fig. 7. In this sub-process the risk level is received
together with the RAn. The REv is based in this case on the and the maintenance action corresponding to said risk level is ordered,
assignment of colors to the risk levels, which will identify the taking also into account previous risk levels as explained above. All
admissibility or not of the risk. The objective of this sub-process is the possible actions that can be carried out are set at a design stage in
to evaluate the new risk level of the item. The input of this sub- which, depending on the event that has occurred and the level of risk, a
process is an event that will be analyzed to define a new level specific action is associated to try to avoid a certain risk. Table 7 in the

8
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

Fig. 6. Subprocesses of Predictive Analytics, Rules for interpreting events and Rules for assigning level risks.

control. So, monitoring events (Event 1 and 2) can be related to one


of three situations: FM1 is activated, FM2 is activated or FM1+FM2
are activated. Event 1 (first threshold over monitoring variable or PdA
algorithm) is programmed to early detect potential initial problems,
but maintainer interpretation tells us that risk level, actually, does not
change, or more accurately, does not change enough to be needed to
raise the level of risk of FM1 or FM2. In terms of maintenance planning,
no new maintenance tasks or no changes of maintenance plan have to
be planned from Event 1. Even so, Event 1 provides the maintainer
useful information in order to pay attention to FM1 and FM2 from
this moment, and also for the maintenance analyst to take conclusions
about the expended time by the system to get Event 2 from Event 1.
When Event 2, the interpretation of the maintainer makes FM1 and
FM2 get Medium Risk level. Since it is not possible by this monitoring
solution to distinguish FM1 from FM2, the maintainer introduces in the
PM program an inspection, which is Event 3. So Event 3 is a PM Event,
concretely, an inspection task. Event 3 confirms FM1, so FM2 risk level
turns back to green. At the same Event 3, the maintainer, from the
inspection information report, decides to change FM1 risk level into
red and to program the substitution of the component affected (Event
4). Here we have tried to illustrate the concepts and its interpretation
Fig. 7. Subprocess for risk treatment, RTr. and use. How to quantify the risk levels by interpreting the data and
information available is an engineering problem where it can be em-
ployed with the use of PdA, if it is possible to build more sophisticated
case study shows an example of the actions associated with each of the solutions in case it is needed. But any case this method can be used
risk levels that may appear in the bearings of high-speed trains. If the to represent also complex PdA results into a common interface directly
corresponding action is to replace the item, the process is ended and the linked with maintenance decision making main management objects:
new item that is installed will be managed from beginning. Otherwise, failure modes and components.
if it is any other action, we will have to measure the effectiveness of
the action by performing the dynamic risk analysis stage again. 4. Use case. DRA for high-speed train fleet maintenance decision
making
3.3. Graphic results representation tool
4.1. Use case presentation
The method is completed with a practical tool for the representation
of the results. This representation shows in a simple way the relation- This analysis focuses on CBM strategy application to train axle
ship between events and states and their interpretation through risk bearings of a passengers’ high speed trains fleet. All the data used
levels. It is very useful in the design of interfaces for the integration in have been provided by the railway company Patentes Talgo. Each train
the dashboard of maintenance management applications. axle, in the train model of this case study, is equipped with four axle
In the example of Fig. 8, it is represented a monitoring solution that bearings, two inner and two outers. The temperature of each bearing
provides information that can be linked with FM1 and FM2 risk level is controlled by the train control monitoring systems (TCMS). The

9
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

Fig. 8. Graphic results representation tool.

TCMS will perform data capture of bearing temperatures with sampling


frequency every minute, with the associated variables of date, time,
outdoor temperature and train speed. In parallel, the train electronic
control unit (ECU) will receive the signals on-line and process them
according to certain control/safety rules (that will be described later),
to send notices to the train supervision post, to control speed limit or to
open the loop for a train immediate stop. Bearing’s temperature is the
only on-board signal available that can be related to bearing behavior.
In this case study there is no possibility for vibration monitoring which
is a more suitable and accurate solution for predictive maintenance
in train axle bearings. The fleet of trains under consideration has a
total of 16 trains, same model of OEM train, but grouped in 5 different
train configurations (i.e., with more or less cars of a given car type)
and two bearings models from different OEMs (BOEM1 and BOEM2),
which are used indistinctly at any bearing functional location of any
train. Also, axle bearings normally have several major maintenance
activities at the workshop, restoring their functionality, and they reach
a life span around 4 to 5 million km, typically according to wheel
replacement. After each one of these interventions, they can be placed
at a different functional location of the train, in a different train of
the fleet. Current maintenance strategy is depicted in Fig. 9. It is a
basic preventive maintenance strategy based on accumulated km of the
bearings (UBM strategy). Maintenance decision making is simply based
on km limitation set by the bearing manufacturer. When the kilometers
get closer to those indicated by the manufacturer, the maintenance task
(bearing replacement) is planned within the next programmed visit
Fig. 9. Current strategy.
to the workshop of the train. However, if evident symptoms such as
noise or very high temperatures appear before set kms, an inspection
by experts will be programmed. The decision about disassembling
that the symptom has been a false alarm and the expense incurred was
the bearing will be made according to experts’ judgment. Removed
bearings are sent to the central workshop for analysis. A significant not necessary for the company since there was no degradation in the
percentage of them actually do not present degradation. Inspection, bearing. An important point was to realize that the company did not
removal and analysis have high cost and the decision-making is based carry out any risk-based maintenance management.
solely on the appearance of symptoms that may or may not be indi-
cators of the actual degradation of the bearing. Once the bearing is 4.2. Application of DRA method to the use case
sent to the workshop, a series of tests are carried out to verify if the
degradation is real or not. In the event of degradation, if the bearing DRA method allows introducing monitoring and PdA for main-
has not reached the end of its useful life, it is overhauled so that it tenance planning. Fig. 8 in the previous point shows the process
can be used again. In the case that the degradation does not exist, developed for the risk management of bearings. As detailed in the
the bearing remains at the disposal of the company to be incorporated previous point, the process on which the methodology is based is com-
into another train when required. In this case, the company considers posed of three fundamental steps. Following the CBM design structure

10
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

Table 3 Table 4
CBM solution description: Block 1 and Block 2 summarize results. CBM solution description: Block 3. Monitoring Variables.
Equipment unit Maintainable item Failure mode Information source Monitoring Monitoring variable description
variable Cod
1 Rodal (wheelsets) 1.1 Wheels toe-in 1.1.1 Wheels toe-in - Misalignment
1.2 Lef-Int Bearing 1.2.1 Bearing - Rolling contact wearing On-board monitoring system T0 Temperature (◦ C) Lef-Int Bearing
1.3 Lef-Ext Bearing Bearing - Rolling contact wearing T1 Temperature (◦ C) Lef-Ext Bearing
1.4 Rig-Int Bearing Bearing - Rolling contact wearing T2 Temperature (◦ C) Rig-Int Bearing
1.5 Rig-Ext Bearing Bearing - Rolling contact wearing T3 Temperature (◦ C) LRig-Ext Bearing
TExt Exterior Temperature (◦ C)
S Train Speed (Km/h)
RK Run distance (Km)
proposed by Guillen et al. [12], who identify five blocks and propose
accurate elements that take part in the CBM process Table 5
A. Risk identification: Bearings are critical components of the train. CBM solution description: Block 4. symptom descriptors analysis.
Failure effects can affect the passenger’s safety and also can produce
important economic penalties. In this case we study the risk of bearing
failure and the risk of failure of the guide system, mainly due to two
failure modes. On the one hand, due to the wear of the bearings
in the running area or on the other hand to a misalignment in the
guidance that increases the risk of the train derailing. Both failure
modes share the symptom of an abnormal rise in temperature and in
some cases Guiding System-Unsetting can be PFM of Bearing–Rolling
contact wearing.

• Block 1: Physical description. Accurate description of TE (Techni-


cal Structure) of systems is key for intelligent treatment. Bearings
are included in wheelsets (Equipment unit level at TE) who
belong to wagon rolling systems (System level at TE).
• Block 2: Functional description. Output of this block is failure
mode elicitation. Train axle bearings are key elements in the
integrity of railway wheelsets; ensuring their good condition is
fundamental for train safety and this can be achieved using
condition monitoring. The most common failure in axle bear-
ings is due to rolling contact fatigue (RCF) of the outer race.
Subsurface networks of cracks develop in the rolling direction
by shear stresses and then deviate towards the surface causing
macro pits that represent a measurable surface damage. Two FMs the paper by Crespo et al. [64] and are applied to real use cases. In this
types are considered within the CBM solution: FM1 ‘‘Wheels toe- manuscript we detail in more depth the process of analyzing the results
in - Misalignment- Unsetting’’ and FM2 ‘‘Bearing - Rolling contact of these algorithms that is used to obtain the descriptors that we will
wearing’’. Overheating could be symptom of both and FM2 could use in the next step of the model. Below we detail for the case study the
be a secondary failure mode respect FM1 (following ISO 13381 analysis carried out after obtaining the results of the predictive model.
secondary definition, FM2 could be a FM1 effect). Block 1 an 2 First, we will study the information that comes from the bearings and
results are summarized in Table 3. how this information is analyzed to turn it into useful information
for the process. Second, following the methodology proposed in the
B. Risk Analysis and Risk Evaluation Condition Monitoring and PdA
previous section, we will transform this information into risk levels
• Block 3: Information sources. Monitoring variables from on-board for certain bearing failure modes. Finally, we will establish a decision
monitoring systems is employed as main information sources. making for each previously defined risk level. The work of collecting all
Real time monitoring is available. In addition, data from wheelset the available information, filtering it and analyzing it to incorporate it
inspections (preventive maintenance actions) and quality analysis into a CBM requires useful variables that can be monitored and experts
results of removed bearings are also used. Both types of data who know how to process this information to be able to use it correctly
are located in different information systems of the organization as input for the CBM or for artificial intelligence algorithms. Without
(Table 4). the necessary data or the correct treatment of this data, it is impossible
• Block 4: Symptom Analysis. Overheating is the only symptom to improve maintenance management. As we discussed in the first
but different descriptors emerge which are able to connect with part of this section, the best variable to measure bearing degradation
monitoring results and maintenance decision making (Table 5). was vibration. However, placing accelerometers and monitoring all
this information was a great investment that was not justified one
In the case study, the sub-process developed in the dynamic risk hundred percent. Since a giant investment has been made to monitor
analysis methodology (Fig. 6) is applied to the failure modes defined bearing temperatures for safety reasons, it is decided to analyze this
in the previous point. At this point we have to highlight the great work information and study bearing degradation from those temperatures.
that has been carried out with predictive analytics. Starting from the In this way we can take advantage of data that was not being used
millions of raw data from monitoring, hard work has been done to for maintenance. Each bearing has 4 temperature sensors that collect
clean up all those unnecessary or faulty data that do not add quality and send the information every minute, therefore in each trip a brutal
to the analysis. In addition, variables that may be useful are calculated amount of data is generated that must be learned to filter and manage.
to be used as input for the predictive models. Finally, we have devel- In the work developed by Crespo et al. [64] in which a predictive
oped, trained and validated three predictive algorithms for detection, analysis is applied to the train bearings, it is explained in detail how all
prognosis and diagnosis or classification that have been applied in the the information that comes from the bearings is filtered and how it is
industry with positive results. Some of these algorithms are detailed in transformed to use it as input data of an artificial neural network (ANN)

11
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

Table 6 Table 7
List of events and their interpretation rules. Event risk-level assignation and advised actions.

that identifies the deterioration of bearings through temperatures. In


addition, they include rules of interpretation that allow early detection
of bearing failure. To develop the decision-making process based on
risk levels, we will use the output of the artificial neural networks
generated by Crespo et al. This information contains all the moments
called ‘‘positives’’ in which the temperature of a bearing has been that occur during the operation of the asset are recorded, as these will
10 ◦ C or more above the expected one and how long it has been have an impact on future decision-making. For example, if 30k km
in these circumstances, in addition to the kilometer in which it was of operation are reached since D3 was produced and there has been
produced, the date, the outside temperature in that moment, etc. When no inspection, the risk level will rise and an on-board or guidance
analyzing the information of a bearing throughout its life cycle, the system inspection will be launched. However, if an on-board inspection
actual temperature of the bearing is greater than 10 ◦ C above what has been carried out before these 30k Km and it has been concluded
would be expected or is estimated by the artificial neural network that there is no degradation, we will restart the count of variable D4,
at around 20,000 or 30,000 times. If we analyze the information of thus delaying the appearance of Event 2. All these decisions will have
20 or 30 bearings, we are already talking about working with orders to be taken in the process of design and analysis of events, a very
of magnitude of a million data. At this point it is very important to important preliminary work for the model to be efficient and effective.
simplify these data in order to make good use of the information. After Once an event has occurred and the corresponding new risk level has
a long study on how to simplify the information without losing value, it been assigned, we proceed to the last point of the methodology, risk
was decided to group the positives by cycles in which two consecutive treatment, the main input of which is the new risk level associated with
positives did not exceed 5 km traveled. It is shown that the distribution the failure modes analyzed.
of the cycles with respect to the km traveled is very similar to that C. Risk Treatment
of the positive ones and therefore it is equally valid to represent the It is the last step of the proposed methodology. Once an event
degradation of the bearings. By working with cycles instead of posi- has occurred and the new risk level has been assigned, this risk is
tives, we have about 200 cycles per bearing, which reduces the input treated to mitigate or reduce it through various actions. These actions
information by two orders of magnitude while maintaining the quality are the result of the decision-making sub-process associated with each
of the information. The study to simplify the information involved risk level shown in Fig Z. In this sub-process the risk level is received
numerous analyses with different distances between positives (5, 10, and the maintenance action corresponding to said risk level is ordered,
25, 50 km) and with different time intervals between positives, and in taking also into account previous risk levels as explained above. All
all cases, it was shown that the grouping of positives by cycles followed the possible actions that can be carried out are set at a design stage in
the same growth pattern as the positives themselves. An important note which, depending on the event that has occurred and the level of risk, a
when dealing with high-speed trains is that the temperature can reach specific action is associated to try to avoid a certain risk. Table 7 shows
high peaks at a certain moment due to high speed or atypical braking, the actions associated with each of the risk levels that may appear after
however this positive is not indicating a degradation of the bearing the events that have been defined. If the corresponding action is to
but a sudden change in driving conditions, operation or just sudden replace the item, the process is ended and the new item that is installed
sensor failure. When these positives are prolonged over time, we can will be managed from scratch. Otherwise, if it is any other action, we
no longer speak of these cases and therefore they can already show will have to measure its effectiveness again by performing the dynamic
a real degradation of the bearing. This fact reinforces the decision to risk analysis stage again.
work with cycles rather than positives. Once we have obtained useful
information, we use it as input information for the event interpretation • Block 5: Decision Making. Descriptor interpretation rules allow
rules sub-process, in this case from monitoring. In contrast, when the programming monitoring results (detection, diagnosis and/or
information comes from a maintenance activity, predictive analytics prognosis). From each result it is able to programming mainte-
is not required. In this case, depending on the maintenance activity nance decisions, preventive maintenance activities to program-
that has been carried out, a new level of risk is associated in the ming
event interpretation rules sub-process. Events are received in the event
interpretation rules sub-process. Table 6 shows the event interpretation 5. Conclusions
rules for the case study.
After knowing the conditions to trigger the events, we are in a Maintenance planning task is being re-thought owing to digitaliza-
position to analyze the change in the risk level of the failure modes tion. The new scenario is a more complex one. Indeed, digitalization
that these events will involve. The following table lists all the possible requires more sophisticated ways of doing maintenance planning, espe-
cases for the two failure modes for each event. cially to suitable managing of more data, information and knowledge,
In view of Table 7 there may be a multitude of combinations in a high dynamic data input, and that have to be integrated in a rising
between the risk levels in which the failure modes are found and the number of decision making processes. Given this complexity, the need
events that may occur. Therefore, it is very important that all events arises to seek new methodologies to manage the maintenance of assets.

12
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

In this work, a risk-based management methodology is proposed, an References


area that has been enhanced by digital transformation and that allows
dynamically assigning a risk level to each failure mode of an asset. [1] Zio E. The future of risk assessment. Reliab Eng Syst Saf 2018;177(April):176–90.
http://dx.doi.org/10.1016/j.ress.2018.04.020.
These risk levels are associated with specific maintenance actions that
[2] ISO. ISO 55000-1:2014 Asset management ? Overview, principles and
are ordered automatically. terminology, Vol. 7. 2014.
This paper proposes a practical solution to carry out dynamic CBM [3] Sola Rosique A, Crespo Marquez A. AENOR. Principios y Marcos de referencia
planning based on risk levels. A structure is proposed to manage the de la gestion de activos. AENOR; 2016.
monitoring results and decision making with the same event definition [4] Márquez AC, Macchi M, Parlikad AK. Value based and intelligent asset man-
agement. Seville, Spain: Springer; 2019, http://dx.doi.org/10.1007/978-3-030-
scheme. For this purpose, we use the risk level control approach as the
20704-5.
main reference as follows: [5] Crespo Marquez A, Gomez Fernandez JF, Martínez-Galán Fernández P,
Guillen Lopez A. Maintenance management through intelligent asset management
• Build the planning proposal as a DRA tool platforms (IAMP). Emerging factors, key impact areas and data models. Energies
• Design the DRA approach in a way consistent with ISO 31000, 2020;13(15):3762.
that is, risk management based on three steps, Risk Identification, [6] Paltrinieri N, Khan F. Dynamic risk analysis in the chemical and petroleum
Risk Analysis and Risk Treatment. industry: Evolution and interaction with parallel disciplines in the perspective
of industrial application. Butterworth-Heinemann; 2016.
• Connect the results of monitoring and prognosis, on the one [7] Guillén AJ, Crespo A, Gómez JF, Sanz MD. A framework for effective man-
hand, and on the other the decision-making events with the agement of condition based maintenance programs in the context of industrial
re-analysis of the risk level. The generated methodology has a development of E-maintenance strategies. Comput Ind 2016. http://dx.doi.org/
significant level of complexity. As a consequence, an environment 10.1016/j.compind.2016.07.003.
[8] Fumagalli L, Cattaneo L, Roda I, Macchi M, Rondi M. Data-driven CBM tool
for programming events and states appears that is not easy to
for risk-informed decision-making in an electric arc furnace. Int J Adv Manuf
handle, especially taking into account scenarios where a whole Technol 2019;105(1):595–608.
network of ‘‘decisions + descriptors + equipment + systems + [9] Lee J, Bagheri B, Kao H-A. A cyber-physical systems architecture for industry
time evolution’’ is generated. Even so, the results obtained and 4.0-based manufacturing systems. Manuf Lett 2015;3:18–23.
presented here have been validated by technicians of different [10] BSI. BS EN 13306:2010:Maintenance-maintenance terminology. 2010.
[11] Vachtsevanos G, Lewis F, Roemer M, Hess A, Wu B. Intelligent fault diag-
profiles, from academia and industry, generating a consensus on
nosis and prognosis for engineering systems. Intelligent fault diagnosis and
their generality and usability. This model is especially suitable prognosis for engineering systems. 2007, p. 1–434. http://dx.doi.org/10.1002/
for companies with high digitalization where the main benefits 9780470117842.
provided could be: [12] Guillén AJ, Crespo A, Macchi M, Gómez J. On the role of Prognostics and
Health Management in advanced maintenance systems. Prod Plan Control 2016.
• simplification of the interpretation, being able to control the
http://dx.doi.org/10.1080/09537287.2016.1171920.
status of the failure modes attending only to a single indicator [13] Zeng Z, Zio E. Dynamic risk assessment based on statistical failure data and
linked to the level of risk at all times the capacity to record events, condition-monitoring degradation data. IEEE Trans Reliab 2018;67(2):609–22.
• the ease of scheduling maintenance actions and the large increase [14] Guo Y, Zhong M, Gao C, Wang H, Liang X, Yi H. A discrete-time Bayesian
in ‘‘perceived’’ control over managed risk, network approach for reliability analysis of dynamic systems with common cause
failures. Reliab Eng Syst Saf 2021;216(September):108028. http://dx.doi.org/10.
• For the operator, the information transformation process to com-
1016/j.ress.2021.108028.
pose the level of risk is transparent. [15] Rocchetta R, Crespo LG. A scenario optimization approach to reliability-based
and risk-based design: Soft-constrained modulation of failure probability bounds.
Finally, the future lines of research derived from the manuscript could Reliab Eng Syst Saf 2021;216(July):107900. http://dx.doi.org/10.1016/j.ress.
be, in the short term, the extrapolation of the process in risk control 2021.107900.
at higher levels of intervention (equipment, system, installation, fleet, [16] Yuan R, Tang M, Wang H, Li H. A reliability analysis method of ac-
etc.) and the use of this methodology in the design. and interpretation celerated performance degradation based on Bayesian strategy. IEEE Access
2019;7:169047–54. http://dx.doi.org/10.1109/ACCESS.2019.2952337.
of AHI models (risk level envelope, risk level by AHI, decisions involved
[17] Li H, Yuan R, Fu J. A reliability modeling for multi-component systems consider-
with AHI). In the longer term they would be the integration of greater ing random shocks and multi-state degradation. IEEE Access 2019;7:168805–14.
intelligence (which generates more complex data that can generate in- http://dx.doi.org/10.1109/ACCESS.2019.2953483.
terpretation problems not considered until now), the integration of this [18] Li H, Li R, Li H, Yuan R. Reliability modeling of multiple performance based on
event programming language with IoT technologies and the validation degradation values distribution. Adv Mech Eng 2016;8(10):1–10. http://dx.doi.
org/10.1177/1687814016673755.
in real environments with a greater number of monitored failure modes. [19] Yuan R, Li H. A multidisciplinary coupling relationship coordination algorithm
using the hierarchical control methods of complex systems and its application
CRediT authorship contribution statement in multidisciplinary design optimization. Adv Mech Eng 2016;9(1):1–11. http:
//dx.doi.org/10.1177/1687814016685222.
[20] Yuan R, Li H, Gong Z, Tang M, Li W. An enhanced Monte Carlo simulation-based
Pablo Martínez-Galán Fernández: Conception and design of study,
design and optimization method and its application in the speed reducer design.
Writing – original draft, Writing – review & editing. Antonio J. Guil- Adv Mech Eng 2017;9(9):1–7. http://dx.doi.org/10.1177/1687814017728648.
lén López: Conception and design of study, Writing – original draft, [21] ISO 31000. International organization for standardization ISO 31000: Risk
Writing – review & editing. Adolfo Crespo Márquez: Acquisition of management - Principles and guidelines 2009. 2009, p. 36.
data, Analysis and/or interpretation of data. Juan Fco. Gomez Fer- [22] Society for Risk Analysis (SRA). Society for risk analysis fundamental principles.
2018, p. 5, (August). URL http://www.sra.org/resources.
nández: Analysis and/or interpretation of data. Jose Antonio Marcos:
[23] Aven T, Zio E. Foundational issues in risk assessment and risk management. Risk
Acquisition of data, Analysis and/or interpretation of data. Anal 2014;34(7):1164–72.
[24] Amalberti R. Piloter la sécurité: Théories et pratiques sur les compromis et les
Acknowledgments arbitrages nécessaires. Springer Paris; 2013.
[25] Aven T. Risk assessment and risk management: Review of recent advances on
their foundation. European J Oper Res 2016;253(1):1–13. http://dx.doi.org/10.
This paper has been written within the framework of the projects
1016/j.ejor.2015.12.023.
‘‘Methodology for industrial application of intelligent maintenance so- [26] Villa V, Paltrinieri N, Khan F, Cozzani V. Towards dynamic risk analysis: A
lutions. Integration of Predictive Analytics and Machine Learning tech- review of the risk assessment approach and its limitations in the chemical process
niques in IoT platforms’’ (Grant CEI-19-TEP134) and ‘‘INMA, Asset industry. Saf Sci 2016;89:77–93. http://dx.doi.org/10.1016/j.ssci.2016.06.002.
Digitalization for INtelligent MAintenace’’ (Grant PY20 RE 014 AICIA), [27] Zio E, Pedroni N. Risk-informed decision-making processes. Tech. rep., FonCSI;
2012.
and with the collaboration of Patentes Talgo and Ingeman, sharing data [28] Shortridge J, Aven T, Guikema S. Risk assessment under deep uncertainty:
of an project to enhance Patentes Talgo CBM strategy and platform. All A methodological comparison. Reliab Eng Syst Saf 2017. http://dx.doi.org/10.
authors approved the version of the manuscript to be published. 1016/j.ress.2016.10.017.

13
P. Martínez-Galán Fernández et al. Reliability Engineering and System Safety 223 (2022) 108359

[29] Nilsen T, Aven T. Models and model uncertainty in the context of risk analysis. [48] Xing J, Zeng Z, Zio E. A framework for dynamic risk assessment with condition
Reliab Eng Syst Saf 2003. http://dx.doi.org/10.1016/S0951-8320(02)00239-9. monitoring data and inspection data. Reliab Eng Syst Saf 2019;191:106552.
[30] ISO. ISO 31010 Risk management — Risk assessment techniques. 2009. [49] NORSOK. NORSOK Z-013 risk and emergency preparedness analysis. NORSOKI;
[31] Negri E, Fumagalli L, Macchi M. A review of the roles of digital twin in 2010.
CPS-based production systems. Procedia Manuf 2017;11:939–48. [50] Haugen S, Vinnem JE. Perspectives on risk and the unforeseen. Reliab Eng Syst
[32] Yang X, Haugen S, Paltrinieri N. Clarifying the concept of operational risk Saf 2015. http://dx.doi.org/10.1016/j.ress.2014.12.009.
assessment in the oil and gas industry. Saf Sci 2018;108:259–68. [51] Parhizkar T, Utne IB, Vinnem JE, Mosleh A. Supervised dynamic probabilis-
[33] Zadakbar O, Imtiaz S, Khan F. Dynamic risk assessment and fault detection using tic risk assessment of complex systems, part 2: Application to risk-informed
a multivariate technique. Process Saf Prog 2013;32(4):365–75. decision making, practice and results. Reliab Eng Syst Saf 2021;208(October
[34] Zadakbar O, Imtiaz S, Khan F. Dynamic risk assessment and fault detection using 2020):107392. http://dx.doi.org/10.1016/j.ress.2020.107392.
principal component analysis. Ind Eng Chem Res 2013;52(2):809–16. [52] Izquierdo J, Márquez AC, Uribetxebarria J. Dynamic artificial neural network-
[35] Huang W, Zhang Y, Yin D, Zuo B, Liu Z. Urban bus accident analysis: based on a based reliability considering operational context of assets. Reliab Eng Syst Saf
tropos goal risk-accident framework considering learning from incidents process. 2019;188:483–93.
Reliab Eng Syst Saf 2021;216:107918. [53] Parra C, Crespo A. Ingeniería de mantenimiento y fiabilidad aplicada a la gestión
[36] Ayoub A, Stankovski A, Kröger W, Sornette D. Precursors and startling lessons: de activos. INGECON; 2012.
Statistical analysis of 1250 events with safety significance from the civil nuclear [54] Erguido A, Márquez AC, Castellano E, Fernández JG. A dynamic opportunistic
sector. Reliab Eng Syst Saf 2021;107820. maintenance model to maximize energy-based availability while reducing the life
[37] Redutskiy Y, Camitz-Leidland CM, Vysochyna A, Anderson KT, Balycheva M. cycle cost of wind farms. Renew Energy 2017;114:843–56.
Safety systems for the oil and gas industrial facilities: Design, maintenance policy [55] Stewart C. The common safety method for risk evaluation and assessment. IET;
choice, and crew scheduling. Reliab Eng Syst Saf 2021;210:107545. 2014.
[38] Zeng Z, Zio E. Dynamic risk assessment based on statistical failure data and [56] Crespo A, Sola A, Moreu P, Gómez J, de la Fuente A, Guillén A, González-Prida V.
condition-monitoring degradation data. IEEE Trans Reliab 2018;67(2):609–22. Criticality analysis for improving maintenance, felling and pruning cycles in
http://dx.doi.org/10.1109/TR.2017.2778804. power lines. IFAC-PapersOnLine 2018;51(11):211–6.
[39] Khan F, Hashemi SJ, Paltrinieri N, Amyotte P, Cozzani V, Reniers G. Dynamic [57] ISO 13381:2015. International organization for standardization ISO 13381-
risk management: a contemporary approach to process safety management. Curr 1:2015 Condition monitoring and diagnostics of machines ? Prognostics ? Part
Opin Chem Eng 2016;14:9–17. 1: General guidelines 2015. 2015, p. 36.
[40] Aven T, Zio E. Foundational issues in risk assessment and risk management. Risk [58] ISO 13379. Condition monitoring and diagnostics of machines — Data interpre-
Anal 2014. http://dx.doi.org/10.1111/risa.12132. tation and diagnostics techniques which use information and data related to the
[41] Liu P, Li Y. An improved failure mode and effect analysis method for multi- condition of machines — General guidelines 2002. 2002, p. 30, (50).
criteria group decision-making in green logistics risk assessment. Reliab Eng Syst [59] International Electrotechnical Commission, et al. IEC 60812: 2018–Failure
Saf 2021;215(May):107826. http://dx.doi.org/10.1016/j.ress.2021.107826. modes and effects analysis (FMEA and FMECA). 3rd ed.. Geneva, Switzerland:
[42] Wang Q, Jia G, Jia Y, Song W. A new approach for risk assessment of failure International Standard. IEC; 2018.
modes considering risk interaction and propagation effects. Reliab Eng Syst Saf [60] Moubray J. Reliability-centered maintenance. Industrial Press Inc.; 2001.
2021;216(September):108044. http://dx.doi.org/10.1016/j.ress.2021.108044. [61] Jardine AK, Lin D, Banjevic D. A review on machinery diagnostics and prog-
[43] Li W, Liu G. Dynamic failure mode analysis approach based on an improved nostics implementing condition-based maintenance. Mech Syst Signal Process
taguchi process capability index. Reliab Eng Syst Saf 2022;218(PB):108152. 2006;20(7):1483–510.
http://dx.doi.org/10.1016/j.ress.2021.108152. [62] Guillén AJ, González-Prida V, Gómez JF, Crespo A. Standards as reference to
[44] Petchrompo S, Li H, Erguido A, Riches C, Parlikad AK. A value-based approach build a PHM-based solution. In: Proceedings of the 10th world congress on
to optimizing long-term maintenance plans for a multi-asset k-out-of-N system. engineering asset management (WCEAM 2015). Springer; 2016, p. 207–14.
Reliab Eng Syst Saf 2020;200:106924. [63] Gómez JF, Fernández PM-G, Guillén AJ, Márquez AC. Risk-based critical-
[45] Chemweno P, Pintelon L, De Meyer AM, Muchiri PN, Van Horenbeek A, ity for network utilities asset management. IEEE Trans Netw Serv Manag
Wakiru J. A dynamic risk assessment methodology for maintenance decision 2019;16(2):755–68.
support. Qual Reliab Eng Int 2017. http://dx.doi.org/10.1002/qre.2040. [64] Crespo Márquez A, de la Fuente Carmona A, Marcos JA, Navarro J. Design-
[46] Wu T, Yang L, Ma X, Zhang Z, Zhao Y. Dynamic maintenance strategy with ing CBM plans, based on predictive analytics and big data tools, for train
iteratively updated group information. Reliab Eng Syst Saf 2020;197:106820. wheel bearings. Comput Ind 2020;122. http://dx.doi.org/10.1016/j.compind.
[47] Engel SJ, Gilmartin BJ, Bongort K, Hess A. Prognostics, the real issues involved 2020.103292.
with predicting life remaining. In: 2000 IEEE aerospace conference. proceedings
(Cat. No. 00th8484), Vol. 6. IEEE; 2000, p. 457–69.

14

You might also like