You are on page 1of 15

Università degli Studi di Cassino e del Lazio Meridionale

Master Degree in Mechanical Engineering

Course on:

System and human reliability

(prof. Alessandro Silvestri)

Final report on:

FMECA ANALYSIS FOR


WIND TURBINE

pag. 1 di 15
1. INTRODUCTION

2. CHARACTERISTIC OF FMECA KNOWLEDGE STRUCTURE

3. FMECA ANALYSIS FOR WIND TURBINE

3.1 - BLUVBAND METHOD

3.2 - ZHAO METHOD

3.3 - 80/20 PARETO METHOD

3.4 - ALARP METHOD

3.5 - UNIVERSALE GENERATING FUNCTION (UGF) METHOD

3.5.1 - PRIORITIZATION

pag. 2 di 15
1. INTRODUCTION
In recent years, more and more countries pay attention to clean energy and wind power has got a fast
development. The worldwide installed capacity of wind turbine generators is growing at 20% per
year. The rapid development of wind power causes more new wind farms and new wind turbines to
be put into operation. As to new wind farms, new employed maintenance personnel are lacking of
fault diagnosis experience and are often difficult to grasp the diagnostic skills due to the complex
structure of wind turbines, this may cause the difficulty of finishing fault diagnosis tasks in time. The
investment of wind farm is huge, so high availability for wind turbines is needed to recoup the
investment as soon as possible. Therefore, finding out the reliable fault diagnosis knowledge sources
is an urgent task for new wind farms and new wind turbines. System evaluation is important to provide
a systematic view for engineers in order to identify the problems and improve them. Reliability has
been increasingly considered in system analysis to reduce system failure. Many researchers have done
investigation on developing system reliability models to assess and optimize system behavior and
safety via components’ working probability and performance. The exact value of the probability is
not easily accessible. To estimate these parameters by statistical models, large amount of data is
needed. In addition to this, the estimation is associated with uncertainty. Therefore, subjective
estimates of parameters which are based on the judgment of experts and engineers could be useful in
the situation with the lack of sufficient data.
The failure trend of wind turbines is behaved like a bathtub curve:

A failure mode is a situation in which an asset or a component cannot work properly. In order to
ranking failure modes, each possible failure mode is valued by three parameters as severity (S),
occurrence likelihood (O), and detection difficulty (D) to obtain a risk priority number (RPN). The
RPN value is calculated through multiplication of the parameters which is:

RPN=S*O*D

pag. 3 di 15
2. CHARACTERISTIC OF FMECA KNOWLEDGE STRUCTURE
FMECA is a bottom-up approach which consists of a breakdown structure for reliability examination
of the final item of the system based on the failure causes. System definition and modelling is usually
used to understand the system’s operation to determine failure modes and their cause and effects.
There may be many failure modes on one component, and these failure modes may generate different
effects on the components at different levels: The local effect indicates the impact of the failure mode
on the component itself or other components at the same level. The next higher-level effect is the
impact of the failure mode on the components of the next higher level. The end effect is the impact
on the entire equipment. There might be many failures causes for one failure mode, and each failure
cause corresponds to certain criticality. The investigation should start at the lowest taxonomic level
and continue to the equipment unit level.

The rate of failure mode 𝛼𝑗 is the ratio of the occurrence number of the j-th failure mode to that of all
possible failure modes on one component. It can be obtained by statistic, experiment, prediction or
any other methods, and its value is between 0 and 1. The failure effect probability 𝛽𝑗 is the conditional
probability of that the entire equipment is led to some criticality due to the j-th failure mode. It can
be estimated by expert experience, and its value is also between 0 and 1.
The failure mode is certain to impact the entire equipment if the value of 𝛽𝑗 is taken as 1, whereas the
failure mode does not impact the entire equipment if the value of 𝛽𝑗 is taken as 0. The failure rate 𝜆𝑝
is the probability of that the failure of the component happens during its working hour t. The criticality
𝐶𝑚𝑗 of the j-th failure mode presents the degree of risk when the j-th failure mode happens in some
severity level during the working hour t, and it can be got by:

𝐶𝑚𝑗 = 𝛼𝑗 𝛽𝑗 𝜆𝑝 𝑡

The criticality Cr is the sum of all criticalities of failure modes on one component, and it is given as
follow:
𝑁 𝑁

𝐶𝑟 = ∑ 𝐶𝑚𝑗 = ∑ 𝛼𝑗 𝛽𝑗 𝜆𝑝 𝑡
𝑗=1 𝑗=1
From a general point of view a qualitative analysis of the system is carried out using the integration
of hierarchical algorithm and block diagram to provide a system sketch based on FMECA.

The characteristics of every component fault differ from each other, but several outlines can be
summarized as follows:

pag. 4 di 15
(1) Hierarchy. An equipment is constructed with components at different levels, and these
components have their own functions at corresponding level. The definition of a component
fault is that the component is loss of function, thus the fault also belongs to different level.

(2) Propagation. Faults propagate along certain direction. When a component at some level of an
equipment breaks down, the abnormal effect is propagated vertically from low levels to high
levels, and it can also be propagated horizontally among the components at the same level.

(3) Radiation. Faults may propagate indirectly among the components at the same level. For
example, if a bearing malfunctions, the vibration amplitudes of other bearings, which belong
to the same shafting with the failure bearing, may increase, whereas the change of vibration
amplitude of the failure bearing itself is not obvious.

(4) Correlation. One fault may generate many fault symptoms. Similarly, one fault symptom can
also indicate many kinds of faults. The relation between them is not a one-to-one
correspondence, but more complicated.

In addition is possible to include two effective parameters to evaluate the risk level:
(1) Turbine functionality. This parameter gives the turbine operational status after the failure.

• No impact. The turbine continues its work although the failure mode has occurred.
• No impact in the short term. Initially the turbine continues its work with all functionality, but
a maintenance action is needed.
• Reduced. Redundancy and auxiliary systems allow the turbine essential functionality; the
turbine continues to provide electricity, but some operations are not available.
• Strongly reduced. Most operations are not available; the turbine continues to provide
electricity with low efficiency.
• Doesn't work. The turbine can't produce electricity.

(2) Safety loss. This parameter indicates if the failure modes could reduce the safety level, with
a consequent risk for the environment, the operator, or the turbine itself.

This solution is used to mitigate one of the RPN drawbacks that many papers pointed out, that is the
same relative importance of O, S and D. FMECA was applied to various industry areas, including
automotive, aeronautical, military, nuclear and electro-technical engineering.

pag. 5 di 15
3. FMECA ANALYSIS FOR WIND TURBINE
The first phase of the FMECA analysis for a wind turbine is focused on the identification of all the
failure modes and their respectively causes for each of the electrical and electronical components
inside the turbine. Thus, all the possible scenarios will be considered in the risk assessment. This is
an important issue because a neglected cause could produce an un-studied situation linked to risk for
the environment, the operator and the system itself, with a consequent loss of availability and safety.
The generic wind turbine is composed by several components as is shown in the scheme reported:

The following step is the failure rate evaluation because the failure rate is an important and useful
parameter linked to the failure probability and can be used to rank occurrence. The failure mode
probability α, represents the percentage of time that the equipment will fail in a given mode. Thus, if
λ is the failure rate of the component, then the mode failure rate λ(M) is given by:

𝜆(𝑀) = 𝛼 ∗ 𝜆

In the next table is shown the criteria proposed to assess occurrence based on the mode failure rate.
As the table shows, occurrence is ranked on a scale from 1 (best case) to 10 (worst case); this scale
appears on standard FMECA forms.

pag. 6 di 15
In order to determine the mode failure rate intervals, data coming from the owner of the wind turbine
tested are used. In particular, the minimum and the maximum mode failure rate was used to set the
range for occurrence O = 1 and occurrence O = 10 respectively. The intermediate ranges are
determined in such a way to set them with all the same length.
The consequences of each failure mode on system element operation, function, or status need to be
identified, evaluated and recorded. Failure effects are classified as local and global effects. The local
effects describe the consequences of a failure mode on the operation, function, or status of the specific
item under consideration, while the global effects stand for the consequences on the operation,
function, or status of the higher-level taxonomy categorization.

Given the failure modes, the proposed procedure for wind turbine is as follows:
• Determine the probability of occurrence O of each failure mode given that the wind turbine
has failed, based on the historical data.
• Determine the probability of not detecting the failure 𝑃𝑁𝐷 .
• Calculate the cost consequence of the failure, 𝐶𝐹 .
• Evaluate the risk of each failure mode, called Cost Priority Number (CPN), by multiplying
the probabilities and the cost calculated in previous steps.

𝐶𝑃𝑁(𝑖) = 𝑂(𝑖) ∗ 𝑃𝑁𝐷 (𝑖) ∗ 𝐶𝐹 (𝑖)

𝑃𝑁𝐷 is calculated by dividing the number of actual failures, 𝑁𝐹 (𝑖), to the total Number of Failure
Vulnerabilities, 𝑁𝐹𝑉 (𝑖), as:

𝑁𝐹 (𝑖)
𝑃𝑁𝐷 (𝑖) =
𝑁𝐹𝑉 (𝑖)

Number of Failure Vulnerabilities is defined as the sum of number of actual failures and the number
of detected possible failures prior to their occurrences, for any given period of time. These risks of
failure may be detected during online monitoring, inspection, or maintenance.
The cost of failure, 𝐶𝐹 , is incurred due to the severity of failure consequence. Here, we only consider
those consequences that are affecting the wind turbine itself. However, it should be denoted that, the
failures may have other consequences endangering the safety of the site crew and neighbor residents
which is specific for a given site and have not been included in this study. Therefore, the cost of
failure is defined as:
𝐶𝐹 (𝑖) = 𝐶𝑃 (𝑖) + 𝐶𝑆 (𝑖) + 𝐶0 (𝑖) + 𝐶𝐿 (𝑖)

• 𝐶𝑃 is the cost of parts which need to be replaced due to the failure;


• 𝐶𝑆 is the cost of service, and it includes all the costs associated with the required facilities and
devices due to the failure, such as renting a crane, or transportation, etc.;
• 𝐶0 represents the opportunity cost, which is the sum of revenues the wind farm owner would
have received from selling power generation, in case the failure didn’t occur. It can be
expressed as:

𝐶0 (𝑖) = 𝐷𝐹 (𝑖) ∗ ̅̅̅̅̅̅̅̅̅


𝑊𝑃𝑂𝑈𝑇 ∗ ̅̅̅̅̅̅
𝐸𝑃𝑅

𝐷𝐹 corresponds to the duration of failure, ̅̅̅̅̅̅̅̅̅


𝑊𝑃𝑂𝑈𝑇 the average output wind power of turbine
and
̅̅̅̅̅̅
𝐸𝑃𝑅 is the average energy purchase rate, within this duration
• 𝐶𝐿 represents the total cost of extra labor required for the repair and can be expressed as:

𝐶𝐿 (𝑖) = 𝐷𝐹 (𝑖) ∗ 𝑁𝐶 ∗ 𝑀𝐻𝑅


pag. 7 di 15
In the above equation, 𝑁𝐶 and MHR are number of repair crew, and man-hour rate, respectively.
While CPN represents a cost-based risk factor, it can easily be incorporated in calculation of the total
failure cost of the system for any specific duration of interest (𝐷𝐼𝑁𝑇 ). The total failure cost can be
derived as:
𝑚

𝑇𝐹𝐶 = ∑ 𝑁𝐹𝑉 (𝑖, 𝐷𝑖𝑛𝑡 ) ∗ 𝐶𝑃𝑁(𝑖)


𝑖=1

Where, m represents the total number of the failure modes, and 𝑁𝐹𝑉 (𝑖, 𝐷𝑖𝑛𝑡 ) denotes the number of
failure vulnerabilities of failure mode i for the duration of interest. In this paper, the total failure cost
for duration of one year is denoted by annual failure cost (AFC).

There are other methods that is possible to use in order to evaluate the risk threshold for FMECA
analysis. The international standard IEC 60812 (2018) which define and standardize the FMECA
procedure miss to consider a method to evaluate the RPN threshold, as well as recent literature.
Usually companies define this threshold using questionnaires to take into account the judgement of
multiple experts in qualitative manner.

3.1 BLUVBAND METHOD


Bluvband and Grabov recommend the application of a simple but effective graphical tool for RPN
analysis. This tool creates a graph of ordered RPN values, much like the Scree Plot used in principal
component analysis. Scree Plot settings require preliminary ordering of RPN values by size, from
smallest to largest. These values are then plotted by size across the graph. The calculated RPNs
usually form a right-skewed distribution, with a first tail on the left (negligible risk values) and a
second tail on the right (critical risk values representing “outliers” from the distribution analysis point
of view). The long lower part of the plot is characterized by a gradual increase of the RPN values,
usually in a straight line f1(x) with a slight slope. The RPN values scattered around this line should
be considered a kind of “information noise”, as they do not require immediate attention. The short
uppermost part of the Scree Plot is characterized by a very steep increase of the RPN values (RPN
jumps), in the form of a straight line f2(x) with a very strong slope. The RPN values scattered around
this line are related to the most critical issues of FMECA and must be dealt with promptly. The Risk
Priority Numbers evaluated in the previous section were subjected to the Bluvband method to
determine the most hazardous failures. The “Curve Fitting Tool” by MATLAB could be used to
implement the linear regression method to evaluate the algebraic description of the straight lines f1(x)
and f2(x). The coefficients in the following equation are evaluated at a 95% confidence level:

𝑝1 = 1.101
𝑓1 (𝑥) = 𝑝1 ∗ 𝑥 + 𝑝2 𝑤ℎ𝑒𝑟𝑒 {
𝑝2 = −0.748

𝑝1 ′ = 7.391
𝑓2 (𝑥) = 𝑝1 ′ ∗ 𝑥 + 𝑝2 ′ 𝑤ℎ𝑒𝑟𝑒 {
𝑝2 ′ = −572.3

Note that the slope of the two straight lines f1(x) and f2 (x) is considerably different. In particular, the
line that fits the uppermost part of the plot is almost seven times greater than the other line.

𝑝1′ 7.391
∆𝑠𝑙𝑜𝑝𝑒 = = = 6.7130
𝑝1 1.101

pag. 8 di 15
3.2 ZHAO METHOD

Zhao propose an alternative method to evaluate the RPN threshold value as follows:
• Create Scree plot, following the rules explained in Bluvband method
• Fix the turning point of the RPN plot linear growth trend using the linear regression method.
Fit the RPN values into a straight line and obtain the turning point using the confidence
interval.
• Determine the threshold value of RPN from the turning point.

The results of the procedure applied on the E/E/PE component of the WT under test considering a
95% confidence level is illustrated in the Scree Plot. The RPN threshold provided by the Zhao
procedure using the linear regression method and with a 95% confidence level is approximately 140.
The 1st degree polynomial fitting curve is the following:
𝑝 𝑧 = 1.262
𝑓𝑍ℎ𝑎𝑜 (𝑥) = 𝑝1𝑧 ∗ 𝑥 + 𝑝2𝑧 𝑤ℎ𝑒𝑟𝑒 { 1𝑧
𝑝2 = −6.024

pag. 9 di 15
3.3 80/20 PARETO METHOD

The use of the 80:20 Pareto principle is the most established approach in reliability analysis to rank
failure modes according to their RPN value and to optimize corrective actions for critical components.
The Pareto diagram is helpful to visualize the differences between the rankings for the failures and
effects. The 80:20 principle can be explained as follow: 80% of the total Risk Priority Numbers
calculated during the FMECA procedure comes from only the 20% of the potential failure modes.
Pareto analysis starts with the prioritization of failure modes by ranking them in order, from the
highest risk priority number to the lowest.
The Pareto chart combines a bar graph with a cumulative line graph; the bars are placed from left to
right in descending order, while the cumulative line distribution shows the percent contribution of all
preceding failures. The combined chart uses the 80:20 rule to indicate where the engineering effort
should be focused more.
Each blue bar stands for the RPN assessment of the corresponding failure mode (y-scale on the left
side of the chart), while the red curve represents the cumulative percentage distribution of the RPN
(y-scale on the right side of the chart).
According to the 80:20 rule, the RPN threshold provided by the Pareto chart is approximately 48. The
first step is the identification of the 80% of the cumulative distribution of the Risk Priority Numbers,
then the RPN threshold value is given by the value of the Risk Priority Number of the failure mode
linked to the 80% of the cumulative percentage.

3.4 ALARP METHOD

The three procedures analyzed above give quite different results. The Zhao technique suggests
considering only four failure modes inside the group of the most critical failure modes (threshold
equal to 140), whereas the Bluvband approach recommends considering 11 failure modes inside this
group (threshold equal to 100), and the Pareto chart indicates that 55 failure modes are critical
(threshold equal to 48). Analyzing in detail the obtained results, it is clear that all the previous
techniques have some critical drawbacks. For instance, according to the 80:20 rule of the Pareto
method, 80% of the criticality should arise from 20% of the causes.
The study's results suggest this principle does not fit very well with this kind of application. As a
matter of fact, 80% of the RPNs of the E/E/PE components in the wind turbine represent 55% of the
failure modes. The Pareto chart cannot be considered a powerful technique to identify the RPN
threshold of a system, actually the principle used to select the numerical value of the threshold should
be reviewed and specifically defined for each kind of application. In this case, it is absolutely not
reasonable select a threshold of 48 indicating that more than half of the failure modes are critical.

pag. 10 di 15
Quite the opposite, the Zhao method suggests for the system under test that only four failure modes
are critical.
More generally, this technique provides untrustworthy results for many applications because of the
manner in which the threshold is evaluated. In fact, using this procedure very few risk priority
numbers overpass the 95% confidence bound falling in the critical modes group.
The Bluvband method provides interesting results, both threshold value and number of modes
considered critical is reasonable. Anyway, the procedure for the threshold evaluation is vague and
extremely subjective. According to the authors, the calculated RPNs form a right-skewed distribution,
with a first tail on the left and a second tail on the right with very different slopes, but no information
about how to divide the distribution in two sections are given. As a consequence, the identification
of the threshold is dependent on the judgment of the designer that carry out the procedure.

Therefore, a new approach has been introduced to overcomes the limits of the previous
methodologies. The proposed procedure consists of the following steps:

1) Calculation of the Risk Priority Numbers according to the guidelines provided in section II;

2) Identification of the main statistical parameters of the RPN set (25th percentile, mean value,
median value, 75th percentile, outliers, minimum and maximum value);

3) Generation of the boxplot of all the assessed RPNs;

4) The negligible modes are all the failure modes with RPNs below the median value;

5) The critical modes are all the failure modes with RPNs above the 75th percentile;

6) The interval between the median value and the 75th percentile is considered ALARP (“as low as
reasonably practicable”) region.

As the acronyms suggests, the ALARP region refers to reducing risk to a level that is as low as
reasonably practicable.
In practice, this means that the operator has to show through reasoned and supported arguments that
there are no other practicable options that could reasonably be adopted to reduce risks further. If a
failure mode is characterized by an RPN value that falls inside the ALARP zone, then designers have
to analyze possible countermeasures to reduce the risk bearing in mind the benefits resulting from its
acceptance and taking into account the costs of any further reduction. Then designers could choose
to apply countermeasures or not based on the previous consideration. The upper and lower limits of
the ALARP region must be considered as low as reasonably practicable too.
Instead, if the RPN is above the 75th percentile then the risk is regarded as intolerable and cannot be
justified in any ordinary circumstance, so corrective actions must be implemented. The proposed
approach was applied to the case study described in the previous sections, and the results of the
statistical analysis are the following:

• Range of admissible values [1;300]


• Minimum: 8
• Maximum: 180
• 25th percentile: 24
• Median: 54
• 75th percentile: 87

pag. 11 di 15
Outliers: none (considering outliers all the RPNs more than three standard deviations away from the
median)

The green zone (below the median) stands for the negligible failures, the yellow region represents the
ALARP and the red region (above the 75th Percentile) indicate the critical failure modes.
In particular, the proposed method suggests 25 failure modes inside the critical group (RPN higher
than 87), 27 failure modes inside the ALARP region and 48 negligible modes.

The threshold to identify the critical modes of the proposed approach falls between Bluvband and
Pareto method, as well as the number of critical modes. Considering only the red zone, the Boxplot
method is a more conservative approach respect to the one proposed by Bluvband. Designers must
always choose the best solution in terms of cost and risk level. It is generally more advisable to select
the worstcase scenario, that is, the procedure providing the lowest RPN threshold, considering a larger
number of failure modes in the critical area. In this application, the worst-case scenario is the 80:20
rule applied in the Pareto chart, but it provides not reasonable results in terms of the cost of the
corrective actions. Indeed, it is not possible to apply countermeasures on the 55% of the failure.
Therefore, the optimal trade-off between cost and threshold level is provided by the proposed method.
Moreover, the new technique allows to introduce also an ALARP zone where each mode could be
considered critical or negligible, depending on the scenario.

pag. 12 di 15
3.5 UNIVERSALE GENERATING FUNCTION (UGF) METHOD
In this section, the FMECA information of the failure modes is aggregated through the system using
the Universal Generating Function (UGF) approach. This model can contain the data of failure modes,
components, and subsystems and the interaction among them through a large-scale system. Also, this
approach is useful for decision makers to find the priority of each improvement action based on their
effect on the whole system. Therefore, the uncertainties are propagated across the whole system.
In UGF, the probability distribution of a variable (Xi) is discretely represented via a u-function in
which pij is the probability that the variable is in state j, xij is its corresponding value, and ki shows
the number of different states of the variable. The u-function presents a polynomial structure of a
probability distribution.
𝑘𝑖

𝑢𝑥𝑖 (𝑧) = ∑ 𝑝𝑖𝑗 ∗ 𝑧 𝑥𝑖𝑗


𝑗=1
Each failure mode is considered as a binary phenomenon which consists of two possible reliability
levels. Where, occurrence rate represents the probability that a failure mode occurs with specified
severity degree. According to this statistical property, the probability distribution of each failure mode
can be presented by u-function
1

𝑢𝐹𝑖ℎ (𝑧) = ∑ 𝑂𝑟 𝑖𝑗 ∗ 𝑧 𝑆𝑖𝑗


𝑗=0
where 𝑢𝐹𝑖ℎ (𝑧) is the u-function of failure mode number “i” in subsystem h, 𝑂𝑟 𝑖𝑗 and 𝑆𝑖𝑗 are
respectively the occurrence rate and severity of the failure mode in state j. Since there are just two
states as 0 stands for failure and 1 stands for working state (no failure), 𝑂𝑟 𝑖1 = 1 − 𝑂𝑟 𝑖0 . Also, severity
of the state with no failure (𝑆𝑖𝑗 ) is zero. Consequently, the u-function of the failure mode number 1
in subsystem 1 ( 𝑢𝐹11 (𝑧)) is demonstrated as 0.443*z4+0.557*z0.
This transformation provides an opportunity to use the properties of polynomial expressions in order
to find the probability distribution of a function of variables. It also facilitates to model a system with
multi-level components.
Based on the failure modes’ u-functions, the behavior distribution of the higher levels such as
components, subsystems, and system can be obtained. In this regard, all possible combinations should
be considered to generate the u-function of the higher-level item via the ⊗𝑓 operator. For the
example, u-function of each subsystem can be computed as:
𝑢𝑆𝑆ℎ (𝑧) = ⊗𝑚𝑎𝑥 (𝑢𝐹1ℎ (𝑧), 𝑢𝐹2ℎ (𝑧), … , 𝑢𝐹𝑛ℎ (𝑧))
where 𝑢𝑆𝑆ℎ (𝑧) is the u-function of subsystem h, nh is the number of failures in the subsystem, and
⊗𝑚𝑎𝑥 denotes the maximum severity of the occurred failures in each combination is considered. A
Matlab programming is developed to acquire the probability distribution of severity levels of
subsystems. results are exhibited in the next table:

u-function of the failure mode number 1 in subsystem 1:

𝑢𝐹11 (𝑧) = 0.443 ∗ 𝑧 4 + 0.557

u-function of the subsystem 1:


𝑢𝑆𝑆1 (𝑧) = 0.663 ∗ 𝑧 4 + 0.0525 ∗ 𝑧 3 + 0.2844𝑧 0

pag. 13 di 15
Similarly, the u-function of the system is figured based on the subsystems’ u-functions through
considering all possible combinations of severity levels of subsystems. Since the subsystems of the
example are connected as a series system, the u-function of the whole system is obtained as:

U(z) = ⊗𝑚𝑎𝑥 (𝑢𝑆𝑆1 (𝑧), 𝑢𝑆𝑆2 (𝑧), … , 𝑢𝑆𝑛 (𝑧))

where U(z) is the u-function of the system, and n is the number subsystems. The severity levels of
the system which are calculated through Matlab software are shown in the next table:

Severity level 0 1 2 3 4
Probability 0.0019 0.0043 0.0004 0.0161 0.9773

According to the previous table, the u-function of the system is:

𝑈(𝑧) = 0.9773 ∗ 𝑧 4 + 0.0161 ∗ 𝑧 3 + 0.0004 ∗ 𝑧 2 + 0.0043 ∗ 𝑧1 + 0.0019 ∗ 𝑧 0

An expected value for each u-function can be extracted using the first derivative at z equal to 1. As a
result, a derivation is to be done on the system’s u-function to find the expected value of the system
in terms of failure as OFI:
𝑑
(𝑈(𝑧)) at z=1
𝑑𝑧

Subsequently, the OFI for the example is computed which is 3.9627. The u-function represents how
the system tends to be failed, and the OFI gives a scale to measure this failure behavior.

pag. 14 di 15
3.5.1 PRIORITIZATION

The components covered by the FMECA procedure are usually very different from a risk value point
of view. The most important failure modes, characterized by a high RPN, should be separated from
those characterized by a significantly lower RPN value. The selection of ``high priority'' failure modes
is a very critical issue for the development of corrective action plans.
Corrective actions are usually applied on to decrease the occurring rate or increase the detectability
of the causes. Therefore, a corrective action controls the related cause in terms of occurrence or
detection. Then, it would lead to mitigate the risk of each failure mode. Successively, it effects on the
items in the higher levels as component, subsystem, and system. As a result, the OFI would be reduced
in the response of implementing the corrective action. This reduction can be used as criteria to
prioritize the corrective actions as:

∆𝑂𝐹𝐼𝑎 = 𝑂𝐹𝐼𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − 𝑂𝐹𝐼𝑎

where 𝑂𝐹𝐼𝑎 is the estimated OFI for the time the corrective action a is applied on the corresponded
failure cause, and ∆𝑂𝐹𝐼𝑎 measures the reduction that is carried out on the current situation of the
system 𝑂𝐹𝐼𝑐𝑢𝑟𝑟𝑒𝑛𝑡 by action a. The higher the ΔOFI, the higher the priority of corrective action to
apply. Consequently, the corrective actions which have more impact on the improvement on the
system behavior can be nominated.

Due to the limited budget, all corrective actions cannot be allocated to all failure causes. Therefore, a
balance should be provided between actions’ costs and their equivalent improvement. As a result, an
algorithm has been developed to select optimal corrective actions using an optimization model. The
algorithm aims to maximize the resulted reduction in OFI by corrective actions in terms of the budget
constraint. The linear mathematical formulation of the optimization model is:
𝑁

𝑀𝑎𝑥 ∑ ∆𝑂𝐹𝐼𝑎 ∗ 𝑥𝑎
𝑎=1
𝑁

∑ 𝐶𝑎 ∗ 𝑥𝑎 ≤ 𝐵
{ 𝑎=1

where 𝑥𝑎 is a Boolean variable that is 1 if action a is applied otherwise is 0, N is the number of failure
causes which is equal to ∑𝑁 ℎ=1 𝑛ℎ , 𝐶𝑎 is the cost associated with the action, and B is the available
budget.
In the abovementioned model, the effect of several corrective actions on OFI is assumed to be linear.
However, there is a nonlinear relationship between OFI reduction and number of applied actions. In
the abovementioned model, the effect of several corrective actions on OFI is assumed to be linear.
However, there is a nonlinear relationship between OFI reduction and number of applied actions.
As a result, the optimization model is changed to a modelling with nonlinear objective function and
still a linear constraint as:

𝑀𝑎𝑥 (∆𝑂𝐹𝐼(𝑋))
𝑁
{
∑ 𝐶𝑎 ∗ 𝑥𝑎 ≤ 𝐵
𝑎=1

where X is a vector of Boolean variable as X = (𝑥1 , 𝑥2 , … , 𝑥𝑁 ). Solving this model will give how
many actions and which of them are optimal to apply.

pag. 15 di 15

You might also like