You are on page 1of 3

Failure Code Hierarchy > Is it the Best Design?

Publicado el 3 de julio de 2018

John Reeve
Author, CRL, CMMS Champion

Validated data is essential to asset management failure analysis. Technicians might


provide a narrative as to what they discovered, repaired, or replaced but actionable
failure data is still required -- inside the EAM.

The “problem” with the Problem-Cause Hierarchy

1. Many administrators when confronted with the task of building out a P-C-R codeset,
never complete the task due to the enormity. And, a lot depends on how granular you
make the Level-1 failure class. In addition, the larger the hierarchy, the more there will
be to maintain going forward. Conversely, without failure coding, you simply have a work
order ticket system.

2. The terminology used in the EAM model (P-C-R design) does not exactly line up with
the RCM standard (SAE JA1011) which defines “failure mode”. As an example, the EAM
usually refers to the "problem code" as the asset problem, whereas RCM standard
requires the component problem.

3. One should be aware, that if the component is placed inside the problem-cause
hierarchy, the size of the hierarchy will expand exponentially. Question: If that is the case,
where should the failed component best be stored?

4. The P-C-R hierarchical design itself often hamstrings the user communities and
introduces confusion resulting in mixed values within the levels. This is evident when I
review organizations CMMS applications as well as on the internet as "recommended
models".
5. Quite often you will hear that stakeholders are not happy with their failure data, or
their analytical reports. They may blame the software when they really should be re-
emphasizing the surrounding process and procedure. There is nothing stopping the
Reliability Team from designing their own failure analytic to start managing by exception
and drilling down on failure modes.

Solutions to the Problem

1. When speaking of failure mode, it is important to capture all 3 elements as actionable


data. After all, maintenance is performed at the component level. Therein, a failure mode
is described as the failed component, component problem, and cause code. This
definition is also described in a book by Douglas Plucknette called RCM Blitz.

2. And before we can capture the component problem, we need to capture the failed
component. Therein the Failed Component should be placed on the work order screen.

3. Since the failed components could be numerous, it would be a good idea to add a
2nd field titled "Component Suggested Add" which upon record save has automatic
routing to reliability leader for review/approval.

4. Whatever failure data structure you use should be flexible in that this codeset will
never be perfect. Therein it should be managed as a “living entity” allowing an easy way
for the technician to submit (or recommend) a new value for any level of the failure code
structure. Question: So, in terms of flexibility, is the failure code hierarchy ideal?

5. Where careful thought is also needed, is in the cause coding. Without proper capture
of the cause code, this failure event could repeat. The trick is to design a cause
code hierarchy that is easy to select from by the technician, supervisor and reliability
leader. But to be clear, we are seeking a "middle ground" design meaning we want some
indication as to basic cause but not a formal root cause analysis (RCA).

6. There are really just 8-10 standard remedy codes which means these can be applied
to every possible failure class, and therefore, a standalone field is acceptable as opposed
to repeating this codeset hundreds of times at the bottom of the hierarchy

7. Failure coding should be designed to support failure analytics. The Reliability Team
should help design this output so that they can properly leverage the EAM knowledge
base using a Pareto-style failure analytic. Senior management does not have time to be
rifling through narrative text.

8. It is the authors opinion that the failure mode is best captured as 3 standalone
fields. And you would still capture the asset problem code -- making it a total of 4 fields.
I know this might be shocking to some who have used this o.o.b. failure hierarchy for
years/decades, but, ask yourself if the stakeholders are happy with the failure reporting.
With individual fields you have greater flexibility -- and even more mathematical
combinations.
In the end, you do not want to create a monster in terms of size. Nor, do you require a
formal failure analysis by the maintenance technician. But, we do want his opinion as to
(a) aging, (b) wear-and-tear, (c) force majeure, (d) power failure/surge or (e) something-
else (which is usually human-factor). If the Cause code of “something-else” is chosen,
then, the Supervisor or Reliability Leader would provide additional categorization using a
cause code hierarchy.

Lastly, a generic codeset can be created for both asset problem codes and component
problem codes consisting of roughly 25 values each. As to the failed components, all
asset components can be placed in one table and linked to a standalone field through a
table-domain, using a type-ahead buffer. I can find any component I need in roughly 3
keyboard clicks even though the table domain may have 500 choices. This design is
explained in the book, Failure Modes to Failure Codes.

To summarize, every day that goes by without capturing validated data, is lost failure data
never to be recovered. Failure data if properly captured can help all industries around the
world make dramatic improvements to the bottom line. If we can accurately capture the
failed component, component problem and cause code in validated fields, against an
asset, then we have a true failure mode on the work order record which can be quickly
compared to Failure Modes Effects Analysis library (which b.t.w., could also be stored
inside the CMMS as a new application).

Related topics: What defines an asset? And when is a component a component?

Good failure data can be used to manage risk, improve asset performance, and optimize
costs. However, despite these improvements, many organizations still struggle with the
capture of failure data, and more importantly, failure analysis.

EAM user communities generally end up with two outcomes: (1) a flawed failure code
hierarchy, and, (2) the inability to leverage failure data. A simpler design is needed which
still supports failure analysis, but not so detailed that it requires significant effort to
manage. In the end, the collected failure data needs to support decision making.

Many of the leading EAM products utilize a failure code hierarchy. This means they align
failure codes to each asset classification. And below each classification, the administrator
sets up a (asset) problem, cause and remedy (called PCR) hierarchy. Unfortunately this
design lacks proper emphasis on the failed component which is a key element of the
failure mode. In some cases, organizations try to incorporate the failed component into
the PCR hierarchy which further balloons the amount of failure records creating
thousands of records. With so much clicking the working level quickly loose interest in
failure data accuracy. Further, the size of these PCR hierarchies can overwhelm the
mobile solution from an I/O perspective. And sometimes the EAM administrator will
simply give up and never complete the failure code library. In conclusion, the
management of this type of architecture results in (1) overall frustration due to size, (2)
failure data inaccuracies, and (3) an inability to make data-based decisions. And now
everyone loses.

You might also like