Professional Documents
Culture Documents
Mean Time to Failure (MTTF) is an essential maintenance metric that estimates how long
non-repairable assets can work before they falter. Tracking MTTF minimizes operational
disruptions, maximizes asset lifespans, and enables more effective O&M decision-making.
MTTF helps maintenance teams develop effective maintenance strategies, reduce dependence
on reactive maintenance, and decrease unplanned downtimes.
How to Calculate Mean Time to Failure
To calculate MTTF, divide the total time of operation by the total number of items you are
tracking:
Only calculate MTTF for identical assets and parts: their manufacturer, size, and even usage
should match.
For example, let’s say you’re calculating the MTTF of lightbulbs in your inventory.
You would want to make sure that the lightbulbs are the same wattage, built by the same
manufacturer, and used in the same way to ensure a more accurate MTTF value.
MTTF Example
You want to calculate the MTTF for conveyor belt rollers in your manufacturing facility.
You have 125 rollers that have been used for a total of 60,000 operational hours in the past
year. The MTTF is:
MTTF = Total Hours of Operation ÷ Total Number of Assets in Use
MTTF = 60,000 hours ÷ 125 assets
So, how can you improve MTTF to get more out of your assets before they fail? Here are a
few tips to get started:
Purchase quality materials and parts: Always ensure that you buy your assets and parts
from quality producers. Purchase materials produced in strict adherence to quality
standards. You’ll have durable materials that will serve you for a longer period of time.
Use assets only for intended functions: Purchasing quality materials isn’t enough. You also
should use the assets and parts only for the functions they are intended to perform.
Additionally, ensure that the conditions such as voltage, pressure, heat, and humidity are
right. Always have the assets installed by qualified professionals.
Implement an effective preventive maintenance program: There’s little that maintenance
scheduling can do for assets that are running to failure. However, preventive maintenance
activities such as cleaning and lubrication can help to extend their lifespan. Developing and
implementing an effective PM program can help to improve your MTTF.
Mean Time Between Failure (MTBF) is a measure of the reliability of a system or
component. It’s a crucial element of maintenance management, representing the average time
that a system or component will operate before it fails.
The MTBF formula is often used in the context of industrial or electronic system
maintainability, where failure of a component can lead to significant downtime or even safety
risks, but MTBF is used across many types of repairable systems and diverse industries. It
can help measure the overall reliability of manufacturing plants, energy grids, informational
networks and countless other use cases.
MTBF is calculated by dividing the total time of operation by the number of failures that
occur during that time. The result is an average value that can be used to estimate the
expected service life of the system or component.
It's important to note that MTBF is an average time, and does not guarantee that a particular
system or component will last for the full MTBF period without failing. The actual time
between failures can vary widely, and it is not uncommon for failures to occur well before or
after the MTBF. Additionally, MTBF does not take into account the severity of the failures or
the impact they may have on operations or safety.
A high MTBF doesn’t mean that breakdowns will never occur, only that they are less likely
to occur. All systems and components have a finite lifecycle, and failures can occur due to a
variety of factors, including wear and tear, environmental conditions and manufacturing
defects.
Reliability engineers can use MTBF to compare the reliability of similar systems or
components, but it cannot be directly compared between different systems or components.
This is because the MTBF is highly dependent on the operating conditions, usage patterns
and other factors specific to the system or component being measured. It is difficult and
possibly inadvisable to seek a meaningful definition of a “good” MTBF across different use
cases. A “good” MTBF for one system might look very different than a “good” MTBF in
another — even very similar — use case.
How is mean time between failure calculated?
First, let’s define the scope: We must define the system or component in question, along with
operating conditions, including environmental factors and usage patterns. Then, we collect
data on the operating time of the system or component, including the start and end times of
each operation cycle. Then, we record the number of failures that occurred during the
operating time. Finally, we can calculate the MTBF: Divide the total operating time by the
number of failures. The result is usually expressed in hours, but can be any unit of time.
For example, let's say you want to calculate the MTBF of a motor that operates for 8 hours
per day, 5 days a week, for a total of 1 year. During this time, the motor fails 4 times. To
calculate the MTBF:
Improving MTBF reduces the number of failures over a given period of time, providing a
range of benefits to businesses and industries. Key benefits include:
Improving MTBF often involves identifying and addressing the root causes of failures. Here
are some common ways to improve MTBF:
MTBF is critical in the aerospace and defense industry, where the breakdown of a component
can have serious safety implications. When human lives are on the line, it is essential to
maximize total uptime of critical systems like fuel and oxygen supply systems. MTBF is used
to ensure that components and systems meet reliability requirements and to identify potential
issues before they become safety risks.
Automotive
MTBF is used in the automotive industry to measure the reliability of components such as
engines, transmissions and electronic systems. By tracking MTBF, manufacturers can
identify design or manufacturing issues and take corrective action before a failure occurs.
Medical devices
In the medical device industry, MTBF is used to ensure that devices such as pacemakers,
insulin pumps and MRI machines meet reliability requirements and do not pose a risk to
patient safety.
What is the bathtub curve?
The bathtub curve is a graph that represents the failure rate of an asset over time. It is used as
a very basic measure to help understand why failures occur on certain assets and how to
predict and prevent them. It is called the “bathtub curve” because it resembles the cross-
section of a bathtub: steep sides with a flat bottom.
What are the three phases of the bathtub curve?
The infant mortality period can also be known as the early failure period and occurs at the
very beginning of an asset’s lifecycle the moment you install and begin using the asset. It’s
important to remember that this doesn’t begin the moment you acquire it, just the moment
you install it and begin using it. For example, if you purchase an asset and it sits in your
warehouse for six months while you finish building your new facility, those six months are
not counted in the bathtub curve because you haven’t begun using it yet.
The reason it’s called the infant mortality period is that there tends to be a high rate of
failures during this short amount of time. This is during the very beginning of the asset’s
lifecycle where failures are most often a result of manufacturer defects, incorrect installation,
or improper usage as operators either received improper training or are learning how to use
the asset.
Discovering failures during this period can be helpful as you can discover defects that can be
replaced under manufacturer warranty. It also helps to discover other human errors in the
asset lifecycle so they can be corrected to prevent future failures.
Normal life period
During the normal life or constant failure rate period, an asset maintains a relatively constant
failure rate. In this period, manufacturer defects and improper usage are far less common and
most of the failures are caused by normal wear and usage of the asset. The rate of these
failures over time typically maintains a constant ratio.
Basically, if an asset’s normal life period is 10 years, if you make six repairs in the first three
years, then only four repairs in the last seven years, over the normal life period, those repairs
average out to be a constant failure rate.
There are two main reasons that the failure rate appears constant: 1) some of the failures are
truly random due to external environmental factors or simply chance occurrences, 2) there are
so many different failures and failure modes contributing to the formula that the overall rate
appears to be random.
While some of these failure modes truly are random, the good news is that most of them can
be accurately predicted and repaired before they become true failures with the help of
a preventive maintenance (PM) program.
A good example is a work truck. You know that the tires on the truck will wear out after a set
number of miles, and if you know how many miles the truck drives over a set period of time,
you can accurately predict when the tires will fail and can change them before that happens.
What you can’t predict, however, is driving over a nail and puncturing a tire while on the
road.
Wear out period
Also known as the end-of-life period, this period is the final stage of the life cycle of your
assets where failures occur due to assets and parts reaching the end of their designed useful
life. These end-of-life failures are generally predictable and are often even specified in the
manufacturer’s documentation about the asset. During this period, the failure rates increase
sharply resulting in the other side of the bathtub on the curve.
Depending on the asset, sometimes the parts that are designed to reach this wear-out period
can be replaced entirely to reset the curve. This doesn’t apply to every asset and there are still
other components of the asset that will need to be considered, but sometimes performing
these repairs can significantly increase the