Thesis Domingos Massala

RAM Analysis applied to centrifugal gas compressors
"Case study of an Oil and Gas Company"
Domingos Cúnua Massala
Thesis to obtain the Master of Science Degree in
Mechanical Engineering
Supervisor: Prof. Virgínia Isabel Monteiro Nabais Infante
Examination Committee
Chairperson: Prof. Luís Filipe Garlão dos Reis
Supervisor: Prof. Virgínia Isabel Monteiro Nabais Infante
Member of the Committee: Prof. Elsa Maria Pires Henriques
July 2018
ii
This work is entirely dedicated to my mother, Helena Domingos.
iii
iv
Acknowledgments
Firstly I would like to thank GOD for the gift of life. Secondly, I would like to express my gratitude to all
the people and entities who contributed to the development of this work, as well as all those who were
preponderant throughout my academic career at Instituto Superior Técnico. So, I would specially like to
express my gratitude to:
• My supervisor, Prof. Dr. Virginia Infante, for giving me the opportunity to do this study with her
and for all support during its development, specially for the attention and availability that she has
demonstrated always to take all doubts;
• Prof. Dr. Beatriz Silva, for the continuous follow-up of my course at Instituto Superior Técnico from
my first year until the end of this journey;
• Total E& P Angola for giving me the opportunity to intern as Mechanical Maintenance Engineer so
that the development of this work could come true, as well as for all finantial support through the
schoolarship that was given to me;
• Engineer Rui ERASMO from Total E&P Angola, for all support during the internship period at Total;
• All my family, specially to my parents, brothers and sister, for all emotional support as well as the
finantial support they have given to me during my whole academic life;
• My friend Nguinamau Cedrick Mbele, for all support in the reviewing the work;
• Engineer Domingos Mbomba Massala (my brother), for being one of the greatest mentors of my
journey at Instituto Superior Técnico;
• Engineers Venilton Machado, Agnelo Cardoso, Ngola Cusseiala, Adilson Moreira and Nkadi
Sebastião, for their support during my first steps at Instituto Superior Técnico;
• Miguel Massala and Domingos Marcos for the good moments we shared together in Lisbon;
• Last but not least, thank you Marta for your support and love.
• All of them I could not cite above, but surely made my journey better than it would be.
v
vi
Resumo
Nos últimos 4 anos, a indústria de Petróleo e Gás tem enfrentado uma crise, resultado da excessiva
oferta de petróleo, o que se traduziu numa queda abrupta do preço de comercialização do barril de
petróleo. Esta tendência tem obrigado muitas empresas do sector a procurarem por soluções que
asseguram a viabilidade das suas operações, por formas a manter os custos de produção dentro de
limites aceitáveis. Ora, uma vez que a eficiência de um processo de produção, de algum modo, está
relacionada também com o desempenho das máquinas que nela participam, portanto, operar tais
máquinas da maneira mais racional possı́vel, evitando-se avarias e paragens inesperadas, é um bom
princı́pio para a redução de custos. É, no entanto, nesta ordem de ideias que a manutenção tem vindo
a ganhar mais importância nos objetivos estratégicos de qualquer empresa no sector industrial.
Assim, este trabalho aplica a análise RAM (Reliability, Availability, Maintainability) aos compressores
centrı́fugos instalados numa planta offshore de Petróleo e Gás, com o objetivo de se poder determinar
de acordo com um histórico de falhas dos compressores, os componentes crı́ticos, os seus
indicadores de fiabilidade (tempo até à falha), manutenibilidade (tempo médio de reparação) assim
como a determinação da disponibilidade dos compressores.
Os resultados obtidos nesta análise permitiram identificar o Dry Gas Seal como o componente mais
crı́tico, uma vez que as falhas deste componente tiveram um impacto maior nos vários compressores
analisados e, foi possı́vel verificar também que este componente apresenta um tempo médio até à
falha menor do que o esperado, traduzindo-se em custos inesperados. Através desta análise, foi
possı́vel identificar as razões do excessivo tempo de paragem verificado nos compressores devido às
falhas com o Dry Gas Seal, sendo este componente, responsável pela redução de disponibilidade nos
compressores. Finalmente, algumas ações de melhoria foram recomendadas para melhorar a
fiabilidade do componente crı́tico e reduzir o tempo de inatividade dos compressores, assim como
melhorar a disponibilidade dos compressores.
Palavras-chave: Análise RAM, Compressores centrı́fugos, Fiabilidade, Manutenção,

Petróleo e Gás.
vii
viii
Abstract
In the last four years, the Oil and Gas industry has faced a crisis in the sector with the large amount of
oil supply, which resulted in an abrupt drop in the marketing price of the oil barrell. This trend has forced
many companies in this sector to look for solutions to keep their operations feasible, in such a way that
the production costs can be kept within acceptable limits. Since the effeciency of any production process
in any industry depends in some way on the machines performance, thus, operate these machines
inherent in the production process in a more rational way, avoid breakdowns or unexpected stops, would
be a good way of reducing costs. Therefore, it is in this context that maintenance has continuously
gained more importance in the strategic objectives of any company in the industrial sector.
Thus, this work deals with the application of Reliability, Availability Maintainability (RAM) Analysis to
centrifugal gas compressors installed in an offshore Oil and Gas plant, in order to determine according
to the failure history, the critical components in the compressors, as well as their mean time to failure,
mean time to repair and availability values of compressors.
The results obtained in this analysis allowed to identify the Dry Gas Seal as the most critical component,
since the failures of this component had a greater impact in the several compressors analyzed, and
it was also possible to verify that the critical component presents a mean time to failure lower than
the expected, which means that failures occur unexpectedly, often translated into unexpected costs.
Through this analysis, it was in the same way possible to identify the reasons of the excessive down
time of the machines due to Dry Gas seals failures (although the mean time to repair was lower than
the expected), which was the main reason for the low value of availability in some compressors. Finally,
some improvement actios were recommended in order to improve reliability of the critical component
and reduce the down time in the compressor, as well as improve the availability of the compressors.
Keywords: RAM Analysis, Centrifugal Compressors, Reliability, Maintenance, Oil and Gas.
ix
x
Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvii
Abbreviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix
1 Introduction 1
1.1 The Oil and Gas Upstream Industry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.1 Centrifugal Compressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 RAMS (Reliability, Availability, Maintainability and Safety) Analysis . . . . . . . . . 7
1.3 Topic Relevance and Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2 RAM Analysis theoretical background 11

2.1 Introduction to RAM Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 RAM Analysis characterization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.3 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3 Pareto Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 FMECA Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.5 Reliability of systems: The RBDs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.1 Series System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5.2 Parallel or redundancy system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.5.3 M-out-of-N System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3 Maintenance 31
3.1 Introduction to maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2 An overview of maintenance history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
xi
3.3 Types of maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 Corrective maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.2 Preventive maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Maintenance Management Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.1 Risk Based Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.4.2 Total Productive Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Case Study 39
4.1 Introduction into the Case Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Total E & P Angola Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2.1 Total’s Block 17: ”The Golden Block” . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.2.2 The Maintenance Department . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Centrifugal Gas Compressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.1 Centrifugal compressors overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3.2 Compressor package main systems . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.3 Actual Centrifugal Compressors Maintenance Plan . . . . . . . . . . . . . . . . . . 47
4.4 RAM Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.1 Methodology to classify the components . . . . . . . . . . . . . . . . . . . . . . . . 48
5 Results 59
5.1 FPSO Dalia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1.1 Failure Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5.1.2 Pareto Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.1.3 Consequence Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1.4 Likelihood Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.5 Criticality Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 RAM Analysis of Critical Component: Dry Gas Seal . . . . . . . . . . . . . . . . . . . . . 66
5.2.1 Dry Gas Seals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.2.2 Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2.3 Estimation of parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.3 Maintainability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.4 Availability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6 Conclusions 79
6.1 Achievements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.2 Difficulties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
References 81
A FMECA Severity vs Occurrence Procedure 85
xii
B Compressor’s Systems: Components 87
C Compressor Major Overhaul activities 89
D Failure History 92
E Check list 96
F Dry Gas Seals FMECA 98
xiii
xiv
List of Tables
4.1 Consequence level (A): Effect on Health Safety and Environment . . . . . . . . . . . . . . 51

4.2 Consequence level (B): Effect on Production . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.3 Consequence level (C): Effect on Maintenance cost . . . . . . . . . . . . . . . . . . . . . 51
4.4 Likelihood level (D): Redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5 Likelihood level (E): Equipment/Component technology . . . . . . . . . . . . . . . . . . . 53
4.6 Likelihood level (F): Failure rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.7 Consequence Index Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8 Likelihood Index evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1 DALIA Compressors failure by System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5.2 Consequence final assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.3 Likelihood final assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.4 Criticality Matrix final assessment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.5 Dry Gas Seals failure history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6 Values for regression analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.7 Reliability and Failures Probabilities values . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.8 Data to estimate the MTTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.9 Estimated MTTR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
5.10 Availability values of gas compressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
A.1 FMECA criteria for severity, likelihood and detection ratings . . . . . . . . . . . . . . . . . 86
xv
xvi
List of Figures
1.1 Offshore platforms used in Oil and Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Typical Oil and Gas reservoir [6] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Oil and Gas Production Overview [8] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Illustrative gas compression PID [11] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Compressor classification (Adapted from [17]) . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Types of compressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1 Representation of the steps which comprehend RAMS and life cycle [29] . . . . . . . . . 12
2.2 Survival and Failure probability [1] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Mortality curve [36] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Exponential distribution [37] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.5 Effect of the parameter β [37] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.6 States of a repairable component [33] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7 Difference between the time measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.8 Example of an RBD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.9 Series systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.10 Active parallel systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.11 Standby system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.12 RBD in M -out-of-N configuration for 2-ou-of-3 . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 The Block 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.2 Block 17 FPSOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3 Topsides organization top view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Total E & P Angola Maintance Department Organization . . . . . . . . . . . . . . . . . . . 42
4.5 LP-MP Compressor Package FPSO Pazflor . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.6 Gas Compression Package with Electric Motor Driver . . . . . . . . . . . . . . . . . . . . 44
4.7 Compressor main system components [53] . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.8 Compressor Unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.9 Anti Surge Control System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.10 Methodody to perform the case study analysis . . . . . . . . . . . . . . . . . . . . . . . . 50
4.11 Criticality matrix Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
xvii
4.12 Final Matrix of Occurrence Versus Severity . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1 Pareto Analysis: Total down time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.2 Pareto Analysis: Number of failures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 Pareto Analysis: Mean Down Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Failure of Dry Gas Seal and its consequence . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Dry Gas Seals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.6 Dry Gas Seal in tandem arrangement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.7 Regression graphic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.8 Reliability function R(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.9 Failure function F(t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.10 CLOV FSPO Dry Gas Seal PI-CoreSight display . . . . . . . . . . . . . . . . . . . . . . . 74
5.11 Normal expected Dry Gas Seal replacement operation . . . . . . . . . . . . . . . . . . . . 76
xviii
Abbreviations
API- American Petroleum Institute

ASV- Anti Surge Valve
CBA- Cost Benefit Analysis
DE- Drive End
DGS- Dry Gas Seal
DT- Down Time
ESD- Emergency Safety Device
FMEA- Failure Modes Effects Analysis
FMECA- Failure Modes Effects & Criticality
FPSO- Floating Production Storage & Offloading
HH- High High
HSE- Human Safety Environment
LL- Low Low
LNG- Liquefied Natural Gas
LP- Low Pressure
MDT- Mean Down Time
MM- Medium Medium
MP- Medium Pressure
MTBF- Mean Time Between Failures
MTN- Maintenance
MTTF- Mean Time To Failure
MUT- Mean Up Time
NDE- Non Drive End
OREDA- Offshore & Onshore Reliability Data
PID- Piping & Instrument Diagram
RAM- Reliability Availability & Maintainability
RAMS- Reliability Availability Maintainability & Safety
RBD- Reliability Block Diagram
RBM- Risk Based Maintenance
RCM- Reliability Centered Maintenance
xix
SCADA- Supervisory Control & Data Acquisition
TBF- Time Between Failures
TPM- Total Productive Maintenance
TTR- Time To Repair
xx
Chapter 1
Introduction
With the globalization of the economy and the increase of competition, companies have been
compelled, for some years, to integrate into their objectives not only economic ones that aim to
maximize profit but also others of a more strategic and vital nature for their survival. Some of these
non-economic objectives refer to social, safety and comfort needs of its employees, environmental
ethics, product quality, relations with suppliers and distributors, efficient management of the assets
needed for production, etc. [1]. In the specific case of maintenance activities, companies are
increasingly concerned about integrating a set of specifications, in addition to the usual technical
specifications, in the process of purchasing equipment, at the time of drawing up the specifications,
which translate the acceptable limits of reliability, availability, maintainability and safety (people and
environment).
In recent years, the Oil and Gas Industry has faced a major crisis on the demand side, causing the
marketing price of the oil barrel to go into freefall mode, going from values around 100 dollars to 20
dollars per barrel. This trend has forced several companies operating in the industry to search for
means and policies to make production operations justifiable within the current context; so, with the
drop in the price of a barrel of oil, the ongoing challenge is to keep production operations viable and
optimize the production costs and consequently the operating costs1 . It is known, however, that
operating losses due to equipment failures lead to a reduction in operational efficiency and can
translate into high economic losses for the company.
A recent study in the Oil and Gas research matter recognizes that the fall in the price of a barrel of oil
increasingly requires companies to increase their focus and accuracy in the performance of operations;
where performing a good asset reliability and integrity is a good starting point [2]. In this scenario, the
asset maintenance sector gains strategic and financial importance, since it is an instrument that seeks
to ensure the full functioning, within technical standards, of equipment during its lifetime [3]. It is for this
1 Operating costs are expenses associated with the maintenance and administration of a business on a day-to-day basis.
Production cost, actually is part of Operation costs, which refers to the cost incurred by the company to produce its good or
a service. Traditionally, production costs include a variety of expenses including, but not limited to, labor, raw materials, general
overhead etc.
1
and other factors that in the last 20th century, it was established that maintenance should be an integral
part of the production strategy for the overall success of a company [4]. However, it can be said that,
reducing the production or operating costs also means to use rationally the resources or equipment
needed to complete the production function, ensuring the proper functioning of the machinery while
maintaining a level of high reliability, avoid failures or unscheduled maintenance actions, optimize the
repair time of the machines and, consequently, increase their availability, are measures that certainly
lead to a reduction in operating costs.
1.1 The Oil and Gas Upstream Industry

The activity in the Oil and Gas sector can be broadly divided into three sectors: An upstream,
midstream and downstream sector. The upstream sector is the part of the Oil and Gas industry
involved with finding oil fields and bringing oil up to the ground. Upstream activities include exploratory
work, such as the search for underground (or underwater) Oil and Gas reservoirs, and the initial drilling,
followed by the production phase, which is the actual extraction of oil from the ground. A whole range of
different structures (platforms) are used offshore for drilling and production operations, depending on
size and water depth. For depths more than 2000 meters, the most used platforms are the
semi-submersible platforms (for drilling and production), rig vessel (for drilling only) and FPSO
(Floating, Production, Storage and Offloading) for oil production, storage and transfer of oil [5]. Figure
1.1 illustrates the three types of platforms cited above.
(a) Drill Rig [5] (b) FPSO [5]
(c) Semi-submersible [5]
Figure 1.1: Offshore platforms used in Oil and Gas
In addition to oil prospecting, water, gas and sediments are also present in the oil reservoir, meaning
2
that when the petroleum comes up to surface it is a mixture of three fluids: Oil, gas and water. Figure
1.2 illustrates a typical Oil and Gas reservoir, in which the presence of oil, gas and water can be
observed.
Figure 1.2: Typical Oil and Gas reservoir [6]
It is thus recognized that for the processing of the three types of fluid (oil, water and gas), a set of
equipment should be required to process them. This processing will usually depend on economic
viability criteria, which will require simpler processing units, based on decantation, use of separating
vessels and other physico-chemical processes, or even more complex processes including oil
treatment, gas, and treatment of water for disposal or re-injection into the wells, and normally recent Oil
and Gas platform use the complex process configuration [7]. Although there is a wide range of sizes
and layouts, most production facilities have many of the same processing systems illustrated in the
figure 1.3.
Figure 1.3: Oil and Gas Production Overview [8]
In figure 1.3, the mixture oil, gas and water coming from the production manifolds enter in different
separation stages, which the main objective is to separate the mixture (oil, gas and water), originating
3
three different process: Water treatment, oil treatmenent and gas treatment.
Until a few years ago it has been a common practice in oil fields to flare associated petroleum gas
resulting from the initial treatment of crude oil. To minimize this practice, the Oil and Gas Industry has
decided to process the gas, and for this reason has historically employed two options, gas re-injection
and monetization, to use the associated gas; these options are only extensively used in a few
economically developed oil producing countries [9].
When the gas leaves the separators, it has generally lost so much pressure that it must be
recompressed to be transported. The compression process includes a large section of associated
equipment such as compressors, scrubbers (to remove liquid droplets) and heat exchangers, lube oil
treatment, etc. The compressors act as the main important device in this process [10].
Figure 1.4 illustrates a typical part of gas compression Piping and Instrument Diagram (PID) in which a
typical compression process is shown containing some of the traditional instruments used in a
compression of gas 2 . Incoming gas is first cooled in a heat exchanger, It then passes through the
scrubber to remove liquids and goes into the compressor. The anti-surge loop and the surge valve
(ASV 101) allow the gas to recirculate.
Figure 1.4: Illustrative gas compression PID [11]
1.2 Literature Review

The literature review of this work is divided into two subsections. In the first subsection the work
concerning the centrifugal compressor is reviewed, with a special highlight to its relation with the Oil
and Gas Industry. The second subsection explores the study of the RAMS (Reliability, Availability,
Mantainability and Safety) analysis, by reviewing its application in the industry, with a special attention
2 Note that several instruments are not shown for simplicity
4
to its application in the Oil and Gas Industry as well as some attention to works related to the gas
compressors.
1.2.1 Centrifugal Compressors
It is often necessary to increase the pressure of a gas for processing, storage or transport reasons.
There are two fundamental different principles used to compress gases: Intermittent (or discontinuous)
flow mode and continuous flow mode. Intermittent flow mode is related to the positive displacement
compressors and continuous to the dynamic and the ejector compressors [12][13]. Positive
Displacement Compressors are discontinuous flow machines, they induce a fixed volume of gas into a
pocket, chamber or cylinder for compression. The size of this pocket is then reduced mechanically,
compressing the gas. At the end of the compression cycle the pocket opens, discharging the
high-pressure gas. Often only one or two stages of this compression process are required. There is
never an open gas passage from delivery to suction (except for leakage through the clearances
between the moving parts). Dynamic Compressors are continuous flow machines, they use rotating
vanes or bladed discs to sequentially accelerate the gas (increasing its energy) then decelerate it
(trading kinetic energy for increased pressure). This normally requires a number of stages, often within
the same casing. Dynamic compressors always have an open gas route through the machine. In
summary, the reciprocating compressor increases the pressure of gas by positive displacement,
employing linear movement of the driveshaft, in the other hand, centrifugal compressor3 does it by
mean of mechanical rotating vanes or impellers [13]. For a better understanding of compressors
classification, the figure 1.5 illustrates its classification.
In the oil and natural gas sector, the most prevalent types of compressors used are the reciprocating
(positive displacement) and centrifugal (dynamic) compressors, being the centrifugal compressor, also
known as radial compressor, considered as a critical equipment in a wide variety of application in the
process industry [13] [14]. Figure 1.6 illustrates the two types of gas compressors most used in the Oil
and Gas Industry.
According to literature, between 1950 and 1960, the centrifugal compressor became popular because
its efficiency was comparable to that of the reciprocating compressor (cited as the most used
compressor until the 1960s), and because of its much lower maintenance costs. Nowdays, the
centrifugal compressor is appointed as the main compressor in the process and pipeline industries and
due to its compact, lightweight and low energy consumption, is used extensively in the offshore
industry [12][17].
Most of the Centrifugal compressors in the Oil and Gas industry are driven by gas turbines or electrical
motors. Often, several stages in the same train are driven by the same motor or turbine.The main
3 TheAmerican Petroleum institute (API) has produced an industry standard, API Standard 617, which is frequently used for the
design and manufacture of centrifugal compressors.
5
Figure 1.5: Compressor classification (Adapted from [17])
(a) Centrifugal Compressor [15] (b) Reciprocating Compressor [16]
Figure 1.6: Types of compressors
operating parameters for a compressor are the flow and pressure differentials. Larger Oil and Gas
installations use centrifugal compressors with 3-10 radial wheels, 6000–20000 rpm (highest for small
size), up to 80 M W load at discharge pressure of up to 50 bars and inlet volumes of up to 500000
m3 /hour and pressure differential up to 10 [10].
The main purposes of gas compression offshore are for: Gas export, gas re-injection to wells, gas lift
and for fuel gas.
6
1.2.2 RAMS (Reliability, Availability, Maintainability and Safety) Analysis
According to standard NP EN 50126 [18], ”The safety and availability objectives of a system in
operation can only be achieved if all reliability and maintainability requirements are met and
maintenance and operation activities are monitored throughout the system’s life cycle, as well as the
environment in which it operates”. On the other hand, due to the design problems and poor product
support, manufacturer equipment and systems are not able to meet these requirements. However, with
proper consideration of reliability, availability and maintainability (RAM) in the design, manufacturing,
and installation & operation phase, the number of failures can be reduced, and their consequences
minimized [19]. This method of integrated approach to the reliability, maintainability, availability and
safety characteristics of an equipment is known as the RAMS (Reliability, Availability, Maintainability
and Safety) [1].
There are many reasons why a product might fail. Knowing, as far as is practicable, the potential
causes of failures is fundamental to preventing them. It is rarely practicable to anticipate all of the
causes, so it is also necessary to take account of the uncertainty involved. The reliability engineering
effort, during design, development and in manufacturing and service should address all of the
anticipated and possible causes of failure, to ensure that their occurrence is prevented or minimized
[20]. O’Connor & Kleyener [20] also discuss some of the main reasons why failures occur.
Because of its potential, RAM is one of the risk evaluation models that are applied in Maintenance and
Safety Integrity Management System [21]. With RAM analysis of the system, key performance metrics
such as Mean Time to Failure (MTTF), Mean Time to Repair (MTTR) and System Availability values (A)
can be ascertained. The information obtained from analysis helps the management in assessment of
the RAM needs of system [22].
RAM is considered to be one of the two most significant areas for profitability improvement [23].
Moreover, RAM modeling will contribute to an increased safety and environmental performance, which
is an important factor in maintaining the license to operate, by providing real and up-to-date data
concerning the actual state of the plant [21].
RAM analysis is a tool that has been applied in many industries since a long time ago; among them the
railway industry, the aeronautical industry, production plants especially process plants like chemical,
sugar, beverage, thermal, paper, nuclear and fertilizer. Kumar [24] presents an extensive study on the
state of the art of RAMS analysis applied in the most varied industries.
In the case of the Oil and Gas industry, the RAMS analysis has been the subject of study for several
reasons, among them, the optimizing production and minimizing costs, Measure upstream Oil and Gas
production availability/efficiency facilities, decision-making on actual productivity and financial figures,
7
etc. Centinkaya [25] assed the reliability and availability of Supervisory Control And Data Acquisition
systems (SCADA)4 used in offshore petroleum facilities by developing a fault tree and failure rate
analysis. Corvaro et al. [21] performed a Reliability, Availability and Maintainability (RAM) analysis to
propose a new maintenance approach of maintenance strategy of a reciprocating compressor. The
RAM study was based on failure rate and model data that are developed and compiled from a number
of sources, including the Enterprise Risk Management (ERM) experience and publicly available
process equipment failure rate databases such as OREDA5 (The Offshore and Onshore Reliability
Data). The effect of sensitive environment of the Artic Offshore on Reliability, Availability and
Mantainability of the Offshore Oil and Gas Production facilities is studied by Naseri [26] by developing
expert-based models for RAM performance analysis of such facilities; the results of the study shows
that the expected number of failures and expected downtimes in the Arctic offshore operations are
higher than those of normal-climate areas.
1.3 Topic Relevance and Motivation
Nowdays, there is a significant growth of maintenance within the strategy of companies, which is due to
the increase of complexity and the great insertion of automation in the most diverse production
systems. Modern Oil and Gas platforms are more complex, both in terms of structure and equipment,
but the need to ensure the reliability and availability of the facility is a current challenge for companies.
This is why the need to constantly seek the best ways to combine technologies and applications to
perfect the tools to make good decisions consistently in terms of maintenance and operations rate
optimization with equipment availability has become more indispensable [27].
This scenario has continuously forced the companies to pay more attention to the health of the
machines, triggering a process of searching for tools that can continuously improve the efficiency of
maintenance service by preventing failures and optimize the maintenance costs. Nowadays an
important attention goes also to the Human, Safety and Environment (HSE) aspects, since the
occurrence of a failure sometimes lead to a catastrophic situation affecting the integrity of the facility
and the people in the facility, as well as result in many cases major environmental pollution.
In an Oil and Gas plant, the unavailability of the compression system, requires that the gas has to be
burned and it also it can implicates losses on Oil and Gas production. On the other hand, there is an
obligation to preserve the environment; the burning of gas must be done in the most controlled way
possible, with some limits imposed since nowdays already exist standards that regulate this aspect.
Thus, it is undoubtedly important to ensure good compressor health as the main agent in the
compression process. Vinnem [28] explores major offshore accidents in the Oil and Gas industry, one
4 SCADA systems are used in production monitoring and control, well monitoring and control, process monitoring and control,
unmanned platform monitoring and control, pipeline systems, and drilling for offshore Oil and Gas in the Oil and Gas Industry
5 OREDA is a project organization sponsored by some Oil and Gas companies with worldwide operations
8
of which was Piper Alfa in 19886 , when a gas leak in the compression area started in the accident
claimed 167 lives.
Thus, the subject of this dissertation suggests the study of the machines, in this particular case, the
centrifugal compressors used in the process of natural gas production, by application of the Reliability,
Availablity and Mantainability (RAM) analysis (without the Safety factor S), in order to be able to
evaluate and propose actions o improve reliability, maintainability and availability, based on the past
and present machine data, so that so that compressors can be used in the most cost-effective way
throughout their operation phase, avoiding some unnecessary costs, bad operation of the machines,
taking into account that ensuring the integrity of the compressors, is at the same time ensuring the
integrity of the installation and the people on the platform.
1.4 Objectives
This dissertation focuses on the application of the RAM (Reliability, Availability, Maintainability)
methodology to the centrifugal compressors used in an offshore Oil and Gas plant. The general
objective is to propose a methodology to identify the critical components or system whitin the
compressors and with this perform a RAM analysis in the critical components. So, in summary it is
intended in this work, from a failure history of compressors to perform the following:
• Identify the critical components in compressors;
• Estimate the actual Mean Time To Failure (MTTF) and the Mean Time to Repair (MTTR) of critical
components;
• Determine the reliability values;
• Estimate the availability values of gas compressors;
• Develop a Failure Modes Effects and Criticality (FMECA) analysis of critical components
determined previously;
• Propose actions to improve the RAM (Reliability, Availability and Maintainability) parameters.
1.5 Thesis Outline

This work is structured in six chapters as the following:
• Chapter 1: In the first and current chapter the importance and the scope of this work is explained
and a brief introduction to the Oil and Gas industry is done as well as a literature review about
the application of the RAM analysis and the centrifugal compressors in the Oil and Gas industry
presented;
6 PiperAlpha was a North Sea oil rig operated by Occidental Petroleum Ltd. and Texaco owned 22 per cent of the shares. An
explosion and the resulting fire destroyed it on July 6, 1988, killing 167 people. Only 62 crew members survived.
9
• Chapter 2: The second chapter comprises the presention of the theory behind the RAM analysis,
which introduces the mathematical theory of reliability, maintainability and availability. The Pareto
analysis is introduced in this chapter and some tools associated to RAM analysis such as Failure
Modes Effects and Criticality (FMECA) analysis and the Reliability Block Diagrams are introduced
as well;
• Chapter 3: The objective of the third chapter is to give an overview on some maintenance concepts.
Therefore, the concept of maintenance in the view of several authors is presented succinctly as
well as their evolution; later the most common types or maintenance philosophies are presented
and, finally, some management models of the maintenance.
• Chapter 4: The fourth chapter presents the case study developed in this study. In this chapter the
company in which the study was applied is presented and the methodology to perform the study is
also introduced;
• Chapter 5: The fifth chapter presents the results obtained in this work. The aspects about
reliaility, maintainability of the critical components are compared and discussed and all the
recommendations actions are presented;
• Chapter 6: Finally, chapter six presents the main conclusions of this study and presents the
difficulties faced during the development of this study and presents the suggestions for future
work as well.
10
Chapter 2
RAM Analysis theoretical background
”Reliability is still seen as a focus for just a few large pieces of

equipment. Expanding the focus of reliability efforts can, in the
simplest terms, lead to less downtime and more profit”.
——————————————–
Steve Sonnenberg, the Chairman of Emerson Automation
Solutions, in ”2015 Emerson Exchange Conference”
In this chapter an approach will be made to the pillar concepts around the Reliability, Availability and
Maintainability (RAM) analysis. Thus, the theoretical and mathematical fundamentals of Reliability,
Availability and Maintainability are presented, which later will serve as the basis for the case study. It is
important to emphasize that the focus of this work falls simply on RAM (Reliability, Availability and
Maintainability) Analysis, leaving out the S (which stands for Safety) portion, but it is therefore worth
mentioning that in some parts of the text it may sometimes be referred to as RAMS to keep the original
meaning of the word. It is also made the introduction of some existing methods in the literature for the
treatment and classification of data needed for a RAM analysis, namely Pareto analysis, Failure Modes
Effects and Criticality Analysis (FMECA) and Reliability Block Diagrams (RBDs).
2.1 Introduction to RAM Analysis

The goal of RAMS is to create an input data for the assessment of the suitability of a system in a life
cycle1 . That is, to provide data on failure rates of the system, possible failure modes, Mean Time
Between Failures (MTBF), Mean Down Time (MDT), maintenance operations, hazards and their
consequences, etc. The output of RAMS analysis enables the life cycle specialists to calculate costs
and to perform Cost-Benefit Analysis (CBA) [29]. But before the RAMS analysis at the system level,
input in the known component failure data must be provided. There are then 3 different interlinked
steps in the application of RAMS [29]. This is illustrated in figure 2.1.
1 According to Assis (2014), the life cycle of a component or system consists of several phases that begin with its design and evolve
to design, manufacture, installation and testing, operation and maintenance, until its slaughter (deactivation and dismantling)
11
Figure 2.1: Representation of the steps which comprehend RAMS and life cycle [29]
Each one of the steps shown in figure 2.1 shall comprise different methods. A special focus shall be
given to the step 1 and 2, because these are the steps which are directly connected with the main
goals of this study.
[Step 1] RAMS data compilation (component level): Failure data compilation stands for the
foundation of any RAMS simulation or process. Whether it is made through testing or through observed
field operation and maintenance feedback, the study of individual component failure provides data on
failure rate and all other reliability parameters. This data is used as input for step 2, RAMS simulation.
[Step 2] RAMS simulation (system level): In this step, the goal is to model the system in terms of
reliability and availability aspects. That is, from values of failure rate, Mean Time Between Failures
(MTBF), Mean Down Time (MDT), among others from individual components, and to be able to
compute values which refer to the same components but while interacting with each other in a system
and a stated environment.
In the study of the steps outlined above, several analytical methodologies can be applied to achieve the
desired purposes, in this way, reliability analysis can be divided into qualitative and quantitative
analysis. Qualitative analysis is intended to verify the various failure modes and causes that contribute
to the unreliability of a component or system, whereas, quantitative analysis uses real failure data,
which can for example be obtained from operation or test programs, together with suitable
12
mathematical models to obtain quantitative estimates of component or system reliability, maintainability
and availability[30].
There are several qualitative and quantitative methods indicated by the bibliography, but in this work the
following methods will be approached essentially:
• Qualitative Methods
– Pareto analysis to identify the most frequent causes of failure and to rank failure modes on
a cost basis;
– Failure modes and effects analysis (FMEA) is a technique of identifying failures and the
consequences of failures within systems or components. When this is done together with a
criticality analysis the combined method is then called FMECA.
• Quantitative Methods
– Reliability Block Diagram (RBD) provides a very convenient description of how various
sub-systems interact to deliver the performance of the equipments as far as reliability is
concerned. RBD is a graphical representation of a system describing the function of the
system and shows the logical interconnections of components needed to fulfill this function.
The notion of a quantitative analysis of reliability is relatively recent, dating back to 1940s, at which time
mathematical techniques, some of which were quite new, were applied to many operational and
strategic problems in World War II. Prior to this period, the concept of reliability was primarily qualitative
and subjective, based on intuitive notions. The needs of modern technology, especially the complex
systems used in the military and in space programs, led to the quantitative approach, based on
mathematical modeling and analysis [30].
Since reliability is defined in probabilistic terms, the basic quantitative models are mathematical, based
on the theory of probability and its implementation requires the use of data. The qualitative and
quantitative methods presented here, will be more fully covered in the later sections of this work.
2.2 RAM Analysis characterization
As Kaplan[31] states: ”When the words are used sloppily, concepts become fuzzy, thinking is muddled,
communication is ambiguous, and decisions and actions are suboptimal, to say the least.”. It is
recognized that so far, it has been used the terms reliability, availability and maintainability without
much detail. Therefore, the purpose of this subsection is to introduce and approach concepts more
comprehensively about reliability, availability and maintainability, as they form the basis for the
application of RAM analysis.
13
2.2.1 Reliability
2.2.1.1 Reliability concept and mathematical theory
The standard NP EN 2007 [32] defines reliability as the ”Ability of an item to perform a required function
under given conditions for a given time interval”. Assis [1] includes the term probability in the concept of
reliability by defining it as ”the probability of an organ functioning satisfactorily (or fulfilling the required
function) for a certain time (or mission) under specified conditions”. In fact, the term probability has
already been cited in the concept of reliability by several authors, such as, Rousand & Hoyland [33] and
Amaral [34].
From engineering point of view, reliability is often more accurately defined as the measure of a
product’s ability to operate without failure [1]. It is important to emphasize the close connection
between the concepts of reliability and quality. Indeed, in order for a given product to be reliable, it is
necessary that the upstream design and manufacture of that product should occur with adequate
quality levels [34]. One can look at reliability as the ability of a product to continue to present quality
over time. Reliability is therefore an extension of quality in the time domain [33].
From now on, reliability or the probability to survive will be denoted by R(t). To introduce the study of
reliability mathematics, Assis [1] considers that a very large number of equal components N0 are to be
tested and that the test is all carried out under the same conditions for a long period2 . Under these
conditions, all components are equally likely to fail. At each moment t throughout the test, each
component presents an increasing probability of failure F (t) and decreasing probability to survive R(t).
Figure 2.2 illustrates the behavior of the functions R(t) e F (t) in the time domain.
(a) Survival probability R(t) (b) Failure probability F (t)
Figure 2.2: Survival and Failure probability [1]
At a certain moment t, if there are Ns surviving components and Nf failed components, the following
2 This way of starting the reliability study is also presented by several sources in the bibliography
14
expressions allow to intuitively calculating the survival and failure probabilities, respectively:
Ns (t)
R(t) = (2.1)
N0
Nf (t)
F (t) = (2.2)
N0
It is important to note that survival and failure situations are two mutually exclusive events3 , the two
probabilities are effectively complementary.
Observing at figure 2.2, as t increases, R(t) decreases until it equals 0 and F (t) increases to 1. This
detail is very important and taking into account that:
N0 = Ns + Nf , (2.3)
then, combining with the equations 2.1 and 2.2, easily the following conclusion can be made:
R(t) + F (t) = 1 (2.4)
The equation (2.4) represents a very important relation in the study of reliability, since it proves the
relation of complementarity that exists between the probabilities of survival and failure of a component.
Another important function in the study of reliability is the function f (t), which represents an
instantaneous probability function commonly designated in statistics by probability density function.
Applied to reliability, f (t) is called the failure probability density function and translates the percentage
of the elements failing at time t per unit time (or time interval dt) relative to the initial population N0 , in
other words, f (t) indicates the likelihood of failure for any t, and it describes the shape of the failure
distribution.
Assuming that T is the time to failure (of a generic component, for example) and is a random variable,
mathematically the probability density function f (t) is defined as [33]:
d(F ) F (t + ∆t) − F (t) P (t < T ≤ t + ∆t)

f (t) = = lim = lim (2.5)
dt ∆t→ 0 ∆t ∆t→ 0 ∆t
By the inverse reasoning (and considering the initial hypothesis of Assis [1]), if the function f (t) is
integrated between the moment the elements are placed into operation until a generic moment t, the
3 In logic and probability theory, two propositions (or events) are mutually exclusive or disjoint if they cannot both be true (occur)
15
result obtained is the failure cumulative distribution function F (t), which represents the probability that a
component fails before the time t, and is mathematically represented by:
Z t
F (t) = P (T ≤ t) = f (t)dt, t > 0 (2.6)
0
In practice engineers are often interested in the failure rate or conditional failure rate function, which
gives the probability that a system or component fails after it has been in use for a given time. In other
words, this function represents the probability that an item or component will fail in the time interval
(t, t + ∆t) when is know that the component is functioning at time t, and is mathematically translated by
[33]:
P (t < T ≤ t + ∆t) F (t + ∆t) − F (t)
P (t < T ≤ t + ∆t | T > t) = = (2.7)
P (T > t) R(t)
By dividing this probability by the length of the time interval, ∆t, and letting ∆t −→ 0, it is possible to get
the failure rate function λ(t) of the component:
P (t < T ≤ t + ∆t | T > 0) F (t + ∆t) − F (t) 1 f (t)

λ(t) = lim = lim = (2.8)
∆t→ 0 ∆t ∆t→ 0 ∆t R(t) R(t)
Since
d d
f (t) = F (t) = (1 − R(t)) = −R0 (t) (2.9)
dt dt
then
−R0 (t) d
λ(t) = = − lnR(t) (2.10)
R(t) dt
Integrating the expression 2.10 from 0 to t and taking into account that R(0) = 1 (from the figure 2.2(a)),
then: Z t
λ(t)dt = −lnR(t) (2.11)
0
and
Rt
R(t) = e− 0
λ(t)dt
(2.12)
The expression (2.12), which allows the calculation of reliability of a given item according to its failure
rate equation, is considered to be the most general mathematical description of the reliability function.
It is applicable to all and whatever the probability distribution of failure might be [35]. The figure 2.2(a)
illustrates the behavior of the reliability function, where the following important conclusions can be made:
• R(0) = 1, meaning that the item is (or must be) assumed not to be malfunctioning at the start of
its operation (t = 0);
16
• R(∞) = 0, which is also logically assumed that no item can operate infinitely (t = ∞) without
failing.
Between these two limits, R(t) is a continuous function in t, usually decreasing from the time of the
beginning of the operation.
The most usual measure of reliability is the Mean Time To Failure (MTTF). MTTF describes the
expected time to failure of a component. This indicator is quite popular because it gives a simple idea
of the reliability of any component, and mathematically is expressed by [1]:
Z ∞ Z ∞
MTTF = f (t)tdt = R(t)dt (2.13)
0 0
Another measure in reliability analysis is the Mean Time Between Failures (MTBF) which represents
the time between two consecutive failures. MTBF and MTTF are two distinct terms, which in practice
are often used interchangeably.
Rousand & Hoyland [33] state that when the time required to repair or replace a failed item is very short
compared to MTTF, MTTF also represents the mean time between failures (MTBF). If the repair time
cannot be neglected, MTBF also includes the Mean Time To Repair (MTTR)4 . From this point of view
presented by Rousand & Hoyland [33], Assis [1] adds that, the MTTF can be used as a measure of the
actual operating time between each two failures (does not include the recovery time that follows), which
in this context takes the meaning of Mean Up-Time (MUT)5 .
Thus, according to what was presented above, it can be seen that the MTTF is an indicator of reliability,
while the MTBF is an indicator that combines the actual operating time MTTF (which is related with
Reliability) and the stopping times in repair MTTR (which is related with Maintainability).
2.2.1.2 The Mortality Curve
The functions described above (f (t), F (t), R(t) and λ(t)) can be used as representations of the life of a
component-in fact, they are laws of life, since they can be generalized to all living beings [1]. The function
λ(t), as illustrated in the figure 2.3, represents the curve of mortality (or survival) and is commonly also
known as the bathtub curve. In this curve there are three characteristic periods, namely:
• Infant Mortality Period: Analysis of mortality in this period refers to the component when it is new,
and tends to show a high rate of damage, however, it consequently decreases more or less quickly.
This is due to particular defects in design, manufacturing defects, improper installation, etc. The
4 Thisconcept will be better introduced in the subsection 2.2.2
5 The difference between the various time used in reliability theory will be better covered when the concept of availability is
introduced in subsection 2.2.3
17
Figure 2.3: Mortality curve [36]
way to minimize this problem is to establish rigorous policies at the project and manufacturing level,
complemented by tests prior to effective entry into service;
• Useful Life Period: In this period of maturity, the failures are of extrinsic origin and are often due to
higher-than-projected operating requests occurring casually and may lead to serious accidents. In
the useful life of a component, the instantaneous failure rate λ(t) is approximately constant, since
it is not time dependent and is simply referred to as the failure rate;
• Wear out Period: This period is characterized by the increasing failure rate. The greatest interest
in the analysis of this period is the fact that one tries to know its probable beginning, so that a
possible entrance in regime of wear-out can be avoided. The onset of the wear period leads to
a progressive increase in the instantaneous rate of damage resulting from the phenomenon of
degradation with the undesirable economic and safety consequences that may arise. For this
reason, it is justified to implement a policy of preventive maintenance, which can carry out a repair
action, or replacement of the component, with the aim of extending its nominal life.
2.2.1.3 Life Distributions (Reliability Models)
Statistical distributions are used to study each phase of the mortality curve. In reliability engineering,
the more usual statistical distributions used to perform a reliability analysis are the Exponential
distribution, the Normal distribution and the Weibull distribution. In this work the main focus will be on
the exponential and Weibull distributions.
Exponential distribution
The exponential distribution is one of the most commonly used distributions in reliability analysis due to
its simplicity [33]. The exponential distribution is used to model the behavior of components that have a
constant failure rate (or components that do not degrade with time or wear out). The probability density
function for the exponential distribution is defined as:
f (t) = λe−λt , t ≥ 0 (2.14)
18
One property of this distribution is that the failure rate function is a time independent, it is a constant. To
illustrate this affirmation, consider the following:
Z t Z t
F (t) = f (t)dt = λe−λt dt = 1 − e−λt , (2.15)
0 0
and by applying the equation (2.4), then:

R(t) = e−λt (2.16)
Once R(t) is known, the failure rate function λ(t) is given by:
f (t) λe−λt
λ(t) = = −λt = λ, (2.17)
R(t) e
which can be confirmed that is a time independent constant, and is represented by the parameter λ. It
is worth noting that in the modeling of many complex systems it is assumed that only random
component failures are important. This enables to assume the use of the exponential distribution a
’running-in’ process removes initial failures6 and the time to ultimate failure is usually long.
In figure 2.4 are presented the failure rate function (figure 2.4(b)) and Probability density function f (t)
(figure 2.4(a)) and the Reliability function R(t) (figure 2.4(c)) of the exponential distribution.
(a) Probability density function f (t) (b) Constant failure rate (c) Reliability function R(t)
Figure 2.4: Exponential distribution [37]
Weibull Distribution
The Weibull distribution was created in 1937 by the Swedish Waloddi Weibull as a basis for his
research on material fatigue, and was presented scientifically in the United States in 1951. Since then
hundreds of scientific articles have been published alluding to the distribution of Weibull and its
numerous applications. It is a distribution that allows the analysis of reliability in the three phases of the
6 Running-in,is the procedure of conditioning a new piece of equipment by giving it an initial period of running, usually under light
load, but sometimes under heavy load or normal load
19
bathtub curve, namely in the phase of infant mortality, useful life and wear out phase.
Assuming that there are no failures - or, more realistically, that it is admitted that there are no failures
observed in a component, before a given instant γ, the most general distribution of Weibull is that of
three parameters -which takes the following form when adapted to the reliability analysis notation [35]:
β t − γ β−1 −( t−γ β
f (t) = ( ) .e η ) , t ≥ γ ≥ 0, β > 0, η > 0 (2.18)
η η
and wherein the respective parameters have the following meanings:
• γ: Location parameter, or life without malfunctions. It is the lower limit of the domain of t considered;
• η: Scale parameter, or characteristic life. It is a measure of the central tendency of the distribution;
• β: Shape parameter, indicates the failure rate characteristic:
– β < 1 : Infant mortality;
– β = 1: Useful life (random failures);
– β > 1 : Wear out failures.
From Figure 2.5 it can be noted that Weibull distribution can be used along the entire length of the
bathtub curve, depending only on the value of the β variable. For example, for the case that β = 1, it is
equal to the case of the exponential distribution that is however applied in case that the failure rate is
constant.
By applying the equations (2.4), (2.6), and (2.10), the expression for realibility and failure rate will be
respectively:
t−γ β
R(t) = e−( η ) (2.19)
β t − γ β−1
λ(t) = ( ) (2.20)
η η
The Mean Time to Failure (MTTF) can be calculated by:
1
M T T F = γ + η × Γ(1 + ) (2.21)
β
The effect of parameter β in f (t), λ(t) and R(t) are shown in the figures 2.5(a), 2.5(b) ad 2.5(c),
respectively.
20
(a) Effect of β in f (t) (b) Effect of β in λ(t) (c) Effect of β in R(t)
Figure 2.5: Effect of the parameter β [37]
2.2.2 Maintainability
When a system fails to perform satisfactorily, a repair is normally carried out to locate and correct the
failure. The system is restored to operational effectiveness by making an adjustment or by replacing a
component [38].
Maintainability is a characteristic of an item, expressed by the probability that a preventive maintenance

or a repair of the item will be performed within a stated time interval for given procedures and resources
(skill level of personnel, spare parts, test facilities, etc) [39]. In other words, maintainability is the
probability of isolating and repairing a failure in a system within a given time. Maintainability engineers
must work with system designers to ensure that the system product can be maintained by the customer
efficiently and cost effectively. This function requires the analysis of part removal, replacement,
tear-down, and build-up of the product in order to determine the required time to carry out the
operation, the necessary skill, the type of support equipment and the documentation [38].
An important measure often used in maintenance studies is the Mean Time To Repair (MTTR). MTTR
is sometimes called the Mean Down Time (MDT) of the component/equipment [33]. The differences
between them is that the total Mean Downtime, MDT, or mean forced outage time, is the mean time the
item is in a nonfunctioning state. The MDT is usually significantly longer than the MTTR, and normally
will include time to detect and diagnose the failure, logistic time, and time to test and startup of the item,
on the other hand, the MTTR is sometimes used only to denote the mean active repair time.
To express mathematically the MTTR, let D denote the down time (or repair time) after a failure of an
item. Let fD (d) denote the probability density of D, and let FD (d) denote the distribution function of D.
The mean time to repair is the mean (expected) value of D which is given by [33]:
Z ∞ Z ∞
MTTR = fD (t)tdt = (1 − FD (t))dt (2.22)
0 0
21
2.2.3 Availability
Availability is commonly used as the term for the combination of reliability and maintainability and it is
defined as the ability of an item (under combined aspects of its reliability, maintainability and
maintenance support) to perform its required function at a stated instant of time or over a stated period
of time [33]. Increase availability of equipment is one of the main objectives of any maintenance model
or strategy [34].
It is important to distinguish the differences between the availability A(t) at time t and the average
availability Aav .The availability at time t is defined as the probability of an item is functioning7 at time t
(A(t)= Pr (item is functioning at time t )). The average availability Aav , denotes the mean proportion of
time the item is functioning when the item is repaired after a failure.
Considering that µ is the repair rate and that the Mean Time to Repair can also be calculated as
1 1
MTTR = µ, and knowing that λ is the failure rate, and the M T T F = λ, according to Rousand &
Hoyland [33], the availability at a certain time (t) can be expressed by:
µ λ −(µ+λ)t
A(t) = + e (2.23)
λ+µ λ+µ
Now, considering a repairable component that is placed into operation and is functioning at time t = 0.
Whenever the component fails, it is replaced by a new component of the same type or repaired to an “as
good as new” condition. The sequence of lifetimes or up-times for the components can be written as T1
, T2 ,...Tn , and assuming that T1 , T2 ,..., Tn are independent and identically distributed, with distribution
function F (Ti ≤ t), for i = 1, 2, ... , n, and mean time to failure M T T Fi , mean down time M DTi . as
illustrated in figure 2.6.
Figure 2.6: States of a repairable component [33]
Supposing that was observed a component until repair n is completed according to the figure 2.6 . Then
it can be written:
n
1X
Ti → E(T ) = M T T F (2.24)
n i=1
7 In this context, the term “functioning” means that the component is either in active operation or that it is able to operate if required.
22
n
1X
Di → E(D) = M DT (2.25)
n i=1
As the average availability has been defined as the proportion of time where the item has been
functioning, the proportion of the time in which the item has been functioning is:
Pn 1
Pn
Ti
i=1P n i=1 Ti
Aav = Pn n = 1
Pn 1
Pn (2.26)
i=1 Ti + i=1 Di n i=1 Ti + n i=1 Di
Then, the average availability can be written as follow:
E(T ) MTTF
Aav = = (2.27)
E(T ) + E(D) M T T F + M DT
From the equation (2.27) it can be concluded that the increase of reliability (MTTF) would lead to an
increase of the availability, since the influence of maintainability will decrease. The larger the time to
failure in comparison with the down time, the more the availability will approach to one. This leads to an
understanding that the plant would be always available. This is of course desirable, however non
realistic. The challenge is to assign the maintenance resources in the plant in such a way that reliability
is maximized while availability is approaching to one as close as possible against optimized costs.
In equation (2.27) MDT (mean down time) is used instead of MTTR to show that it is the total mean
down time that should be used and not only the mean active repair time. Moreover, MTTR can as well
be used to express the availability taking into account only the time spent in active repair; this
availability excludes preventive maintenance downtime, logistic delays, supply delays and
administrative delays and considers only the corrective downtime or the active repair time, and is called
inherent or intrinsic availability [40].
MTTF
Aintrinsic = (2.28)
MTTF + MTTR
In addition, it is important to clarify and to reinforce the difference between the measures of time that
were referenced throughout the text. Several authors in literature interpret differently some measures of
time used in the theory of reliability, but to be in accordance with the exposed throughout the text, in
this work the convention that will be used is that as illustrated in the figure 2.7.
23
Figure 2.7: Difference between the time measures
2.3 Pareto Analysis
It is common in a RAM analysis to begin by first defining which failures or components should be paid
more attention. Therefore, for components already in the phase of operation, based on their failures
history, it is possible through some methods to prioritize the most critical components, as well as the
most prevalent failures, for this reason, the Pareto analysis gains here its importance as a method to
arrive in such conclusions.
In the nineteenth century, the Italian economist Vilfredo Pareto observed that about 80% of the
country’s wealth was controlled by about 20% of the population This observation led to what is now
known as Pareto principle; its also known as ”80-20” rule [41].
Pareto analysis is a statistical technique that is used in decision making for the selection of the limited
number of tasks that produce the most significant overall effect. It uses the concept based on
identifying the top 20 of causes that need to be addressed in order to resolve 80% of the problems.
Pareto Analysis is a type of analysis that has many applications, namely in areas such as
manufacturing, quality, supply chain, reliability, etc. Normally in reliability, it is applied during the work of
a failure analysis when it is used as criterion, to look for the potential events with greater implication in
the maintenance activities [42].
A Pareto histogram is a graphical representation of the results of a Pareto analysis. It lists data in
descending order of value, and displays a cumulative percentage curve through the right side of the
first bar. The Pareto histogram chart can be used successfully for identification of the events with major
downtime or maintenance costs; however, the deficiencies of this method are [43]:
• The Pareto histogram analysis is based on downtime (cost, or failure frequency) alone and cannot
identify which factors are dominant;
24
• Frequently occurring failures impact productivity, and are key reliability improvement tasks;
however, a Pareto histogram may miss identifying the events with low downtime or maintenance
cost and high failure frequency.
Pareto analysis has been widely used to solve practical problems at the level of many industry branches,
for example, Hossen et al. [44] applied pareto analysis to examine the stoppage losses in a textile
industry; with the application of pareto analysis they could conclude that idling and minor stoppage and
breakdown losses are responsible per 89.3% of total stoppage losses. Hanif & Agha [45] performed a
failure analysis in a centrifugal pump, by applying Pareto analysis, it was ascertained that 80% of the
failures were due to mechanical failures, seal leakages and material problems.
2.4 FMECA Analysis
Once the critical components and their most frequent failures are identified, there is a need to study
these failures in greater depth, in such a way that actions can be taken in order to mitigate their risks
and their consequences. One of the tools widely used for this purpose is the Failure Modes Effects and
Criticality Analysis (FMECA).
Failure Mode Effects and Criticallity Analysis (FMECA) is an extension of the Failure Modes and Effects
Analysis (FMEA), which was one of the first systematic techniques for failure analysis. It was developed
by reliability engineers in the 1950s to study problems that might arise from malfunctions of military
systems [33]. An FMEA becomes a failure mode, effects, and criticality analysis (FMECA) if criticalities
or priorities are assigned to the failure mode effects [33]. The FMECA result reveals the failure modes
with greater probability of occurrence and greater severity, allowing the implementation of
countermeasures that eliminate or attenuate these effects.
In the FMECA analysis (in the operation phase of component, equipment or system) all the failure
modes are identified and evaluated according to their criticality. This criticality is then translated into
risk and, if this is not acceptable, it will be necessary to take corrective actions, resulting from the
development of actions to mitigate the failure modes and thus reduce the risks.
Regarding the FMECA implementation methodology, the bibliography shows different perceptions, that
is, the steps for applying the methodology may vary according to each author. There are several
guidelines, standard and procedures for the application of FMECA methodology. The methodology
applied in this work is developed in the chapter 4.
25
2.5 Reliability of systems: The RBDs
From the point of view of reliability engineering, sometimes there is more interest in reliability and
availability analysis of a system, and not only in the component level. A technical system will normally
comprise a number of subsystems and components that are interconnected in such a way that the
system is able to perform a set of required functions [33].
There are several quantitative methods to evaluate the reliability of a system from its constituent
elements, within these evaluation methods, block diagram is one amongst. Block diagrams are widely
used in engineering and science and exist in many different forms. They can also be used to describe
the interrelation between the components and to define the system. When used in reliability subject,
the block diagram is then referred to as a reliability block diagram (RBD). A reliability block diagram is a
graphical representation of the components of the system and how they are reliability-wise related
(connected)8 . The figure 2.8 illustrates an example of a RBD consisting of series and parallel elements
(mixed).
Figure 2.8: Example of an RBD
There are different reliability-wise configurations possible and the RBD may be constructed from one
configuration or a combination, depending on the system. Here the series and parallel configurations
will be analyzed. Another interesting configuration is the M out of N configuration which will be also
highlighted.
2.5.1 Series System
In a series system, the components are associated as schematically illustrated in the figure 2.9. Briefly,
in a serial system, all components must work well for the system to work; it is enough for one
component to fail so that the system also fails.
In order to arrive at the expression that allows the calculation of n components connected in series,
8 It should be noted that this may differ from how the components are physically connected
26
(a) Logical diagram of blocks (b) Logic gate diagram (”and” gate)
Figure 2.9: Series systems
considering that R1 (t), R2 (t), ..., Rn (t) are the reliability of the respective components, and that E1 and
E2 , ..., En represent the events of components 1, 2 and n not to fail, respectively. Thus, since the
probability that a component will operate (without failing) for a period of time t is its own reliability, then:
P (En ) = Rn (t) (2.29)
However, the system in series implies that, the success of the system implies the success of all the
components, then:
Rs (t) = P (E1 ∩ E2 ... ∩ En ), (2.30)
and assuming that the components are independent 9 , then the reliability of the system is simply the
product of the individual probabilities of accomplishing the mission:
Rs (t) = P (E1 ) × P (E2 ) × ... × P (En ), (2.31)
the general form of the reliability of a series system is then:
n
Y
Rs (t) = Ri (t) = R1 (t) × R2 (t) × ... × Rn (t) (2.32)
i=1
From the equation 2.32 it is important to note that the reliability of a series system is never greater than
the lower reliability of its constituent components, thus:
Rs (t) ≤ min[(R1 (t), R2 (t), ..., Rn (t)] (2.33)
9 The failure of one of them does not change the reliability of the other
27
2.5.2 Parallel or redundancy system
Two or more components are in parallel, or redundant (in case they are equal), when all components
must fail for the system to fail, that is, if at least one of the components works, then the system continues
to run (does not fail). Parallel system normally are divided into active parallel and passive parallel (or
standby) systems.
2.5.2.1 Active parallel system
Active means that all components are operating during the system’s mission period. The figure 2.10
illustrates this type of system. As was done for the case of components in series, the same will be done
10
for the case of components in parallel. Thus, according to the system definition in parallel , it can be
mathematically written as:
Rs (t) = P (E1 ∪ E2 , ..., ∪En ) = 1 − P (E1 ∪ E2 , ..., ∪En ) = 1 − P (E1 ∩ E2 , ..., ∩En ), (2.34)
which results in:

Rs (t) = 1 − P (E1 ) × P (E2 ) × ... × P (En ) (2.35)
Thus, the reliability of the system will be:
n
Y
Rs (t) = 1 − [1 − Ri (t)] = 1 − [1 − R1 (t)][1 − R2 (t)]...[1 − Rn (t)] (2.36)
i=1
(a) Logical diagram of blocks (b) Logic gate diagram (”or” gate)
Figure 2.10: Active parallel systems
10 If just one function, the system will function
28
2.5.2.2 Passive parallel system or standby
It has been referred so far only to active redundancy, i.e situations where all components remain
energized during the duty cycle. In practice, however, it is often preferable to use passive
redundancies, also referred to as standby.
Passive redundancy consists of the application of an alternative component (or more) that goes into
operation in case the main component (or components) fails, avoiding the consequences of failure. The
figure 2.11(a) illustrates this type of system with a generic n components and the figure 2.11(b)) for the
case of 3 components represented in logic gates.
(a) Block diagram (b) Logic gates: ”or exclusive” and ”and” gates)
Figure 2.11: Standby system
To describe the reliability of this type of system with standby components, the Poisson discrete function
is adequate, from which it is possible to determine the probability of a given number of failures k in a
certain time interval t and the probability of the number of failures in that interval to be lower or equal to
a certain limiting number K. For a system of K redundant components (or tolerated faults), the
reliability of this standby system is given by [1]:
K
X (λt)k
RK = P (k) = e−λt (2.37)
k!
k=0
The equation (2.37), can be understood as the probability of not occurring any failure p(0) during the
period t, plus the probability of occurring one failure p(1), two failures p(2), up to K (corresponding to
the number of K redundant components). It should also be noted that this expression is valid only for
the following conditions [1]:
• The Switcher component SC always works (reliability close to 1);
• The various components (main and standby) are identical;
• The failure rate functions are constant;
• The failures of each component are statistically independent;
29
• The components are independent, the failure of one, does not cause harm to the other.
2.5.3 M-out-of-N System
When a system is in a M -out-of-N configuration, at least M out of total N components must be

operational for the system to be operational. Series configuration is therefore an N -out-of-N
configuration and parallel configuration is a 1-out-of-N configuration [33]. According to the existing
literature, this system is also known as partial redundant system [1][34]. An example of a system in
2-out-of-3 configurations can be seen in figure 2.12, where at least two components must function for
the system to function.
Figure 2.12: RBD in M -out-of-N configuration for 2-ou-of-3
The reliability function Rs for a system with N equal components (with reliability R(t)) in M -out-of-N
configuration is:
N
X N!
Rs (t) = [R(t)]n [1 − R(t)]N −n (2.38)
n(N − n)!
n=M
In addition to the configurations seen above, there are other more complex or even less complex
configurations that typically consist of joining two or more types of the as configurations previously
presented. Thus, for a better understanding of such configurations, it is recommended to read [1] or
[34] for further information.
30
Chapter 3
Maintenance
Maintenance strategies are a must in Oil and Gas and to ensure

that the business model remains sustainable for the long haul, it’s
important to follow up and check on the integrity of the assets.
——————————————————————————
Smartware Group, 2016.
3.1 Introduction to maintenance

The performance of a product or system depends not only on its design and operation, but also on the
servicing and maintenance of the item during its operational lifetime (e.g, changing oil in an equipment)
on a regular basis, adequate repair or replacement of failed parts or components, proper storage when
not in service, and so forth and, the fact that the equipment, machinery or other industrial systems are
subjected to failures for various reasons, requires a number of actions to be taken in order to avoid or
minimize the occurrence and consequences of the failure. These actions are a part of maintenance
activities [30].
According to the European standard EN 13306 [46], ”maintenance is the combination of all the
technical, administrative and management actions applied during the life cycle of a good, intended to
maintain or restore it to a state in which it can perform the required function ”.
The concept of maintenance has evolved over the time. The literature presents several points of view
about the maintenance concept; however, it is possible to identify common elements (keywords) that
elucidate the concept and function of maintenance:
• Ferreira [47] defines maintenance as: ”The act or effect of maintaining the necessary measures
for the conservation or permanence of something or situation” ;
• Dhillon [4] introduces in the concept of maintenance the satisfactory condition factor of
equipment, defining maintenance as ”all the actions necessary to maintain an asset or restore it
31
to a satisfactory condition”. The main objective is to quickly restore the equipment to its
operational readiness state using available resources;
• Kardec and Nasfic [48] affirm that in addition to performing its function, maintenance must
guarantee the reliability and availability of the physical item or installation, attending to the
process with safety, preserving the environment and with adequate costs, being the maintenance
mission.
3.2 An overview of maintenance history

The literature shows that the history of maintenance can be described according to the point of view of
several authors. However, similar to the concept of maintenance, there are key or common points in the
idea of each one, where most share the maintenance history in three periods. A neat summary of the
history of maintenance with reference to the three maintenance periods can be presented as below [49]
• First period: Previous to World War II, called maintenance of the first generation where the
availability of equipment and concern for the prevention of failures was not a priority. The
equipment were oversized, the designs were simple and their repair easy to execute and
therefore more reliable. Cleaning and lubrication were sufficient and there was no need to do so
systematically;
• Second period: Called maintenance of the 2nd generation, began in the 1950s, where the post-
war generated an increasing demand for products boosting the mechanization of industries, with
numerous and complex machines. Preventive maintenance plans were developed and there was
a concern with the downtime of the production equipment. The concept of preventive maintenance
then arises also the consideration that the failures in the equipment could and should be foreseen.
Maintenance costs have increased and more control is required;
• Third period: Started in the mid-1970s, was called maintenance of the 3rd generation. During
this period, new ways of maximizing the useful life of the production equipment were considered,
with a high availability and reliability, without any damage to the environment, greater safety, better
quality of the product and costs under control.
Maintenance has undergone great changes throughout its evolution, seeking to make processes
increasingly efficient, in a conscious and sustainable way. This justifies its concern about cost,
availability, reliability, safety and the environment. It is also for the same reason that some authors refer
that nowadays, there is probably another maintenance generation, the fourth generation, where the
focus is to maximize the effectiveness of an asset, minimize the failures, reduce losses and maximize
32
gains. For this, new challenges should be part of the daily routine: Risk Management, human reliability
and accuracy in measurement and profitability analysis.
3.3 Types of maintenance
In this work, the classification followed on the types of maintenance is based on the classification applied
by Total E&P Angola1 . Generally, maintenance is practically divided into two main groups: Corrective
maintenance and preventive maintenance.
3.3.1 Corrective maintenance
Corrective maintenance is divided into two types of maintenance: Unplanned corrective maintenance
and planned corrective maintenance.
3.3.1.1 Unplanned Corrective Maintenance
Is the maintenance carried out after the failure recognition and intended to put an item into a state in
which it can perform a required function. This type of corrective maintenance normally involves:
• Diagnosis of the failure (detection, location, analysis);
• Immediate corrective or palliative action (restoring to perfect or to only partial working order);
• Deferred corrective action, with or without modifications;
• A functional test.
3.3.1.2 Planned Corrective Maintenance
Consists on ensuring the availability of spare parts, tooling and any required resources (including
competent personnel) to restore at the earliest the availability of ’Important’ equipment that failed as a
result of a run-to-failure maintenance plan, hence, minimizing the consequence of the failure. It implies
that failure is anticipated and prepared for so that the consequence of the failure is minimized, which
implicates:
• The failure modes and causes have been identified - i.e. a Failure Mode Effects Analysis (FMEA)
has been conducted;
• The corrective actions to restore the equipment to an operational condition can be performed by
in-house maintenance team;
• All required spare parts and tools are readily available.

1 It should be noted that the of types of maintenance may vary according to the various sources in the literature.
33
3.3.2 Preventive maintenance
Preventive maintenance is the maintenance carried out at predetermined intervals or according to

prescribed criteria and intended to reduce the probability of failure or the degradation of the functioning
of an item. The preventive element relies upon the execution of maintenance tasks at periodic intervals,
either calendar or running hours based. Preventive maintenance may require machine shutdown/out of
service. In the Oil and Gas field, preventive maintenance activity aims at the following:
• To maintain the required fitness for purpose of production installations over the field expected life;
• To contribute to the prevention of incidents which may impact on the safety of personnel, equipment
or what may have serious environmental consequences;
• To optimize the maintenance regimes so that operational costs are minimized without
compromising the level of safety and integrity.
The following activities are given as examples of preventive maintenance in the Oil and Gas field:
• Measurement/checking, cleaning and greasing;
• Mechanical tests;
• Systematic replacement of parts;
• Functional tests (Emergency Safety Device (ESD) system, fire and gas, etc.).
Preventive maintenance is normally divided into two groups: Systematic preventive maintenance and
condition-based maintenance or predictive maintenance.
3.3.2.1 Systematic Preventive Maintenance
Systematic Preventive maintenance is carried out based on a predetermined interval of time, number of
operations, running hours, etc.
3.3.2.2 Condition-Based or Predictive Maintenance
The deterioration of material and equipment depends on real operating conditions and cannot be
determined in advance. Periodic or continuous measurements of observable and significant
parameters permit a better appreciation of the real state of deterioration of a piece of equipment as well
as the spacing or elimination of repetitive, costly and sometimes unjustified tasks.
Condition-based or predictive maintenance is based on the idea of not carrying out any maintenance
on a piece of equipment until it has reached the plant defined equipment operating thresholds i.e. close
to the point at which it will no longer be able to perform its required function. This type of maintenance
is a way of optimizing the preventive maintenance.
34
The concept of condition-based maintenance assumes that the three-stage procedure described below
is followed thoroughly and systematically:
• Measurements and observations: These are collected over a period of time, either periodically or
continuously;
• Processing of the measurements: These are reviewed, validated and represented in a form ready
for analysis;
• Analysis: This involves the close examination of the data.
Conditional preventive maintenance in the Oil and Gas industry mainly involve:
• On-site sampling and oil analyses (analysis in the on-site laboratory or external);
• Vibration monitoring and analysis;
• Thermography;
• Electrical measurement;
• Equipment performance parameters monitoring.
3.4 Maintenance Management Models
The maintenance practice obviously requires time and money to be effective. However, since these
resources are always scarce, it is necessary to analyze which equipment justifies this expenditure. At
present, and as a result of various technological innovations over the last years, there are new Industrial
Maintenance Management Models, which are used to define the maintenance philosophy, such as:
• RCM – Reliability Centered Maintenance;
• TPM – Total Productive Maintenance;
• RBM – Risk Based Maintenance.
In this work the philosophy that will be used for the case study is the Risk Based Maintenance and this
will be approached, taking into account the objectives of Total E&P Angola, once it is the recommended
Maintenance Strategy whitin the firm.
It will also be presented succinctly some pillars on the theory of TPM, since it is also a method with a
great relevance and also used in several organizations.
35
3.4.1 Risk Based Maintenance
Risk Based Maintenance (RBM) is the type of philosophy that prioritizes maintenance resources toward
assets that carry the most risk if they were to fail. It is a methodology for determining the most economical
use of maintenance resources. This is done so that the maintenance effort across a facility/equipment
is optimized to minimize any risk of a failure. A risk-based maintenance strategy is based on two main
phases [50]:
• Risk assessment;
• Maintenance planning based on the risk.
The maintenance type and frequency are prioritized based on the risk of failure.
Components/equipment that have a greater risk and consequence of failure are maintained and
monitored more frequently. Components/equipment that carry a lower risk are subjected to less
stringent maintenance programs. Implementing a Risk Based Maintenance process means that the
total risk of failure is minimized across the facility in the most economical way. The monitoring and
maintenance programs for high-risk components/equipment are typically condition-based maintenance
programs.
RBM has been defined also as a specific maintenance program developed using any of the
FMEA/FMECA and RCM. However, the FMEA/FMECA technique has been already introduced in the
previous chapter, and then the RCM will be approached as an input to perform Risk analysis.
3.4.1.1 Reliability Centered Maintenance
The idea behind Reliability Centered Maintenance (RCM) is to establish a logical process to design
appropriate maintenance activities to support complex systems, with optimal frequency, reduced
maintenance shutdowns and consequently decrease costs.
RCM identifies maintenance activities and their frequencies based on functional analysis of an
operational context. According to Smith & Hinchcliffe [51], the four unique features of the RCM
methodology are:
• Preserve functions;
• Identify failure modes that can defeat the functions;
• Prioritize function need;
• Select only applicable and effective preventive maintenance tasks.
These four features can be presented also as the answer to seven questions:
• Feature one:
36
– (1) What are the functions and associated performance standards of the asset in its present
operating context?
– (2) In what way does it fail to fulfill its functions?
• Feature two:
– (3) What causes each functional failure?
– (4) What happens when each failure occurs?
• Feature three:
– (5) In what way does each failure matter?
• Feature four:
– (6) What can be done to predict or prevent each failure?
– (7) What should be done if a suitable proactive task cannot be found?
In RCM, a failure mode effects and criticality analysis (FMECA) is normally used to answer questions 1
to 5.
3.4.2 Total Productive Maintenance
Total Productive Maintenance (TPM) is a system of maintaining and improving the integrity of production
and quality systems through the machines, equipment, processes, and employees that add business
value to an organization. TPM is an approach to maintenance management that was developed in
Japan in 1988 to support the implementation of just-in-time manufacturing2 and associated efforts to
improve product quality. TPM activities focus on eliminating the six major losses [33]
• Availability losses
– 1. Equipment failure (breakdown) losses. Associated costs include down-time, labor, and
spare part cost;
– 2. Setup and adjustment losses that occur during product changeovers, shift change, or other
changes in operating conditions.
• Performance (speed) losses
– 3. Idling and minor stoppages that typically last up to 10 minutes. These include machine
jams and other brief stoppages that are difficult to record and consequently usually are hidden
from efficiency reports. When combined, they can represent substantial equipment downtime;
2 Just-in-time is a production management system that dictates that everything must be produced, transported or purchased at the
right time. It can be applied in any organization, to reduce inventories and costs.
37
– 4. Reduced speed losses that occur when equipment must be slowed down to prevent quality
defects or minor stoppages. In most cases, this loss is not recorded because the equipment
continues to operate, albeit at a lower speed. Speed losses obviously have a negative effect
on productivity and asset utilization.
• Quality losses
– 5. Defects in process and reworking losses that are caused by manufacture of defective or
substandard products that must be reworked or scrapped. These losses include the labor
and material costs (if scrapped) associated with off-specification production;
– 6. Yield losses reflect the wasted raw materials associated with the quantity of rejects and
scrap that result from startups, changeovers, equipment limitations, poor product design, and
so on. It excludes the category 5 defect losses that result during normal production.
In this way it can be said that the TPM is a new attitude of managers and operators, tending to
maximize the overall efficiency, eliminating all losses, which means zero failures in the machines and
zero defects in the products and, in a way in general, this implies the participation of many sectors,
namely the Maintenance, Operations, Facilities, Design Engineering, Project Engineering, Construction
Engineering, Inventory and Stores, Purchasing, Accounting and Finance, Plant and Site Management,
etc. [34][52].
An inquiry raised by many companies is that if there exist a specific methodology to approach TPM.
Wireman [52] answers to this question by saying that the steps to implement TPM must be determined
for each company individually. These steps must be adjusted to fit individual requirement because the
type of industry/service/facility, production methods, services activities, equipment conditions, special
needs, problems, techniques, and levels of sophistication of maintenance vary dramatically from
organization to organization.
38
Chapter 4
Case Study
4.1 Introduction into the Case Study
The case study focuses on the centrifugal gas compressors of the Block 17 of Total E & P Angola.The
centrifugal compressors are the most common machines installed in the Block 17 FPSOs to compress
the gas coming from the wells. Gas compression devices are always part of a more or less complex
system where a failure on the system can lead to severe consequences as already seen in the
introduction of this work.
The objective is therefore to perform a study on the reliability, maintainability and availability of the
centrifugal gas compressors in the Block 17 of Total E & P Angola, applying the methods studied in the
previous chapters, so that through this analysis one can choose new paths and new decisions with the
aim of improving the current state of operation and constantly ensuring the integrity of the machinery.
This work was developed in two parts: The first part which the case study focuses on, was carried out
through an internship program, at Total E & P Angola and the second one was carried out at Instituto
Superior Técnico. Thus, the data used to perform the study were collected during the internship period
in the company.
Thus, in this chapter a brief introduction Total E&P Angola is made at first, and secondly, an
introduction will be made on the centrifugal gas compressors most used by the company, as well as the
various systems that they are constituted. Finally, all the methodology used to obtain the study results
is presented.
4.2 Total E & P Angola Company
Total E & P Angola is one of the main branches of the French company TOTAL S.A., one the biggest
energy producer in the world, present throughout the five continents, with operations in more than 130
39
countries and around 100,000 workers. In Angola, Total initiated its activities in 1952-1953, when it was
granted its first concession, in Angolan onshore and offshore – Kwanza Basin and Low Congo Basin.
Today, Total E & P Angola is active in the most productive areas of the country, with more than 1,700
workers1 and its main activities are related to the Upstream industry.
Total E & P Angola operates in the Blocks 17, 17/06, 32, 33 and signed with Sonangol in 2011 the
contract to operate in blocks 25 and 40 in the pre-salt in the Kwanza Basin. TEPA is also partner of
block 0, 14 and 39 and participates with about 13.6% in the Angola Liquefied Natural Gas (LNG)
Project. The Block 17 is the most productive block, and for this reason is known as the Golden Block.
4.2.1 Total’s Block 17: ”The Golden Block”
The Block 17 is a world-class oil fields block that covers nearly 4,000 square kilometers (km2 ) located
from 150 to 270 kilometers off the coast of Angola. This acreage has become the stage for a unique
industrial adventure, with 15 discoveries – developments that have set global benchmarks for the
industry – and a spectacular production outlook. By May 25, 2010, less than a decade after first oil
from the block, its cumulative production had reached 1 billion barrels. The map of Block 17 is
illustrated in the figure 5.5(a) and the share of the Block in the figure 5.5(b).
(a) Block 17 map (b) Block 17 shares
Figure 4.1: The Block 17
The Block 17 is made up of four FPSO: Girassol FPSO, Dalia FPSO, Pazflor FPSO and CLOV
FPSO-which were gradually brought on stream between 2001 and 2014. The FPSOs of the Block 17
are illustrated in the figure 4.2.
FPSOs may have some differences in one or another aspect, but generally modern FPSOs should
include the units as indicated in the figure 4.3. Figure topsides presents the particular case of FPSO
pazflor. An important aspect to note is that as one moves from left to right, safety precautions increase,
and the area of gas compression is one of the most dangerous areas as well.
1 Data based until December 2017
40
(a) FPSO Girassol (b) FPSO Dalia (c) FPSO Pazflor
(d) FPSO CLOV
Figure 4.2: Block 17 FPSOs
Figure 4.3: Topsides organization top view
4.2.2 The Maintenance Department
No company can survive in today’s competitive world without an ongoing effort to create new products,
develop new process and meet new expectations, and this is only possible if the company is well
organized, so that goals can be achieved. Thus, at this point the objective is to present the
Maintenance Department of Block 17 that is in charge of ensuring the full functioning of the machines
to which they are affiliated, one of them, the gas compressors. The figure 4.4 presents the organization
of Maintenance department at Total E& P Angola.
41
Figure 4.4: Total E & P Angola Maintance Department Organization
An important aspect to note in the organization of the maintenance department is the fact that Turbines
and Compressors deserve a different attention within the Mechanical service due to the importance
they have in the production process.
4.3 Centrifugal Gas Compressors
4.3.1 Centrifugal compressors overview
A compressor is a device used to increase the pressure of a compressible fluid. The fluid can be any
compressible fluid; either gas or vapor and can have a wide molecular weight range. So, it means that,
in the selection and design of any compressor, it is very fundamental the knowledge of the basic
properties of the gas or gases being compressed2 . In a centrifugal compressor, energy is transferred
from a set of rotating impeller blades to the gas. The designation “centrifugal” implies that the gas flow
is radial, and the energy transfer is caused from a change in the centrifugal forces acting on the gas.
Normally, most of compressors in Total E& P Angola are supplied by the vendor as a package, and this
package includes in almost of the cases a driver (which can be an electric motor or a gas turbine),
power transmission system, the proper compressor unit, lubrication system, control and monitoring
system, etc. Figure 4.5 illustrates the package of an FPSO Pazflor compressor, in which case the driver
is a turbine and in the figure 4.6 with an electric motor.
In most of the cases it is common to assembly the compressor, gearbox and driver on a common
single-lift baseplate. The baseplate is fabricated from structural steel and contains mounting pedestals
for each piece of equipment. In some cases, all of the auxiliary equipment needed to support the
2 This particularity of the influence of the gas properties on the machine function will be seen latter when the failures in compressors
are analyzed
42
compressor and its drivers (such as a lubricating oil system, a dry gas seal system, instrumentation,
and a local control panel) are also mounted on or within the baseplate.
Figure 4.5: LP-MP Compressor Package FPSO Pazflor
The figure 4.5 is the top view of a centrifugal gas compressor baseplate, in which the gas turbine,
gearbox and the compressor are installed. The compressor side is divided into two parts: Drive End
(DE) and Non Drive End (NDE) and the figure 4.6 illustrates a simplified Process and Instrument
Diagram (PID) of a centrifugal gas compressor inwhich the driver is an Electrical Motor
The compressors, as already mentioned before, are one of the main players in the process of gas
compression. In general terms, the compression processes in the FPSOs of Block 17 have the
following purpose:
• Provide Gas injection to reservoir;
• Supply gas to LNG (Liquefied Natural Gas) plant;
• Supply Gas Lift for artificial lifting of the risers;
• Supply Fuel Gas to power turbines;
43
Figure 4.6: Gas Compression Package with Electric Motor Driver
• Supply Blanketing gas for storage tanks.
4.3.2 Compressor package main systems
As seen before, when referring to the compressor, in fact, it is intended to mean the package and
not simply the compressor itself. Therefore, it is important to define the systems that constitute the
compressor package, in such a way that a boundary can be created between the other systems involved
in the compression process. The boundary that will be followed here is based on the model defined by
OREDA , as shown in figure 4.7, where the main systems of a centrifugal compressor are indicated. An
important note in this division made by OREDA is that the driver (Turbine or Motor) is not included in the
study of the compressor package, this system is treated separately because of its complexity and the
numerous subsystems that it presents, and for this reason, in this study the driver will not be included
within the boundary of the compressor package.
From figure 4.7 it can be seen that the main systems in the package of a centrifugal compressor are the
following:
• Power Transmission System (Gearbox);
• Compressor Unit System
44
Figure 4.7: Compressor main system components [53]
• Recycle Valve (Antisurge valve);
• Lubricating System;
• Shaft Seal System;
• Control and Monitoring System.
In the following subsections it will be detailed some aspects inherent to the systems considered above.
It should also be noted that these systems are divided by other subsystems or components. The
components within the systems cited above are presented in Appendix B.
4.3.2.1 Power Transmission System (Gearbox and couplings shafts)
The Gearbox is installed on common baseplate, between driver (Electric Motor or a Gas Turbine) and
Compressor to increase the output motor speed in order to match the required compressor speed. The
gearbox design in almost cases consists of the following main components: Main input shaft, torque
converter, fixed planetary gear, revolving planetary gear with output shaft3 .
The gearbox is driven by the driver through a Low Speed coupling, and is driving the compressor
through high speed coupling.
3 All above mentioned components are installed in the housing
45
4.3.2.2 Compressor Unit
The “Compressor unit” normally consists of a case (or casing housing), an internal bundle assembly
and a rotor.
The “Compressor Case” is a one-piece cast or forged steel barrel which contains the bundle and rotor
assembly. The case also contains nozzles with inlet and discharge flange connections to introduce flow
into and extract flow from the compressor. The compressor rotor is fundamentally an assembly of
impellers mounted on a steel shaft. Additional rotor components include miscellaneous hardware, such
as a thrust balance drum (balance piston), impeller spacers, a thrust disc, one or two couplings, etc. In
the figure 4.8 a compressor unit is illustrated, being the figure 4.8(a) the illustration of a casing in the
horizontal split and the figure 4.8(b) an illustration of a bundle assembly.
(a) A Compressor casing (b) Bundle Assembly
Figure 4.8: Compressor Unit
4.3.2.3 Recycle Valve (Antisurge valve)
The compressor needs minimum gas conditions (flow and pressure) at suction for a correct and stable
operation, avoiding catastrophic situations like the surge. The Antisurge is the system that protects a
compressor from surge by continuously calculating the distance between the compressor’s operating
point and its surge limit line. Normally, there is a controller which modulates a recycle or anti-surge
valve to prevent the compressor’s operating point from reaching the surge limit while maintaining other
process variables within safe or acceptable limits. Figure 4.9 illustrates a typical anti surge control
system.
46
Figure 4.9: Anti Surge Control System
4.3.2.4 Lubrication System or Lube Oil System
The Lube Oil System is intended to supply oil to the compressor and driver bearings and to the gears
and couplings. As can be observed in the figure 4.6, the Lube Oil is drawn from the reservoir by the
pumps and is fed under pressure through coolers and filters to the bearings. Upon leaving the bearings,
the oil drains back to the reservoir.
4.3.2.5 Shaft Seal System
The function of the sealing system is to avoid gas leakages. Normally these leakages are divided into
external leakage which is due to process gas leakages towards the atmosphere and internal leakage
which is due to process gas leakages between the compression stages. In most of the centrifugal
compressors, this system uses a dynamic seal called Dry Gas Seal that is mounted on the shaft of the
compressor with the function of avoiding gas leaks.
4.3.2.6 Control and Monitoring System
The compressor is constituted by a set of instruments equipped for the reading, monitoring and control
of several parameters inherent in its operation. The monitoring system is responsible for analyzing the
compressor’s operation 24 hours a day, 7 days a week, in a way connected to other systems that allow
visualizing all the information collected by the instrumentation. The indications provided allow to identify
symptoms of failures, such as for example, the existence of a problem in a certain component and this
allows to check if there is a problem in the compressor or if something is not operating as it should.
4.3.3 Actual Centrifugal Compressors Maintenance Plan
The maintenance of compressors is a fundamental aspect in guaranteeing the performance of the

machine, thus, the company adopts a philosophy of systematic preventive maintenance and
condition-monitoring to the compressors in such a way that the objectives of availability and reliability
can be fulfilled.
47
Briefly, compressors in relation to systematic preventive maintenance, after compressor start-up,
generally compressors follow the following philosophy of systematic preventive maintenance as follow4 :
• Preventive Maintenance after running 2,000 hours;
• Minor Overhaul after running 30,000 hours;
• Major Overhaul after running 50,000 hours.
Note that these running hours refer to the compressor and not to a particular system or component. In
the Appendix C are shown the activities that are performed in the Major Overhaul, being the main
maintenance activity performed in the compressor.
Regarding the Condition based maintenance, it is applied the vibration technique to the Driver (in the
case Gas Turbines only in the axial compressors) and the compressor itself in the bearings.
4.4 RAM Analysis
4.4.1 Methodology to classify the components
The purpose of this subsection is to describe the methodology used to classify the components by
levels of importance or criticality and, depending on the level of each component, propose the actions
that must be taken.
Thus, according to the figure 4.10, which summarizes the methodology followed, the components after
being accessed are classified into 3 classes, namely:
• Vital or critical components;
• Important components;
• Secondary components.
Vital or critical components: Are those components (classified with the letter V, as will be seen later)
that have a classification in terms of Consequence Versus Likelihood of Medium-High (MH), High-High
(HH) or High-Medium (HM), as will be seen after in the criticiality matrix (figure 4.11);
Important components: Are those components (classified with the letter I) that have a Consequence
Versus Likelihood classification of Medium-Medium (MM), Medium-Low (ML) or High-Low (HL), as
4 It
is important to highlight that the daily routine maintenance such as cleaning some parts of the machine, checking the oil level,
etc, are also performed
48
seen in the criticality matrix (figure 4.11);
Secondary components: Are those components (classified with the letter S) that have a
Consequence Versus Likelihood of Low-High (LH), Low-Medium (LM) or Low-Low (LL), as seen in the
criticality matrix (figure 4.11);
This methodology comprises 5 phases, namely:
1. Failure Data Collection (integrated with Pareto Analysis);
2. Consequence Assessment;
3. Likelihood Assessment;
4. Criticality Matrix;
5. Actions.
4.4.1.1 Compressor Failure Data Collection
In order to carry out the case study, it is necessary to have data available for analysis. The data that will
be used in this case study are obtained from several sources in Total E&P Angola, such as TOTAL’s
operation and maintenance registration system in use, from the Daily Hours files and from some
maintenance reports available in the company. The events that are considered failures are those
resulting in the unavailability of the machine.
The period analyzed is from 01/01/2010 to 30/09/2017. In order to limit the study It was took only into
account those failures that the details were properly identified, namely in aspects where it was possible
to know exactly the time to failure and the time of repair or downtime of the machine caused by the
failure.
4.4.1.2 Consequence Assessment
The Consequence Assessment classifies the component analyzed based on the assessment of its
failure consequences on the three criteria: Human Safety and Environment (HSE), Production effect
and Maintenance cost. Each criterion considered has three levels, High (H), Medium (M) and Low (L)
assigned with values 100, 10 and 1, respectively.
Human Safety and Environment (HSE) criteria

The aim of this criterion is to take into account the impact of loss of operational or process function on
Health Safety and Environment. The considerations about this criterion can be found in the table 4.1.
49
Phase
Collect failure
First
data
Consequence Does the failure or the component has a direct

Assessment impact with High (H) or Medium (M)?
Second
Phase
No No No
HSE Production MTN Costs
(H or M)? (H or M)? (H or M)?
Yes Yes Yes
Pre-Critical
Components
Likelihood
Phase
Assessment
Third
Criticality
Matrix
Fourth
Phase
Vital/Critical Important Secondary

Fifth phase
Preventive MTN RAM Analysis Planned Corrective MTN Corrective MTN

Actions
Figure 4.10: Methodody to perform the case study analysis
50
Table 4.1: Consequence level (A): Effect on Health Safety and Environment
Consequence High Medium Low
Effect on HSE Failure which Failure which might affect Failure which does not
immediately decreases the environment or is affect Safety and/or
the level of safety or necessary for health & Environment
affect the environment general welfare for the
personnel
Consequence 100 10 1
Level (A)
Production Criteria
The purpose of this criterion is to consider the impact loss of operational or process function could have
on production. The considerations about this criterion can be found in the table 4.2.
Table 4.2: Consequence level (B): Effect on Production
Effect on Failure of which Failure of which Failure of which does not

Production immediately causes a increases the risk of affect the production
loss of production production loss
Level (B)
Maintenance Cost Criteria

Regarding the maintenance cost, in this case it is defined taking into account the corrective
maintenance cost of the component due to the failure considered. The maintenance costs include
manpower (maintenance hours), spare parts, logistics and charges. The considerations about this
criterion can be found in the table 4.3.
Table 4.3: Consequence level (C): Effect on Maintenance cost
Maintenance Cost > 50 K e 10 K e < cost < 50 K e cost < 10 K e

cost/year
Level (C)
51
4.4.1.3 Likelihood Assessment
The Likelihood Assessment classifies the component analyzed based on the assessment on its
redundancy, failure rate & component technology. Each criterion considered, similar to the
consequence analysis, has also three levels, High (H), Medium (M) and Low (L) assigned with values
100, 10 and 1, respectively.
Redundancy Criteria
The redundancy rate5 is a ratio of the number of components (of the same capacity) installed in the
system necessary to ensure the associated function. The three likelihood levels for this criterion are
described in Table 4.4.
Table 4.4: Likelihood level (D): Redundancy
Likelihood High Medium Low
1×100% 2×100% 4×50%

Redundancy
2×50% 3×50% 6×25%
3×33% 4×33% 6×33%
4×25% 8×25%
Likelihood Level 100 10 1

(D)
Technology of Component Criteria

Another aspect to be considered is the technology of component. The more complex an equipment is,
the more attention is required to its reliability. The three levels for the classification of the different types
of equipment are described in Table 4.5.
Failure Rate Criteria

The Failure Rate is the number of equipment failure (discontinuance of the aptitude to accomplish the
required function) during a fixed period. The considerations about this criterion can be found in the
table 4.6.
5 Normally the redundancy rate is indicated in different process documents such as: process/operating description, PIDs (Piping
and Instrument Diagrams), PFDs (Process Flow Diagrams), etc.
52
Table 4.5: Likelihood level (E): Equipment/Component technology
Likelihood High Medium Low
High speed rotating

equipment (rpm > 3000)
All compressors Low speed rotating Check valves

Equipment technology equipment (rpm < 3000 )
with classical technology
Multistage pumps Vertical pumps P<500 Static equipment without

KW instrumentation (e.g.
vessels, pipes, tanks,
hydrocyclones, etc.)
Single stage pumps with Manual and electrical Manual valves

P > 500 kW handling devices
Vertical submerged HVAC classical

pumps equipment
Gas turbines Control Valves

(E)
Table 4.6: Likelihood level (F): Failure rate
High Medium Low

Likelihood
(Failure/year) (Failure/year) (Failure/year)
Failure Rate λ≥6 0.5 < λ < 6 λ ≤ 0.5

(F)
4.4.1.4 Criticality Matrix
In order to evaluate the final criticality, it will be necessary first to define a variable that integrates the
effect of all parameters analyzed on the consequence and likelihood. The variable representing the
consequence will be named as Consequence Index (H) and the variable representing Likelihood will be
named as Likelihood Index (K).
Consequence Index evaluation: Index (H)
As each of the three consequence criteria has been assigned a numerical coefficient (consequence
level, 100 or 10 or 1). Consequence Index is calculated as follows:
• By the multiplication of the three numerical coefficients A, B and C, Consequence Rating can be
53
assessed as (G) = (A) × (B) × (C). Thus, the function Consequence Index is then (H) = log(G).
Consequence classification based on 0-6 range: See Table 4.7.
Table 4.7: Consequence Index Table
Rating (G) 104 − 106 102 − 103 1 − 101
Index (H) = log(G) 4−6 2−3 0−1
Likelihood Index evaluation: Index (K)
As each of the three Likelihood criteria has been assigned a numerical coefficient (likelihood level, 100
or 10 or 1). Likelihood Index is calculated as follows: By the multiplication of the three numerical
coefficients D, E and F , Likelihood Rating can be assessed as (J) = (D) × (E) × (F ). Thus, the
equipment or component Likelihood Index is (K) = log(J). Likelihood classification based on 0-6
range: See Table 4.8.
Table 4.8: Likelihood Index evaluation
Rating (J) 104 − 106 102 − 103 1 − 101
Index (K) = log(J) 4−6 2−3 0−1
Final classification
After determining the probability of failure of the component and its consequence of failure it is possible
to present such results in an array of type (figure 4.11(a)): Probability versus Consequence. In the
criticality matrix, probability and consequence categories are organized in such a way that the lower
criticality components are in the lower left corner and the higher criticality components are in the upper
right corner as shown in Figure 4.11(b) .
The Criticality Matrix in figure 4.11 shows the final classification of the equipment into Vital (V),
Important (I) and Secondary (S) based on the results of the two assessments above.
According to the company requirements, whichever the final criticality classification resulting from the
criticality study, equipment where HSE consequence is assessed as “High”, shall be classified as
”Vital”. In this way, by oberserving the Figure 4.11(b), all components in zone 1, i.e classified as
vital/critical, should nevertheless deserve special attention, and it is necessary to develop measures
aimed at (at least) allowing the component of zone 1 (vital) to be moved to zone 2 (important), or even
54
(a) Criticality Matrix (b) Criticality Matrix Components
Figure 4.11: Criticality matrix Components
secondary. According to the methodology proposed here, it will be imperative to perform a RAM
analysis and reevaluate the component maintenance plan.
4.4.1.5 Failure Modes Effects and Criticality Analysis
Once critical components are identified, an FMEA/FMECA has to be conducted. As already said in the
background of this work, this technique helps to identify potential failure modes, their causes, failure
indicators, failure criticalities, failure probabilities and its effects. Further to this, mitigation tasks will be
considered so that the occurrence and severity levels of all the possible failures can get down.
In this work, the methodology followed to perform this analysis is divided into four phases as described
below:
1. Identification of Equipment Functions & Functional Failures
• Equipment Function: Identification of the function(s) of the components being studied;
• Potential Failure Mode(s): All the failures modes already observed all the manner by which a
possible failure can occur;
• Potential Effect(s) of Failure: Description of what will happen if the failure mode occurs;
• Potential Cause(s) of failure: All identified failure cause and all potential failure causes of the
failure mode described.
2. Initial Risk Assessment (see the table A.1 in Appendix A)
• Initial Severity assessment: For all the possible failures identified, an assessment of its
impact on the functionality of the equipment/component is performed. For this reason, a
score between 1 and 10 will be assigned, with 1 meaning “no impact” and 10 meaning
“extreme impact”;
• Initial Likelihood assessment: In this assessment, an important question must be answered,

how likely is it that this failure mode will occur? For this reason, a score between 1 and 10 will
55
be assigned, with 1 meaning “very unlikely to occur” and 10 meaning “very likely to occur”;
• Initial Risk Rating: Severity × Likelihood.
• Detection Mode: How can the cause(s) of failure be detected? (At the start of its Potential
Failure);
• Detectability: If this failure mode occurs, how likely is it that the failure will be detected? For
this reason, a score between 1 and 10 will be assigned, with 1 meaning “very likely to be
detected” and 10 meaning “very unlikely to be detected”;
• Initial Risk Priority Number - RPN: Severity × Likelihood × Detectability. The letters and
numbers inside the figure 4.12 indicate whether a corrective action is required for each case:
’N’ = No corrective action needed, ’C’ = Corrective action needed and ’Number’ = Corrective
action needed if the Detection rating is equal to or greater than the given number. For
example, according to the risk ranking table in Figure 4.12, if Severity = 6 and Occurrence =
5, then corrective action is required if Detection = 4 or higher. If Severity = 9 or 10, then
corrective action is always required. If Occurrence = 1 and Severity = 8 or lower, then
corrective action is never required, and so on.
Figure 4.12: Final Matrix of Occurrence Versus Severity
3. Mitigation Tasks identification & Frequencies
• Recommended Maintenance Actions: Ways of reducing the severity, likelihood of occurrence,

or detectability of the failure;
• Recommended Frequency: Define the frequency for each of the recommended maintenance
actions.
4. Final Risk Assessment
56
• In this phase the objective is to recalculate the RPN according to the new severity, criticality
and detectability values suggested in phase 3.
57
58
Chapter 5
Results
In this chapter it is presented the results derived from the application of the methodology proposed to the
case study are presented. Thus, it was decided to apply only the methodology to tthe gas compressors
FPSO DALIA, since it was the FPSO with the largest available data records, but it is also important
to clarify that the methodology can be applied to any compressor of any FPSO in case that there is
sufficient data to do so.
5.1 FPSO Dalia
5.1.1 Failure Data Collection
As already seen, the case study focuses on the failures of FPSO Dalia compressors, namely in the
LP, TCA and Export compressors. Therefore, the data collection was done for a time window between
01/01/2010 to 30/09/2017. The survey was done in such a way that, for all the failures collected, it was
possible to associate each failure with the respective system in the compressor package considered in
this study, the day the failure occurred, the operating time until the failure of the system/component, as
well as how long the machine has been down due to component or system failure. The complete history
of failures is attached in the Appendix D and it is important to note here the following aspects:
• The Time To failure (T T Fi ) refers to the System or to a component (if the failed component is
identified), that is, the time that a system/component leads to failure since the reference date
01/01/2010 (or since the date that machine is placed into operation after the last failure). For
this case study, it was not estimated the amount of time that the compressors machines were not
running (but able to operate) due to production issues, for example, for this reason the Mean Time
To Failure is then equal to Mean-Up-Time;
• Down-Time, in this case, refers to the total time the compressor was down (unavailable) because
of the failure associated with a certain component or system;
• The Mean Time to Repair (MTTR), as defined in previous sections, was considered as the actual
maintenance or restoration time of the failed component, without taking into account the logistics
59
and administratives times.
From the analysis of the available information, it was verified that, over the time window considered,
some of the systems constituting the compressor were affected by some failures, some with some
considerable frequency and others not so. In table 5.1, the number of random failures (n), the total down
time ( t) and the mean down time (MDT) ( n1
P P
t) are displayed.
Table 5.1: DALIA Compressors failure by System
Reference System Number of Total DownPTime MDT inPhours

failures (n) in hours ( t) ( n1 t)
A Compressor Unit 4 54 13.5
B Power Transmission 1 16 16
C Shaft Seal System 7 4,381 625.85
D Lubrication System 2 14 7
E Anti-Surge System 12 536 44.66
Total 26 5,001
5.1.2 Pareto Analysis
Pareto diagrams allow better visualization of the priority analysis of components that have suffered
malfunctions. Therefore, from the values obtained in the table 5.1, the pareto diagrams were
P
constructed in order of the total down time ( t), the number of random failures (n) and the mean down
time ( n1
P
t), with the aim of obtaining indications about the systems causing more unavailability (down
time) and the systems which are less reliables (number of failures).
P
Pareto analysis in order of the total down time ( t)
This analysis allowed to identify the system that most penalized the availability of the machine, that is,
the one that was responsible for the longest time of immobilization of the machine. The graph of the
figure 5.1 shows that the Shaft Seal System system, annotated with letter C, was what most penalized
the compressor during the period under analysis. This penalty totaled a down time of 3,481 hours.
60
(a) (b)
Figure 5.1: Pareto Analysis: Total down time
Pareto analysis in order of the number of failures (n)

The analysis performed in order of the number of failures indicates the Anti-Surge system (annotated
with letter E) is the most critical. As it can be seen in the graph of the figure 5.2, was in the Anti-Surge
system that the greatest number of malfunctions occurred, that is, 12 malfunctions.
(a) (b)
Figure 5.2: Pareto Analysis: Number of failures
Pareto Analysis in oder of the mean down time ( n1

P
t)
This analysis, which shows the staggered systems according to mean down time (in fact, the mean
time the compressor was down due the total failures associated to a certain system). This analysis
indicates that the Shaft Seal System system (noted in letter C) , was the one that had the higher mean
down time, as can be seen in the graph of figure 5.3.
61
(a) (b)
Figure 5.3: Pareto Analysis: Mean Down Time
Comparative analysis
The comparative analysis of the three analysis before makes clear that the Shaft Seal System system
is the most critical. The Shaft Seal System system contributed most to the unavailability of the
compressors, although it does not have the largest number of failures in the time interval considered
and, according to figure 5.2 and 5.3, it is verified that more than 90% of the machine downtime in the
considered interval was caused by the Shaft Seal System. On the other hand, the Anti-Surge system
appears with the highest number of malfunctions in the considered interval, but with a very low number
of hours of down time when compared to the Shaft Seal System system, but both systems together
comprise almost 80% of the total number of failures . For this reason, for this study, it was decided to
begin the analysis with these two systems. However, a first analysis of consequence will be performed
to perceive the influence of the failure consequence that each one of the system has in the different
aspects as already seen previously.
5.1.3 Consequence Analysis
As can be seen from the attached detailed failure history in the Appendix D, all failures registered in the
Shaft Seal System system are caused by Dry Gas Seal failures, for this reason, Dry Gas Seal will be
considered directly as the component to be analyzed in the Shaft Seal System. And regarding the
Anti-Surge System, it was noted that the registered failures are relacted directly with the anti-surge
valve, which causes, however, the failure of the system. So, for these two components the failure
consequence analysis will be performed, comprising the second step of the methodology presented
before.
Impact on HSE
Failure involving Dry Gas Seals are normally hazardeous and can result in an uncontrolled or
excessive release of hydrocarbon gas. In an extreme case, this might lead to fire or explosition (as
62
shown in the figure 5.4(c) , which is a case that occurred in Total E& P Nigeria), although no cases of
this were experienced by the survey done for this work. For this reason it is ranked as High with a value
of 100 when considering HSE aspect.
For the case of the Anti-Surge valve, failure involving the anti-surge valve are not normally so impactful
to the environment and safety, but end up having a risk of being, in case the situation is not under
control. For this reason, it will be ranked as medium with a 10, according to table 4.1.
(a) Failed Dry Gas Seal (b) Failed Dry Gas Seal (c) Explosition caused by DGS failure
Figure 5.4: Failure of Dry Gas Seal and its consequence
Production Aspect
Depending on the type of compressor which the Dry Gas Seal is installed, Dry Gas Seal’s failure does
not represent always a direct production loss, but in any case, the Dry Gas Seal can lead to the total
failure of the compressor, which could increase the probability of loosing the production. For this
reason, it was ranked with medium (M), with the value 10 when considering production aspects,
according to table 4.2.
What was seen above is likewise valid for the anti-surge valve. Its failure does not impact directly the
compressor, but if it is not controlled, it can take other proportions and lead the compressor at risk to
the phenomenon of surge, which can lead to catastrophic failure of the compressor and cause a loss of
production. Therefore, it was also ranked with a Medium value of 10, according to table 4.2.
Maintanance Costs
Operational problems with Dry Gas Seals system are costly, both for the expense of replacement or
refurbishment of components (up to 100,000 Euros per seal), and other aspect is that most of the Dry
Gas Seal repair activities are done by specialist technicians and not by the inboard maintenance team.
For this reason, regarding the maintenance cost aspect, it was considered as High with 100, according
to the table 4.3.
Concerning the anti-surge valve, its maintenance/repair costs in case of failure are not usually as high
as the Dry Gas Seals, because it is rare to replace the valve completely and the repair activity is
63
sometimes performed by the inboard maintenance team. In most cases, the costs do not exceed
10,000 Euros. For this reason, it was assigned with the Low and with a value of 1, according to table
4.3.
The HSE consequence is represented by the letter A, the Production by B and the Consequence on
Maintanance Costs with the letter C. Remember that the Consequence Rating is given by (G( =
A × B × C. Therefore, the Consequence Index H will be H = log G. Thus, from the analysis made
above, it is possible to present the table 5.2 as a result of the analysis of Consequence of the two
systems (components) presented above.
Table 5.2: Consequence final assessment
S/N System/Component (A) (B) (C) (G) (H) Consequence

Classification
1 Shaft Seal 100 10 100 105 5 High (H)

System / Dry
Gas Seal
2 Anti-Surge 10 10 1 102 2 Medium (M )

System / Anti
Surge Valve
According to table 5.2, it is verified that the Shaft Seal System has a final classification of High (H) in
consequence and the Anti-Surge as Medium (M), for this reason will be made the analysis of Likelihood
for both systems , according to the methodology proposed in figure 4.10.
5.1.4 Likelihood Analysis
Redundancy Rate
As the redundancy rate of the Dry Gas seal is 1 × 100%, thus the Likelihhod Level was considered High
with a value of 100, according to the table 4.4.
Respecting to the anti-surge valve, according to the P&ID (Piping and Instrument Diagram) of the
compressors as seen in figure 1.4, there is one anti surge valve for each compressor, which according
to the table 4.4 means that is a type of redundancy 1 × 100%. Thus Likelihhod Level was considered
High with a value of 100.
Equipment Technology
Regarding the equipment technology, considering the table 4.5, It is considered that all the types of
compressors are critical, but here the interest is to know this criterion at the component level, not at the
equipment level, so that It can be possible to conclude about the Dry Gas Seals and the Anti-Surge
Valve. The Dry Gas Seal was considered as critical component in that its operation mode is quite
complex and in that case of failure it directly compromises the integrity of the compressor, requiring
64
therefore in case of repair proper specialists. Thus, it was ranked as High with a value of 100.
Regarding the Anti-Surge Valve, as it is a control valve, according to the classification followed by the
4.5 table, It is considered Medium and was ranked with 10.
Failure Rate Criteria

Regarding the Dry Gas Seal, as the failure rate is less than 0.5 per year (λ ≤ 0.5), according to the data
from the table 4.6 it is Low (L) and is ranked with a value of 1.
As for the Anti Surge System, the failure rate is less 1 failure/year but greater than 0.5 failures/year
(0.5 < λ < 6), so it was considered as Medium, with a value of 10, according to the table 4.6.
The Redundancy rate is represented by the letter D, effect of technology E and the failure rate with the
letter F . Remember that Likelihood Rating is given by J = D × E × F . Therefore, the Consequence
Index K will be K = log J. Thus, from the analysis performed above, it is possible to present the table
5.3 as a result of the Likelihood analysis of the two systems presented above.
Table 5.3: Likelihood final assessment

S/N System/Component (D) (E) (F ) (J) (K) Likelihood
Classification
1 Shaft Seal 100 10 1 103 3 Medium (M )

System / Dry
Gas Seal
2 Anti-Surge 10 10 10 103 3 Medium (M )

System/
Anti-Surge
Valve
According to the table 5.3, it is verified that the Shaft Seal System has a final consequence
classification of Medium (M) and the Anti-Surge of Medium (M) as well. So, with the likelihood analysis
performed, It is alraedy possible to know in which zone of the criticality matrix the two analyzed
systems are, as will be seen below in analysis of criticality matrix .
5.1.5 Criticality Matrix
From the previous analysis, it is now possible to present the result of the criticality analysis, as shown in
the table 5.4.
From the table 5.4, it is verified that the Shaft Seal System, represented by the Dry Gas Seal component,
is in the red zone of the criticality matrix, being considered as a High-Medium component, classified as a
vital or critical component. In the other hand, the Anti-Surge System represented by the anti-surge valve
is located in the yellow zone, being considered as a Medium-Medium component, therefore, classified
65
Table 5.4: Criticality Matrix final assessment
S/N Unit/Subsystem Consequence Likelihood Criticality

Classification Classification Classification
1 Shaft Seal System H M HM
2 Anti Surge system/ Anti- M M MM

Surge Valve
as an important component. According to the methodology, the Dry Gas Seal of the Shaft Seal System
system will be taken into account for the purposes of reliability, maintainability and availability study,
once they were classified as critical components.
5.2 RAM Analysis of Critical Component: Dry Gas Seal
5.2.1 Dry Gas Seals
Dry Gas Seals (DGSs) are commonly used to seal shaft ends of centrifugal compressors from process
side. This technology appeared in mid 80s’ in replacement of oil seals. The compressor shaft ends
including lubricating journal and thrust bearings need be sealed in normal operating conditions as well
as in stand by and static conditions at the Settle Out Pressure (SOP). Dry gas seals are basically
non-contacting mechanical face seals consisting of a mating ring, which rotates, and a primary ring,
which is stationary; basically the idea is to leave a gap between very small faces in order to control the
flow of gas. The running gap between the primary and mating gas seal rings is typically around 3 µm.
Figure 5.5 illustrates the Dry Gas Seals in a good condition.
(a) Dry Gas Seal (b) Dry Gas Seal 2
Figure 5.5: Dry Gas Seals
Before studying the failures of Dry Gas Seals in detail, it is important to note succinctly its operation.
There are several types of Dry Gas Seals, but the type of Dry Gas Seals used and recommended by
66
Total S.A is the Seal in tandem arragement, which will be addressed here, as shown in the figure 5.6.
Figure 5.6: Dry Gas Seal in tandem arrangement
As illustrated in the figure 5.6, a clean and filtered seal gas coming either from the process
(compressor) or an external available source is injected on the primary (or inboard) seal at a pressure
slightly above the process side pressure. The main part of this flow is routed to the process side, while
the small leakage passing through the seal interface is routed to the flare system.
A clean inert buffer gas, usually nitrogen, is injected on the secondary (back-up) seal few bars above
the atmospheric flare back pressure. The main flow passing through the intermediate labyrinth reaches
the primary cavity and is routed to the flare system in combination with the leakage coming from the
primary seal. The secondary seal leakage is routed to an atmospheric vent.
When the Dry Gas Seal is contiminated, the small gap opens allowing the gas to escape uncontrollably
causing the Dry Gas Seal failure. Injection of any type of solids or liquids into this very narrow seal
running gap can cause degradation of seal performance. This would create excessive gas leakage to
the vent and eventual failure of the seal. As can be seen in the table 5.5 of Dry Gas Seal failures, most
of them were due to contamination. The contamination can come, for example, from the process gas,
from bearings lubrication oil and contamination from seal gas supply1 .
5.2.2 Reliability
In the case of the present analysis, according to what was observed in the maintenance reports,
normally the failure of Dry Gas Seals implies its replacement, and very few times it was possible to
1 These failures modes are best addressed in the FMECA produced for the Dry Gas Seal, which is attached in the Appendix F
67
recover Dry Gas Seal after its failure in the site. Since their repair is only carried out in a specialized
workshop of the manufacturer, the parameters for reliability analysis will be obtained and estimated
empirically, from the historical data of substitutions of the component under study.
Table 5.5: Dry Gas Seals failure history
Date Failure Mode Failure Cause Time to Failure Down Time Mean Time To
(hours) (hours) Repair (hours)
10/02/11 Contamination Process Gas 9584 2472 384
24/05/11 Contamination gas Process Gas 18 493 240
25/06/13 Condensation Due to machine 30582 312 244

stop
04/02/16 Contamination Lube Oil 54762 370 312

migration
14/05/16 Condensation Process Gas 33252 250 228
30/06/16 Condensation Due machine 43431 272 216

stop
28/06/17 Condensation Due to machine 11198 212 168

stop
5.2.3 Estimation of parameters
According to Assis [1], it is possible to use the Weibull distribution in conjunction with Bernard’s
regression (as will be seen below) for the estimation of Weibull parameters, since it is not known at
whicht phase the component regarding the bathtube.
The estimation of Weibull distribution parameters is performed on the basis of the Time To Failures
(T T Fi ) recorded by the Dry Gas Seals considered, as shown in the table 5.5. Therefore, having these
values available and ordered by date, in order to determine the Weibull parameters, it was necessary to
estimate the values of the theoretical curve of accumulated failure frequencies F (ti ), and for this
reason, was used one of the most common theoretical methods -Bernard’s Approximation Method-,
which, according to Amaral [34], when the number of failures N registered in a time window is less than
20, then the value of the accumulated failure frequencies is calculated by:
i − 0.3
F (ti ) = (5.1)
N + 0.4
being i the number of a certain time record until failure and N the total number of time records until
failure, in other words, N represents the number of failures.
68
5.2.3.1 Weibull Distribution parameters
In the present case study, it was found that when a failure occurs in a particular Dry Gas Seal, the
component is replaced by another Dry Gas Seal, in a new state or in a repaired and approved state. This
means that the new Dry Gas Seal installed comes directly from a warehouse or from the manufacturer
and therefore has not been subjected to any use since the last repair. In this way, it is possible to
consider the occurrence of failures from the moment the Dry Gas Seal installed in the compressor is
placed into operation, reason why It will be considered the inferior limit of life equal to zero. For this
reason the Weibull distribution of two parameters (β, η) was used throughout this analysis since the
parameter γ assumes the null value, γ = 0. Thus , the equation (2.19) becomes:
t β
R(t) = e−( η ) (5.2)
Thus, the objective is to transform the Weibull reliability function into a common linear function y =
Ax + B, through a linear regression. In order to arrive at this linear form, the following mathematical
manipulations were considered:
t β
R(t) = e−( η ) (5.3)
t
⇔ ln(R(t)) = −( )β (5.4)
η
t
⇔ ln(−ln(R(t))) = ln( )β (5.5)
η
1
⇔ ln(ln( )) = β × ln(t) − β × ln(η) (5.6)
R(t)
From the equation 5.1, was possible to determine the cumulative failure frequencies, F (ti ), and the
reliability R(ti ) as well according to the relation seen in the equation (2.4). Thus, from the equation 5.6,
It was possible to arrive at the following conclusion:
1
y = ln(ln( )) (5.7)
R(t)
x = ln(t) (5.8)
A=β (5.9)
−B
η=e β (5.10)
Therefore, considering the available data, it was possible to obtain the table 5.6 and the graphic in the
figure 5.7.
Observing the figure 5.7, it is realized that the particular shape parameter β = 1.0511 and from the
69
Table 5.6: Values for regression analysis
i Median rank R(ti ) ti Time to ln(ti ) ln(ln(1/(R(ti ))))

(Bernard)F (ti ) Failure (hours)
1 0.09 0.90 18 2.89 -2.31
2 0.22 0.77 9584 9.17 -1.34
3 0.36 0.635 11198 9.32 -0.79
4 0.50 0.50 30582 10.33 -0.37
5 0.63 0.36 33252 10.41 0.01
6 0.77 0.23 43431 10.68 0.39
7 0.90 0.09 54762 10.91 0.86
Figure 5.7: Regression graphic
10.904
same graph 5.7, the value of B is −10.904, so η = e 1.0551 = 30, 778 hours. So, the mean time to failure
using Weibull distribution applied to the studied was estimated by:
M T T F = η × Γ(1 + β1 )=30, 778 × Γ(1 + 1

1.0551 ) = 30, 133 hours.
The calculation of parameters using Weibull distribution, assuming the condition that β is approximately
equal to 1, shows that the Dry Gas Seal are failing in the period of its useful life. Thus, one is in a
position to affirm that the failure rate is constant, representing, however, the second phase of the
bathtub curve (figure 2.3).
Another way of approaching the problem would be to start by assuming the failure rate is constant (the
case of the exponential distribution), thus considering that the Dry Gas Seals in 3 compressors are
under study, taking into account that during the analyzed time window were registered 7 failures. As the
analysis was made from 01/01/2010 until 09/30/2017 (66960 hours), according to Carinhas [35], when
the total down time can not be neglected, this value must be deduced from the total time for ways to
know the effective time of operation of the components. Thus, the sum of all effective times of the 3
compressors is 19,5879 hours (according to failure history in the Appendix D), and according to same
70
author, Carinhas [35], it can used the expression 5.12 to estimate the failure rate of each component:
Nf
λ= (5.11)
N0 × ∆t
Where Nf is the number of failures in the time interval considered, N0 the number of components to
operate and ∆ is the effective time of operation of each component. Then, the failure rate will be:
7
λ= = 3.57 × 10−5 (5.12)
195879
and with the Mean Time To Failure M T T F = 27, 983 horas.
It is noted that, the MTTF value calculated assuming at first a constant failure rate is quite similiar with
the MTTF determined via the Weibull distribution, which means that assuming that it is in the useful life
period is in fact a good approach, moreover, according to the values of table 5.6, it is realized that this
values are also realistic, once the aritmetic avarege value for i Time To Failure is around 22000 hours.
For the rest of work, it will be considered only the MTTF calculated via the Weibull distribution, and
assuming a useful life period, then the failure rate will be:
λ= 1
MT T F = 1
30,133 = 3.31 × 10−5 failures/hour
Tradionally the preventive maintenance of Dry Gas Seal in the company are carried out after 50,000
hours of operation, once the expected time to failure of a Dry Gas Seal according to manufacturers’
instructions is 51,000 hours. This maintenance is normally scheduled with an preventive maintenance
of the entire compressor. But as one of the objectives of this dissertation is to study the failures from
the historical record of each machine and finally to suggest the type of maintenance most suitable for
the critical component, so in this way it is possible to state that, on an average, each Dry Gas Seal
fulfills 30,133 hours without failure, which is the actual Mean time to Failure estimated by the Weibull
distrbution. With this information and knowing that it is possible to know the operation hours of the Dry
Gas Seal, it is proposed that in a systematic preventive maintenance logic and in order to reduce the
number of unexpected failures of Dry Gas Seals, resulting in great damages , an inspection activity must
be planned as soon as possible when it is approaching the 30,133 hours of operation, and the seal must
be replaced if it is found not in good conditions to continue operating.
5.2.3.2 Exponential distribution: Reliability
Taking into account failure rate previously determined, it is possible to represent the probability density
functions of the exponential distribution, as a function of the time and the constant failure rate. So, the
probability density function will be:
−5
f (t) = λe−λt = 3.31 × 10−5 e−3.31×10 t
71
Thus, the Reliability (equation 2.16) and the failure probability function will be respectively:
−5
R(t) = e−λt = e−3.31×10 t
(5.13)
−5
F (t) = 1 − R(t) = 1 − e−3.31×10 t
(5.14)
The equation 5.13 represents the probability that a Dry Gas Seal will remain in correct operation at the
end of time t, measured since the last Dry Gas Seal replacement. The equation 5.14 represents the
probability that a Dry Gas Seal will fail at the end of time t, measured since the last Dry Gas Seal
replacement.
The figures 5.8 and 5.9 graphically represents the two functions previously seen depending on the
operating time of the Dry Gas Seal, as well as, the table 5.7 presents some reliability values for some
values of time.
Figure 5.8: Reliability function R(t)
Figure 5.9: Failure function F(t)
The reliability of the compressor is a very important aspect, since it ensures that the compressor will be
able to fulfill its mission within an expected period of time, therefore, increase or guarantee the reliability
72
Table 5.7: Reliability and Failures Probabilities values
Time Reliability Failure
Probability
0 100.00% 0.00%
1000 97% 3%
14000 63% 37%
18000 55% 45%
22000 48% 52%
26000 42% 58%
30000 37% 63%
34000 32% 68%
38000 28% 72%
42000 25% 75%
46000 22% 78%
50000 19% 81%
54000 17% 84%
58000 14% 86%
62000 13% 87%
66000 11% 89%
70000 10% 90%
74000 9% 91%
of the components, will nevertheless guarantee the reliability of the compressor. As the company
expects a time to failure of 51,000 in the Dry Gas Seals, it turns out that Dry Gas Seals according to the
reliability values displayed above, have a reliability around 19% to fulfil the 51,000 hours, which is a
value considerably low. Thus, in order to improve reliability of the Seal, a deep analysis has been
developed on the different types of failures registered in the Seals, as well as the possible failures, with
the purpose of proposing measures to avoid that such situations happen and the failure can be be
avoided in such a way that the reliability objectives are met. Appendix F presents the summary of this
analysis, the Failure Modes Effects and Criticality Analysis (FMECA).
Still on the Reliability of Dry Gas Seal, one of the actions proposed to the company was to include the
Dry Gas Seal in a monitoring system that is being developed by the company called PI-Core Sight,
where the objective is to allow all critical components to be monitored by Engineers which are
responsible for the associated machine. Therefore, this action consisted in collecting all information
about the instruments which are installed in the field to control the Dry Gas Seal performance
parameters, in order to connect to PI-Core Sight, and this will allow the Engineers to monitor in real
time the trends of several parameters inherent to the functioning of the Seal, and once acceptable limit
values of some parameters are defined, if they are outside the acceptable limits, the immediate action
must be taken to control the situation, or any case, the safe shutdown of the machine must be prepared
to fix the problem. The figure 5.10 presents for example a display under construction for the case of
one of the compressors of the FPSO CLOV 2 , where the curves indicating the amount of gas leaking
from Dry Gas Seal.
2 This action is still in development, however, this image is merely illustrative and does not yet represent the final display
73
Figure 5.10: CLOV FSPO Dry Gas Seal PI-CoreSight display
5.3 Maintainability
In order to estimate the Mean Time to Failure using the statistical concepts according to the available
data, the methodology proposed by Nasa [54] was used in this work. The normal distribution for the
MTTR calculation is used, according to the same source Nasa [54], this distribution is the most used for
this purpose. Thus, using the data from the table 5.5, and using normal log distribution to estimate the
MTTR of the component.
Let define
t0i = ln(ti ) (5.15)
Using statistical methods, the Maximum Likelihood Estimator (MLE), or the best estimated value of the
mean is estimated by:
n
1X 0
t0i = t (5.16)
n i=1 i
The Maximum Likelihood Estimator of the variance is then estiamated by:
n
1 X 0
S 02 = (t − t0i )2 (5.17)
n − 1 i=1 i
74
Therefore, the mean of the log normal distribution is then estimated by:
0 S 02
M T T R = e(ti + 2 )
(5.18)
and its variability of time to repair is:

p
σ = MTTR eS 02 − 1 (5.19)
Thus, using the data from table 5.5 and the expressions shown above, it was possible to estimate the
values as shown in tables 5.8 and 5.9.
Table 5.8: Data to estimate the MTTR
n Time To Rapair (ti ) t0i
1 168 5.12396
2 216 5.37528
3 228 5.42935
4 240 5.48064
5 244 5.49717
6 312 5.743
7 384 5.95064
Pn
i=1 =38.6
Table 5.9: Estimated MTTR
t0i S 02 MTTR Σ
5.51429 0.184 226 102
From table 5.9 it is noted that the Mean Time to Repair is equals 226 hours with a variability of 101
hours. Now, according to figure 5.11, which refers to the standard planning of a Dry Gas Seals
exchange activity defined by the manufacturer and the company, it is noted that the Mean Time To
Rapair is 11 days (264 hours). Therefore, these values reveal that in average terms, the Mean Time To
Repair has been fulfilled, despite having a greater down time due to the failure of the Seals. Regarding
the Down time, it will be better covered when availability results are presented in the subsection 5.4.
75
Figure 5.11: Normal expected Dry Gas Seal replacement operation
5.4 Availability
Once the compressors failure data are available, in this analysis it is more feasible to present the
availability of each compressor. As it is not known how long the machines were stopped due preventive
maintenance reasons (or due to some other reason, for example, due to process stops), the Up-time
will be considered as the time that the machine was actually working. Thus, the availability is calculated
by:
U pT ime
Availability = (5.20)
U pT ime + DownT ime
The company expects an acceptable level of availability in compressors of 98% according to its
operational needs. Thus, using the failure history in the Appendix D which contains the data about the
down time of machines and taking into account that it is being analyzed for a period of 66960 hours
(that is, from 01/01/2010 to 30/09/ 2017), then:
• For the TCA compressor, since the Down-time was 636 hours, which means that the Up-time was
(66960-636) = 66324, hence:
66324
A= 66324+636 = 0.99 , that is, A = 99%
• For the Export Compressor, since the Down-time was 838 hours, which means that the Up-time
was 66122, so:
66122
A= 66122+838 = 0.98 , that is, A = 98%
76
• For the LP Compressor, since Down-time was 3527 hours, which means that the Up-time was
63433, so:
63433
A= 63433+3527 = 0.94 , that is, A = 94%
Table 5.10 summarizes the availability and unavailability of the analyzed compressors. The
compressors marked with green means that are within the company’s operational requirements in
terms of availability, while the compressor marked in red is below the company’s availability
requirements, in this case the LP Compressor.
Table 5.10: Availability values of gas compressors
Compressor Up Time Down Time Availability (%) Unavailability

(Hours) (Hours) (%)
TCA Compressor 66324 636 99% 1%
Export Compressor 66122 838 98% 2%
LP Compressor 63433 3527 94.7% 6.3%
For the LP Compressor, the only one that did not meet the availability requirements, 91.7 % of the
compressor unavailability is due to Dry Gas Seal failure, since the time that the compressor was down
due to Dry Gas Seals failure totalizes 3237 hours.
It was noted in the reports of the operations that the long down times with the Dry Gas Seals failures
were due to logistical constraints. Field operations have taken much longer than expected (11 days); it
was found that not all the parts needed to carry out the component replacement operation were always
available at offshore, since the replacement of Dry Gas Seal in practice sometimes requires the
replacement of other components out of the Shaft Seal System, normally the Bearings (according to
manufacturer’s instructions). This situation in sometimes made the operations often stand in standby
waiting for the respective parts from the logistics base (if it existed, because sometimes were not
available) and if not, it would have to be ordered with some urgency.
In order to minimize the impact of this phenomenon, it was produced a list of all the possible parts
necessary for the replacement of Dry Gas Seals - called Check list- (according to the manufacturer of
the machines and the experience of the company with the operations of past replacements). Therefore,
the company must ensure that has always in stock a minimum spare quantities (as indicated in the
checklist) to perform the replacement operation in case of the failure of the seal.
Another suggestion was to stock all spare parts of the check list in the warehouse in offshore, instead
of being onshore, thus avoiding wasting time with some eventual constraints also on the transportation
77
level.
78
Chapter 6
Conclusions
Firstly, according to the study methodology proposed in this study, it was possible, based on the failure
history of gas compressors, to identify the components with a certain priority for the study through a
Pareto analysis based on three parameters, namely the number of recorded failures, the total downtime
time and the mean down time. Then, once the components prioritized through the Pareto analysis were
identified, a consequence analysis of their failures and likelihood assesment were done, with the
objective of classifying each component (according to its consequence and likelihood index) into a
criticality matrix in one of the following classes: Vital / critical, important or secondary. As the proposed
methodology suggests, for the components classified as vital / critical was performed a RAM analysis
to better understand the aspects that influence the reliability, maintainability and availability of the
component.
6.1 Achievements
The obtained results in this study allowed to classify the Dry Gas Seal from the Shaft Seal System as
the vital/critical component in the gas compressors analyzed. This way, the RAM analysis was
performed for the Dry Gas Seals, which allowed to calculate the actual Mean Time To Failure (MTTF) of
the Dry Gas Seals which were noticed that the actual MTTF is less than the expected MTTF as
indicated by the manufacturer; the MTTF calculated was of 30,133 hours and the expected is 51,000
hours, meaning that the seals are failing with 20,000 hours less than what they were expected to fail,
and by mean of the Weibull distribution was possible to conclude that the seals fail in the useful life.
The knowledge of this result, will allow the company to take technical decisions to determine their
intervention program and assess maintenance, as well as to trigger some actions that seek to prevent
the emergence of failures as well as decrease its consequence through a Failure Modes Effects and
Criticality Analysis (FMECA).
And concerning maintainability, it has been found that the Mean Time to Repeair (MTTR) of Dry Gas
Seals is 216 hours, which is a value below the expected Mean Time to Repair of 260 hours, although
79
the compressors have a very high Mean Down Time (MDT) due to Dry Gas Seal failures, which
compromises the availability of the compressors. Thus, knowing the Up time and Down Time of each of
the compressors studied, it was possible to calculate the availability of each compressor, which was
found that the LP compressor was the only one with an availability value below the acceptable limit
established by the company (acceptable value equal to 98%) with an availability value of 94.7%, which
means that the unavailability of the LP Compressor was 6.3%. It was possible to find out that 91.7 % of
LP compressor’s unavailability was due to Dry Gas Seal failures, and that the high Down Time was
mainly due to logistical constraints; the knowledge of these parameters allowed to develop some
recommendations actions in such a way that availability levels can be achieved.
6.2 Difficulties
One of the major difficulties related to the reliability study was the access of the failures data of the
equipments. It is important to develop a database intended for future reliability studies, i.e the history of
failures of the machines must be data-based in such a way that access to the information regarding the
number of hours until a certain failure occurred, as well as the time it takes to fix the particular the failure,
so that this information can be usuful to analyze RAM parameters, for this reason, it is recommendation
to the company to pay attention in this situation. It was for this reason that it was chosen during the
development of the work to use simply the history of failures of those failures that the information were
available on the time to failure, the mean down time and the mean time to repair. It must be responsibility
of all the people involved in the maintenance activities to contribute to the efficiency of the maintenance
work, creating a culture to register perfectly the data inherent to all operation of the machines.
6.3 Future Work

First of all, it is important to remember that the methodology proposed in this work can be applied to
any type of machine at the level of an offshore Oil and Gas plant. Therefore, it would be only necessary
to have data to be able to use the methodology.
So, a possible next step would be to include the S factor (from Safety) in the RAM Analysis, since
safety in an offshore platform environment is a very important factor, therefore it would be interesting to
realize the implications of this factor in the operation of machines.
Another interesting approach would be to make a detailed analysis of the costs associated with the
unexpected failures in the compressor, as well as the implications that it has on the production process
and, likewise, to make an economic feasibility study of all proposed improvement actions in this work.
80
References
[1] Assis, R. (2014). Support to the decision in Maintenance in the Management of physical assets (2nd
ed.) (in portuguese). Lisbon: LIDEL-Technical Editions, Lda.
[2] Karev, A. et al. (2015). Driving operational performance in oil and gas. Ernst & Young publications
[3] Junior, J., Ribeiro, M. & Franco, B. (2015). Cost of Asset Maintenance in an operational and strategic
optical in the industry environment (in portuguese). Optimization of resources and development in
Symposium of excellence in management and technology, S. Paulo, Brasil. Library, 2002).
[4] Dhillon, B. S. (2002). Engineering Maintenance: A modern approach (1st ed.). Washington: CRC
Press LLC.
[5] Petrobras (2018). Petrobras site, http://www.petrobras.com.br/infograficos/

tipos-de-plataformas/desktop/index.html, acessed on 6th March 2018.
[6] Chemistry glossary (2018). Website, https://glossary.periodni.com/glossary.php?en=

petroleum, acessed on 18th February 2018.
[7] Almeida, J. (2006). Introduction to Oil and Gas Industry (in portuguese). Rio Grande Sul: Petrobras
[8] Revolvy(2018). Revolvy’s website, https://www.revolvy.com/main/index.php?s=Well+test+

(oil+and+gas), acessed on 16th February 2018.
[9] GGFR (2009). Guidance on Upstream Flaring and Venting: Policy and Regulation. Global Gas
Flaring Reduction Partnership.
[10] Devold, H. (2013). Oil and gas production handbook: An introduction to oil and gas production,
transport, refining and petrochemical industry (2nd ed.). Oslo: ABB Oil and Gas.
[11] http://15926.org/topics/mapping-pid/index.htm, acessed on 16th March 2018.
[12] Brown, R.N. (1997). Compressors: Selection and sizing (2nd ed.). Houston, TX: Gulf Publishing
Company.
[13] Lewis, J. & Stark, B. (2003). Machinery and rotating equipment integrity inspection guidance notes
(1st ed.). Cleveland: Crown.
[14] Sorokes, J. (2013). Selecting a centrifugal compressor. American Institute of Chemical Engineers
(AIChE) and Dresser-Rand.
81
[15] https://www.researchgate.net/figure/Dresser-Rand-DATUM-multi-stage-centrifugal-compressor_
fig1_237261761, acessed on 17th March 2018.
[16] http://www.editiontruth.com/reciprocating-compressors-market/, acessed on 17th March

2018.
[17] Boyce, P. (2003). Centrifugal Compressor: A basic guide (1st ed.). Oklahoma: PennWell
Corporation.
[18] NP standard (2000). Portguese Standard: Maintenance terminology.
[19] Saraswat, S. & Yedava, S. (2007). An overview on reliability, availability, maintainability and
supportability (RAMS) engineering. Internation Journal of Quality & Reliability Management, Vol.
14, pag. 330-344.
[20] O’Connor, P. & Kleyener, A. (2012). Practical Reliability Engineering (1st ed.). United Kingdom:
Wiley Publication.
[21] Corvaro, F. et al. (2016). Reliability, Availability, Maintainability (RAM) study, on reciprocating
compressors API 618. KeAi-Advanced Research Evolving Science, 3, 366-272.
[22] Sharma, R.K. & Kumar, S. (2008). Performance modelling in critical engineering systems using
RAM analysis. Reliab. Eng. Syst. Saf., Vol. 93, pag. 891-897
[23] Williams, J.P. (2001). Predicting process systems. Hydrocarbon engineering
[24] Kumar, S. (2014). Reliability, Availability and Maintainability Analysis of a process industry: A state
of art review. International Journal of Mathematical Sciences, Technology and Humanities, Vol. 117,
pag. 1253 – 1267.
[25] Cetinkaya, E.K. (2001). Reliability Analysis of SCADA systems used in the offshore oil and gas
industry.MSc. Thesis in Eclectrical Engineering, Faculty of the Graduate School, University of
Missouri-Rolla.
[26] Naseri, M. (2016). RAM Analysis of Oil and Gas Production Facilities Operating in the Arctic
Offshore: Expert Judgements and Operating Conditions.PhD. Thesis in Eclectrical Engineering,
Faculty of Science and Technology, UiT-The Arctic University of Norway.
[27] Djeddi, A. Z., Alaifa, A. & Salam, A. (2015). Operational reliability analysis applied to a gas turbine
based on three parameter Weibull distribution. MECHANIKA, Vol. 21, pag. 187192.
[28] Vinnem, J.E. (2014). Series in Reliability Engineering: Offshore Risk Assessment (3rd ed.) London:
Springer-Verlag.
[29] Simões, G.M. (2008). RAMS analysis of railway track infrastructure (Reliability, Availability,
Maintainability, Safety).MSc. Thesis in Civil Engineering, Instituto Superior Técnico, University of
Lisbon.
82
[30] Blischke, W.R. & Murthy, D.N. (2003). Case studies in Reliability and Maintenance (1st ed.). New
Jersey: John Wiley Sons.
[31] Kaplan, S. (1990). Bayes is for eagles. IEEE Transactions on Reliability 39:130-131.
[32] NP standard (2007). Portuguese Standard: Maintenance terminology.
[33] Rousand, M. & Royland, A. (2004). System Reliability Theory: Models, Statistical Methods, and
Applications (2nd ed.). New Jersey: John Wiley Sons.
[34] Amaral, F.D. (2016). Maintenance Management in Industry (1st ed.) (in portuguese). Lisbon:
LIDEL-Technical Editions, Lda.
[35] Carinhas, H. (2009). “Reliability”, support texts of Tribology and Maintenance subject (in
portuguese). Instituto Superior Técnico, Universidade de Lisboa.
[36] https://en.wikipedia.org/wiki/Bathtub_curve, acessed on 24th April 2018.
[37] Mira, P.J. (2014). Need and Availability of Maintenance Materials: A case study at Brisa.MSc.
Thesis in Mechanicl Engineering, Instituto Superior Técnico, University of Lisbon.
[38] Hongzhou, W. & Pham, H. (2006). Reliability and optimal maintenance (1st ed.). New Jersey:
Springer-Verlag
[39] Lakner, A.A. & R.T. Anderson (1985). Reliability Engineering for Nuclear and Other High Technology
Systems. Elsevier Applied Science Publishers.
[40] Weibull (2018). Weibull’s site, http://www.weibull.com/hotwire/issue79/relbasics79.htm,

acessed on 8th May 2018.
[41] Roush, M. & Webb, W. (2006). Applied Reliability Engineering (5th ed.). New York: Center of
Reliability Engineering
[42] Manutenção em foco (2018). Site, https://www.manutencaoemfoco.com.br/

diagrama-de-pareto-como-ferramenta-de-analise-de-falhas/, acessed on 8th May 2018.
[43] Hall, R.A., Knights,P. F. Daneshmend, L. K. (2000). Pareto analysis and condition-
based maintenance of underground mining equipment, Mining Technology. 109:1, 14-22, DOI:
10.1179/mnt.2000.109.1.14
[44] Hossen, J., Ahmad, N. Ali, S. M. (2000). An application of Pareto analysis and cause-and-effect
diagram (CED) to examine stoppage losses: a textile case from Bangladesh. The Journal of The
Textile Institute, DOI: 10.1080/00405000.2017.1308786.2000.109.1.14
[45] Hanif, A. & Agha, M.H. (2006). Utilizing Quality Tools: A Predictive Maintenance Perspective.
International Journal of Performability Engineering Vol. 8, No. 6, November 2012, pp. 699-704.
[46] European Standard (2010). Maintenance - Maintenance terminology.
83
[47] FERREIRA, A.B. (1999). New Dictionary of Portugues Language (3rd ed.)(in portuguese) Rio de
Janeiro: Nova Fronteira.
[48] Kardec, A. & Nascif, J.A (2009). Maintenance - strategic function (3rd ed.) (in portuguese). Rio de
Janeiro: Qualitymark Editora, Lda.
[49] Moubray, J. (1997). Reliability-centered Maintenance (2nd ed.). New York: Industrial Press Inc.
[50] Fiix(2018). Fiix’s site, https://www.fiixsoftware.com/maintenance-strategies/

risk-based-maintenance/, acessed on 4th June 2018.
[51] Smith, A. & Hinchcliffe, G. (2004). Reliability-Centered Maintenance: A gateway to world class
maintenance. Elsevier Butterworth-Heinemann, New York.
[52] Wireman, T. (2005). Developing Performance Indicators for Managing Maintenance. New York:
Industrial Press, Inc.
[53] OREDA (2005). Offshore Reliability Data (4th ed.). Norway: ISINTEF industrial management.
[54] NASA (2018). Mean Time to Repair Predictions. https://engineer.jpl.nasa.gov/practices/

at2.pdf, acessed on 1st July 2018.
84
Appendix A
FMECA Severity vs Occurrence

Procedure
85
Table A.1: FMECA criteria for severity, likelihood and detection ratings
Rating Severity Likelihood Detection
1 Failure not noticeable Failure occurs once in Current control almost certain to detect
more than five years cause/mechanism of failure or the failure
mode
2 Very minor effect on Failure occurs once Very high likelihood current control will detect
equipment function every three to five cause/mechanism of failure or the failure
years mode
3 Minor effect on Failure occurs once High likelihood current control will detect
equipment function every one to three cause/mechanism of failure or the failure
years mode
4 Equipment function Failure occurs once Moderately high likelihood current control will
slightly impaired per year detect cause/mechanism of failure or the
failure mode
5 Non-critical aspects of Failure occurs once Moderate likelihood current control will
equipment impaired every six months to detect cause/mechanism of failure or the
one year failure mode
6 Non-critical elements Failure occurs once Low likelihood current control will detect
of equipment every three months cause/mechanism of failure or the failure
inoperable mode
7 Partial failure of Failure occurs once Very low likelihood current control will detect
critical elements of per month cause/mechanism of failure or the failure
equipment mode
8 Equipment inoperable Failure occurs once Remote likelihood current control will detect
but safe per week cause/mechanism of failure or the failure
mode
9 Safety or regulatory Failure occurs every Very remote likelihood current control will
compliance three to four days detect cause/mechanism of failure or the
endangered, with failure mode
warning.
10 Safety or regulatory Failure occurs more No known control available to detect

compliance than once per day cause/mechanism of failure or the failure
endangered, without mode
warning
86
Appendix B
Compressor’s Systems: Components
87
Appendix C
Compressor Major Overhaul activities
89
Dresser Rand SA
Product Services
Le Havre - France
SCOPE OF WORK : Compressor Major Overhaul
Criteria/Periodicity : Dependent of Machine Condition Manpower Skills : 1 Supervisor

Check with D-R Field Support Service 3-4 Technician
Duration : 3-4 week 1 Helper
DESIGNATIONS DURATION SPARE PARTS SPECIAL TOOLS

1. Site preparation
2. Tools preparation
3. Isolate and purge compressor 5.0 hr
4. Disassemble piping 3.0 hr
5. Disassemble Coupling Guard 1.0 hr
6. Disassemble Coupling Spacer (1hr / coupling spool) 1.0 hr Shaft Alignment Tool
(a) Check alignment
(b) Check axial float
7. Disassemble Coupling Hub (exept special case) 0.5 hr Coupling tool/
(a) Inspect Shim Pack Ring & Plug Gauges
8. Disconnect instrumentation (half hr / compressor end) 6.0 hr
9. Disassemble Puller Blocks, Retaining and Shear Rings 1 hr Shear Ring Filler Block/
Shear Ring Lifter/
Head Pusher-Puller Block
10. Remove Bundle from Casing (may take 2 days) 3 hr Bundle Stand Assembly/
Bundle Lifter/
Bundle Craddle/
Clamping Tool Assembly
11. Clean compressor casing bore 2 hr
12. Disassemble Bundle - upper half removal 3 hr Bench Top Bundle support/
(a) Overlap Inspection Bundle Lifter
(b) Labyrinth/Rotor clearance inspection
13. Disassemble Thrust Bearing / Thrust Disc 1 hr Pusher Tool/
(a) Inspect/check axial clearances Plug & Ring Gauges
(b) Disassemble Thrust Bearing
( c) Remove Hydraulic Fit Thrust Disc
14. Disassemble Journal Bearings 3 hr Bearing Housing Lifter/
(a) Inspect/check clearances Bearing Entering Sleeve/
(b) Disassemble Journal bearing Intake End Rotor support
( c) Disassemble Journal bearing Discharge End
(d) Bearing Inspection
(e) Check Journal Bearing Clearance
15. Remove Inner Gas-Seal (1hr / cartridge) 3 hr Seal Pusher/Puller Assy
(a) withdraw Barrier Seal Head Alignment Tool
(b) withdraw Gas-Seal Cartridge
( c) Inspect/check Clearances
16. Disassemble Bundle 5 hr Head Lifter
(a) Remove Intake Head
(b) Remove Discharge Head
( c) Remove Rotor
Inspect/check clearances
Check geometry (run-out) and dimensionnal
(d) Disassemble Internal Bundle
17. Balance rotor (if shipping is required then several days) 5 hr Rotor
18. Disassemble I.G.V./Diaphragms 4 hr
(a) Check clearances
19. Clean Bundle 12 hr pearlglass
20. Assemble I.G.V./Diaphragms 4 hr
Ref. doc. :
90
Dresser Rand SA
Product Services
Le Havre - France
SCOPE OF WORK : Compressor Intermediate Overhaul (Complete)
Criteria/Periodicity : Dependent of Machine Condition Manpower Skills : 1 Supervisor

Check with D-R Field Support Service 3-4 Technician
Duration : 3-4 week 1 Helper
DESIGNATIONS DURATION SPARE PARTS SPECIAL TOOLS

21. Assemble Labyrinth Seals 6.0 hr Labyrinths
22. Check valuation and dimension, repair
23. Renew all o-rings, gaskets, etc… 4 hr O-rings/Gaskets
24. Assemble Bundle 6 hr Head Lifter
(a) Assemble Internal
(b) Install Rotor
( c) Install Intake Head
(d) Install Discharge Head
25. Install Inner Gas-Seal (1hr / cartridge) 4 hr Gas-Seals Seal Pusher/Puller Assy
(a) Check Clearances
(b) Install Gas-Seal Cartridge
( c) Install Barrier Seal
26. Assemble and adjust Journal Bearings 3 hr Journal Bearings Bearing Housing Lifter/
(a) Assemble Journal Bearing Intake End Bearing Guide Studs/
(b) Assemble Journal Bearing Discharge End Rotor support
( c) Check clearances
27. Assemble Thrust Bearing / Thust Disc 2.0 hr Thrust Bearing Pusher Tool/Collar Tool/
(a) Check 1/2 Rotor Float Clearances
(b) Check clearances
( c) Assemble Hydraulic Fit Thrust Disc
(d) Assemble Thrust Bearing
28. Assemble Bundle - upper half installation 2.0 hr Bundle Lifter
29. Cartridge installation in Case 5 hr Bundle Craddle/
Adjustable Roller Assy/
Bundle Guide Stud/
Clamping Tool Assy/
Craddle Grease
30. Assemble Puller Blocks, Retaining and Shear Rings 1 hr Shear Ring Filler Block/
Shear Ring Lifter/
Head Pusher-Puller Block
31. Assemble Instrumentations 8 hr Probes
(a) Adjust instrumentations
32. Assemble Coupling Hub 1.0 hr Shaft Alignment Tool
33. Check alignment 3 hr Coupling tool
34. Assemble Coupling Guard and Spacer 2 hr
35. Assemble Piping 3.0 hr
36. Flush Oil Systems (if required) 5 hr
Until clean satisfactory
37. Security test 1 hr
38. Test, puting into service and check parameters
Note: for activities on Rotor 3 lines have to be added

16bDisassemble Rotor 48- 96 hr
16c Assemble Rotor (recommended replacing impellers) 96- 144hr Impellers
16dBalance rotor (if shipping is required then several days) 24 hr
Ref. doc. :
91
Appendix D
Failure History
92
DALIA COMPRESSORS FAILURE HISTORY
Time to Down
System
Date Compressor Failure Failure Time
Relacted
(Hours) (Hours)
TCA Compressor
Start Date:
January 2010
Trip due to
Compressor
21/01/2011 TCA vibration Sensor 9144 12
Unit
VSHH 52651
Trip due to
Compressor
22/01/201 TCA Vibration Sensor 18 12
Unit
VSHH 52651
Trip due to Low
Lubrication
06/06/2012 TCA Low Lube Oil 21720 10
System
header pressure
Trip due to Anti-
Anti-Surge
06/03/2013 TCA Surge Valve 28190 8
System
failure
Trip due to high
vibration HP1 Compressor
02/04/2014 TCA 19650 12
Bearing driven Unit
side
Trip due to Dry Shaft Seal
04/02/2016 TCA 54762 370
Gas Seal failure System

28/06/2017 TCA 11198 212
Gas Seal Failure System
Running
September
2017
Total effective
Total Hours running hours: 636
66324 hours
93
Time to Down
System
Relacted
(Hours) (Hours)
LP Compressor
Start Date:
January 2010
Trip due to Power
08/02/2011 LP Compressor Gearbox Transmission 9552 16
positioner System
10/02/2011 LP Compressor 9584 2472
24/05/2011 LP Compressor 18 493
Trip due to Anti
Anti-Surge
05/05/2013 LP Compressor Surge Valve 26659 136
System
failure
Trip due to Anti
Anti-Surge
System
failure
Trip due to Anti
Anti-Surge
System
failure
Trip due to Anti Anti-Surge
19/07/2013 LP Compressor 45 4
Surge Valve System
Trip due to Anti Anti-Surge

Surge Valve System
FSLL on 1st Anti Anti Surge

12/09/2013 LP Compressor 477 88
Surge Valve System
Trip due Bad Anti Surge

12/12/2013 LP Compressor 2072 3
behaviour of ASV System
Antisurge valve Anti Surge

15/11/2014 LP Compressor 7989 4
feedback issue. System
94
Anti Surge
10/01/2015 LP Compressor Trip due to ASV 1316 10
System
Trip due to Anti Anti Surge

Surge Valve System

30/06/2016 LP Compressor 43431 272
Total effective
63433 hours
Time to Down
System
Relacted
(Hours) (Hours)
EXPORT Compressor
Start Date:
January 2010
Trip due to Low
Lubrication
18/07/2012 LP Compressor Low lube oil 22752 4
System
header pressure
Trip due to Anti
Anti Surge
System
failure
25/06/2013 LP Compressor 30582 312
Tripped by active
thrust bearing
Compressor
29/04/2015 LP Compressor Temperature 54846 18
Unit
sensor (TE54653)
faulty.
14/05/2016 LP Compressor 33252 250
Total effective
66122 hours
95
Appendix E
Check list
96
CHECK LIST LIST FOR DGS/BEARINGS EXCHANGE OUT
CENTRIFUGAL GAS COMPRESSORS

PART NUMBER
LIST
ASSEMBLY DESCRIPTION SAP CODE PART DESCRIPTION QUANTITY
ITEM N°
1 1000155190 1388124 KINGSBURY THRUST BEARING ASSEMBLY 1

2 1000155079 1388117 DRESSER-RAND JOURNAL BEARING 1
3 1000147424 1388040 JOURNAL BEARING HOUSING 1
NS 528849014 1388234 SCREW 4
NS 000 004 993 NA PIN 2
4 526-244-001 1388222 COVER 1
5 1000147425 1388041 TRUST BEARING HOUSING 1
NS 528.849.014 1388234 SCREW 10
NS 000 004 979 NA PIN 4
NS 528.849.065 NA SCREW 2
6 002-272-357 1388002 O’RING (Cover / Thrust Bearing Housing) 1
PART NUMBER:1000155078
7 528.849.016 1388235 SCREW (Cover / Thrust Bearing Housing) 6
8 539.819.001 1388261 LUBE OIL MANIFOLD 1
9 528.849.053 1388238 SCREW (Lube Oil Manifold) 4
THRUST & JOURNAL BEARING 10
11
002-272-217
539-148-005
1387991
1388259
O’RING (Lube Oil Manifold)
ORIFICE SCREW (Lube Oil Manifold / Thrust Bearing Housing)
2
2
ASSEMBLY-INTAKE END 12 528-849-077 1388242 SCREW (Thrust Housing / Journal Housing) 8
13 000 007 634 1387968 SCREW (Journal Bearing / Head) 8
14 099-010-021 1388038 LOCTITE 1 box
15 527-454-203 1388226 LUBE OIL DRAIN ADAPTER 1
16 528-849-012 1388232 SCREW (Lube Oil Drain) 4
17 002-272-246 1387999 O’RING (Lube Oil Drain) 1
18 1000152910 1388047 LUBE OIL SUPPLY ADAPTER 1
19 528-849-014 1388234 SCREW (Lube Oil Supply) 4
20 002-272-212 1387989 O’RING (Lube Oil Supply) 1
21 595-450-001 1388284 ORIFICE FLANGE 1
22 528-849-013 1388233 SCREW (Lube Oil Vent) 4
23 1000151537 1388045 LUBE OIL VENT ADAPTER 1
24 002-272-228 1387997 O’RING (Lube Oil Vent) 1
25 099-010-023 1388039 HYLOMAR 1 box
26 000 007 432 1387967 SCREW (Obturated Hole Thrust Bearing housing) 1
1 1000155088 1388120 DRESSER-RAND JOURNAL BEARING 1

2 1000155153 1388121 JOURNAL BEARING HOUSING 1
NS 528-849-014 1388234 SCREW 1
NS 000-004-993 NA PIN 6
NS 528-849-021 NA SCREW 2
NS 001-333-062 NA DOWEL 4
3 1000155085 1388118 COUPLING LABYRINTH 12
4 1.000.155.086 1388119 BAFFLE 1
5 1000151207 1388044 LUBE OIL DRAIN ADAPTER 1
6 528-849-028 1388237 SCREW (Drain Adapter & Vent Adapter) 8

7 002-272-225 1387994 O’RING (Drain Adapter) 1
8 1.000.152.935 1388048 LUBE OIL VENT ADAPTER 1
9 002-272-228 1387997 O’RING (Drain Adapter) 1
JOURNAL BEARING DISCHARGE END 10 1000-151-539 1388046 LUBE OIL SUPPLY ADAPTER 1
11 528-849-016 1388235 SCREW (Supply Adapter) 4
12 002-272-214 1387990 O’RING (Supply Adapter) 1
13 000-207-327 1387986 SCREW (Baffle / Coupling Labyrinth) 12

14 000-006-243 1387966 SCREW (Coupling Labyrinth / Housing) 8
15 528-849-098 1388243 SCREW (Housing / Head – Discharge End) 8
16 528-849-194 1388245 PLUG 1
17 099-010-021 1388038 LOCTITE 1 BOX
18 099-010-023 1388039 HYLOMAR 1 BOX
19 595-450-001 1388284 ORIFICE FLANGE 1
1 1000154748 1388088 DRESSER-RAND GAS SEAL 1

2 1000159340 1388164 INNER SEAL LABYRINTH 1
3 1000159372 1388166 RETAINING RING 1
4 000-107-361 1387978 SCREW (Retaining Ring / D-R Gas Seal) 1

5 002-302-270 1388016 O’RING (D-R Gas Seal) 2
GAS SEAL ASSEMBLY INTAKE END 6 002-302-269 1388015 O’RING (D-R Gas Seal) 3
7 002-302-260 1388012 O’RING (Inner Seal Labyrinth) 1
8 002-302-246 1388010 O’RING (D-R Gas Seal) 1
1 1000154744 1388087 GAS SEAL 1

2 1000159341 1388165 INNER SEAL LABYRINTH 1

3 1000159372 1388166 RETAINING RING 1
4 000-107-361 1387978 SCREW (Retaining Ring / D-R Gas Seal) 1
GAS SEAL ASSEMBLY DISCHARGE 5 002-302-270 1388016 O’RING (D-R Gas Seal) 2
END 6 002-302-269 1388015 O’RING (D-R Gas Seal) 3
7 002-302-174 1388008 O’RING (Inner Seal Labyrinth) 1

8 002-302-246 1388010 O’RING (D-R Gas Seal) 1
97
Appendix F
Dry Gas Seals FMECA
98
Failure Modes Effects and Criticality Analysis (FMECA)
Equipment: Gas Compressor Technology: Centrifugal Compressor Issued by: Domingos Cúnua MASSALA
Subsystem/Component: DRY GAS SEAL
Phase 1 Phase 2 Phase 3 Phase 4
Detectabilit
Detectabilit
Risk rating
Risk rating
Likelihood
Likelihood
Severity
Severity
Equipment Detection Recommended
Potential Failure Modes Potential Effects of failures Potential causes of failure RPN Recommended actions RPN
y
Function Mode Frequency
1. Contamination from process

gas: Contamination from
process gas can occur when High increase
there is unsufficient sealing gas of pressure 21 Dry Gas Seal PI-Core Sight monitoring Ever 8
1
pressure, allowing process gas in vent line
to come into contact with seal
ring face
2. Contamination from gas seal

supply: The supply of sealing
1. To eliminate or decrease the contamination
gas which is inserted into the
failure 2, seal gas conditioning systems could
seal is a main source of High increase
be added which would lead to improved dry
contamination of the seal. This of pressure 21 8
1
gas seal reliability and availability.
type of contamination takes in vent line
place when the seal gas is not
Dry Gas Seal PI-Core Sight monitoring
correctly treated upstream of
the dry gas seal
Contamination: Foreign
materials, either solid or
liquid, passing into the thin Contamination can damage the seal
3. For the failure cause number 3. It is
seal running gap between and create a big leakage
3. Incorrectly fitted filters, which High increase recommend to change the compressor gas
the rotating and stationary
causes gas bypassing of of pressure 21 seal filter elements to less micron filters if they 8
1
rings.
elements in vent line are presently of a greater particle size.
4.To prevent the contamination faliure cause 4,

separation gas must be injected between the
High increase seal and bearing for those compressors that do
of pressure not use this technique. Also using, carbon
in vent line. bushes and/or an oil slinger could aid in
.
21 stopping the oil movement. In addition, Ever 8
9
1
Lube Oil increasing the size of the bearing lube oil
4. Contamination from Lube oil
inside the return line and tank breather decreases
migration towards the seals.
Seal backpressure which in turn would help against
this type of contamination by reducing the
Avoid gas leakage from the shaft of compressor
migration of oil.
To avoid this type of failure, the temperature-

1. Condensation during
pressure relationship of the seal gas must be
operation: Decrease of Gas
studied. Consequently, simulating the seal gas
Seal Pressure. Components of
temperature and pressure drops expected
the gas seal system such as
throughout the different components within the
filters, valves, orifices, and the
gas seal system is required. The resulting data
seal faces themselves, will
Machine gets can then be plotted on a phase diagram of the
cause seal gas pressure drops 36 8 1 8 1 8
9
block seal gas. If the sealing gas goes through a

during operation. As the seal
liquid phase, particular filtration and liquid
expands across these
separation equipment, and possibly heating of
components, the Joule-
the sealing gas may be necessary.
Condensed liquids (water or C6+ Thompson effect will result in a
hydrocarbon) formed in the sealing consequent decrease in gas
gas can produce a sticky substance temperature.
Condensation
that adversely affects elasticity of O-
rings and springs, resulting in
increased leakage rates.
2. Condensatsion when
machine is stop: If machine is
stopped for more than few
minutes, the gas in compressor
Depressurize the compressor imediately after
and DGS
Machine gets a certain stop.
system can cool, increasing the 36 8 1 8 1 8
9
block
risk of
condensation in the seal on re-
starting. In
particular if compressor is not
depressurized
Analyze the trends from vibration condition

This failure can be a result of an monitoring to follow the performance of the
excessive compressor seal;
Misalignment can cause repeated Excessive
Misalignment misalignment, which can be due 27 Every Weekend 8 1 8 1 8
9
balance diameter o-ring failures. Vibration

to the unusual axial motions of Dry Gas Seal PI-Core Sight monitoring
compressor.
Reverse pressure in the seal can

disrupt the normal seal face forces. If
these forces are not correctly 1. Suitable instrumentation to avoid such
balanced then the axial gap between incidents, or a check valve could be installed in
the back surface of the seal face and L- the leakage line to flare;
sleeve can open up, thus allowing the
Oring to escape from its groove.Then High increase 2. Check for which pressure values the seals
Reverse Pressurization when the normal pressure is returned, of pressure 21 are designed for reverse pressure. If is there a 6 1 6 1 6
7
the Oring may remain out of its in vent line need to change, changes must be done
groove, thus providing a leak path. according to process requeriments
When face contact occurs on reverse
pressurization immediate seal failure
can take place; Dry Gas Seal PI-Core Sight monitoring
99
100

Thesis Domingos Massala

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis Domingos Massala

Uploaded by

Copyright:

Available Formats

RAM Analysis applied to centrifugal gas compressors

"Case study of an Oil and Gas Company"

Domingos Cúnua Massala

Thesis to obtain the Master of Science Degree in

Supervisor: Prof. Virgínia Isabel Monteiro Nabais Infante

Palavras-chave: Análise RAM, Compressores centrı́fugos, Fiabilidade, Manutenção,

2 RAM Analysis theoretical background 11

A FMECA Severity vs Occurrence Procedure 85

C Compressor Major Overhaul activities 89

F Dry Gas Seals FMECA 98

4.1 Consequence level (A): Effect on Health Safety and Environment . . . . . . . . . . . . . . 51

5.1 DALIA Compressors failure by System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

A.1 FMECA criteria for severity, likelihood and detection ratings . . . . . . . . . . . . . . . . . 86

1.1 Offshore platforms used in Oil and Gas . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

4.1 The Block 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1 Pareto Analysis: Total down time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

API- American Petroleum Institute

1.1 The Oil and Gas Upstream Industry

(a) Drill Rig [5] (b) FPSO [5]

(c) Semi-submersible [5]

Figure 1.1: Offshore platforms used in Oil and Gas

Figure 1.2: Typical Oil and Gas reservoir [6]

Figure 1.3: Oil and Gas Production Overview [8]

Figure 1.4: Illustrative gas compression PID [11]

1.2 Literature Review

1.2.1 Centrifugal Compressors

(a) Centrifugal Compressor [15] (b) Reciprocating Compressor [16]

Figure 1.6: Types of compressors

1.3 Topic Relevance and Motivation

• Identify the critical components in compressors;

• Determine the reliability values;

• Estimate the availability values of gas compressors;

1.5 Thesis Outline

RAM Analysis theoretical background

”Reliability is still seen as a focus for just a few large pieces of

2.1 Introduction to RAM Analysis

2.2 RAM Analysis characterization

2.2.1.1 Reliability concept and mathematical theory

(a) Survival probability R(t) (b) Failure probability F (t)

Figure 2.2: Survival and Failure probability [1]

R(t) + F (t) = 1 (2.4)

d(F ) F (t + ∆t) − F (t) P (t < T ≤ t + ∆t)

P (t < T ≤ t + ∆t | T > 0) F (t + ∆t) − F (t) 1 f (t)

2.2.1.2 The Mortality Curve

2.2.1.3 Life Distributions (Reliability Models)

f (t) = λe−λt , t ≥ 0 (2.14)

and by applying the equation (2.4), then:

Figure 2.4: Exponential distribution [37]

and wherein the respective parameters have the following meanings:

• β: Shape parameter, indicates the failure rate characteristic:

– β < 1 : Infant mortality;

– β = 1: Useful life (random failures);

– β > 1 : Wear out failures.

The Mean Time to Failure (MTTF) can be calculated by:

Figure 2.5: Effect of the parameter β [37]

Maintainability is a characteristic of an item, expressed by the probability that a preventive maintenance

Figure 2.6: States of a repairable component [33]

Then, the average availability can be written as follow:

2.3 Pareto Analysis

2.4 FMECA Analysis

Figure 2.8: Example of an RBD

2.5.1 Series System

Figure 2.9: Series systems