You are on page 1of 8

X -2011

AVAILABILITY AND SAFETY OF INDUSTRIAL


SYSTEMS
Konstantin Dimitrov
Abstract: The present paper describes the development and the application of an
enhanced methods, as well as some particular approaches for analysis, achievement
and monitoring of overall availability in industrial systems. Specific approaches for
increasing the systems availability over computer-controlled industrial systems are
also defined. Particular analysis, focused on the correlations, existing between the
systems availability, the systems safety and the systems ability for modification was
also performed in the actual study.
Keywords: availability, safety, ability for modification industrial systems.
1. Introduction
The majority of the industrial processes and systems should be able to carry out
their assigned functions for very long periods of time, i.e., without even short term
faults and failures, (which could respectively degrade the safety of operations and
generate considerable economic losses), i.e., they should possess an enhanced
systems and/or process availability [1], [2], [7],[10].
In fact, the systems availability represents the degree to which a system, a
subsystem and/or any other type of industrial equipment remains in a specified
operable and committable state at the start of its mission, even when the mission is
cancelled for/at an unknown, i.e., a random, time [2], [3], [10]. In other words the
systems and/or process, availability is the proportion of time, when a system (a
process) is in a functioning condition [2]. [3]. This feature can also be expressed as
a mission capable rate [2], [3], [7].
The system safety concept calls for a risk management strategy based on an
identification, analysis of hazards and application of remedial controls using a
systems-based approach [6], [8], [11]. Such kind of concept is different from the
traditional safety strategies, which rely on control of conditions and causes of an
accidents, based either on the particular, or as a result of investigation of individual
(past) accidents [4], [5], [9]. The concept of system safety is useful in demonstrating
adequacy of technologies, when difficulties are faced with probabilistic risk analysis
[6], [7], [8].
A systems-based approach to systems safety requires the application of
scientific, technical and managerial skills to hazard identification, hazard analysis and
elimination, control, or management of hazards, throughout the life-cycle of an
industrial systems [6], [8].
Some enhanced methods, as well as a few particular approaches for analysis and
monitoring of the overall systems availability are proposed in the present paper.
239

X -2011
Specific approaches for increasing the systems availability over computercontrolled industrial systems are also defined. Particular analysis, focused on the
correlations, existing between the systems availability, the systems safety and the
systems ability for modification was also performed in the actual study.
2. Monitoring and enhancement of the overall systems availability.
2.1. Definition of a two-state model for a reparable system.
The general scheme of a two-state model of reparable system is presented in
Fig.1.
TBF1,2

Up

TTF1
Uptime 1

TBF2,3

TTF2
Uptime 2

TTF3
Uptime 3

Down

Downtime 1
F1

F2

Downtime 2
F3

Downtime 3

Fig.1. Two-state model of reparable system.

The analysis of the model, that, that system can be in either one of the twostates:
- Uptime, which determines the failure-free operation period(s) of the system;
- Downtime, which determines the repair period(s) of the system.
The significance of the Uptime period, could also be expressed via the Time
To Failure TTF-variable, while the sum of the Uptime and the Downtime, periods
represents the Time Between Failures TBF-variable. The beginning of every
Downtime period is expressed by the Failure event Fi. Generally all these
variables, as well as the occurrence of the failure events are stochastic by their nature.
The measure of the systems reliability, can then be expressed as a mean Uptime,
which is called Mean Time To Failure, or MTBF [3], i.e.,
1
n n

MTTF = lim

Uptime i

(1)

i =1

The mean Downtime, can then respectively be called Mean Time To Repair
or MTTR [3], and can be determined via the following relation,

240

X -2011
MTTR = nlim

1
n

Downtime i

(2)

i =1

Thus, the Mean Time Between Failures MTBF, could be expressed as


follows [3],
MTBF = MTTF + MTTR
(3)
The availability of a system As, can then be expressed as a ratio between the
Uptime and the total time, that the system is intended to be available, i.e.,
As =

1
MTTF
Mean Uptime
=
=
Mean Uptime + Mean Downtime MTTF + MTTR 1 + MTTR
MTBF

(4)

Then, a total systems availability can be close to 1, under the following


condition,
MTTR
<<1.
MTTF

(5)

In fact, the ratio MTTR / MTTF can be small enough, if,


a) the systems MTTF is extremely large, i.e., the achievement of large systems
availability via enhanced reliability
b) the systems MTTR is extremely large, i.e., the achievement of large systems
availability via small Downtime periods.
2.2. Enhanced methods and approaches for analysis, achievement and
monitoring of the overall systems availability.
In general, the requirements for achievement of high-level systems availability
As, are not so easy to be developed and applied in the industrial systems structures
and operation. The enhanced analysis, performed over the main types of failures and
faults reveals, that, there exist six main types of fault and errors, which could
eventually endanger the availability of industrial systems and processes. These fault
types are presented in Fig.2. and can respectively be grouped as follows:
- Physical failures and faults of the industrial systems structural components;
- Energy failures - caused by faults in the power supplies;
- External faults caused by disturbances and changes in the environmental
conditions;
- Human errors;
- Fault and errors, caused by wrong maintenance and/or repair techniques;
- Control systems failures.
The characteristics of relation (5) can now be utilized for a development of
two general methods for achievement of a high level systems availability, which
could be named respectively Method AHR and Method AFR please see
Fig.3.

241

X -2011
Components

Energy Supply

Control Systems

Failures

Failures

Failures

As

Human

Maintenance and

External

Errors

Repair Errors

Faults

Fig.2. General types of failures and errors, which can endanger the systems availability.

The core of Method AHR is to achieve high level systems availability via
development of enhanced systems reliability (i.e., high MTTF). The enhanced
systems reliability can therefore be obtained via two general approaches
- Approach AHR1: via an application of highly reliable systems components,
which compose the systems structure, i.e., it would be called a Perfect Reliability
approach, since a perfect reliability structure of the system is created;
- Approach AHR2: via an application of high level redundancy in the systems
structure, and/or diversity in the systems design, i.e., application of Design for
Reliability (DFR) approach. Such approach could be called a Fault Tolerant
approach, since it tolerates the failures to occur, but at the same time prevents their
dominance (their impact) over the systems operation.
The essence of the AFR methods is to achieve high level systems availability
through fast recovery of the failed components (i.e., low level of MTTR). The lower
level of the MTTR can be obtained via two types of general approaches.
- Approach AFR1: continuous (on-line) fault diagnosis, i.e., identification
and location of possible systems faults and failures via built-in self-diagnosing
capacities - such as diagnosing expert system, built-in test equipment with continuous
operation, etc.. Such approach could be called a Self Diagnosis approach, since it
diminishes considerably the periods for fault identification and fault location.
- Approach AFR2: development and application of technical and organizational
means for achievement of short time periods for elimination of the occurred faults
and failures. Such approach could be called Fast repair approach, since it provides
options for systems recovery via repairing, reconfiguration and restart of the system.
Of course, particular combinations from these approaches (e.g., combination of
fault tolerant approach with fast recovery techniques for the failed systems
components), will generate higher systems availability.
242

X -2011
The so-defined four basic approaches (AHR1, AHR2, AFR1 and AFR2), can be
applied to any type of industrial system. They also could be applied successfully in
computer-controlled industrial systems, but developed with slightly different aspects
of the following kind:
a) the simplest (classical) way would be to apply these approaches directly, to all
process characteristics and respectively to all systems components, including the
computer control system. The inconvenience here is, that, such an approach could be
rather expensive;
b) the application of more cost-effective techniques, resulting in a development
of systems structures, possessing different functional levels (Fig.4), and by
considering the fact, that, the systems availability requirements could be different for
the different functional levels of the industrial system (please see again Fig.4). In the
so-proposed technique, the overall systems availability could be generated by
providing a specific availability to each functional level of the computer-controlled
industrial system;
c) a substitution of the hardware redundancy by a software modeling, when
providing self diagnosis and fault-tolerance. In the present technique, some particular
software models of the monitored process could be utilized, as a substitute to
redundant sensors (in the control systems), or even an existing expert systems could
be applied for fault diagnosis, thus achieving short time periods for self diagnosis, as
well as fast repair (an eventual complication here is the increase in the complexity of
the software system).
Methods AHR
Enhanced
Reliability

AHR 1
Perfect
Reliability

Methods AFR
Fast recovery of
failed comp.

AHR 2
Fault
Tolerance

AFR 1
Self
Diagnosis

AFR 2
Fast
Repair

High Level
of Systems
Availability - As

Fig.3. Methods for achievement of high level systems availability.

243

X -2011

Emergency Control
Level 4 Direct Control of
Process Operation
Level 3

Coordination Control and


Supervisory Control

Level 2 Quality Control and Optimization


Level 1

Scheduling and Management Control

Fig.4. Different functional levels of the availability requirements in computer-controlled


industrial system.

3. Safety analysis, availability and ability to modify of industrial systems.


In general, the requirements for systems availability and systems safety are not
identical, but still very related to each other, i.e., the availability methods could have
a direct impact on the safety.
In many safety related systems, which can not be transferred to a so-called safe
state, the systems availability must be guaranteed, even in case of failure event(s). In
such cases, high level of systems safety implies for sure a high level of systems
availability.
In cases, when a high availability can be achieved via redundancy techniques (in
order to create a fault-tolerant system), the safety techniques are also (quite often)
based on the systems/process redundancy (in order to avoid the consequences of fault
events). Thus, the fault-tolerance approach (with easily recovered
systems
components) could be regarded also like some particular safety techniques, as far as
they preserve the correct operation of the processes (even during fault occurrences).
Only in those cases, when a high risks with systems degradation is involved, and
when multiple failures do occur (during the systems operation), the fault-tolerance
can not be considered as an adequate and satisfactory safety technique, which means
that some special safety techniques must be developed and applied over the systems
structures.
In order to develop a classification over the safety requirements in accordance
with an eventual risks involved, a specific risk graph has been proposed in a
German Industry Standard DIN 19250 87. This standard identifies the hazards by
their likelihood and potential severity and safety requirements can be grouped in
different classes (from 1 to 8).
In general, during the systems operation different faults and failures could occur,
but in addition to that, some unpredictable changes in the environment (i.e., in the
operation conditions) might also occur and could respectively endanger the systems
availability.

244

X -2011
This could be a quite common issue, since the majority of the industrial
processes are subjected to some necessary expansions and/or modifications, in order
to correspond to the modified exigencies in the final products features. The
consequences of all these facts is, that, during the implementation and testing of the
new and/or modified functions, the system itself might not be available a situation,
that could cause safety problems, economic losses etc.
Therefore, in addition to high level of availability, a particular ability for
modification should also be required and respectively developed in the systems
structure (especially when the monitored industrial processes are continuous). All this
means, that, in addition to Design for Reliability (or DFR) approach of the systems
structure a Design for Modifiability (or DFM) approach must also be developed
and included in the industrial systems structures (even at a very early stage of their
life cycle). The DFM approach must also provide options for implementation of new
control software (and its testing), while keeping the industrial process fully
operational.
4. Conclusions:
4.1. Some enhanced methods, as well as some particular approaches for
achievement, analysis and monitoring of overall availability in industrial systems are
developed and proposed in this study.
4.2. Specific approaches for increasing the systems availability over computercontrolled industrial systems are also defined.
4.3. Particular analysis, focused on the correlations, existing between the
systems availability, systems safety and systems ability to modify was also performed
in the actual study.
References:
1. Askin, R.G., etc., Modeling and Analysis of Manufacturing Systems, John
Willey and sons., 1998.
2. Barton, H.R., Availability analysis of series-parallel systems, Proc., ANN.
Reliability and Maintainability Symposium, Atlanta, 1989, pp. 516 521.
3. , .., ., , ,
, 1994, 1999.
4. Dimitrov, K.D., Fault diagnosis and reliability enhancement of XPS
thermoforming and sensors control system, Scientific Symposium HIPNEF 2009,
October 14, pp 124-129, Republic of Serbia.
5. Isermann, R., et al., Model-based fault diagnosis and supervision of machines
and drives., 11-th IFAC World Congress Talinn, 1990, pp. 1-12.
6. Kretsovalis, A., R.S.H. Mah, Effect of redundancy on estimation accuracy
in process data reconciliation, Chem. E. S., Vol.42 pp.2115 2121., 1997.
7. Lauber, R., Impact of computer-aided development support systems on quality
and reliability, Proc. COMPSAC92, 1992.
245

X -2011
8. Mellefest, K.A., Classification system for industrial process control with
safety functions, IFAC Workshop, Sept.28-30, Bruges, 1998.
9. Musa, J.D., et al., Software reliability, New York, Mc Graw-Hill 1987.
10. Saglietti, F., Strategies for achievement and assessment of faulttolerance, 11-th IFAC World Congress, Tallinn, 1990, vol. 7, pp.273 278.
11. Shatz, S.M., J.P. Wang, Models and algorithms for reliability-oriented
task allocation in redundant distributed systems, IEEE Trans. Reliability, vol. 38.,
No1, April 1989, pp. 16-27.
Author Data:
Konstantin Dimitrov Dimitrov, Assoc.Prof. D-r Eng., Vice Dean of the
Department of Mechanical Engineering, Chair of Engineering Logistics, Technical
University Sofia, Bulgaria, Tel. 965 3895, e-mail : kosidim@abv.bg.

246