Safety Critical Computer Systems: Failure Independence and Software Diversity Effects On Reliability of Dual Channel Structures

Print ISSN: 1312-2622; Online ISSN: 2367-5357
DOI: 10.1515/itc-2015-0011
Safety Critical Computer Systems: Failure

Independence and Software Diversity
Effects on Reliability of Dual Channel
Structures1
H. Hristov, W. Bo
Key Words: Safety Critical Systems; reliability; failure independence; outages of SCTP that create coercion to remove the failure
diversity; dual channel structures. and allow the process to continue. Such systems are said
Abstract. The paper examines Safety Critical real-time Systems to have fail-safe behavior and their failures are
(SCS), in particular their dual-channel structures. The analytical safety failure [1].
study on computer-based SCS recognizing failures comparing the The second group includes such systems where the
results of both channels is based on the theory of reliability. The fail-safe behaviour is inappropriate. Due to the nature of the
aim is to establish reliable models that include independence rate
between failures of both channels and their diversity. The created process in aviation, space transport, air traffic control, in
formulas are used to calculate the function of reliability, probabil- life-supporting systems, etc., in most cases it is not pos-
ity of safety failures and hazard failures. The intensity of failures sible to define the criterion of safety post- failure condition
of the system and the intensity of its recovery, the ratio between or behaviour. Each stopping of the process is inadmissible.
Common Mode Failures (CMF) and failures of individual chan- Similar SCTP have to meet the requirements for availability
nels are explicitly presented in the models. for the independence
of hardware and software Faults are introduced. The analytical or continuity of SCTP. These cases also refer to SCS but
models obtained allow evaluating the effects of independence of they are no longer fail-safe. There is a general criterion,
channels and the depth of their diversity. A formula for calculating according to which systems fall into this class. It is the
the improvement of safety thanks to the dual-channel architecture admissibility of risk arising from a possible failure. The risk
in comparison to a single-channel structure of the same output
is standardized with safety standards that are universal,
reliability parameters has been worked out. Analytical models can
be used to calculate indicators of systems in order to establish irrelevant to the technical solution and concern both exam-
their compliance with safety standards. Analytical models can be ined groups.
used to calculate indicators of specific systems to determine their
compliance with safety standards. The results make possible to 1.2. Safety Standards
find out technical solutions with better reliability and safety fea- Safety is absence of inadmissible risk. Safety cannot
tures.
be absolutely guaranteed, whatever if any measures are
taken. There is always hazard. The problem is whether re-
sidual risk Qd is acceptable, admissible. Admissibility is
1. Introduction determined by regulations [4]. The admissible value of hazard
1.1. Safety Critical Systems (Tolerable Hazard Rate) can be different and depends on
the application of SCS. Therefore, it is spoken about levels
Safety Critical real-time Systems (SCS) control special
of admissible hazard – Safety Integrity Level (SIL) [6,7].
critical technology process or operation (SCTP) whose fail-
Standard EN 50126 has legally established Railway Signal-
ures could lead to loss of lives, great human and material
ing & Interlocking safety. The quantitative standards are
values and/or inadmissible damage to the environment. There
defined according to application of SCS:
are a lot of examples of SCTP and systems that govern them
SIL1 − Qd = 1.10−6, SIL2 − Qd = 1.10−7, SIL3− Qd = 1.10−8,
in transport [1], aviation [2], medicine, energy sector, mili-
SIL4 − Qd = 1.10−9. Standards can be identified also for
tary sphere, etc. The hardware, software and transmission
aviation (FAA), automotive (ISO 26262), medical (IEC 62304),
tract of systems that process and transmit information are
nuclear (IEC 61513) industries, etc.
subjected to increased reliability and safety requirements.
According to the SCTP nature, technical solutions are 1.3. Technical Solutions
divided into two main groups. Once the quantitative standards of safety have been
The first group includes SCTP suitable to define the established, it is not relevant for the user how they have
criterion of safe behavior after the failure of their control been achieved. Although it is not mandatory, in control of
systems, a criterion to limit functionality or stop the con- RTP of the first group it may turn appropriate to apply fail-
trolled process [3]. These are undesirable but not hazardous safe approach. The modern computer-based systems use
modification of this principle (quazi fail-safe) as in figure 1:
1
The authors gratefully acknowledge the support of Ningbo
a computer device F is assigned to operate as functional
University of Technology K. C. Wong Education Fund.
information technologies
and control
2 2014 9
Unauthenticated
Download Date | 2/5/18 8:41 PM
and another specific device K controls if it perform its It is not possible to determine probability distribution
functions correctly and without failures [8]. If it detects a pi (i) of incorrect vectors with any possible Fault and Error,
failure, it switches off the controlled object CO, which is the which, moreover, can not be foreseen and given in ad-
determined safe condition. vance. Without great detriment of accuracy for the model,
With control on processes of the second group, safety equal probability distribution can be assumed, wherein prob-
is achieved through high reliability. For this purpose Re- ability to obtain the i-th vector from the incorrect vectors
dundancy in different forms is used: dual channel struc- with length v is q i = 2 − v. It is the same for anyone else.
tures, N-version programming, homogeneously or diversity This also applies to functional vector Yi, which is one of all
reservation, TMR or more M ∨ N complex ones, ring 2v vectors and arises after a failure with the same probabil-
ity. All other incorrect vectors are with summed probability
network structures, etc. Special requirements are put also to
after failure to appear any incorrect vector
the telecommunication systems, which must contain all
necessary Safety related mechanisms [5]. 1 2v − 1
(1) pΣ = 1 − = .
2v 2v
Let probability of any failure in the microprocessor is
Q. Then probability Qi for post-failure appearance of the
i-th vector (i = 1, 2 v) will be
F K СО 1
(2) Qi = q i Q = Q,
2v
and probability Qhazard for appearance of any incorrect
Quazi fail-safe vector is
v
−1
(3) Qhazard =2 Qv .
Figure 1. Generalized architecture of computer-based fail-safe
2
systems
The probability for incorrect output signal with vector
1.4. Why Not «1 ∨ 11» length v = 8 will be
This will be demonstrated with a simple example that (4) Qhazard = 0,996Q.
for the needs of SCS it is inadmissible to apply usual struc-
ture «1 of 1» [8]. It can be seen that hazard Qhazard is approximately
With input vector Xi(x1 x2…, xw) at the outputs of equal to the probability of failure, i.e. commensurable with
microprocessor μP (figure 2), a combination of logic signals unreliability. In a complex system containing thousands of
Y(y1 y2…, yv) appears and these signals form a vector of components, even when they are of marginal low intensity
length v bits. When the device is serviceable, the output of failures of order of λ = 1.10−9 1/h, the best average time
vector is determined by the algorithm, internal memory and that can be achieved between failures MTBF is of
input data. The expected functional vector is Xi,i, a result of 10 000 ÷ 100 000 h [10] or probability of failure
transformation Xi → Yi. Q = 1.10−5 ÷ 1.10−6 . These are values that are of magnitude
With failures of the microprocessor, incorrect vectors several orders greater than the admissible one. The conclu-
can be obtained, which are distinguished from the real (func- sion is that concept «1 of 1» is inapplicable in SCS.
tional) vector Yi by one, two or more digits (d= 1,2,3..). 1.5. Dual-channel Solutions
Information in telecommunications is transferred with inde-
pendent bits of the telegram and the code distance between With all variety of conceptual and specific technical
vectors is a powerful tool to detect distortion and correct solutions mentioned above, there is a class of systems that
errors in the communication channel. Microprocessor pro- have obtained the widest distribution: duel-channel struc-
cessing is quite different. Heming distance d is not protec- tures [1,7,8]. Both structural units are constantly switched
tion means. The probability of appearance of incorrect vec- on "2 ∨ 2" scheme or the reserve one is switched on as
tors does not depend on their Heming distances to the standby "1+1". These systems are used in all areas of SCS
correct vector. application by world-famous companies such as Siemens,
Bombardier, Thales, etc.
The dual-channel principle is a possible solution to
0 1 the problems created by computer-based Safety Critical
1 Х 0
Y F-K structure in figure 1. The most significant of them is
1 1 recognition of failures that control device K must possess
0 µP 1 to identify the failure and promptly switch to the defined
0 0 post failure rate. With absolute tests, the failure is identified
slowly and technically inefficiently.
One of the most widely used tests is a relative test,
Figure 2. Summarized scheme of microprocessor wherein vector X applied to the input is processed in two
10 2 2014 information technologies

and control
Unauthenticated
CANNEL 1 CO
X
Y1
COMPARATOR OK
K
CANNEL 2
Y2
X
Figure 3. Dual-channel structure

channels, 1 and 2, and output vectors Y1 and Y2 of the The hazard in Safety Critical dual-channel systems,
same length are compared using principle “is – is”. Their consisting of removal of the non-functional and possibly
compliance is a measure of performance, where comparator hazardous signal, can be created by two groups of faults
gives OK to implement control on the object (figure 3). The and errors:
control device K includes conditionally a comparator and 1. Common to both channels causes for failures –
the second channel. Common-Mode Failure (CMF) [11] that:
• are due to errors in the general specification, in
1.6. Subject and Purpose of the Study manufacturing and operation, to environment effects such
Dual-channel computer-based structures are well as electromagnetic interference, weather conditions, etc.;
known and widely used but there are reliability properties • are contained in components common for both
and features that are not completely studied. They are the channels: power supply, input-output organization, com-
subject of study in this paper. The examinations are fo- mon software, comparison, etc.
cused in particular to dual-channel computer-based sys- 2. Simultaneous independent failures in both chan-
tems for comparison of the type given in figure 3. nels cause erroneous, but accidentally one the same vec-
The aim of the study is to establish analytical models tors, due to which failures remain unidentified (accidentally
for quantitative determination of reliability and safety of non identificable – ANI.
dual-channel SCS considering the effect of two factors: The first group. CMF-causes are common for the two
independence of failures in the channels and diversity of channels and affect both end results in the same way. Let
their software. Similar theoretical results can be found in take η -Fault, and the intensity of failures caused by them
[11,12,13] but they have been made with another formula- is λη. They lead to unrecognisable by comparison failures.
tion of the problem, relate only to some reliability indicators The second group includes generally recognisable
and do not take into account the effect of both factors. Fault. Let take α-Fault, and the intensity of failures caused
There is no quantitative assessment of effect "2 ∨ 2" on by them is λα. The safety of a dual channel system is based
safety and structural parameters influencing it. on the assumption that the channels are independent,
CMF-causes are minimized and probability of unrecognized
α-Fault is small enough.
2. Independence of Channels Of course, the relationship between the main flow of
2.1. Two Groups of Reasons for Undetected failures λ2 v 2 of both kinds and the reasons causing them
Failures are mediated by the input data and operation algorithms of
channels.
It is known that if two events A and B are indepen-
dent, then the conditional probability of each, provided that 2.2. Probability of Accidentally Undetected
the other one has occurred, is equal to its unconditional Failures
probability – the probability to happen event A does not
In order to appear erroneous but equivalent output
depend on whether event B has or has not occurred, i.e.
vectors, they must be warped in the same manner in both
equality P(AB) = P (A).P(B) is fulfilled.
channels and become one and the same post-failure vector
The flow of failures and recoveries in the systems of
Yj. The probability of this to happen with equally probable
electronics, computers and telecommunications is recog-
distribution of incorrect vectors is
nized to be a Poisson flow [13,15]. It has the property lack
of after-effect, which means the independence of succes- 1 1 ⎛ 1 ⎞
2
sively occurring failures. Any failure does not cause a next (5) q i1 q i 2 = v v
=⎜ v⎟ .
one and is not correlated with it. This independence is valid
q1 = q 2 2 2 ⎝2 ⎠
for the failures of individual channel but it spreads in both Example. Let v = 3, and vectors be labeled in an
channels. ascending order of decimal numbers. 8 vector are obtained:
and control
2 2014 11
Unauthenticated
000 1 100 5 3. Influence of Independence
001 2 101 6 of Failures on Reliability-Safety
010 3 110 7 Features of a Dual-Channel System
011 4 111 8
3.1. Equivalent Reliability Scheme
After a failure in dual-channel system „2v2”, sets of
combinations of vectors in both channels can be formed System «2v2» is serial in reliability. If any of its ele-
according to matrix (6). ments fails, the system also fails. If channel 1 generates a
flow with intensity λα1, and channel 2 generates a flow with
intensity λ α 2 , then the overall intensity is α -Fault
(figure 4a, 4b)
(6) Channel 1 (8) λα = λα1 + λα2.
11 21 31 41 51 61 71 81 With channels with equal reliability, the intensity of
failures in each one is
12 22 32 42 52 62 72 82 (9) λα1 = λα2 = 0,5 λα.
13 23 33 43 53 63 73 83 The intensity of all failures is the sum of intensities
Channel 2 14 24 34 44 54 64 74 84 of all the components of the pattern (figure 4b, 4c)
15 25 35 45 55 65 75 85 (10) λ2 v 2 = λα1 +λα2+λη = λα +λη = λαf +λαε +ληf +ληε.
16 26 36 46 56 66 76 86
3.2. Function of Reliability R(t)
17 27 37 47 57 67 77 87
The dual channel computer-based system is reliable in
18 28 38 48 58 68 78 88
two cases:
1. When there is no CMF and in both channels there
are no recognisable failures by α-Fault;
At each moment only one vector is correct: functional 2. When failures recognisable by comparison have
vector Yi. Let it be No 3. In the matrix shown above com- accidentally caused functional instead of incorrect vectors.
bination 33 is marked, which means that both channels have In the first case the system is serviceable: no CMF
generated this vector. The controlling effect with reliable (Rη) and each of the two channels with reliabilities Rα1and
operation is validated (OK) with this combination. However, Rα2 is serviceable: R' = RηRα1Rα2.
OK at the output of comparator will be obtained not only In the above-quoted example (6), this means that the
in the 3rd position. The comparator will be “misled” also in functional combination 33 can also be obtained when fail-
the j-th , k-th positions, etc., entirely in all other positions ures are in one, and even in both channels. Since events
2v -1 with the same numbers of vectors. The probability to are incompatible, the reliability of these components must
2
be added together to give a total reliability for the second
obtain any of them is q.q = ⎛⎜ 1 ⎞
v ⎟
. In this particular case they case.
⎝2 ⎠
In the second case in one of the two channels or in
are 7. The probability of false OK will be (1 − q)q2, and
both channels there is α-Fault but by chance they have
probability of accidental non-identification Qani will be
generated a functional vector, Yi. In the example mentioned
2 v Qα 1 Qα 2 above (6) it means that functional combination 33 can be
(7) Qani = (1 − q ) q Q1.Q2 = ( 2 − 1) . ,
2v 2v obtained also with failures in on of the channels and even
here Qα1 and Qα2 are probabilities of appearance of failures in both channels. Since the events are incompatible, these
in channel 1 and channel 2. components of reliability have to be summed to obtain the
Equation (7) can be used to quantitatively evaluate total reliability for the second case
the probability of accidental non-identification Qani of simul-
taneous independent failures in both channels. ⎛ Q Q Q Q ⎞
(11) R' ' = Rη ⎜ Rα1 αv2 + Rα 2 αv1 + α12v α1 ⎟.
⎝ 2 2 2 ⎠
α1 α2 η
a
b
α η
с
ηαf ηαε αηf αηε
Figure 4. Equivalent circuit of a dual channel structure

and control
Unauthenticated
η CMF not recognize a failure is
⎡ Q Q ⎤
Qni 2 v 2 = 1 − Rη ⎢1 − (2v − 1) αv1 . αv2 ⎥ .
α (16)
⎣ 2 2 ⎦
With exponential distribution and channels equally
α ani reliable, the probability of an unidentified failure is
− λη t
⎡ ⎛ 1 − e − 0 , 5λα t ⎞
2
⎤
(17) Qni 2 v 2 = 1− e ⎢1 − ( 2 v − 1)⎜⎜ ⎟⎟ ⎥.
Figure 5. Logical circuit of non-identification of failures Qα 1 = Qα 2
⎢⎣ ⎝ 2v ⎠ ⎥⎦
Furthermore, the first case and the second one are 3.4. Probability of a Safety Failure Qs(t)
incompatible, and therefore the sought probability R2 v 2 for
the dual channel system "2 ∨ 2" to be serviceable is mod- Safety failures are created only by α-Fault, CMF fail-
elled with their sum, which after processing is reduced to ures do not affect Safety status. The failure is recognized,
and OK signal is hung in two cases.
⎡⎛ Q ⎞⎛ Q ⎞⎤ In the first case, one of the channels operates and the
(12) R2 v 2 = R'+ R" = Rη ⎢⎜ Rα 1 + αv1 ⎟⎜ Rα 2 + αv2 ⎟⎥
⎣⎝ 2 ⎠⎝ 2 ⎠⎦ other has failed, or vice versa, but the output vector of the
failed one is different from the correct vector of the channel
With equal reliability in the two channels and homo-
operating. Taking into account that the events are incom-
geneous Poisson process with constant event rate
patible and the failed channel can generate any of i 2v–1
(λ = const), the dual channel structure reliability is obtained
incorrect vectors, the probability of this is a sum of prob-
in the form
abilities, i.e.
2
⎛ −0,5λα t 1 − e−0,5λα t ⎞
(13) R2v2 = e
−λη t
⎜⎜ e + ⎟⎟ 2v − 1 2v − 1
λ =const
⎝ 2v ⎠ (18) Q 's 2 v 2 = v
Qα 1 Rα 2 + Qα 2 Rα 1 .
2 2v
In the second case both channels have failed and
3.3. Probability of Hazard Failure Qh(t)
generate different output signals (with different numbers,
The logic, connected with the two reasons for an see (6)). The probability of this is the sum of probabilities
unidentified and possibly hazard failure (p. 2.1), is shown for all such cases
graphically in figure 5. 2
CMF-failure events and accidentally unrecognised ⎛ 2v − 1 ⎞

(19) Q ' 's2v 2 = ∑q j Qα 1q k Qα 2 = ⎜
qk = q j = q⎜ v
⎟⎟ Q α 1 Q α 2 .
α -failure, unlike the events in p. 3.2 are compatible and ∀j≠ k ⎝ 2 ⎠
their total intensity is not the sum of their partial intensities. Since the two cases are mutually exclusive, then the
The Boolean function of non-identification Fni, when a probability of safety failure of the dual-channel structure is
failure is not logically detected, is the sum of their probabilities
2
1 1
(14) Fni = zη ∨ z ani , (20) Qs2v 2 = Q's 2v 2 +Q' 's 2v 2 =
2v − 1 ⎛ v ⎞
(Qα1Rα 2 + Qα 2 Rα1 ) + ⎜⎜ 2 −v 1 ⎟⎟ Qα1Qα 2 .
v
2 ⎝ 2 ⎠
where logical variable zη0 / 1 (1 when there is a failure, With exponential distribution and equality of
reliabilities in both channels, the probability of safety failure
0 when there is no failure) is the logical variable of failures
failures is
0 /1
with CMF – origin (p. 2.2) and z ani corresponds to acci- 2v − 1 ⎡ 2v − 1 ⎤
(21) Qs = v
(1 − e− 0, 5λα t ) ⎢2e− 0, 5λα t + v (1 − e− 0.5λα t )⎥ .
dentally unrecognised simultaneous failures in both
2v 2 Qα 1 = Qα 2 2 ⎣ 2 ⎦
channels.
To model the probability of non-identification of fail-
ures, logical-probabilistic transition from (14) has to be
4. Channel Diversity Effect
implemented. Having applied the theorem of De Morgan, an on Dual-Channel System Reliability
unrepeatable Boolean function in basis “conjunction-nega- and Safety Features
tion” Fni = z 1ani . zη1 , which is appropriate for complete sub- 4.1. Diversity
stitution, is obtained. Applying the rules of logical-proba- Dual-channel structure channels can be homogeneous
bilistic transitions, for the probability to not detect failures, and diversity.
it can be obtained Diversity is a method of solving the problem (logical,
(15) Qni2 v 2 = 1 – (1– Qη)(1– Qani), technical, etc.) in two different ways (A and B) based on the
where Qni2v2 is the probability sought for an unidentified same input data. As it is known, difference may consist in
failure, Qη – probability for unrecognisable CMF failure, and divergence of approach and method of problem solving, in
Qani – probability an accidentally unidentified recognisable implementation of various principles or various company
failure. Substituting from (7) to (15), for the probability to technologies.
and control
2 2014 13
Unauthenticated
Software diversity is most widely spread. The differ- result in non-serviceability of the entire system. The total
ence may occur in algorithms, programming languages, data intensity of failures is the sum of intensities of different
presentation (inverse, reverse), etc. It is usually achieved kinds of failures:
through diversity and independence of programming teams (22) λ2v2 = λα + λη = λαf + λαε + ληf + ληε = λαf + ληf + λαε+ ληε
solving both A and B versions of the problem. If the dif-
ference is in methods and algorithms, it is said to be artificial λα λη λf λε
(forced) diversity, which with coding can be achieved by where
one and the same team. λα – intensity of α – failures recognisable by compari-
Diversity is the most effective tool for detecting er- son of output results;
rors. Its effectiveness is due to the properties of errors λαf – intensity of α – failures due to faults;
unlike the properties of faults. When the causes for failures λαε – intensity of α – failures due to errors;
are hardware faults, failures are independent of whether the λη – intensity of η – failures unrecognisable by com-
channels are homogeneous or diversity. They have their parison;
own, specific to each α-Fault. No such dependence exists ληf – intensity of η – failures due to general faults;
with errors. Errors (with design, construction, programming, ληε – intensity of η – failures due to general errors.
documentation, technology, etc.) are systematic, «by birth», Two separate, partial metrics for independence of fail-
one and the same for all produced series. If both channels ures will be introduced due to two reasons:
according to copies of one and the same program, A ≡ B • For hardware faults f
errors of the only software lead to one and the same incor-
rect results and failures remain unrecognisable. When the
λαf λαf
(23) ϕ= = , λαf = ϕλf, ληf = (1−ϕ)λf
channels operate on various programs, A ≠ B errors are λαf + ληf λf
detected because they are not one and the same, are of
• For software errors ε
accidental nature and on random locations in software, due
to which lead to inappropriate results. In deep diversity λαε λ
channels there is no dependence of errors missing (λη ≈ 0) (24) Δ = = αε . λαε= Δλε, ληε = (1−Δ)λε
λαε + ληε λε
and they can be examined as α − Fault as hardware faults.
In this substitution the probability of an unidentified
4.2. Schemes of Diversity Implementations failure (17) takes the form
The scheme, which is used to implemented diversity,
⎡ ⎛ 1 − e − 0 ,5 ( λαf + λαε ) t ⎞
2
⎤
can be different, e.g.: = 1− e
− ( ληf + ληε ) t
⎢1 − ( 2 v − 1)⎜ ⎟ ⎥
(25) Qni 2 v 2 Qα 1 = Qα 2 ⎢ ⎜ 2v ⎟ ⎥,
1. 2H+2S: two channels 1 and 2 operate in parallel or ⎣ ⎝ ⎠ ⎦
in sequence over time under various programs. A and B can and with included partial metrics ϕ and Δ
be separate and independent processing, transfers, records,
⎡ − 0 , 5 (ϕλ f + Δ λ ε )t 2
⎤
etc. but “are supplied” with the same input information. − [( 1− ϕ ) λ f + (1 − Δ ) λ ε ] t ⎛ 1 − e ⎞
Q ni 2 v 2 = 1 − ⎢1 − ( 2 v − 1) e ⎜ ⎟ ⎥
(26) ⎢ ⎜ 2v ⎟ ⎥.
Hardware faults, pulse interference and software errors are ⎣ ⎝ ⎠ ⎦
recognisable. This scheme is the most efficient but very From general formula (26) it follows that:
inefficient in terms of resources. • When all failures are due to CMF-causes ϕ = 0 and
2. 2H+1S: two hardware channels operate in parallel Δ = 0, probability of non-identification is maximal
on one and the same program, A = C. Hardware failures are − ( λ f + λε ) t
recognisable but software errors are CMF-Fault and cannot (27) Q ni 2 v 2 max = 1− e = 1 − e − λ2 v 2t = Q 2v 2 ,
ϕ =Δ =0
be identified. When they operate synchronously and in i.e. all system failures remain unidentified but only one of
phase, no pulse interference are recognisable as well. them, the one that leads to a functional vector, is not
3. 1H+2S: two different programs are performed by one potentially hazard.
computer in sequence. Although created by independent • When all failures are independent ϕ → 1 and
teams, if the programs prove to be very close by the way Δ → 1, probability of non-identification is minimal
of using hardware, the same effect of failure on processing
⎛ 1 − e − 0 , 5 (λ f + λ ε ).t
2
and the resulting output vectors can be obtained. As a ⎞
result, relevant but incorrect results can be obtained and (28) Q ni 2 v 2 min ≈ ( 2 − 1) ⎜⎜
v
⎟
2v ⎟
failures will remain unrecognisable by comparison. ⎝ ⎠
4.3. Equivalent Scheme of Diversity System
Reliability 5. Improvement of SCS Safety
Failures can be due to both hardware faults f and of as a Function of Independence
software errors ε -Error. Hence reliability of any 2H+2S and Diversity
system depends on both α-Fault and η-Fault. In the present
context there is a system consistent in reliability (figure 4c).
5.1. Formal Model
Each of α, η, f ,ε – failures is independent from the others. Ratio ξ between probability of non-identification
No matter where failures occur, in channels 1 or 2, they Qni 2v2max (27) when the system is virtually reduced to a

and control
Unauthenticated
single-channel one and its current value for the general System 2H+1S. It is most often that one and the same
case (26) is introduced software is used in both synchronously working channels
1− e
− (λ f + λε )t and errors lead to unrecognisable incorrect output results.
ξ = There is no software diversity, failures due to hardware
(29) ⎡ ⎛ 1 − e − 0 , 5 ( ϕλ f + Δλε )t
⎞
2
⎤ .
1 − ⎢1 − ( 2 − 1 ) ⎜⎜ ⎟ ⎥ e − [( 1 − ϕ ) λ f + (1 − Δ ) λ ε ] t
faults are η and α and recognition in regard to errors is
v
⎢ 2v ⎟ ⎥
⎣ ⎝ ⎠ ⎦
practically zero Δ ≈ 0
Equation (29) can be used to calculate improvement
(in times) of safety due to greater independence of failures ⎡ ⎛ 1 − e − 0 , 5ϕλ f t ⎞
2
⎤
− [(1 − ϕ ) λ f + λ ε ]t
⎢1 − ( 2 v − 1) ⎜ ⎟ ⎥
in the channels. The maximum improvement of safety is (32) Q ni 2 v 2 = 1 − e ⎢ ⎜ 2v ⎟ ⎥.
⎣ ⎝ ⎠ ⎦
measured by ratio ξmax of the maximum (27) and minimum
(28) values of probability for non-identification of failures System 1H+2S. A dual-channel system of one hard-
ware channel with two different programs has metrics for
1− e
− ( λ f + λε ) t
hardware independence in interval ϕ = 0 ÷1, at that being
ξ max = 2 closer to the desired 1 as much as more differently the two
⎛ 1 − e − 0 , 5 ( λ f + λε ) t ⎞ [times] separate programs use hardware. The identification of fail-
(30) ( 2 − 1)⎜⎜ v
⎟
2 v ⎟ ures is also in the same interval and the cause of failures
⎝ ⎠
is errors ε. The formula for calculating is (26), the same as
With too realistic λ2v2 = λf + λε = 1.10 1/h, t = 1.104h
-4
in 2H+2S, but with the corresponding data of channels.
and v = 8 bits probability of hazard failure in the dual-
channel system will be reduced ξmax = 1 045 times. 5.3. Models for Recoverable Systems
5.2. Case Studies The reliability function R(t) of non-recoverable sys-
tems (or recoverable but until to the first failure) where
System 2H+2S. In individual hardware channels ϕ ≈ 1, operation time until failure t is included and coefficient of
and in the two parallel channels different, independent pro-
μ
grams are used. It is why software diversity is deep and availability K a = μ + λ of recoverable systems, which is
error recognition is very great Δ ≈ 1:
dependent on service restoration rate μ, are similar proba-
⎡ ⎛ 1 − e − (λ f + λ ε )t ⎞ ⎤
2
bilistic quantities. These quantities are used to measure the
⎢ v
⎜ ⎟ ⎥
(31) Qni 2 v 2 ≈ 1 − ⎢1 − ( 2 − 1)⎜ 2v ⎟ ⎥. probability of object availability. Using this analogy and
⎣ ⎝ ⎠ ⎦ summing up the results, formal models of reliability and
The improvement of safety is greatest and may be safety indicators are given in the table.
calculated by (30).
Formulas for determining reliability and safety indicators
Unrecoverable systems Recoverable systems
Reliability 2
⎛ 1 − K kα ⎞
2
⎛ Q ⎞ Ka = Kη ⎜ K kα + ⎟
R2v 2 = Rη ⎜ Rkα + kvα ⎟ 2v ⎠
and availability ⎝
2v 2
⎝ 2 ⎠
Probability ⎡ ⎛Q ⎞ ⎤
2
⎡ ⎛ 1 − K kα ⎞ ⎤
2
Qni = 1 − Rη ⎢1 − ( 2v − 1)⎜ kvα ⎟ ⎥ K ni = 1 − Kη ⎢1 − (2v − 1)⎜ v ⎟ ⎥
of uninitiated 2v 2
⎢⎣ ⎝ 2 ⎠ ⎥⎦
2v 2
⎣⎢ ⎝ 2 ⎠ ⎥⎦
failures
Probability 2v −1 ⎡ 2v −1 ⎤ 2v − 1 ⎡ 2v − 1 ⎤
Qs = v Qka ⎢2Rka + v Qka ⎥ Ks = v
Kka ⎢ 2Kka + v
(1 − Kka )⎥
of safety failure
2 v2
2 ⎣ 2 ⎦
2 v2
2 ⎣ 2 ⎦
with channels [
− (1−ϕ )λ f + [(1− Δ )λε ] t ] μ
Rη = e , Kη =
μ + (1 − ϕ )λ f + (1 − Δ) λε
,
of equal reliability
−0.5(ϕλf +Δλε ) t
and λ =const: Rka = e K ka =
μ
μ + 0 ,5(ϕλ f + Δ λε )
and control
2 2014 15
Unauthenticated
Figure 6. Non-identification of failures K ni2 v 2(λ, ϕ)
Figure 7. Improvement of safety ξ as a function of independence ϕ
6. Examination on Independence results are given only for one of the most common schemes:
2H + 1S. Considering that perfection of the software can be
and Diversity Effects achieved and demonstrated for relatively simple problems
using the methods of error-free programming [2], it is as-
To establish the effect of failure independence and sumed that software CMF causes have been reduced to
diversity of channels, calculations for the schemes exam- zero (λε = 0). It is only failures due to intensity faults λf that
ined in p. 5.2 have been carried out with different values of remain. Under these constraints according to (32) the prob-
parameters involved. Here, because of the limited place, the ability of non-identifications of failures in dual-the channel

and control
Unauthenticated
Figure 8. Dependence of unrecognised failures on the vector lengths
system with equal reliability of channels is is seen that independence of failures has particularly strong
effect on highly-reliable systems (λft → 0).
− (1 − ϕ ) λ f t ⎡ 2 −1 v
− 0 , 5 ϕλ f t 2 ⎤
(33) Q ni =1− e ⎢1 − 2v
(1 − e ) ⎥, The greater the intensity of failures and/or the older
2v2
⎣ 2 ⎦ system becomes, the smoller is the effect of independence
and recoverable systems with total intensity of recovery μ of channels. Thus with 8 bits, ϕ → 1 and λf → 0 it is reached
⎡ 2 v − 1 ⎛ 0 ,5ϕλ 2
⎤ to 154 202 times smaller probability of potentially hazard
μ ⎞
(34) K ni 2 v 2 = 1 − ⎢1 − 2 v ⎜ f
⎟ ⎥ failure and after aging improvement sharply declines. But
μ + 0 ,5 (1 − ϕ ) λ f ⎢ 2 ⎜ μ + 0,5ϕλ ⎟ ⎥.
⎣ ⎝ f ⎠ ⎦ even with values of interest in practice λf = 0,1[1/h] and
The results of function K ni2 v 2 (λ, ϕ) are shown in ϕ = 0,9, ratio ξ = 4942 times remains impressive.
figure 6. It can be seen that the probability of non-identi- Figure 8 shows dependency K ni 2 v 2(v, ϕ ) with
fication of failures grows with their intensity reaching the μ = 1.[1/h]. From the graphs it is seen that with increasing
highest values with ϕ = 0, when the structure is reduced to the length of vectors, probability of non-identification
a single channel one. The probability K ni2 v 2 sharply de- sharply decreases. The stronger this effect is, the bigger is
creases with increasing the independence of channels. This independence of channels.
process is particularly sensitive with ϕ → 1. This sensitivity From the results shown above it can be concluded
is even better illustrated in figure 7. that all design and technological measures have to be taken
In particular, attention should be paid to comparison to reduce CMF-component to zero. As for the hardware
of a dual-channel system with a single-channel one «1 v 1» solutions, this is largely achievable task. The situation re-
with various values of influencing factors. Using equation lated to the problem of software is different. Errors ε in
(33) and applying it to the case, the effect on safety is complex software systems are a source of CMF that can be
found overcome with small resources one of which is diversity Δ.
−λ f t
1− e
ξ= Conclusion
(35) 1− e
−(1−ϕ )λ f t ⎡ 2v −1
⎢1− 22v 1− e (
−0,5ϕλf t 2 ⎤
⎥ )
⎣ ⎦ This study is an attempt to model reliability and safety
−λ t
performance of a class of wide-spread Safety Critical Com-
1− e f puter Systems. It has become clear what and how reliability
ξ max =
when ϕ = 1: 2v − 1
2 2v
1− e
− 0,5λ f t 2
( ) and probability of recognised and unrecognised failures
depend on. It has been confirmed that independence of the
The graphs by this formula are shown in figure 7. It two channels in dual-channel structures is crucial for iden-
and control
2 2014 17
Unauthenticated
tification of failures and hence for safety operation of sys- Level (SIL). Evaluation Techniques, 2002.
tems. The contribution of this paper is the quantitative 7. EN 50126 EN50126. The Specification and Demonstration of
models used to evaluate these features. They show that the Reliability, Availability, Maintainability and Safety – RAMS.
positive effect on safety of dual-channel nature is extremely 8. Christov, Chr., N. Stoytcheva, M. Christova. Diversity as a
Mean for Reliability and Safety. © Springer-Verlag, Berlin Heidel-
strong, near the absolute independence of channels. When berg (2010), Transport Systems Telematics, Communications in
reliability parameters of specific systems have been studied Computer and Information Science, 104, 2010, 308-319.
by established formulas, their indicators can be calculated 9. Kantz, H. The ELEKTRA Railway Signalling System: Field
to determine compliance with safety standards (i.e. 1.3). Experience with an Actively Replicated System with Diversity.
Using these models and results of examinations carried out, Alcatel Austria AG, Wien, Austria, 1995.
it is possible to improve technical solutions or propose new 10. Thomson, Jim. Common-Mode Failure Considerations in High-
Integrity C&I Systems. Safety in Engineering. Retrieved November
ones with higher reliability and safety. 21, 2012.
11. Littlewood, B., V. Stankovic, L. Strigini. Introduction to
References Ñommon-mode Failure Probability and Diversity. City University
London, www.csr.city.ac.uk. 2009.
12. Martin, Sh. Reliability Computer Systems and Networks.
1. Teeg, G., S. Vlasenko, etc. Railway Signaling and Interlocking.
Wiley-Interscience, 2008.
Eurailpress. 2009.
13. Epstein, B., I. Weissman. Mathematical Models for Systems
2. Rierson, L. Developing Safety-Critical Software. CRC Press, Reliability. CRS HRESS, 2008.
2013.
14. Marcos Mainar Lalmolda. Testing Safety-critical Software Sys-
3. Hristov, H., V. Trifonov. Reliability and Security of Communi-
tems. The University Nottingham. 2009.
cations. Novi znania, 2007 (in Bulgarian).
15. Gindev, E. Introduction to the Theory and Practice of Reliabil-
4. Bowen, J., Y. Stavridon. Safety-critical Systems, Formal Methods ity (in Bulgarian). Academic Publishing House Marin Drinov, Sofia,
and Standards. – Software Engineering Journal, 1993.
.. 2000.
5. Franekova, M., P. Luley. Modelling of Failure Effects within
16. Hristova, M. Models and Algorithms for Use Redundancy in
Safety-Related Communications with Safety Code for Railway the Fault-Tolerance Systems (in Bulgarian). – Mechanics, Trans-
Applications. Mechanic, Transport, Communication, Sofia, 2015.
port, Communications, 2008, No. 1.
6. Elsevier, B. V. Safety Instrumented Functions – Safety Integrity
Manuscript received on 05.08.2015
Hristo Hristov, DSc, is Professor at the Wang Bo, born in 1980, got a doctor‘s
Technical University of Sofia. He degree in in Economics and now is an
graduated from the Mechanical and associate professor of Ningbo Univer-
Electrical Institute of Sofia, MSc sity of Technology China, doctoral tutor
programme in Telecommunications in of Sofia Transportation University of
1962. He defended a PhD thesis in Bulgariaÿand also is an honorary asso-
Moscow (1972) and a thesis for the ciate professor of Moscow National
degree of “Doctor of Sciences” at the Aviation University of Technology. In
Technical University of Sofia (1988). Academic research: Presided over five
He has developed and is a primary high-end projects launched by Chinese
teacher of various courses related Bureau of Foreign Experts; Presided over
mainly to critical safety systems, which and accomplished a longitudinal project
is also the subject of his textbooks. Eleven dissertations have been launched by the provincial education department; Presided over a
defended under his scientific supervision. Prof. Hristov is the author 800-thousand Yuan horizontal project launched by Ningbo Traffic
of over 330 scientific papers including 32 books (textbooks, manuals, Detachment; Took a major part in two projects on international
and monographs), 31 inventions and more than 50 scientific projects. cooperation launched by Ministry of Science and Technology of
He was awarded Honorary Gold Medal of the Technical University China (China-Russia(3/6), China-Ukraine(2/6)); Took part in many
of Sofia and was elected a member of the Transport Academy of municipal projects; Published nearly 20 papers as the first author
Russia, “Doctor Honoris Causa” of the St. Petersburg State Uni- on core journals; Scientific research funds nearly one million a year.
versity of Railways. He has been in cooperation with the University
of Technology of Ningbo, China, which has ordered the development Contacts:
of this paper. School of Xiangshan Research Institute
Contacts: Ningbo University of Technology
Ningbo University of Technology 201 Fenghua Road, Ningbo 315211
201 Fenghua Road, Ningbo 315211 China
tel: +86574 89526502 tel: +86574 89526502
China e-mail: bo305@hotmail.com
and Technical University – Sofia
8 St. Kliment Ohridski Boulevard
1756 Sofia
Bulgaria
tel: +35989581612
e-mail: cac@tu-sofia.bg

and control
Unauthenticated

Safety Critical Computer Systems: Failure Independence and Software Diversity Effects On Reliability of Dual Channel Structures

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Safety Critical Computer Systems: Failure Independence and Software Diversity Effects On Reliability of Dual Channel Structures

Uploaded by

Copyright:

Available Formats

Print ISSN: 1312-2622; Online ISSN: 2367-5357

Safety Critical Computer Systems: Failure

10 2 2014 information technologies

Figure 3. Dual-channel structure

12 2 2014 information technologies

CMF-failure events and accidentally unrecognised ⎛ 2v − 1 ⎞

14 2 2014 information technologies

Formulas for determining reliability and safety indicators

Unrecoverable systems Recoverable systems

Figure 7. Improvement of safety ξ as a function of independence ϕ

16 2 2014 information technologies

18 2 2014 information technologies

You might also like