You are on page 1of 6

Automotive meets Electronics  12. – 13.03.

2019 Â Dortmund

Reliability Assessment of a Redundant 12V On-board Power Supply


Using Solid-state Safety Relays
Fabian Schipperges, M.Sc., Dr. Ing. h.c. F. Porsche AG, Weissach, Deutschland, fabian.schipperges{at}porsche.de
Felix Jialei Luo, M.Sc., Universität Stuttgart, Stuttgart, Deutschland
József Gábor Pázmány, M.Sc., Dr. Ing. h.c. F. Porsche AG, Weissach, Deutschland
Prof. Dr.-Ing. Bernard Bäker, Institut für Automobiltechnik Dresden - IAD, Dresden, Deutschland

Abstract
The first part of this paper presents a new fault-tolerant and redundant on-board power supply concept for the supply of
safety-relevant electronic control units (ECU). Solid-state mosfet-based safety switches guarantee the protection of the
battery cells. Likewise, these switches are deployed to isolate and tolerate electrical system faults. Fault-tolerant and
redundant power supply systems are necessary to meet the requirements of the functional safety for automated driving.
ISO 26262 provides methods and metrics to ensure the functional safety of electronic systems in vehicles. These in-
clude different fault metrics for a quantitative evaluation. To determine these metrics, FMEDAs and FTAs are usually
applied. Additionally, ISO 26262 mentions Markov models as a further, but less widespread, quantitative evaluation
method. The Markovian approach allows the calculation of probabilistic ratios and the evaluation of any system state.
Therefore, in the second part of this paper, a Markov model is developed considering a safety function of the presented
power supply concept. Using this Markov model, we assess the impact of the failure rates and the diagnostic coverages
of the safety function on the system reliability.

same technological characteristics and mosfet-based safe-


ty switches. According to ISO 26262, this system is capa-
1 Introduction ble of achieving relevant ASIL D safety goals and sup-
The development of driver assistance systems and extents ports the implementation of at least Level 3 functions [2].
of automated driving (AD) requires the development of
fault-tolerant on-board power supplies in order to fulfill
functional safety requirements. The on-board power sup-
2 System Description
ply must be reliable to ensure supply to the safety relevant
functions and systems, e.g. electrical power steering or 2.1 Redundant Architectures
monitoring of the driving environment. Malfunctions of Redundant supply structures are common in aviation, en-
these safety relevant functions due to a supply blackout ergy supply systems and elsewhere. Generally, a differen-
must be avoided. The related safety goals are of utmost tiation must be made between active and passive redun-
importance and according to ISO 26262 they are rated up dancy. For example, a diesel generator is a passive redun-
to the highest level. To meet these safety goals, redundan- dancy, which guarantees an emergency supply that was
cy concepts are required. not active up to the point in time when the main power
The ISO 26262 provides different fault metrics for the supply fails. The design of a redundancy concept depends
evaluation of these safety goals [1]. The first part of sec- on the reliability, availability and safety goals. Further-
tion 3 of this paper recapulates these metrics. However, more, the choice of technologies is determined by the op-
this work focuses on the evaluation method using Markov erating location, the operating conditions and the scopes
models, so the second part of section 3 clarifies the prin- of maintenance.
cipals of this approach. This inductive evaluation method
is less common in application than the method of Fault
Tree Analysis (FTA) and Failure Mode, Effect and Diag-
nosis Coverage Analysis (FMEDA). Nevertheless, the
ISO 26262 mentions it as a quantitative analysis as well.
In section 4, we apply this approach using the example of
a safety mechanism. The safety function must detect
overcurrents and consequently interrupt them by trigger-
ing a safety switch. If this safety mechanism fails during a
short circuit, a dangerous undervoltage can occur. Hence, Figure 1: Reliability functions of one-item (singular) and
the proper supply voltage of the safety-relevant systems is active redundancies (1-out-of-2, 1-out-of-3).
no longer guaranteed.
To achieve a trade-off within the requirements, in a vehi-
The safety function is part of a power supply concept in-
cle, active parallel redundancies are to be expected. Re-
troduced in section 2. It provides a redundant power sup-
dundancy of parallel active systems can be generalized as
ply for safety-relevant electronic control units (ECUs) and
k-out-of-n redundancy. Birolini calculates the reliability
is built of two 12V lithium-ion battery storages with the

ISBN 978-3-8007-4877-8 59 © VDE VERLAG GMBH  Berlin  Offenbach


Authorized licensed use limited to: KIT Library. Downloaded on December 05,2023 at 22:11:54 UTC from IEEE Xplore. Restrictions apply.
Automotive meets Electronics  12. – 13.03.2019  Dortmund

HV/48 DC/
DC distribution,
12V A protection

distr., Diag Diag distr., ...


sfty., sfty.,
prot. S1 S4
prot.
S2 S3
Safety μC+ Safety μC+
fct 1 BMS BMS fct 1 red.
fct 2 fct 2 red.
B1 B2
... ...
ASIL compliant power net 1 ASIL compliant power net 2

Figure 2: Redundant and fault-tolerant 12V on-board power supply. The safety switches ensure fault isolation and bat-
tery protection. fct1/fct2 and fct1red/fct2red represent equivalent and redundant safety-relevant ECUs

functions of ideal items (nonrepairable, constant failure ized by its fault tolerance for AD applications, we focus
rates, identical and independent elements, ideal failure the explanations on only the functions for driving mode:
detection and switch) [3]. Even if the assumptions made x Normal operation mode, switches S1-S4 are
in a theoretical calculation are not attainable in a realistic closed/conducting, batteries connected and
system, the principal influence of active parallel redun- working at the same voltage level. fct1/fct2 and
dant elements on reliability is easily visible in Figure 1. fct1_red/fct2_red represent equivalent and re-
dundant safety-relevant ECUs. A DC/DC or al-
2.2 Fault-tolerant and Redundant 12V ternator supplies all loads and charge the batter-
Power Supply Concept Using Safety ies.
x Fail-operational mode with two battery storages,
Switches
a failure occurred outside of the ASIL compliant
The on-board power supply concept, in this work, pro- power net 1/2, switches S2 and S3 are non-
vides an active parallel 1-out-of-2 redundancy of the en- conducting, the two batteries supply the related
ergy and power supply and takes ISO 26262 as well as ASIL compliant power net 1/2.
legal regulations [4; 5] into account. Safety-relevant func- x Fail-operational mode with two battery storages,
tions can be implemented accordingly with 1-out-of-2 re- a failure occurred inside of one of the ASIL com-
dundant ECUs. Such automotive power supply topologies pliant power net 1/2, opening S1 or S4, respec-
have already been discussed in a wide variety of designs tively, isolating the faulty branch, the supply of
[6–9]. one set of the safety relevant ECUs is guaranteed
In comparison to other concepts, the one presented in by one battery storage (at least) or both battery
Figure 2 is characterized by a lean design using two equal storages.
battery storages protected and connected by four mosfet- x Fail-operational mode with one battery storage, a
based switches. The design of the system is based on the failure occurred inside of one of the ASIL com-
following premises: pliant power net 1/2, opening S1/S2 or S3/S4, re-
x Minimization of construction space, weight and spectively, protect the battery of the faulty ASIL
costs by redundant lithium-ion-iron-phosphate- compliant power net, the supply of one set of the
based battery storages at same operating voltage. safety relevant ECUs is guaranteed at least by
Thus no converter system (12V DC/DC) is re- the related battery storage.
quired. As a simplification, in the case of fail-operational, only
x Mosfet-based safety switches provide a com- the first moment of the fault occurrence is considered.
bined protection of the batteries and the electri- Depending on the system states and diagnosis infor-
cal system including the wiring harness. mation, the above descripted fail-operational modes can
x The system is suitable for BEV, HEV, PHEV be extended.
and ICE. For a more transparent representation, Figure 3 shows the
The dotted lines in Figure 2 denote system boundaries and structural analysis of the presented concept. The blocks
imply two independent battery storage systems. However, already represent top gates of subsystems. As will be
under the consideration of the above mentioned premises, shown in section 4, this is a helpful approach to reduce
maximum synergy effects and savings are achieved by the state space of the Markov model. Additionally, it in-
merging into an integrated system. creases the transparency during the analysis process. The
For the development of system functionalities the top ve- creator freely determines the definition of the blocks and
hicle operation modes driving, idle mode and charge the structure. It depends on the focus of the analysis. This
mode (vehicle with electrified powertrain) must be con- structural analysis is the first step of the analysis process
sidered. However, as the system is particularly character- carried out in section 4.

ISBN 978-3-8007-4877-8 60 © VDE VERLAG GMBH  Berlin  Offenbach


Authorized licensed use limited to: KIT Library. Downloaded on December 05,2023 at 22:11:54 UTC from IEEE Xplore. Restrictions apply.
Automotive meets Electronics  12. – 13.03.2019  Dortmund

Redundant and
fault-tolerant
Powernet 1 power supply Powernet 2

BMS+ Monitor. + Clamp BMS + Monitor. + Clamp


Batt. 1 Batt. 2
safety μC 1 HW Prtct. 1 30_1 safety μC 2 HW Prtct. 2 30_2

Switch 1 Switch 2 Switch 3 Switch 4


Sens. 1 Sens. 2 Sens. 3 Sens. 4
(Mosfets (Mosfets (Mosfets (Mosfets
volt./curr. volt./curr. volt./curr. volt./curr.
+ Driver) + Driver) + Driver) + Driver)

Figure 3: Structural analysis of the proposed power supply concept. The blocks represent top gates of subsystems.

cally FTA and FMEDA. Additionally, all failure rates


have to be determined. Table 1 summarizes possible tar-
3 Quantitative Evaluation Metrics get values for these fault metrics.
According to ISO26262
A hazard and risk analysis of functions for high- or fully 3.2 Theory of Markov
automated driving results in safety requirements up to Besides FTA and FMEDA, it is recommended to use an
ASIL D, the maximum rated safety level according to ISO additional quantitative Markov-Analysis. With Markov-
26262. Theses safety goals must also be achieved by the models, it is possible to model and probabilistically eval-
electrical power supply and the power net, since they en- uate states of subsystems and degraded system states.
sure the supply of safety-relevant ECUs. For the electrical Moreover, dynamic structural changes, repairs and se-
power supply, this means a high degree of reliability and quential multiple faults can be modeled. Below, the most
availability. ISO 26262 provides three fault metrics to important characteristics of homogenous and continuous-
quantify safety objectives with respect to the ASIL safety time Markov-models are stated. Norris provides a com-
goals [1]. prehensive and detailed description [10].
The basic requirement, using the Markov-model, is a ex-
3.1 SPFM, PMHF und LFM ponentially distributed stochastic process X(t). We as-
sume homogenous failure rates λ(t):
The Single Point Fault Metric (SPFM) incorporates sin-
gle-point faults (SF) and residual faults (RF) that lead di-
( ) = const. ∀ > 0
rectly to a violation of safety objectives. These faults are
not covered by a safety mechanism due to a missing, in-
A Markov-process is a stochastic process that is charac-
sufficient or finite diagnostic coverage. A high percentage
terized by its memoryless property. Future states depend
implies a robust system regarding these faults. The Latent
only upon the present state.
Fault Metric (LFM) describes the influence of latent mul-
In this paper, we use a continuous-time Markov-model
tiple-point faults that are not covered by a safety mecha-
with a finite state space S. The fundamental state infor-
nism or recognized by the driver. A low percentage im-
mation is contained in matrix Q:
plies that the proportion of latent faults is high and vice
versa.
=( ) , ∈
The Probabilistic Metric of random Hardware Faults
(PMHF) is an absolute measure for the evaluation of safe-
The Matrix Q has properties as follows:
ty goal violations due to random hardware faults. It con-
sists of two parts. One part is related to the SPFM and
o ≤0∀ ∈
contains the sum of single-point failure rates and residual
o ≥ 0 ∀ , ∈ mit ≠
failure rates. The second part is related to the LFM and
o ∑∈ =0∀ ∈
contains the sum of latent multiple-point fault failure
rates.
A first order differential equation represents the Markov-
model:
Table 1: Target values according to ISO 26262 [1]
̇( ) = ⋅ ( )
ASIL PMHF SPFM LFM Details
A - - - -
The solution to the differential equation can be obtained
B < 10-7h-1 ≥ 90% ≥ 60% recommended
with Laplace transformations. The state space transfer is
C < 10-7h-1 ≥ 97% ≥ 80% required
conducted with following transformation:
D < 10-8h-1 ≥ 99% ≥ 90% required
ℒ: ( ⋅ − )
In order to determine the necessary numerical values, in-
ductive and deductive analysis methods are applied, typi-

ISBN 978-3-8007-4877-8 61 © VDE VERLAG GMBH  Berlin  Offenbach


Authorized licensed use limited to: KIT Library. Downloaded on December 05,2023 at 22:11:54 UTC from IEEE Xplore. Restrictions apply.
Automotive meets Electronics  12. – 13.03.2019  Dortmund

By multiplying with the initial condition and following sidered as an example. A failure outside this partial power
the re-transformation, we obtain state probability P(t) at net can cause a dangerous overcurrent and, as a result, an
given time t. unacceptable undervoltage inside this partial power net.
The mosfet-based safety switch, Switch 2, prevents this
( ) = {[( ⋅ − ) ⋅ ]→ℒ } fault propagation. The related safety mechanism SM1
consists of two components; one component provides the
To obtain the reliability R(t) of a defined state class, we overcurrent detection and the second one ensures the
accumulate all state probabilities representing this state switch-off mechanism. The subcomponents in this exam-
class: ple originate from the structural analysis, as shown in
Figure 3, and form a subsystem. The mentioned safety
( )=∑ ( ) function OCP consists of a first (SM1), second (SM2) and
third (SM3) order safety mechanism. Each of them has its
The failure distribution is calculated from subtracting the own diagnostic mechanism and the related diagnostic
reliability from 1: coverage DC1, DC2 and DC3, respectively. SM3 and
DC3 are mentioned for completeness, they are not absent
( )=1− ( ) in the Markov model. Figure 4 shows the subcomponents
of the safety function OCP. Table 2 explains the safety
Hence, both parameters are complementary. The mean mechanisms and the diagnostic coverages.
time to failure is calculated via the integral of the reliabil-
ity:

MTTF = ( )d

4 Probabilistic Analysis Using


Markov-Models
In this section, we demonstrate the application of a Mar-
kov analysis. The results of the calculations are presented
graphically.
Markov models were used in literature to assess automo-
tive on-board power supplies, as follows. Abele presents a
comparison of different reliability analysis methods and a
comprehensive description of how to model systems us-
ing Markov chains. He develops and introduces a com- Figure 4: The safety function Overcurrent Protection
bined method for systematic fault identification and de- consists of different safety mechanisms (SM1-SM3).
duced modelling [11]. Dominguez-Garcia et al. also ex-
amine a dual battery power net architecture. They apply a Table 2: Description of the safety mechanisms (SM1-
so-called dependability rate adopted from the Federal SM3) and the diagnostic coverages (DC1-DC3).
Aviation Administration (FAA) regulations [9]. Münzing
et al. combine the probabilistic evaluation of the different SM1 detection of overcurrent and switch-off
system states with related results of a physical behavior DC1 diagnostic coverage detect overcurrent, the
model [12]. Cherfi et al. consider a safety-relevant system quality is determined by SW(μC) and
and analyze the variation of failure rates and diagnostic HW(sens., HWP) implementations
coverage related to the probability of failure of the system SM2 monitoring (SW) the State of Health of mosfets
[13]. and drivers, Escalation in case of impermissi-
The presented work is founded in the mentioned litera- ble conditions
ture. The developed and employed model allows for a full DC2 diagnostic coverage functional capability
symbolic calculation of the state probabilities. Thus, pa- mosfet devices, the quality is determined by
rameter variations and dependences can be studied effi- SW(μC) and HW(sens., HWP) implementa-
ciently. This work uses homogenous Markov models. The tions
failure rates are assumed to be time constant. They are SM3 μC watchdog, μC reset
estimated by using a similar well-trusted database and ex- DC3 determined by implementation
pert knowledge.
First, for the Markov analysis, a simplified FMEDA at
4.1 Case Study: Safety Mechanism for system level was performed. We assumed six states for
initial failure occurrence and considered the DCs and an
Overcurrent Protection (OCP)
accumulated failure rate for the occurrence of a short cir-
The safety function Overcurrent protection (OCP) of cuit. For step two and step three we correlated the initial
ASIL compliant power net I, as shown in Figure 2, is con- failures with themselves twice. The model consists of 39

ISBN 978-3-8007-4877-8 62 © VDE VERLAG GMBH  Berlin  Offenbach


Authorized licensed use limited to: KIT Library. Downloaded on December 05,2023 at 22:11:54 UTC from IEEE Xplore. Restrictions apply.
Automotive meets Electronics  12. – 13.03.2019  Dortmund

DC1 !
1.L_e.1

0.1.1 1-DC1

λ1 1.A.1
DC2
!
Z0 λ2 0.1.2 1.L_e.2

1-DC2
!
1.L_u.1
λ3
0.1.3 DC3 !
1.L_e.3

1-DC3 !
1.L_u.2

Figure 5: Markov graph of the safety function Overcurrent Protection (left)


and a more detailed representation of states 1-10 (right).

states after excluding inadmissible states, these states en-


compass correlations of the same failures or senseless 4.2 Influence of Failure Rates and Diagnos-
combinations regarding the technical background. Figure tic Coverage
5 shows a visual representation of the Markov graph. The
The safety function OCP is both a safety mechanism and
dashed portion of the left image is illustrated in more de-
an intended function of the described system. The behav-
tails one the right. This part represents the assumed initial
ior of SM1 and DC1 have a significant impact on the
failures. λ1 represents the accumulated failure rate for the
probability that the ASIL compliant power net I fails due
occurrence of a short circuit, λ2 represents the failure rate
to a relevant failure. The symbolic Markov model helps to
of SM1 and λ3 represents the failure rate of SM2. The
examine, evaluate und visualize the impact of failure rates
states 0.1.1, 0.1.2 and 0.1.3 serve as transition states and
and diagnostic coverages. Figure 7 indicates the impact
have no relevance in the following evaluation. The corre-
of λSM considering 50 FIT, 100 FIT and 500 FIT. The
sponding diagnostic coverage levels convert them into
shown reliability functions account the state probabilities
valid system states. DCx indicates a detected failure,
that lead to a system breakdown if OCP failed. For t=106
hence DCx-1 indicates the complementary case. Figure 6
hours the probability functions differ significantly. With
indicates the visual representation of the calculated relia-
λSM= 500 FIT the probability of ASIL compliant power
bility functions. The impact of the mentioned safety
net I failed already amounts to approximately 6.8%. For
mechanism on the accumulated state classes fault-free op-
λSM= 50 FIT it is only approximately 0.9%.
eration, detected/undetected latent fault, fail-operational
and system failed is shown. While the probability of fault-
free operation drops rapidly, the probabilities of fail-
operational and detected/undetected latent fault avoid a
system breakdown. Even after 106 hours, there is little
probability of system failed. Note that the calculations re-
fer to ASIL compliant power net I. Since ASIL compliant
power net II behaves very similar, the overall system is
robust against system breakdowns related to overcurrent
failures.
Figure 7: Failure probabilities as a function of different
failure rates λSM1.

Finally, the Markov model helps to observe the impact of


the diagnostic coverage levels. Figure 8 explains the im-
pact of DC1 and DC2 on the reliability, for t=106 hours,
of the state classes system failed (left) and fail-operational
(right). If the safety function OCP fails, due to an insuffi-
cient DC1 in the case of an overcurrent event, there is a
high probability that the system will also fail. DC1 has a
Figure 6: Reliability functions of important state classes.
greater impact than DC2. For DC1=0.6 and DC2=0.6 the
Latent fault represents detected and undetected faults.
probability that the system fails already amounts to ap-
proximately 4%. Figure 8 (left) clarifies this context by
the influence of DC1.

ISBN 978-3-8007-4877-8 63 © VDE VERLAG GMBH  Berlin  Offenbach


Authorized licensed use limited to: KIT Library. Downloaded on December 05,2023 at 22:11:54 UTC from IEEE Xplore. Restrictions apply.
Automotive meets Electronics  12. – 13.03.2019  Dortmund

Figure 8: Influence of the diagnostic coverages DC1 and DC2 on the reliability functions of state classes breakdown
(left) and fail-operational (right), for t=106 h.

For a controlled fail-operational mode, failures have to be [2] SAE International, 2014. J3016: Taxonomy and Definitions
known and detected. Therefore, a higher DCx leads to an for Terms Related to On-Road Motor Vehicle Automated Driv-
increased probability to be in state fail-operational in the ing Systems.
case of relevant failures. Figure 8 (right) illustrates this [3] BIROLONI, Alessandro, 2017. Reliability engineering: Theo-
context. ry and practice.8th Edition. Berlin: Springer. ISBN 978-3-662-
54209-5 (eBook)
[4] Economic Commission for Europe of the United Nations
5 Conclusion (UN/ECE), 2006. Regulation No 79: Uniform provisions con-
In the first part of this work, we presented a new fault- cerning the approval of vehicles with regard to steering
tolerant and redundant on-board power supply concept for equipment.
the supply of safety-relevant ECUs. The system provides [5] Economic Commission for Europe of the United Nations
a simple design and with the chosen technologies, it coun- (UN/ECE), 2015. Regulation No 13-H: Uniform provisions
teracts the trend towards increasing construction space, concerning the approval of passenger cars with regard to
weight and system costs in future redundant power supply braking.
systems. A quantitative and probabilistic evaluation of a [6] SCHUMI, S., Graf, A., 2018. Energy and Supply Concepts for
subsystem was performed in the second part. The proba- Automated Driving. In: 9th GMM-Symposium AmE. Dort-
bilistic reliability evaluation of electronic systems using mund, 07.-08.03.2018. Berlin: VDE Verlag. ISBN 978-3-
Markov models offers an extension to the quantitative 8007-4524-1
fault metrics of ISO 26262. Markov models help to evalu- [7] AUGIER, J-L. et al., 2016. Efficient, Safe and Reliable
ate the reliability of degraded system states as well as ac- Powernet for AD. In: EEHE 2016. Wiesloch, 08.-09.06.2016.
cumulated state classes. We analyzed the safety function, Renningen: Expert Verlag. ISBN 978-3-8169-3346-5
Overcurrent Protection, to present the advantages of the [8] HORN, M. et al., 2015. Development of safe and reiiable
Markovian approach. For this purpose, we developed and Powernets for new vehicle functions - using the example Start-
calculated a symbolic Markov model. Using this model, Stop-Coasting. In: EEHE 2015. Bad boll, 22.-23.04.2015.
we examined the impact of failures rates and diagnostic Renningen: Expert Verlag.
coverage levels on the probability of different state clas- [9] DOMINGUEZ-GARCIA, AD. et al., 2006. Reliability evalua-
ses. The quality of the evaluation depends on the chosen tion of the power supply of an electrical power net for safety-
failure rates, assumptions and abstractions. However, this relevant applications. In: Reliability Engineering & System
also applies to other quantitative methods. The Markov Safety., 91(5):505–514. ISSN 0951-8320
approach is an elaborate method. Nevertheless, much of [10] NORRIS, JR, 1997. Markov Chains. Cambridge: Cambridge
the necessary information is already available from the University Press. doi:10.1017/CBO97805118 10633
methods for evaluating the quantitative fault metrics ac- [11] ABELE, M., 2008. Modellierung und Bewertung hochzuver-
cording to ISO 26262. The presented method assists com- lässiger Energiebordnetz-Architekturen für sicherheitsrele-
prehensive studies on the failure behavior of electronic vante Verbraucher in Kraftfahrzeugen. Kassel: kassel univer-
systems. sity press. ISBN 978-3-89958-388-5
[12] MÜNZING, P. et al., 2017. Sichere Energieversorgung für
autonome Fahrzeuge. Bewertung funktionaler Sicherheit für
6 Literature automatisierte Fahrfunktionen. In: QZ Qualität und Zuverläs-
[1] International Organization for Standardization, 2018. ISO sigkeit. 2017/12:28–32.
26262-5:2018 Road vehicles — Functional safety — Part 5: [13] CHERFI, A. et al., 2014. Modeling automotive safety mecha-
Product development at the hardware level. nisms. A Markovian approach. In: Reliability Engineering &
System Safety, 130:42–49. ISSN 0951-8320

ISBN 978-3-8007-4877-8 64 © VDE VERLAG GMBH  Berlin  Offenbach


Authorized licensed use limited to: KIT Library. Downloaded on December 05,2023 at 22:11:54 UTC from IEEE Xplore. Restrictions apply.

You might also like