400 Commonwealth Drive, Warrendale, PA 15096-0001 U.S.A.

Tel: (724) 776-4841 Fax: (724) 776-5760
SAE TECHNICAL
PAPER SERIES
2000-01-1052
Delphi Secured Microcontroller Architecture
Terry L. Fruehling
Delphi Delco Electronics Systems
Reprinted From: Design and Technologies for Automotive Safety-Critical Systems
(SP–1507)
SAE 2000 World Congress
Detroit, Michigan
March 6-9, 2000
The appearance of this ISSN code at the bottom of this page indicates SAE’s consent that copies of the
paper may be made for personal or internal use of specific clients. This consent is given on the condition,
however, that the copier pay a $7.00 per article copy fee through the Copyright Clearance Center, Inc.
Operations Center, 222 Rosewood Drive, Danvers, MA 01923 for copying beyond that permitted by Sec-
tions 107 or 108 of the U.S. Copyright Law. This consent does not extend to other kinds of copying such as
copying for general distribution, for advertising or promotional purposes, for creating new collective works,
or for resale.
SAE routinely stocks printed papers for a period of three years following date of publication. Direct your
orders to SAE Customer Sales and Satisfaction Department.
Quantity reprint rates can be obtained from the Customer Sales and Satisfaction Department.
To request permission to reprint a technical paper or permission to use copyrighted SAE publications in
other works, contact the SAE Publications Group.
No part of this publication may be reproduced in any form, in an electronic retrieval system or otherwise, without the prior written
permission of the publisher.
ISSN 0148-7191
Copyright © 2000 Society of Automotive Engineers, Inc.
Positions and opinions advanced in this paper are those of the author(s) and not necessarily those of SAE. The author is solely
responsible for the content of the paper. A process is available by which discussions will be printed with the paper if it is published in
SAE Transactions. For permission to publish this paper in full or in part, contact the SAE Publications Group.
Persons wishing to submit papers to be considered for presentation or publication through SAE should send the manuscript or a 300
word abstract of a proposed manuscript to: Secretary, Engineering Meetings Board, SAE.
Printed in USA
All SAE papers, standards, and selected
books are abstracted and indexed in the
Global Mobility Database
1
2000-01-1052
Delphi Secured Microcontroller Architecture
Terry L. Fruehling
Delphi Delco Electronics Systems
Copyright © 2000 Society of Automotive Engineers, Inc.
ABSTRACT
As electronics take on ever-increasing roles in
automotive systems, greater scrutiny will be placed on
those electronics that are employed in control systems.
X-By-Wire systems, that is, steer- and/or brake-by-wire
systems will control chassis functions without the need
for mechanical backup. These systems will have
distributed fault-tolerant and fail-safe architectures and
may require new standards in communication protocols
between nodes (nodes can be considered as
communication relay points). At the nodes, the "host"
application Electronic Controller Unit (ECU) will play a
pivotal role in assessing its own viability. The
microcontroller architecture proposed in this paper
focuses on ensuring thorough detection of hardware
faults in the Central Processing Unit (CPU) and related
circuits, thus providing a generic fail-silent building block
for embedded systems. Embedded controllers that
implement the Delphi Secured Microcontroller
Architecture will provide high deterministic fault coverage
with relatively low complexity.
INTRODUCTION
Many techniques to validate the node or host ECU are
presently employed, ranging from software intensive
schemes such as diverse (redundant) algorithms with
time redundancy, to hardware intensive schemes such as
Asymmetrical Microcontrollers, or completely redundant
multiple microcontroller architectures.
An alternative to these schemes is the Dual Central
Processing Unit (DCPU). This strategy has proven
successful for Delphi and has reduced hardware and
software design complexity. Additional payoffs include,
reduced software validation requirements, increased
system reliability and decreased EMI/EMC at the ECU.
In the future the Delphi Dual CPU architecture will include
a new data surveillance module called the Data Stream
Monitor (DSM). The function of this module will be to
cover faults in the data streams processed by the Dual
CPU, keep fault detection latency to a minimum and
maintain a high level of independence between the
application control algorithm and the fail-silent
implementation.
Delphi has been a leader in implementing secured data
processing techniques in automotive embedded
microcontrollers. The dual CPU made its debut in
production ABS programs in 1996. As this paper is
written, the system has a proven track record with over
10 million units fielded.
BACKGROUND - THE MCU/CPU SYSTEM
The problem of fault detection in an embedded
microcontroller will be discussed in three broad
categories: CPU, memory and MCU peripherals.
To facilitate discussion and gain insight into how the
Delphi Secured Microcontroller Architecture evolved, the
microcontroller will be defined as follows: A
microcontroller Unit (MCU) is comprised of a central
processing unit (CPU) and associated peripheral
devices. The peripheral devices may be general or
customized to the controller application. These can
include communication devices such as serial peripheral
interfaces, as well as timers, auxiliary power supplies, A/
D converters and other devices, built on the same
integrated circuit. The core of the MCU (CPU core) is the
CPU together with memory it immediately acts on, such
as RAM, ROM/FLASH, EEPROM, and the
communication bus that links these elements.
An MCU dedicated to the control of one vehicle
subsystem, such as anti-lock brakes (ABS), is considered
to be embedded in that subsystem. Further, when the
MCU is part of an application Electronic Control Unit
(such as an ABS ECU) which contains interface circuits
supporting specialized I/O requirements, the combination
may be referred to as an embedded controller.
These distinctions are made to help organize the way the
ECU is partitioned and helps clarify the proper fault
detection methods for components in the subsystem,
including external sensors and electromechanical drive
elements or internal ECU integrated circuits. It is also
done to develop a layered fault detection model for the
vehicle system. The layered fault detection organization
facilitates a “bootstrap” sequencing that helps to solve
the problem of “who checks the checker”.
2
Overall, the method minimizes redundancy between
vehicle system software diagnostics and ECU system
built-in self-test.
The layered fault detection model is similar in spirit to
layered models found in network communications
systems. The bootstrap technique discussed in this
paper is continued throughout the ECU interface
subsystem circuits.
The correct operation of the CPU, memory and MCU's
peripherals (such as timer modules, A/D converters,
communication, and output driver modules, etc.) must be
established not only during the initialization phase
following power on, but also during repetitive execution of
the control program. Normally bootstrap test schemes
are only run at power-up. Delphi’s unique architecture
facilitates continuous fault detection using this method
without choking the throughput capacity of the CPU.
The generic modules that make up the Delphi Secured
Microcontroller are considered fail-silent because the
outputs are disabled in the event of a fault. However,
backup communication output signals will be produced to
flag the vehicle system when a fault has occurred.
Figure 1 shows a typical layered model used for fault
detection analysis in a simple ABS system.
Figure 1.
Supplemental software techniques are also used to
process critical redundant inputs/outputs (I/O) and the
specialized BIST circuits of the ICs. In order to limit the
scope of this paper, these topics will not be discussed in
detail. Further, the fault detection techniques of the
peripheral modules built onto the microcontroller,
although briefly summarized, cannot be adequately
covered in this paper. The focus will be limited to the task
of continuously validating the state of health of the CPU
Core system.
Systems that employ embedded MCUs typically include
self-tests to verify the proper operation of the CPU and
associated peripheral devices. Typically these tests
include illegal memory access decoding, illegal opcode
execution or a simple Watchdog/Computer Operating
Properly (COP) test.
These strategies alone do not provide complete fault
coverage of key MCU components such as the CPU.
While other, redundant fault-detection methods
frequently employed in automotive systems can detect
the presence of hazards caused by faults in these
components; it is much easier to guarantee safety when
coverage of the CPU is provably complete.
It is very difficult to achieve provably complete fault
detection in a device as complex as a CPU without
duplication and comparison.
Test methods that are implemented so that execution
occurs as the algorithm is running will be referred to as
"on-line" or "concurrent" testing. Further, "off-line" testing
will reference the condition when the device is placed in a
special mode in which the execution of the application
algorithm is inhibited.
Off-line testing is generally reserved for manufacturing
test or for special purpose, diagnostic test tools used by
the field technician.
Tens of thousands of test vectors are generated for
manufacturing tests to establish a 99% fault detection
level for complex microcontrollers. Designing routines to
test the ability of a CPU to execute various instructions by
using sample data and the instruction under-test that will
be used in the application is impractical. Even If a
separate "test ROM" [2] was included in the system to
either:
1. Generate a special set of inputs and monitor the
capability of the CPU and application algorithm or a
test algorithm to properly respond: or
2. Generate and inject test vectors derived from
manufacturing fault detection testing and then
evaluate the capability of the CPU to properly
process, and produce the correct resultant data at
circuit specific observation points.
Figure 2 illustrates the concept of a "test" or "stimulus"
ROM and it's integration into a fail-silent system.
Figure 2.
3
Implementing the first technique will encounter the
following technical hurtles. In a complex system a test
ROM will become inordinately large in order to
adequately guide the CPU through a limited number of
paths or "threads" of the application algorithm. The test
vectors used must be carefully selected and requires
intimate and detailed knowledge of the control algorithm
software.
Even if the "application systems" fail-silent test designer
could manage the task of ensuring that every module
was effectively tested, the end result would be of limited
utility when considering the range of parameters that can
be involved for any given software module. Thus the first
test ROM method would be contrived and limited in its
ability to simulate an actual operating environment.
If the second technique were employed, unless all of the
manufacturing test vectors were used or exhaustive
testing using pseudorandom patterns is performed, the
resulting coverage would be partial and the tests lengthy.
Any attempt made to identify only the used portion of the
MCU in order to target the subset with the proper vectors
(to reduce the overall vector quantity) would require
detailed scrutiny and modification every time the
algorithm changed so that the appropriate changes are
made to the test vector set. This approach requires
detailed knowledge of the MCU and can only be
accomplished with the active participation of the MCU
manufacturer. The technique, although useful for an
initial start-up verification would have implementation
difficulties for continuous validation of the system in a
dynamic run mode of operation.
Neither of the above techniques consider the concept of
monitoring a system based on execution "dwell time" in
any particular software module or application "run time
mode" condition.
Modifying a CPU to include built-in self-test, such as
parity to cover the instruction set look up table,
duplication or Total Self Check (TSC) circuit designs,
current drain

testing, level and threshold testing, etc., of
subcomponents of the CPU, may result in a significant
modification to a basic cell design [7].
It was Delphi's goal to develop a fail-silent method that
could be implemented on a variety of architectures from
different MCU silicon suppliers with a minimum of
interaction.
Although Delphi uses some of the above techniques,
they are employed in "stand-alone" modules, such as
wheel speed interfaces, steering drive controls or the
functional compare modules. These modules are Delphi
commissioned, and vehicle system specific. They are
connected to the MCU bus but do not modify the silicon
suppliers core modules.
For good reason, CPU designers are reluctant to modify
or make even minor changes to proven designs, since
experience is the key to confidence in a CPU
implementation. Duplication of the CPU in the Delphi
Secured Architecture preserves the knowledge and
reliable performance of the basic CPU core. This
approach was considered less invasive and would
require less testing and validation, than designs that
attempted to modify the CPU core with an assortment of
BIST techniques [7]. Further, this technique minimized
sharing with the CPU manufacturer the Delphi
responsibility for overall system safety.
The Dual CPU concept has facilitated implementations
on multiple manufacturers’ CPU cores and architectures.
At this printing Delphi presently has Dual CPU
implementations on nine different microcontrollers in
production.
Software techniques that involve time redundancy, such
as calculating the same parameter twice via different
(diverse) algorithms, also require that multiple variables
be used and assigned to different RAM variables and
internal CPU special function registers, i.e., time
redundancy requires hardware resource redundancy to
be effective. Because of the substantial amount of CPU
execution time needed for redundancy the CPU
processing capacity must be doubled to accomplish the
redundant calculations in a real time control application.
Because of the added complexity necessary for this
implementation of redundancy, the verification process is
commonly long, lengthy, costly and prone to human
errors and omissions. Software diagnostics should be
devoted to identifying improper behavior in the overall
system, not to testing microcontroller hardware.
SELF-TESTING APPROACHES
The following sections examine four broad categories of
self-testing concepts: Dual Microcontroller, Asymmetric
Microcontroller, Memory verification and the Dual CPU
implementation.
DUAL MICROCONTROLLER CONCEPTS – Having a
Logic function, or any device, test itself is a questionable
practice. In the Delphi system the CPU’s are duplicated
and hence test each other.
As noted, the process of requiring the CPU to perform it’s
own self-testing on all MCU supporting peripherals is
inefficient. This is especially true in applications having a
relatively large memory and with many complex
peripheral devices.
To date, the most direct way to solve this problem has
been to simply place two microcontrollers into the ECU
system. In such systems, each microcontroller is the
compliment of the other and each memory and peripheral
module is duplicated. Both devices execute the same
code in near lock step.
Figure 3 shows a mechanization of a dual
microcontroller. The illustration is provided to show the
quantity of parallel input signal/output signals required for
its implementation.
4
Figure 3.
Dual microcontrollers are effective because they check
the operation of independent microcontrollers against
each other. Although the system tests are performed
with varied threads through the algorithm, and the
technique accounts for variable dwell in any portion of the
application with the random-like data that occurs in the
actual application environment, the following must be
considered:
1. Data faults or hardware faults that may occur are
used to calculate system parameters. In a dual
microcontroller system these parameters will be
processed and may be filtered before they are
compared by the second microcontroller. Since the
direct data is not compared, as in the Dual CPU
system, the comparisons are considered “second
order” and the source and nature of the fault could be
masked, prone to communication delay or missed
altogether.
2. Many parameters will have to be checked at different
rates. Also, tolerance ranges that are used to check
parameters between the two microcontrollers will be
looser than in a direct data comparison, “first time
fail", Delphi architecture type system.
3. The number of times that mis-compare between the
two MCUs actually occurs, before a fault is actually
logged and responded to must be established.
Further, conditions will need to be established on
what conditions the fault counter will be restarted.
4. The fail-safe software is not independent from the
application algorithm. As adding parameters
modifies the application algorithm, fail-safe software
alteration must also be evaluated.
5. Some parameters are used to calculate other
dependent parameters. This could lead the system
architect to make value judgements on which
parameters are determined critical or not. This
increases the subjectiveness of the system and the
true error detection capability and latency.
6. This technique is not an efficient form of resource
allocation [4,5,8]. Two identical, fully equipped,
microcontrollers doing the same task is costly.
Further, extensive communication software is also
used to synchronize the data exchange between the
two microcontrollers.
ASYMMETRIC MICROCONTROLLER ARCHITECTURES –
There are many hybrid schemes in this category. The
secondary processor size and speed requirements can
vary dramatically depending on the extensive nature and
the variety of the validation tasks it is assigned to perform
on the main processor. Much of the appeal to this
implementation lies in the ability to use standard "off the
shelf" components, and if optimized, can gain a cost
advantage over dual microcontroller architectures.
The secondary processor can be used to do an intensive
check of a few portions of the algorithm, or to employ
“check-point and audit” schemes of many modules within
the control algorithm (or some of both).
Control flow analysis of the main controller is also a
popular use of the secondary processor. Control flow
designs can also have a great diversity in the complexity
of the final implementation. The basic concept is to
validate that the main processor executes code from one
module to the next in a logical manner. By sending
Software Module Identifiers (SMIDs) to the secondary
processor the overall program flow of the main processor
can be monitored.
The module SMIDs give the secondary processor the
capability to determine variations in loop or software
module execution time. The ability of the main processor
to transfer to, and return from subroutine properly can
also be ascertained. These schemes can give an
indication of the “state of health” of the main processor,
but the actual deterministic fault coverage is deceptive
and difficult to test and measure (the author has personal
experience with placing these schemes in production).
The asymmetrical approach also finds use in conjunction
with the software technique that employs diverse
programming with time redundancy. In this
implementation the main processor is large and fast
enough to accommodate two algorithms which process
critical input and output variables in two separate ways.
An attempt is made to create as much of a “dual channel”
on the same main processor as possible. This is
accomplished by using as many different resources of
the microprocessor hardware as possible. Data to be
processed will be held in different RAM locations. If
possible, the processor will use different internal registers
for data manipulation. The system may also use
complimentary RAM variables to perform and check
RAM parameter calculations.
5
A mechanization showing the implementation of two
redundant (diverse) programs on the Main
microcontroller is presented in Figure 4. The illustration
also depicts the secondary processor that tests the
capability Main CPU to execute the control program with
periodic contrived input data.
Depending on the system, the data acted on by the two
algorithms may be exactly the same, in which case the
results should closely match. Some schemes may allow
the data to be slightly different, in which case the
compared results would have to be bounded to create
reasonable limits on this data differential. These
requirements of software and hardware will increase the
complexities of the final design.
Finally, both algorithms have to eventually be processed
by the same logic unit of the microcontroller. If the logic
unit is corrupt then both diverse algorithms could
calculate the same corrupt result. To circumvent this, a
second smaller processor is used to send data to be
processed by the main controller.
Figure 4.
The main microcontroller processes this data and sends
the result back to the second processor for comparison.
This is an attempt to test a part of the main processor
that cannot be duplicated.
In a runtime application, the control algorithm can run
many times without executing certain software module
functions. The special test injected by the secondary
processor also serves to ensure that all modules can be
executed and tested on a scheduled basis.
A similarity can be drawn between the second processor
in this implementation and the test ROM technique
mentioned earlier. Hence this process suffers from the
same flaws when contrived and limited data is used to
test a microcontroller.
MEMORY VERIFICATION
Single Bit Parity – A common technique for verifying the
operation of a MCU memory peripheral is to use a check
sum, where a process arithmetically sums the bits of a
block of memory. The check sum is then compared to a
reference value. A miscompare represents a memory
failure.
One disadvantage of check sums is that if two opposing
bits of the memory are flipped to the opposite state then
the checksum will continue to be valid. The failure to
correctly detect such data faults is known as aliasing.
This fault type is rare and requires that two faults occur
exactly or almost simultaneously.
Sum checks are slow, since it is usually performed by the
CPU during time slices allocated for the purpose. Due to
increasingly large memory arrays and heavy demands on
CPU resources, the validation may not occur within the
time responses of the system.
Another technique for verifying the operation of MCU
memory peripherals is to use parity. Single bit parity is
faster than the checksum method described above, and
synchronizes the memory validation with its use in the
execution of the application algorithm. It will also
however require the memory array design to be modified
and it will require decoding by special hardware. This
modification although not difficult, must be agreed upon
with the silicon supplier.
The Delphi system takes advantage of parity circuits on
small memory arrays such as RAM. At the present time
this drives the development of a custom module. With
large memory arrays parity can become a significant cost
burden.
The CPU and specific software must process the
consequences of a parity fault. Since single bit parity is
insensitive to double bit flips, multiple bit schemes and
implementations have been developed to avoid aliasing.
Delphi Memory Fail-Silent implementation – The Delphi
System exploits a concept of minimal redesign of the
silicon provider's core element cells such as the CPU and
memory. The surveillance module described in this text
attaches to any CPU system bus (Harvard or
VonNeuman with cache or superscaler implementations)
and performs the same validation functions on any
architecture. This gives Delphi the advantage of
development of equivalent fail-silent systems with a
variety of manufacturers.
Another advantage of the Delphi System described in
this text is the capability to automatically capture and
store the location of the fault and what the CPU was
executing at the time of the fault.
Error Detect and Correct Modules – To circumvent the
requirement of adding special hardware to the CPU or
software to the application, multiple bit party schemes
and standalone Error Detect and Correct (EDC)
processor modules have been developed. The problem
of modifying the memory array or adding another
memory array to include the extra parity bits still exists.
In a typical application, six bits are added to a 16-bit word
or four bits on eight [10]. Consequently and depending
6
on the implementation, up to 50% of the memory along
with the extra internal module interconnections may be
devoted to the problem of capturing flipped bits. Silicon
providers have typically incorporated these circuits
because of the problems existing with the reliability of
FLASH.
Using syndrome testing and Hamming Codes [9,10],
EDC can detect and correct single bit errors, detect all
two-bit errors, and detect some triple bit errors. Although
EDC is adequate to detect flip bit errors, the module is
intrusive and must be placed in series between the
memory module and the CPU. All data is channeled
through this device for processing before it is sent to the
CPU, potentially adding an execution delay to the system
on every memory read.
While this may be acceptable to some silicon suppliers
that need this module to cover for inherent processing
problems, EDC approaches must be modified to meet
Delphi's future fail-silent goals.
Extended Delphi Memory Fail-Silent Goals – The Delphi
System goals are to recapture these memory resources
for the application program and to provide the system
with more automatic fault response independent of the
state of the CPU. The Delphi System also includes
special registers to automatically capture diagnostic
information, and continuous safeguards ensuring the
correct operation of the surveillance module itself.
Although the differences may appear subtle, EDC
modules require configuration and driver software. When
a fault occurs, a flag or interrupt is generated and the
CPU must respond. A fault is detected in the Delphi
System and responded to by the surveillance module
itself, requiring no input from the CPU.
If the CPU is healthy it can then read the flags and
process the interrupts. In certain conditions the CPU will
be prohibited from clearing the fault condition until a reset
occurs and a complete diagnostic routine is run.
DEVELOPMENT OF THE DUAL CPU – Providing a
second Microcontroller operating in parallel with the first
is not software and hardware resource efficient [6,8].
The human error involved in software verification or
ensuring all critical parameters are included and checked
was deemed unacceptable.
This led Delphi to develop a dual CPU system
incorporated into a single microcontroller unit (MCU). In
such a system each CPU receives the same data stream
from a common memory.
The purpose of the secondary CPU is to provide a clock
cycle by clock cycle check of the primary CPU in a
functional comparison module. If the data from the
memory is corrupt, it will be discovered at a later step in
the validation process.
Figure 5 is included to show the simplification in the
system mechanization for an ABS controller when a Dual
CPU is employed.
Figure 5.
To ensure that the CPUs are healthy, both CPUs must
respond to the same data in the same way. The Dual
CPU system employs continuous cross-functional testing
of the two CPUs as multiple paths are taken through the
application algorithm. It should be noted, that if the
system dwells in one software module or mode
disproportionately to others, the testing by the Dual CPU
is similarly proportionate.
Further, the random-like parameter data inherent in real
world applications is “operated on” by the algorithm and
any inappropriate interaction with the current instruction
data stream is detected. This technique has proved
effective for all environmental conditions such as
temperature, voltage or electromagnetic interference
(EMI).
In essence the actual algorithm and data execution
become the test vectors used to ensure “critical
functionality” of the system. This is a corollary to
common test methods that are designed to detect “critical
faults”. The system tests only those hardware resources
the software application algorithm utilizes, and does not
spend any time testing unused portions of the CPU
system.
Figure 6 illustrates an expanded view of the Dual CPU.
Although both CPUs receive the same inputs from the
MCU peripherals, the second CPU's only output is to the
Functional Compare Module.
If the algorithm is modified to include a previously unused
set of available instructions (such as a possible fuzzy
logic instruction set), or new operational modes are
added (such as Adaptive Braking or Vehicle Yaw
Control), modification of the self-check system is not
required.
7
Figure 6.
The Dual CPU fail-silent system architecture is inherently
independent of the application algorithm. Also, the
primary design intent of a Dual CPU system is to respond
to a fault on its first occurrence.
The Delphi Secured Architecture - Dual CPU Data
Comparisons versus Dual Micro Controller – Both the
Dual CPU and the Dual Microcontroller architectures
compare similar data values to determine the existence
of a fault. There is however, a significant difference in the
quality of the values compared.
For example, in the Dual CPU the data compared will be
the actual input that is collected from a pair of similar or
redundant set of peripheral interface modules.
Consider the present application where two similar or
redundant peripheral modules, such as two different input
capture timer modules (i.e. - the MCU Core Timer Vs. the
timers in the Delphi application specific Wheel Speed
Input Module) with different time bases or two different A/
D converters capture the same event.
The data will only deviate by the quantization, linearity, or
offset error differences between the two inputs. Once
these limits are set the Dual CPU can detect and respond
to a first time fail.
The quality of the data is improved because it is local and
directly processed by the same CPU. This enables
confidence in the first fault error detection.
Conversely, a Dual Microcontroller algorithmically
processes or filters the data by software, and also has to
account for communication delay before the data is
compared. This means that the Dual CPU provides
immediate first order detection of data discrepancies
whereas the dual MCU suffers from second order less
accurate error detection.
The Dual CPU Summary of Key Points – Thus the
architectural design simplifications achieved by the Dual
CPU and the support modules that are either locally
duplicated or have their own uniquely design BIST
circuits, gain several advantages over any of the
competing alternatives.
• Increasing hardware reliability due to the reduced
component and interconnect count.
• Decreasing that part of EMI susceptibility and
radiated emissions that is related to the smaller and
reduced complexity of the board layout.
• Improved diagnostics because:
• The fault is detected at it’s source, without
processing
• The system is a first time fail
• Increasing the software reliability by:
• Elimination of communications and data
synchronization software
• Significant reduction in parametric comparisons
and inherent software decision logic.
• Reduced complexity of software validation
DELPHI SECURED ARCHITECTURE DESIGN
PHILOSOPHY – The development of the Delphi Secured
Microcontroller Architecture is based on the following
axioms:
• The Microcontroller’s single CPU is insufficient to
adequately determine its own functional integrity.
• Once the CPU’s function integrity is verified, the CPU
is then sufficient to verify the functionality of the
microcontroller’s peripherals, provided:
• Appropriate diagnostics are incorporated in the
peripheral modules.
• Redundant input and output signals exist for
plausibility checks of critical signals.
• Appropriate feedback signals exist for plausibility
checks of critical signals.
All peripheral modules will be verified at startup and as
the control algorithm executes.
Hardware implementations are preferred for fail-silent
architectures for the following reasons:
• Promotes fail-silent independence from the
application software.
• Hardware implementations put the diagnostic at a
point where the fault can be detected the earliest.
• Hardware Fail-silent schemes can be implemented to
minimize or eliminate competition for limited CPU
and memory resources.
• More complete testing and validation can be
achieved using manufacturing processes.
The Delphi system is a bootstrap process that is
dependent on verifying the CPU first and then the MCU
peripheral modules. The process is run during the
initialization phase and during repetitive execution of the
control program. It is therefore advantageous to the
execution speed of this method to incorporate peripheral
BIST circuits that are independent of, and require
minimal interaction with the CPU.
8
The Delphi architecture accomplishes this in the following
manner:
• The secondary processor / functional compare
module runs concurrent with the main control
processor, and consumes no system resources until
a fault is detected. There is a software module that
will perform initial CPU configuration (such as
clearing internal registers and testing the functional
compare module) and handle faults diagnostics,
however this code does not execute concurrently with
the application software.
• The DSM runs concurrently and autonomously in
background mode and in the start up initialization
mode. There is configuration and test software that
checks the DSM and handles faults but this code
does not execute concurrently with the application
software. When the foreground DSM operates for
dynamic verification, there is a slight impact on CPU
resources.
The Delphi system takes advantage of continuously
varying execution threads through the application code
and the random-like data that occurs in actual use, to
detect faults. A benefit of the Delphi Architecture is that
the real time CPU and software execution testing is
automatically proportionate to the time the system dwells
in any mode.
In actual use, the control program can run many times
without going through every possible code path. When a
particular thread through the algorithm inevitably does
execute, the Delphi Architecture provides the following
safeguards:
• The Dual CPU serves as a runtime functional check
on the processing of code, data and output controls
as it executes.
• The Data Stream Monitor (DSM) ensures that the
code and data signatures, presented to the Dual
CPU at runtime, match the code and data signatures
that were generated when the code was compiled.
The objective of the Delphi architecture regarding MCU
peripherals is to:
• Ensure correct initial and continuous MCU system
peripheral configuration.
• Incorporate sufficient diagnostics (HW or SW) to
“adequately cover” both the CPU and MCU
peripherals to ensure proper “critical functionality” of
the system.
• Adequate coverage means the CPU and support
peripherals will be diagnosed for failure, within
the time response of the vehicle to ensure
occupant safety.
• Critical functionality means that the Delphi
architecture focuses on the MCU resources that
are used (at runtime and when the system was
developed). Those resources that are not used,
or were never used when the system was
developed, are not tested.
DETAILED COMPONENTS OF THE DELPHI SECURED
MICROCONTROLLER – The following is a summary of
the system that supports the mission of the Dual CPU in
a mission critical embedded controller.
• One main control CPU –controls MCU system
peripherals.
• One secondary CPU – lockstep operation with Main
CPU, receives all the same inputs as the main CPU
but its only output is to the Functional Compare
Module.
• One Functional Compare Module – Compare the
Address, Data, and Control outputs of the Main and
Secondary CPU. If a fault occurs the ECU system
outputs are disabled. In the Delphi system the CPU
stays active to aid in diagnostics.
• One Data Stream Monitor (DSM). – A memory
mapped module designed for autonomous and
concurrent background testing of memory. In
addition this module is capable of signaturing data
streams while the CPU is on the bus.
• Parity on RAM
• Duplication of selected peripheral modules
• Secondary Clock oscillator and error detection
circuits.
Note: The mechanization that follows assumes the usual
complement of self-test functions included with modern
microcontrollers. Typically these functions include illegal
memory access decoding, illegal opcode execution and
simple Watchdog/Computer Operating Properly (COP)
circuits.
Figure 7 was included to show the enhanced diagnostic
capabilities of the Dual CPU, that can detect faults and
latch the status of the MCU at the time of the fault event.
Figure 7.
9
General System Objectives –
• To ensure that the microcontroller is operating as
intended. "Operating as intended" is defined as the
ability of the CPU and associated support
peripherals of the MCU, to correctly process input
data and output controls as required by the
application algorithm.
• To ensure data execution coherency. In this context,
coherency is defined as stable data, or the absence
of flipped bits, stuck bits, transient and noise, or any
intermittent inconsistencies in the data stream.
• To increase the deterministic fault coverage of the
Delphi fail-silent system architecture. [1,4,5,7]
• To detect and respond to faults within the time
response of the system.
• To minimize the fail-silent implementation
dependency on the application algorithm. The fail-
silent system is intended to be independent of the
application control algorithm. The health of the MCU
system is verified before the application algorithm is
started. The MCU system is then verified
concurrently as the algorithm executes.
• To reduce sensitivity and ensure integrity of the
complete MCU system during all forms of
environmental stress (Electromagnetic fields, RFI
transient noise, thermal cycling/shock, etc.).
• Increase system reliability by decreasing component
count, interconnections and simplifying fail-silent
software.
Development of the Data stream Monitor – In the Dual
CPU concept, successful testing of peripheral modules
by the main CPU is predicated on its own correct state of
health (i.e. the ability of the CPU to execute the algorithm
as intended), and the “Built in Self Test” (BIST) circuits
incorporated into the MCU peripheral modules.
The job of the secondary CPU/Functional Compare
Module is to guarantee the correct state of health of the
main CPU. Then, as a secondary step, the Main CPU
methodically tests as subordinate peripherals by
exercising or polling their unique BIST circuits or by
comparing data from redundant modules.
The Delphi Layered Model for the Bootstrap Fail-silent
Method – This sequential scheme which first validates
the CPU and then validates the MCU peripheral modules
in a prescribed order can be considered as a “bootstrap”
validation system.
Figure 8 shows a layered model in a simplified steering
assist system. It is the same technique as used for ABS
systems (Figure 1).
Figure 8.
The order and priority that MCU peripheral modules or
ECU subsystem circuits / ICs are validated is dependent
on its hierarchical location within the layered model for
the system.
Because of the sequential nature of the bootstrap
method and since this scheme is run at the initialization
phase and during repetitive execution of the control
program, the speed at which the CPU can detect faults in
the MCU support peripherals is essential.
It is advantageous to the execution speed of this method
to incorporate peripheral BIST circuits that are
independent of and require minimal interaction with the
CPU.
The Data Stream Monitor Mission – The Data Stream
Monitor (DSM) was devised to comply with the above
goals. It is a stand-alone memory mapped module and
requires no redesign of the CPU or memory peripheral
modules to incorporate it onto the MCU system bus.
Figure 9 shows the subsystem of the DSM that is
responsible to validate the memory concurrently as the
application executes by using CPU idle-bus-cycles or by
stealing a cycle if needed. All memory blocks are
automatically clocked into the system. The system is
independent of the state of the health of the Dual CPU
system. If a fault occurs all internal registers are latched
for enhanced diagnostics.
10
Figure 9.
The module is an adaptation a Linear Feedback Shift
Register (LFSR) designed to accept and accumulate
parallel inputs from the data bus. This implementation is
commonly referred to as a Parallel Signature Analyzer
(PSA). When implemented properly [1,3,8] for the
application, the PSA is capable of accomplishing a form
of data compression on extremely long data streams.
The result of the data compression, referred to as the
“signature”, is held in a register where comparison to a
reference value can be made for fault determination.
In one mode the DSM can take advantage of, or steal,
idle bus cycles from the CPU. This is referred to as the
background mode because the CPU is not driving the
bus. During these cycles the DSM has the capability of
autonomously downloading the contents of memory onto
the system data bus. Each word of memory can be
accumulated in the PSA in one clock cycle enabling high-
speed signaturing of memory.
Unlike EDC processors that are inserted between the
memory module and the CPU, the DSM is a bus
“listening” device and is therefore non-intrusive and
easier to implement.
As a result of the polynomial divisions that generate the
final signature, the probability of aliasing is virtually
eliminated [1,3,8]. In the Autonomous mode the DSM
can verify memory at startup and concurrently as the
algorithm executes, independent of the CPU or the
CPU’s “state of health”.
The DSM in this mode represents a complete hardware
implementation, and software support is not required.
When a fault does occur, ECU output drivers (relays,
solenoids etc.) are automatically disabled via a fault pin
connected to the DSM fault logic (Figure 10).
The Data Stream Monitor – Foreground and Background
Mode Applications – The DSM device will process and
detect faults in any data stream as long as it is
deterministic. There are examples of code and data
streams, which presently meet this criterion (this is in
addition to the data stream derived from memory). One
example is the hardware configuration software routines
that are executed on each ignition cycle of the ECU/MCU
system.
To accomplish this task, the DSM is composed of two
PSA circuit subsystems. One is dedicated to the
background mode as described and the other is
dedicated to performing that signature operation when
the CPU is driving the bus (foreground mode). The two
circuit systems are joined by a common Mode Control
Module (Figure 10).
The device as described, can ensure that the hardware
configuration modules are processed by the CPU the
same way each time they are executed. This is
equivalent to a Dual Microcontroller system that verifies
that both micros have initialized the same way in each
and every ignition.
Common Mode Bus Errors – Since a Dual CPU system
operates from a common memory, it is susceptible to
common mode bus errors (from either bus or data
transients). Further, the Dual CPU system depends on
the fact that the two CPUs are in lock step.
Determining the health of the Main CPU is predicated on
the condition that both the main CPU and the secondary
CPU react identically to the same information or data
(independent of the quality).
Even though the DSM monitors only the data bus it does
offer protection against this condition. If the address or
control bus is corrupted the result will manifest itself on
the data bus with a corrupted signature.
SUMMARY
This article describes the evolution of a self testing
architecture. It examined the need for self test of the CPU
functions as well as the memory and the various current
implementations. It has also examined the ability of the
Dual CPU / DSM to achieve these goals.
11
Figure 10.
CONCLUSION
The Delphi Secured Microcontroller Architecture as
presented is intended for use in stand-alone fail-silent
systems. In the presence of a detected fault, the system
output drivers will be disabled automatically; fault
information will be latched and stored; however the CPU
will stay active to enable online diagnostics. Current
production applications default to a baseline mechanical
backup system.
The Delphi architecture is based on the premise that the
closer a fault can be detected to its source the faster and
more accurate the ultimate response will be. The
concurrent coverage offered by the Dual CPU and DSM
along with integrated redundant interface modules will
provide the capability of deterministic fault coverage of
the MCU / ECU system.
At present, X-By-Wire systems are being proposed with
multiple “host” (any application ECU is a host)
redundancy at a communication node, or complex
distributed redundancy, integrated with a master
controller to manage the system.
The ability of the ECU to accurately determine its own
“state of health” utilizing a Delphi Architecture, will
simplify system implementation, reduce communication
complexity, improve fault response and diagnostic latency
in present Fault Tolerant or Fail Silent, X-By-Wire
systems.
ACKNOWLEDGMENTS
The author would like to thank my colleagues for their
enduring patience and for their gracious technical input
and editorial support along with their encouragement in
the preparation of this manuscript.
John Waidner - Chassis System Start Center,
Sr. Systems Engineer.
Troy Helm - Chassis Systems, Software Engineer
Rob A. Perisho Jr. - Advanced Chassis Systems, Sr.
Project Engineer
Charles Duncan - Chassis SystemsCompetency Leader
James Spall - Chassis Systems Start Center Team
Leader
Brian T. Murray Ph.D. - Chassis Systems Research Ctr.
12
REFERENCES
1. R. A. Frowerk, “Signature analysis: a new digital field
service method“ Hewlett-Packard Journal, pp. 2-8
May 1977
2. H. J. Nadig, “Signature analysis – Concepts,
Examples, and Guidelines” Hewlett-Packard
Journal, pp. 15-21 May 1977.
3. S.W. Golomb, “Shift-Register Sequences,” Holden-
Day, Inc., San Francisco, 1967
4. J. Sosnowski, Concurrent Error Detection Using
Signature Monitors. Proc of Fault Tolerant
Computing Systems, Methods, Applications. 4th
International GI/ITG/GMA Conference, Sept 1989, pp
343 - 355
5. K. Wilken, J.P.Shen, “Continuous Signature
Monitoring: Low – Cost Concurrent Detection of
Processor Control Errors,” IEEE Transactions on
Computer-Aided Design, Vol. 9, pp. 629-641 June
1990.
6. Intel Corporation, Embedded Pentium
R
Processor
Family Developer’s Manual, “Error Detection,”
Chapter 22, pp. 393 – 399
7. E. Bohl, T. Lindenkreuz, and R. Stephan, “The Fail-
Stop Controller AE11,” Proc. Internationa Test
Conference, IEEE Computer society Press, Los
Alamitos, Calif., 1997, pp. 567-577
8. Bardel, W. H. McAnney, J.Savir, “Built-In Test for
VLSI: Pseudorandom Techniques,” IBM Corp., John
Wiley & Sons, 1987
9. J. Wakerly , “Error Detecting Codes, Self-Checking
Circuits and Applications,” Elsevier North-Holland,
Inc. 1978 –section 2.1.6 error correction (syndrome
testing)
10. Zvi Kohavi, “Switching and Finite automa Therory,”
McGraw-Hill Inc. , 1978, section 1.3, pp 14 – 21.
ADDITIONAL SOURCES
1. J. Sosnowski, Evaluation of transient hazards in
microprocessor controllers,” in Proc. of 18
th
IEEE
FTCS, 1986, pp. 364 - 369
2. P.K. Lala, “Digital Circuit Testing and Testability,”
Academic Press Inc., 1997.
3. D. A. Anderson, G. Metze, “Design of Totally Self –
Checking Circuits for m-Out-of-n Codes,” IEEE
Transactions On Computers, Vol C-22, NO. 3, March
1973
CONTACT
Terry L. FruehlingChassis Systems Start Center Sr.
Systems Engineer 765-451-5431
terry.l.fruehling@delphiauto.com