Aad4 PDF

CS384
Design of Operating Systems

Term Paper
Embedded Real-Time
System Considerations
Eric Durant
Electrical Engineering and Computer Science Department
Milwaukee School of Engineering
Milwaukee, WI, USA
<durante@msoe.edu>
Thursday 23 April 1998

Table of Contents
Abstract................................................................................................................................................... 1
Introduction ............................................................................................................................................. 1
General Kernel Considerations............................................................................................................... 2
Formal Kernel Development............................................................................................................. 3
In-Kernel versus User Space Implementation.................................................................................. 4
Security Considerations: Kernel Hypervisors.......................................................................................... 5
Multiprocessing in Embedded Operating Systems................................................................................. 8
Fault Tolerance and Distributed Redundancy ........................................................................................ 9
Guarantees, Schedulability and Probability.......................................................................................... 11
Three Approaches to Time Assurance........................................................................................... 12
Types of Scheduling....................................................................................................................... 13
An Application of a Modular Kernel....................................................................................................... 13
Conclusion ............................................................................................................................................ 14
References............................................................................................................................................ 16
CS384 –Design of Operating Systems
Term Paper
Embedded Real-Time System Considerations
Abstract
This paper presents an overview of recent IEEE conference papers relevant to
the design of embedded, real-time systems. Focus is placed on kernels for such
systems. Formal development, tradeoffs of user versus kernel mode, security and
auditing, multiprocessing synchronization, fault tolerance and distributed
redundancy, guarantees, schedulability and probability, and implementation
issues of a real-time messaging and locking protocol are discussed.
Introduction
Special demands are placed on embedded systems. Often, these systems must
operate reliably for extended periods in harsh environments. In addition, their
nature often dictates that their task be completed in a specific time window in
contrast to general desktop computing tasks which may be completed somewhat
ahead of or behind schedule with no adverse effects.1 When such hard real-time
guarantees of completion times are needed, formal validation and deterministic
scheduling can be useful. Other embedded systems may require a high level of
confidence in (but not an absolute guarantee of) completion time.
1
For example, for general word processing tasks, assuming the normal behavior of the system is “fast enough”by
the user’s criteria, the effects on his productivity will be minimal if the program operates at double-speed or half-
speed. However, if a respirator operates at either half or double the optimal speed, the effects could be
catastrophic.
In other systems, the focus may be on optimizing fault tolerance and ensuring a
graceful degradation of performance despite severe damage to the system.
Tightly coupled multiprocessing systems have special concurrency and
serialization needs. Tradeoffs are made between running modules in kernel
space and in user space. In shared networking environments, special measures
may be necessary to ensure the immediate and continued security of the system.
This paper discusses all these topics in several broad sections. First, general
kernel considerations, including formal development and user versus kernel
space, are presented. Next, an approach to security is covered. Then,
multiprocessor issues are investigated. After that, fault tolerance with an
emphasis on critical military applications is considered. Then, confidence as an
alternative to guarantees and deterministic scheduling is discussed. Finally, an
architecture for a standard, real-time messaging and locking protocol is
examined.
General Kernel Considerations
Two of the fundamental questions in any kernel design are “Will it work?” and
“What is in it?” Fowler shows how the former may be answered for real-time
systems using formal methods [Fow97]. The later is highly debated in real-time
systems practice and research, but some rules of thumb can be derived for
analyzing the tradeoffs of user and kernel mode for many applications.
Applications using each approach are reviewed for their merits [Sie97, Map97].
2
Formal Kernel Development
Formal proof that an operating system is valid, that is, that it meets its
specifications in all possible situations, is seldom done in practice. Perhaps this is
due to the perceived complexity of a formal proof or the perceived stability of the
kernel and embedded system based on empirical data. While both are valid
reasons to not use formal proof in many situations, Fowler has shown that the
complexity can be managed [Fow97].
Using formal methods, Fowler developed a real-time operating system (RTOS)
kernel with fixed priority preemptive multitasking. PVS (Prototype Verification
System, an automated theorem prover) and RTL (Real-Time Logic, a language
for expressing temporal behavior of a system) were the tools used to express the
requirements of the system and to refine and expand the formal description.
Fowler took an iterative approach, starting with the most abstract concepts and
refining and partitioning them in several stages. This allowed the validation of the
system to be broken into many small, manageable phases. In addition, it allowed
implementation details to be delayed until late in the development process,
making it possible to reuse much of the work across different target hardware and
to still allow platform-specific efficiency optimizations.
Fowler showed that with proper expertise, formal kernel development is feasible
for many applications. In addition, it requires a highly modular and abstract design
to be carefully specified early in the development process. In other areas of
system design, these attributes have been shown to make systems more robust,
extensible and maintainable. These desirable traits should extend to real-time
kernels as well, giving benefits beyond the assurance of formally proven stability.
3
In-Kernel versus User Space Implementation
The profusion of terms indicating the depth and breadth of the functionality of a
kernel is impressive: microkernel, exokernel, modular kernel, nanokernel,
monolithic kernel, etc. To complicate matters, these terms often have different
meanings for RTOS designers, for the marketing department, and for the
customer. The fundamental issue, though, is determining what should be part of
the kernel and what should not be.
The classic reason for including functionality in the kernel is efficiency, while the
classic reason for putting functionality in a user process is protection. Both
efficiency and protection are desirable, if not necessary, for cost-effective,
malleable designs. So, in one sense, there can be no final answer to where a
particular function should be implemented. Certainly, some classes of security
operations must be performed in kernel mode [Mit97] and arbitrary user code on
a time-shared system must not be run in kernel mode, but most operating system
(OS) functions are somewhere between these extremes.
A brief overview of two recent research projects, one implemented in kernel mode
and the other in user mode, will yield some general guidelines for making this
decision.
An ongoing research project at the University of Colorado - Boulder is to enhance
an interface to RTOS functionality providing quality of service scheduling [Sie97].
A demonstration module was implemented in kernel mode for two reasons. First,
kernel mode eliminated the mode-switching overhead. Second and more
important, additional complexity would have resulted from the scheduling module
being in the same scheduling space as the processes it was scheduling.
4
The classic reasoning that kernel mode activities can be implemented more
efficiently has lead to transport protocols often being implemented in kernel mode.
Mapp justifies developing a high-speed transport protocol in user space by noting
that such an implementation is easy to test and refine [Map97]. There is also the
obvious benefit of system stability during testing in user mode. An early,
unoptimized, high-level language, user mode implementation of the A1 transport
protocol for ATM performed about 20% poorer than a highly optimized Linux
kernel implementation of TCP. Mapp summarized the results…
“These results clearly show that a user-space protocol is able to achieve significant
performance and should help to debunk the idea that transport protocols must be run in the
kernel in order to achieve reasonable performance!”
So, we have an example of a kernel mode scheduler and of a user-mode
transport layer. What conclusions can be drawn from these examples? It seems
that the classical concerns deserve careful consideration, but are often
overstated. That is, mode switches have a cost, but this cost is often small
compared to other factors. In addition, using kernel mode means that exceptional
care must be taken to not compromise the system, and more fundamentally, that
the component must be trusted.2
Security Considerations: Kernel Hypervisors
As embedded real-time systems are increasingly networked, perhaps connected
to the Internet or other public networks, system security becomes important. One
layer of security is validation that the embedded system is accepting commands
and providing data (and perhaps encrypting them) only to trusted clients.
5
Another layer of security is the internal system security. At this layer, it is desirable
to allow different processes access to different sets of system calls and
resources. Various responses are possible when a process attempts an operation
for which it does not have privileges. A return value may indicate the privilege
failure, transferring responsibility for handling the failure to the process.
Alternatively, the process may be terminated or restarted. Another option is to
inform a trusted node outside the local system of the privilege failure. This would
aid in diagnosis of problems in the larger system, especially if the system is
designed that such a failure is not expected in normal operation.
Mitchem discusses an approach to security at the system call level that could be
extended to implement the final option mentioned above [Mit97]. This method
uses “kernel hypervisors” which are loadable kernel modules that intercept
system calls and can be used in concert. They may implement a number of
checks, secondary actions (such as logging, replication and remote notification of
attempted security violations) and filters both before and after the requested
action is performed. Kernel hypervisors have been implemented on a Linux
kernel to provide application-specific security.
A kernel hypervisor is a metaphorical extension of a classic hypervisor.
Traditionally, a hypervisor is a software layer immediately above the hardware
that virtualizes the hardware interface for the next software layer. Thus, a
hypervisor is commonly used to implement a virtual machine. By extension,
kernel hypervisors run on top of the kernel and provide a virtual system call
2
However, depending on the system architecture, one might argue that a trusted system scheduler running in user
mode could have the same detrimental effects as a kernel mode scheduler, for example.
6
interface. This virtualization layer enables the checks, secondary actions and
filters mentioned above.
Kernel hypervisors offer a number of advantages over other approaches (such as
traditional hypervisors, modified shared libraries and wrappers using OS debug
features) which have been used to virtualize or wrap the system call interface:
n Since the hypervisor is a kernel module, it cannot be bypassed by library
calls.
n The kernel itself does not need to be modified.
n Kernel hypervisors can be stacked to provide varying and overlapping
degrees of security, secondary actions and filters.
n The concept can be extended to any operating system supporting kernel
loadable modules, including many Unix variants and Windows NT.
n No updates to existing code modules or shared libraries are required since
the added functionality is provided at the system call level.
Thus, kernel hypervisors are an attractive solution when checking security or
performing other actions in response to system calls are desired. The kernel
hypervisor architecture allows fine- or coarse-grained action (that is, process
selectivity) and the stacking of hypervisor modules to achieve the desired result
by composition of existing modules.
7
Multiprocessing in Embedded Operating Systems
In multiprogrammed, multiprocessing operating systems, synchronization and
mutual exclusion have the potential to incur much more overhead than in
uniprocessor systems. Two common solutions beyond basic spin locks are often
considered for this problem [Tak97]. These are preemption-safe locking and wait-
free synchronization. However, both approaches have significant drawbacks.
When preemption-safe locking is employed, a process holding a lock will not be
preempted and a preempted process will not be allocated a lock. Although
Takada did not use this terminology, the system for waiting for a lock is essentially
a condition variable system. Waits on a lock are queued, and cancelled upon
preemption. When a process runs again, it must reinitiate the lock sequence,
entering the lock queue at the rear.3 The cost of scheduling locks in this manner
is at least O(n), where n is the number of processes contending for the lock.
Wait-free synchronization is a scheduling approach in which operations requiring
mutual exclusion are planned with a priori knowledge of their durations. However,
this approach does not scale to complex data structures. Takada notes that his
group’s research uncovered no operating systems implementing this method,
although some implemented a variant known as lock-free synchronization.
However, this variant does not scale to multiprocessing systems as it cannot
guarantee order of completion.
The authors combined aspects of both wait-free and locking-based
synchronization to overcome many of their individual limitations. When a process
3
In a separate work, not reviewed for this paper, the authors suggested a variant of this approach in which a
preempted process resumes its place in the queue after reentering the run state.
8
must wait on a lock, the operation it plans to execute on the lock is stored in a
shared queue for that lock. If process “A” is preempted (not spinning) when its
turn to acquire the lock arrives, another process, “B,” spinning on the same lock
will execute the critical section stored in the queue by “A” on behalf of “A.” Thus,
waits on the lock are serviced in a FIFO sequence.
When all tasks waiting on the lock are preempted, execution is suspended. The
first process waiting to enter the run state will process the pending wait queue.
Thus, progress will be made more quickly than if each process had to enter the
running state as the lock became available. Also, except for the overhead of
handling the enhanced lock data structures, the cost of preempting the processes
has been eliminated.
Fault Tolerance and Distributed Redundancy
Fault tolerant, redundant systems are used when failure would be catastrophic, or
when system replacement or repair cost would be prohibitively expensive.
Common examples in which fault tolerant, redundant systems are used are
enterprise database systems and military weaponry applications. Although fail-
over systems may be adequate when critical components are sufficiently
duplicated and failed components can be quickly replaced to ensure the
continued fault-tolerance of the system, a more flexible, autonomous and robust
system is needed for many applications.
With a focus on military applications subject to battle damage and long periods of
operation under harsh conditions, Kim discusses various fault-tolerant modes and
how a system may transition through these modes in different operating phases
9
[Kim97]. The basic principle is that tradeoffs can be made among timeliness,
efficiency and consistency. This is accomplished by allocating resources
depending on mode of operation and allowing for both catastrophic component
failure and intermittent failure.
A supervisor node oversees all functions, but even this node is failsafed. Certain
nodes are configured to monitor the consistency of the supervisor’s behavior and
automatically elect a new supervisor when its behavior becomes inconsistent.
The supervisor assigns one or more nodes to each task group and sets recovery
policies.
The most basic recovery policy is distributed recovery block, or DRB. Critical
tasks are handled by both a primary node and a shadow node4. The shadow
node uses a different method for performing the task than the primary node. Each
node performs an acceptance test (AT) on its result. If the primary’s AT fails, it
notifies the shadow, which takes over the primary role, assuming its AT was
successful. The shadow will also take over if it does not receive a state update
from the primary by a certain deadline. State data is stored in a shared log cache
so that either node can restore a known state and then synchronize with the other
node when its AT fails.
A more advanced recovery policy, adaptable distributed recovery block (ADRB),
introduces two additional fault-tolerant modes requiring fewer resources, but
taking more time to recover and providing less consistent performance during
recovery. The first additional mode is sequential backward recovery. In this mode,
a single node performs a task. If its AT fails, the supervisor node assigns a
4
A “node,”in this context, may either be a hardware device or software module.
10
standby node from a pool to take over its function. The standby node then loads
its state from the shared log cache. If a complete node crash is detected, a similar
replacement occurs, but the time required is subject to greater variation as a
crash may be detected at various execution stages.
The other ADRB recovery policy is sequential forward recovery mode. In this
mode, an AT failure is seen as a “glitch” and exception-handling logic
compensates for the failure, which is assumed to be intermittent. The exception
handler is usually less efficient than assigning a standby node, but can be used in
more situations. When a node crash occurs in sequential forward recovery mode,
the same action is taken as in sequential backward recovery node; the node is
replaced with a standby node.
Systems designed with DRB and ADRB require high levels of modularization and
redundancy to be effective. Although the military is currently the primary market
for such applications, private sector applications in which robustness and graceful
degradation under the harshest of operating circumstances5 are prime areas for
adopting adaptive distributed recovery schemes.
Guarantees, Schedulability and Probability
If hard real-time is to be guaranteed, tasks must be schedulable a priori. In many
applications, high confidence is sufficient and significant throughput gains can be
made compared to the limitations imposed by hard real-time.
Siewert’s discussion of classic time assurance and his confidence-based
approach are covered below. Then Li’s taxonomy of scheduling systems is
5
such as medical devices and commercial aircraft
11
presented as a contrast to Siewert and as a complement to scheduling methods
used in common non-real-time operating systems.
Three Approaches to Time Assurance
Siewert discusses three basic approaches to service time assurance systems.
These are best-effort, hard real-time and application-specific embedded control
[Sie97]. Best-effort systems work when it is known that there are sufficient
resources to meet all possible requests. Their performance is undefined when the
system becomes overloaded. Hard real-time systems require foreknowledge of
the worst-case execution time and implement scheduling based upon this. They
are desirable when an absolute guarantee is required, but can let a large
proportion of the system resources be unused. Application-specific controllers are
tightly coupled and optimized systems that guarantee performance, but are not
easily scaled.
All of the above solutions are in place in various production systems. However, for
the vast majority of applications in which high quality (but not absolute) service is
required6, alternate approaches should be considered since none of the above
are optimal. Siewert’s team’s work is on a confidence-based in-kernel scheduler.7
Off-line performance data and real-time profiling are used to negotiate a high
confidence of quality of service with user processes. Thus, deadline failures are
minimized (but allowed) while increasing system throughput and utility.
6
for example, multimedia transmission, virtual reality, and even classic hard real-time domains such as telemetry
and digital control
7
Their system also is capable of providing hard-real time based on worst-case analysis.
12
Types of Scheduling
Li, referencing Stankovic, provided a concise summary of real-time scheduling
techniques from a different perspective than Siewert [Liy97]. These
methodologies parallel the taxonomy on non-real-time schedulers. The basic
types include:
n Cooperative - non-preemptive
n Static priority-driven - The highest priority process in the ready state is always
allowed to cause other processes to be preempted. The priorities do not
change during execution.
n Dynamic priority-driven - The scheduling of the system hinges on the priority
adjustment policy. As in static priority-driven scheduling, the highest priority
process in the ready state is allowed to run.
An Application of a Modular Kernel
Many RTOSs do not provide full real-time services to user applications [Maa97].
In this case, an RTOS with a modular, extensible kernel may be needed so that
real-time features can be exploited. Maaref discuses an architecture for
implementing a standard real-time messaging protocol on such an RTOS,
Microware’s OS-9.
The protocol discussed is MMS (Manufacturing Message Specification), an ISO
application layer protocol providing for synchronization and communication of
industrial devices. MMS specifies semaphores, journaling, a file system and event
management, among other services. The services specified are similar to those
13
of many OSs, so the author argues that an MMS service is better implemented at
the OS level than as a user application.
Since OS-9 is a highly modular operating system, Maaref found its kernel mode
device modules to be an ideal avenue for implementing MMS. Prior
implementation architectures for MMS on other operating systems could not
provide hard real-time.
So, depending upon the type of real-time services a particular RTOS exposes to
user applications, it may be necessary to implement certain functions in kernel
mode. When this is necessary, the architecture provided for extending the OS is
crucial.
Conclusion
A large amount of current research material on embedded system and real-time
issues indicates that this is a rapidly growing field. Driven by decreasing hardware
costs and an established knowledge base, embedded systems continue to
expand into low end and distributed systems. Research into security and fault-
tolerance, especially, will be key areas in the next few years. Most consumer
systems will require fault tolerance for reasons quite different from the military
applications for which these systems were first developed. That is, as the cost to
service consumer devices continues to rise, the cost of the devices themselves
continues to fall. Therefore, the cost-effectiveness of robust and redundant fault-
tolerance mechanisms is inevitable for most all applications as the market
matures.
14
On the other hand, life- and safety-critical embedded systems are growing more
complex and having more demanded of them. Hence, they will benefit from
formal validation and enhanced multiprocessor synchronization models.
Clearly, there is much overlap between the high (life- and safety-critical domains
such as military, aviation and medical equipment) and low (consumer domains
such as personal communications and entertainment) ends of the real-time
embedded systems arena.
Historically, such cross-pollination in technical areas has spurred rapid advances
as high-end techniques became cost-effective for the consumer domain. Based
on the breadth of current research and continuing growth in both the high and low
ends, continued rapid innovation seems probable in real-time embedded system
technologies.
15
References
n [Fow97] Fowler, S. and Wellings A., “Formal Development of a Real-Time
Kernel”, Proc. IEEE 18th Real-Time Systems Symposium, Dec. 1997, San
Francisco, pp. 220-229.
n [Kim97] Kim, K.H., et. al., “The Adaptable Distributed Recovery Block
Scheme and a Modular Implementation Model”, Proc. IEEE Pacific Rim
International Symposium on Fault Tolerant Systems, Dec. 1997, Taipei, pp.
131-138.
n [Liy97] Li, Y., Potkonjak, M. and Wolf, W., “Real-Time Operating Systems for
Embedded Computing”, Proc. IEEE International Conference on Computer
Design: VLSI in Computers and Processors, Oct. 1997, Austin, pp. 388-392.
n [Maa97] Maaref, B., “MMS Implementation Based on a Real-Time Operating
System Kernel”, Proc. IEEE International Symposium on Industrial
Electronics (ISIE), Part 1 (of 3), July 1997, Guimaraes, pp. 29-34.
n [Map97] Mapp, G., Pope, S. and Hopper, A., “The Design and
Implementation of a High-Speed User-Space Transport Protocol”, Proc. IEEE
Global Telecommunications Mini-Conference, Nov. 1997, Phoenix, pp. 1958-
1962.
n [Mit97] Mitchem, T., Lu, R. and O’Brien, R., “Using Kernel Hypervisors to
Secure Applications”, Proc. IEEE 13th Annual Computer Security Applications
Conference (ACSAC), Dec. 1997, San Diego, pp. 175-181.
16
n [Sie97] Siewert, S., Nutt, G. and Humphrey, M., “A Real-Time Execution
Performance Agent Interface to Parametrically Controlled In-Kernel
Pipelines”, Proc. IEEE 3rd Real-Time Technology and Applications
Symposium, June 1997, Montreal, pp. 172-177.
n [Tak97] Takada, H. and Sakamura, K., “A Novel Approach to
Multiprogrammed Multiprocessor Synchronization for Real-Time Kernels”,
Proc. IEEE 18th Real-Time Systems Symposium, Dec. 1997, San Francisco,
pp. 134-143.
17

Aad4 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aad4 PDF

Uploaded by

Copyright:

Available Formats

CS384

Design of Operating Systems

Thursday 23 April 1998

General Kernel Considerations............................................................................................................... 2

Formal Kernel Development............................................................................................................. 3

In-Kernel versus User Space Implementation.................................................................................. 4

Security Considerations: Kernel Hypervisors.......................................................................................... 5

Multiprocessing in Embedded Operating Systems................................................................................. 8

Fault Tolerance and Distributed Redundancy ........................................................................................ 9

Guarantees, Schedulability and Probability.......................................................................................... 11

Three Approaches to Time Assurance........................................................................................... 12

An Application of a Modular Kernel....................................................................................................... 13

This paper presents an overview of recent IEEE conference papers relevant to

auditing, multiprocessing synchronization, fault tolerance and distributed

redundancy, guarantees, schedulability and probability, and implementation

issues of a real-time messaging and locking protocol are discussed.

operate reliably for extended periods in harsh environments. In addition, their

contrast to general desktop computing tasks which may be completed somewhat

guarantees of completion times are needed, formal validation and deterministic

confidence in (but not an absolute guarantee of) completion time.

graceful degradation of performance despite severe damage to the system.

Tightly coupled multiprocessing systems have special concurrency and

serialization needs. Tradeoffs are made between running modules in kernel

space and in user space. In shared networking environments, special measures

kernel considerations, including formal development and user versus kernel

space, are presented. Next, an approach to security is covered. Then,

multiprocessor issues are investigated. After that, fault tolerance with an

emphasis on critical military applications is considered. Then, confidence as an

alternative to guarantees and deterministic scheduling is discussed. Finally, an

architecture for a standard, real-time messaging and locking protocol is

General Kernel Considerations

specifications in all possible situations, is seldom done in practice. Perhaps this is

complexity can be managed [Fow97].

Using formal methods, Fowler developed a real-time operating system (RTOS)

kernel with fixed priority preemptive multitasking. PVS (Prototype Verification

System, an automated theorem prover) and RTL (Real-Time Logic, a language

system to be broken into many small, manageable phases. In addition, it allowed

implementation details to be delayed until late in the development process,

to still allow platform-specific efficiency optimizations.

to be carefully specified early in the development process. In other areas of

extensible and maintainable. These desirable traits should extend to real-time

kernel is impressive: microkernel, exokernel, modular kernel, nanokernel,

customer. The fundamental issue, though, is determining what should be part of

the kernel and what should not be.

classic reason for putting functionality in a user process is protection. Both

efficiency and protection are desirable, if not necessary, for cost-effective,

particular function should be implemented. Certainly, some classes of security

(OS) functions are somewhere between these extremes.

An ongoing research project at the University of Colorado - Boulder is to enhance

an interface to RTOS functionality providing quality of service scheduling [Sie97].

kernel mode eliminated the mode-switching overhead. Second and more

being in the same scheduling space as the processes it was scheduling.

Mapp justifies developing a high-speed transport protocol in user space by noting

obvious benefit of system stability during testing in user mode. An early,

unoptimized, high-level language, user mode implementation of the A1 transport

kernel implementation of TCP. Mapp summarized the results…

So, we have an example of a kernel mode scheduler and of a user-mode

the component must be trusted.2

Security Considerations: Kernel Hypervisors

As embedded real-time systems are increasingly networked, perhaps connected

layer of security is validation that the embedded system is accepting commands

to allow different processes access to different sets of system calls and

resources. Various responses are possible when a process attempts an operation