You are on page 1of 11

J Sign Process Syst (2010) 59:33–43

DOI 10.1007/s11265-008-0272-9

SIGMA System: A Multi-OS Environment


for Embedded Systems
Wataru Kanda · Yu Murata · Tatsuo Nakajima

Received: 10 March 2008 / Revised: 5 August 2008 / Accepted: 2 September 2008 / Published online: 26 September 2008
© 2008 Springer Science + Business Media, LLC. Manufactured in The United States

Abstract Embedded systems are becoming increas- GUIs and multimedia support in addition to the tra-
ingly sophisticated and there exists a wide variety of ditional realtime processing capability. In addition, the
requirements such as traditional realtime requirements, number of applications that are connected to networks
multimedia support, etc. It is hard to satisfy all of the and provide services is increasing. To satisfy these re-
requirements by a single OS. Would they be satisfied, quirement, using a general purpose OS, particularly
the system would become complex and this would cause Linux, which has a wealth of network applications,
new problems. A multi OS environment is an efficient middleware, device drivers and software development
approach to deal with these problems and to satisfy tools, as a platform for embedded systems is starting to
complex requirements while keeping the system sim- gather attention.
ple. We propose a multi OS environment named the However, most of the general purpose OSes are
SIGMA system targeted especially at multiprocessor designed and optimized for the common case. Their
architectures. On the SIGMA system, guest OSes are design is optimized for the common case and not for the
corresponded one-to-one with cores. As a result, op- rare case to increasing the system throughput. For re-
posing to existing multi OS environment using virtu- altime processing requirements in embedded systems,
alization techniques, the system does not degrade the worst case execution time should be take in to consider-
performance of the guest OSes. In addition, the guest ation even when it is a rare case. General purpose OSes
OS running on the SIGMA system requires almost no are not always designed to preemptable even during
modification to its source code. the execution of lower priority tasks. As mentioned
above, there is a conflict between realtime processing
Keywords Operating systems · Multi-OS requirements and high functionality requirements, so it
environment · Symmetric multiprocessors · is hard for a single OS to satisfy both requirements si-
Embedded systems multaneously, as such systems tend to become complex
and unstable.
In order to deal with these problems, a multi OS
1 Introduction environment which runs a realtime OS and a gen-
eral purpose OS on a single machine has been sug-
Recently embedded systems, especially mobile phones gested. For example wombat [1]. For building multi
and consumer electronics products are becoming in- OS environment on a single machine, virtualization
creasingly sophisticated. These systems require rich technologies are popular in server and desktop field,
for example Xen [2]. However existing virtualization
technologies have a tradeoff between guest OS mod-
ification costs and the performance of the system. To
keep the performance overhead small, the guest OS
W. Kanda (B) · Y. Murata · T. Nakajima
Waseda University, Tokyo, Japan needed to optimized for the vartualization layer and the
e-mail: kanda@dcl.info.waseda.ac.jp code modification cost of guest OS increases. On the
34 J Sign Process Syst (2010) 59:33–43

other hand, to keep the amount of guest OS code modi- 2.1 Full-virtualization
fications small, the virtualization layer must emulate all
hardware instructions and its emulation cost increases Full-virtualization is achieved by complete emulation of
the performance overhead. Emulation degrades the all instructions and hardware. The most significant ad-
realtime response which is often crucial for embedded vantage of full-virtualization is that almost all OSes can
systems. Thus, existing virtualization technologies that be run on its VM without requiring any modification to
are designed for server and desktop environments are them. Since this approach emulates the hardware com-
not suitable for embedded systems. pletely, the reliability of the guest OS is not degraded
To cope with these problems, a system that enables by the virtualization. On the other hand, emulation of
multi OS environment with a low performance over- the processor and hardware decrease the performance
head and a minimal amount of guest OS modifications of the guest OS and make the VMM massive. The
is needed. In this paper, we propose a system which examples of VMMs developed for full-virtualization
achieves a multi OS environment on a single machine are VMware [5–7] and Virtual PC.
without any specific hardware except a symmetric mul-
tiprocessor (SMP) that is generally found in any per- 2.2 Para-virtualization
sonal computer. The name of the system is SIGMA
system. The SIGMA system assigns one OS to one Para-virtualization is achieved by modifying the guest
of the processors. This approach achieves multi OS OS for a specific VMM. The performance of para-
environment with a minimum engineering cost and only virtualization is superior to that of full-virtualization
a little performance degradation. We ran a benchmark as the OS is optimized for the VMM. However, such
software on its OS to evaluate the performance of modifications extinguish some advantages achieved by
SIGMA system. full-virtualization. For instance, it is impossible to run
The rest of this paper is organized as follows. The a para-virtualized on different VMMs because the OS
next section describes three virtualization techniques is modified for only a single VMM. Examples of para-
as a related work which achieve multi OS environment virtualization hypervisors and its guest OSes are Xen
on a single machine. Section 3 describes requirements and XenoLinux [2], the L4 microkernel and L4Linux
for multi OS environments for embedded systems. In [8]. L4Linux is a para-virtualized Linux which is ported
Section 4 overview of the SIGMA system and detail of to run on a L4 microkernel. L4Linux uses the L4 mi-
its implementation are given. Section 5 shows evalua- crokernel as a hypervisor and is modified to run on
tion of the performance and the guest OS modification it. L4Linux is executed concurrently with other servers
costs. Section 6 discusses the aptitude of the SIGMA in the unprivileged mode. Wombat [1] is also a para-
system and related works against the requirements virtualized Linux which is ported to run on L4/Iguana
described in Section 3. Finally, Section 6 summarizes system. L4/Iguana consists of NICTA::L4-embedded
our research. [9] and Iguana [10]. NICTA::L4-embedded is a L4
microkernel modified for embedded systems to support
realtime processing. Iguana is a collection of basic soft-
ware that provides OS services such as memory pro-
2 Related Work tection mechanisms and device driver framework for
embedded systems. NICTA::L4-embedded, Wombat
Several virtualization technologies exist. The key el- and Iguana have been developed at National ICT
ement of virtualizaton techniques is the virtual ma- Australia (NICTA).
chine monitor (VMM) which duplicates a single real
hardware interface to multiple virtual interfaces. This 2.3 Pre-virtualization
interface is called a virtual machine (VM) and equips
all instructions(both privileged and unprivileged) of Pre-virtualization is achieved by modifying the assem-
processor and hardware resources(memory, I/O de- bler codes of the guest OS [11]. The modification
vices,etc...) [3]. Basically, a program running on a consists of a multi-phase process called afterburning.
VM should behave as it was running on true hard- Since the afterburning is processed automatically, the
ware except for some differences caused by the avail- engineering cost is relatively small. Pre-virtualization
ability of system resources and timing dependencies has the both benefits of full-virtualization and para-
[4]. Virtualization technologies are generally catego- virtualization, the low engineering cost and the high
rized into three approaches: full-virtualization, para- performance. A pre-virtualized OS can support multi-
virtualization, pre-virtualization. ple hypervisors such as Xen and L4 [12]. At present, a
J Sign Process Syst (2010) 59:33–43 35

single compiled image of a pre-virtualized OS can be 4 Design & Implementation


run on multiple types of hypervisor. Users can select a
hypervisor for any purposes [13]. This section explains the design and implementation of
SIGMA system, that provides the multi OS environ-
ment and is designed for multiprocessor architectures.
The SIGMA system does look like a VMM as it is a
3 Requirement multi OS environment, but it does not provide virtu-
alized hardware interfaces to OSes running on it. An
In order to build a multi OS environment for embedded OS in the SIGMA system runs directly on the physical
systems, following requirements are stated. hardware. The principal building blocks of the SIGMA
system are an interprocessor interrupt (IPI) [14] and an
interOS communication (IOSC). The system is imple-
• code modifications to the guest OS should be
mented on IA-32 architecture.
minimal
• the virtualization layer should be light
• it should be possible to reboot the guest OSes in-
4.1 Approach and Overview
dependently from each other
We assume that for most embedded systems, the num-
General purpose OSes like the Linux kernel are mas- ber of OSes which are required to have high functional-
sive and complicated software and are frequently up- ity is limited. In most cases, we expect that the number
dated to incorporate new features and to fix bugs, code of guest OSes is two: a realtime OS and a general pur-
modifications should be kept small or it will get hard to pose OS. Cases using more than three different OSes
go along with up-to-date kernel. From the viewpoint of are very rare. On the other hand, multicore processors
adding new guest OS to the environment ,the smaller are becoming major and two or four processor in a
code modification also has an advantage. Thus, code single machine is not a rare case. In this situation,
modifications to the guest OS should be minimal. there are more processor than these are guest OS is.
The second requirement is concerns the virtualiza- Therefore SIGMA system assigns each guest OS an
tion layer. To minimize the performance degradation own core so that the OS can use all CPU resources
of the guest OS, the virtualization layer must be very exclusively and the OS can run in privileged mode.
light. Especially in embedded systems, the guest OS is Devices are divided among the guest OSes. In order to
required to have realtime capabilities. Also to minimize share a device between OSes, the device must be man-
the virtualization layer effect to the worst case response aged by one OS and other OSes can send and receive
time of the guest OS. In addition, smaller system is data via communication mechanisms between OSes.
easier to test and verify. This approach has many advantages. First, resource
The third requirement is for the system availability. management mechanism is not needed and therefor
When a problem occurs in one of the guest OS and hardly no virtualization layer is required. This fulfills
its reboot is needed, the guest OS should be able to the second requirement from Section 3: “the virtual-
reboot independently from the other guest OSes. By ization layer should be light”. OSes running on the
rebooting the guest OS independently, other systems SIGMA system can manage their assigned devices by
can provide services continuously and availability of the themselves, they act as if they were running on a single
whole system can be improved. normal hardware. And they can run in privileged mode
As mentioned in Section 2, both full-virtualization and use most of the privileged instructions. Thus, OSes
and para-virtualization have a trade-off between the are hardly required to be modified and cat have as high
code modification of the guest OS and the performance performances as the native system. This also fulfills our
degradation of the guest OS. The full-virtualization first requirement: “code modifications to the guest OS
approach hardly requires any modifications to the guest should be minimal”.
OS, while it degrades its performance. The virtual- In Intel’s multicore architecture, one of the proces-
ization layer also becomes large. On the other hand, sors is selected to be a the bootstrap processor (BSP)
the para-virtualization approach minimizes the perfor- and the others are application processors (APs) during
mance degradation of the guest OS, while imposing the system initialization by the system hardware [15].
quite a lot of code modifications for the guest OS. After configuring the hardware, the BSP starts the APs.
Thus, it is difficult to simultaneously satisfy the first two For the SIGMA system, we define that an OS running
requirements by the existing virtualization approaches. on the BSP is a host OS, and that the OS running on the
36 J Sign Process Syst (2010) 59:33–43

Figure 1 The structure of the

Manager
SIGMA system.

Wombat

Iguana

Server

Server

OS
... ...
SIGMA Linux

NICTA::L4-embedded

CPU0 (BSP) CPU1 (AP)


H/W

AP is a guest OS. Especially, we call the guest Linux, memory maps. Then, the OS manager modifies the
“SIGMA Linux”. It is slightly modified for the SIGMA memory map to create memory spaces for the SIGMA
system. In the SIGMA system, NICTA::L4-embedded, Linux. The modification is based on the pre-defined
Iguana and Wombat are running on the BSP, and memory regions which are available for the SIGMA
SIGMA Linux is run on the AP. The architecture of Linux. After the modification, the OS manager over-
the system is illustrated in Fig. 1. writes the modified map to the memory and boots
the SIGMA Linux. Although the original native Linux
includes instructions that call the e820 function, those
4.2 Boot Sequence of the SIGMA Linux are removed from the SIGMA Linux to prevent the
modified memory map from being overwritten again.
All modules of the SIGMA system including the Finally, the SIGMA Linux reads the modified memory
SIGMA Linux kernel, are loaded to the memory at maps and starts booting within the restricted memory
initial boot time. The OS manager, one of the modules areas. This approach however produces a security prob-
running on the host OS, takes charge of starting the lem where the SIGMA Linux might ignore the memory
guest OSes. Before the SIGMA Linux boots, the OS restriction and access illegal parts of the memory. This
manager acquires kernel information from L4. L4 has problem is discussed in Section 6.
information about the modules such as the location in
the memory, the image size and other parameters. In
addition, to define the memory regions available for the 4.4 InterOS Communication
guest OS, the OS manager creates new e820 memory
maps. The details of the memory management will be The IOSC is a mechanism that lets OSes running
described in the next paragraph. The OS manager also of different cores to communicate with each other.
writes the information that the SIGMA Linux uses such The IOSC is a similar mechanism as the interprocess
as kernel parameters to a specified memory address. Fi- communication (IPC) of a general OS. The IOSC is
nally, OS manager sends a startup IPI to the AP and the processed by following steps.
SIGMA Linux starts booting. For the SIGMA Linux, it
seems as it was booted by the general bootloader such (a) The source OS writes a message to a shared
as GRUB on a standalone system. For the guest OS, memory region.
the OS manager is something like GRUB. (b) The source OS sends an IPI to the destination
cores.
(c) The destination OS receives the IPI and reads the
4.3 Memory Management message.

Although a normal Linux kernel acquires a memory In the SIGMA system, each pair of OSes that
map through a bios call to identify free and reserved communicate shares two different memory regions
regions in the main memory, a Linux kernel running on and each OS writes its messages to different ar-
the SIGMA system obtains the memory map from the eas. These memory regions are placed outside of
OS manager. both the L4 microkernel and Linux. Their addresses
Before booting the SIGMA Linux, the OS manager are specified statically when the system is built.
executes a program that invokes the e820 function However these regions are not protected by any
on the AP and acquires the e820 memory map. The mechanisms. Thus, it is possible for all guest OSes
function of the e820 is to check the memory size and to access these memory regions illegally. Because
J Sign Process Syst (2010) 59:33–43 37

all the guest OSes in the SIGMA system are al- The second category is interrupt configurations. In
lowed to execute all privileged instructions, they can the SIGMA system, guest OSes are not allowed to
change the memory mapping to the MMU of their share devices and interrupts. Thus, all interrupts must
assigned processor and invade the memory regions be delivered to the corresponding guest OS properly.
where the shared memory for IOSC is located. In the Intel multiprocessor architecture, it is possible
If the IOSC is called frequently, the system per- to distribute interrupts among processors by interrupt
formance could be affected. This resembles IPCs of a numbers. This mechanism is achieved with IO APIC
microkernel-based systems. [14] functions. We modified the part of the SIGMA
Linux code which initializes the IO APIC to prevent
it from configuring anything set by the host OS being
4.5 Device Management
overwrote by SIGMA Linux. Hereby, interrupts for L4
microkernel is delivered to L4 microkernel and inter-
Device management is mainly done by the host OS.
rupts for Linux is delivered to Linux. To achieve this
Although both the host and the guest OSes can use
mechanism, following options have to be selected at
devices directly, should some OSes use the same device
Linux kernel configuration and Linux has to be forced
at the same time, a collision occurs. The SIGMA system
to use IO APIC.
avoids these collisions by restricting device accesses to
the host OS. • Local APIC support on uniprocessors
Device drivers are implemented as device servers of • IO-APIC support on uniprocessors
the host OS. A device access of a process in the host
OS is performed by just calling the device server. When
a process of a guest OS wants to use a device, the 5 Evaluation
process has to send a request to the device server of
the host OS by calling the IOSC service. This process This section describes the evaluation of our prototype
is performed by a stub driver in the guest OS. They system. We ran a collection of benchmark programs
are called pseudo device drivers. Since the interface to evaluate the performance of the guest OS. We also
of the pseudo device drivers are same as those of the examined the collision on shared memory between the
common device drivers, the process can use devices host OS and the guest OS. All experiments described in
without knowing the location of the particular device this paper were performed on a PC with a 2.4GHz Intel
and its driver. Core2 Duo with 32KB L1 cache for each core, 4MB
shared L2 cache, 512MB of RAM, and an ATA disk
4.6 SIGMA Linux Modifications drive. We used Linux kernel version 2.6.12, and 2.6.16
for Xen.
The SIGMA Linux is able to execute all of the hard-
ware instructions directly.These instructions need not 5.1 LMbench
to be substituted with emulated codes or hypervisor
calls. For the SIGMA system, two kinds of the modi- We used the LMbench [16] to measure their perfor-
fications are needed. The first category consists of the mance. LMbench is the micro benchmark software to
modifications related to memory regions. Following measure the performance of Linux. LMbench measures
three parts of Linux kernel were modified. the performance of system calls, context switches, la-
tencies of communications and so on. Since LMbench
• The address that kernel setup routine is located was uses regular Linux system calls, it can be suitable to
changed compare many Linux systems.
• The address that uncompressed kernel image is SIGMA Linux is compared to three systems: vanilla
located was changed Linux kernel running on a single core (Native Linux),
• The instruction to call e820 function was omitted vanilla Linux kernel running on two cores (Native SMP
Linux), and modified Linux kernel running on Xen
The memory region were the non-modified kernel VMM (XenoLinux).
setup routine and uncompressed kernel image are lo- Figure 2 shows the cost of the interactions between a
cated is already used by the L4 microkernel. To avoid process and the guest kernel. The benchmark examines
overwriting the memory, the first two modifications Linux system call and signal handling performance.
are required. The third modification is required for Any of the items of SIGMA Linux is the same as that
constrict memory regions used by the SIGMA Linux. of native Linux, while XenoLinux takes longer time
38 J Sign Process Syst (2010) 59:33–43

Figure 2 LMbench:
Processes—time
in microseconds.

than native Linux in most of the cases. Some experi- processing. Although it may be suitable for math calcu-
ments of SIGMA Linux made better results than native lation fields and image processing to introduce native
Linux. This can be caused by the small difference of the SMP Linux, it is not always efficient way for general
environment: SIGMA Linux ignores some interrupts, purpose use. To create a program which is optimized to
but native Linux doesn’t. parallel processing, a special compiler may be required
We evaluated the cost of a process context switch in such as automatic parallelizer and it may be also re-
the same set (Fig. 3). The cost of a context switch on quired to use a parallel programming language such as
SIGMA Linux is also almost the same as that of the Fortress [17]. As described above, native SMP Linux
native Linux and less than that of the native SMP Linux often shows slower performance than native Linux on
and XenoLinux in any case. a single processor. There are some possible reasons.
The last benchmark evaluates the cost of the file The first is lock contention: if one processor has already
operations (Fig. 4). Xen is as fast as native Linux and acquired the lock of the critical sections, the other
SIGMA Linux in file create/delete operations. On the processors have to wait until it will be unlocked to
other hand, native SMP Linux is slower than the others. access the regions. In this case, the advantage of parallel
The result shows that SIGMA Linux performs as well execution cannot be made the best use of. The second
as native Linux. It means that there are very few over- is the memory access collision issues, which will be
heads in SIGMA Linux comparing to native Linux. described in Section 5.2. The third is cache inconsis-
From the all results, SIGMA Linux presented as tency. In a multiprocessor system, when one processor
the almost same performance as native Linux. In ad- changes a page table or page directory entry, the change
dition, compared to native SMP Linux and Xeno- must be propagated to all the other processors to keep
Linux, SIGMA Linux shows much better performance the cache consistency. This process is commonly re-
than them. ferred to as TLB shootdown and performed by IPI [15].
If multiple threads are appropriately assigned to TLB shootdown in IA-32 architecture must guarantee
processors and there are no dependencies between either of the following two conditions: different TLB
the threads, native SMP Linux can achieve efficient mappings are not used on different processors during

Figure 3 LMbench: Context switching—time in microseconds. Figure 4 LMbench: File & VM system latencies in microseconds.
J Sign Process Syst (2010) 59:33–43 39

updating the page table, the OS is prepared to deal with become slow in such cases. Fortunately, many kinds of
the case where processors use the stale mapping during multiprocessors have cache memory in each processor
the table update. A processor can be stalled or must and a processor can access cache memory much faster
invalidate the modified TLB entries in order to keep than memory. If a process accesses data exists in cache
the TLB consistency. As a result, this may degrade the memory, the read/write operation becomes faster. On
system performance. the other hand, if the data isn’t in cache memory, the
XenoLinux shows more than two times slower than process has to access main memory. Such operations
SIGMA Linux and native Linux at worst. The most are expensive compared to the cache memory. Thus,
substantial element is the emulation of privileged in- the results of memory latencies depend on whether
structions handled by Xen hypervisor. Many of IA- the target data is in cache or not. We measured the
32 instructions related to memory and interrupt and latencies with memory access collision in five different
I/O management must be performed in the privi- conditions described in Table 1. The main purpose of
leged mode. To perform those operations, XenoLinux the evaluation is to survey how much affection there
invokes hypervisor, which actually handles such in- are from activities of other OSes which share same
structions. Such invocations are called hypercalls. The memory bus.
hypercall is very expensive processing. Although some The five conditions are categorized by cache avail-
optimizations are implemented into Xen to reduce the ability and memory stress. “No Cache” and “Both No
cost of the hypercall, for example, some hypercalls are Cache” condition mean that caches in each processor
handled together, it can’t be enough to improve the are disabled. In “Stress”, to generate loading condi-
performance degradation substantially. tion, the program which only repeat read and write to
On the other hand, SIGMA Linux shows as fast memory is run on the host OS, Wombat. In that case,
performance as native Linux. The most significant ad- the memory access of the guest OS in “No Cache”
vantage of avoiding performance degradation is that condition can be sometimes affected. In “No Cache
the guest OS can handle the privileged instructions with Stress” condition, the delay is more increased
directly. The configurations of page tables and interrupt because the many access from the host OS occurs. In
vector tables, enabling/disabling interrupts, device con- “Both No Cache” and “Both No Cache with Stress”
trols require privileged instructions and those are usu- condition, the memory collision is assumed to be much
ally performed by emulations in general virtualization more increased because all processors try to use the
approaches. Though, SIGMA Linux hardly requires memory bus in reading or writing to the memory. The
such emulations except in using device drivers run on evaluations are performed on the guest OS and we
the other OSes. In addition, the VMM of general virtu- used LMbench as an evaluation tool. The results are
alization manages the guest OSes and usually handles illustrated in Fig. 5. Figure 5 shows that, in both cache
scheduling of the guest OSes. In other words, the VMM enabled and disabled configurations, the performance
must decide one of the guest OS to run and save and of the guest OS in loading condition degrades ten
restore all OS contexts. This context switches between to fifteen percents compared to that in unloading. In
OSes are similar to that of processes in general OS and addition, when loading size is below four megabytes,
may degrade system performance. In SIGMA system, the latencies of “No Cache” and even “No Cache
multi OS run concurrently and scheduling of OSes is with Stress” conditions are same as the “Normal”.
not required at all. The superiority of SIGMA Linux That is, no performance degradation occurs below four
in performance aspects is based on the architecture megabytes when the caches are enabled on the target
of SIGMA Linux which is almost same as that of processor.
native Linux.

5.2 Memory Access Collision


Table 1 The measurements condition of memory access collision
Memory accessing can conflict at a memory bus in in SIGMA system.
shared memory multiprocessor architecture. The Condition Host OS Guest OS
conflict causes the performance degradation even in Cache Stress Cache
SIGMA system. When more than two OSes access Normal Enabled No Enabled
to the memory concurrently, while one OS performs No cache No No Enabled
memory related operations, the others can be waited No cache with stress No Stressed Enabled
for it to become available. We call such conditions Both no cache No No No
memory access collisions and the performance may Both no cache with stress No Stressed No
40 J Sign Process Syst (2010) 59:33–43

Figure 5 LMbench:
lat_mem_rd: Memory read
latencies in the guest OS
with memory collisions.

In general IA-32 architecture, processors and mem- 5.3 Engineering Cost


ory are connected by the 32-bit system bus, and the bus
is shared by all processors on the system. Therefore, if We compared the number of modified lines of each
multiple processors try to access the memory, the mem- guest OS to measure the development cost quantita-
ory access may conflict. In the architecture, memory tively. One of the principle purpose of SIGMA system
reading/writing is performed in accordance with mem- is to minimize the engineering cost of the guest OS as
ory ordering model of the processor, which is the order small as possible. The results are shown in Table 2.
in which the processor issues reads and writes through Table 2 indicates that the modification cost of
the system bus to system memory. For a example, in SIGMA Linux is much smaller than that of XenoLinux
write ordering of multiprocessor system, although each and L4Linux. A significant reason is that SIGMA Linux
processor is guaranteed to perform writes in program has the almost same structure as native Linux. That is,
order, writes from all processors are not guaranteed SIGMA Linux can execute the instructions that control
to occur in a particular order [15]. Therefore, if the the hardware directly, and such instructions need not
system memory becomes loaded condition, multiple to be substituted with emulated codes and hypervisor
processors may access to the memory at the same time calls. The required modifications in SIGMA Linux are
and actually waiting time can be increased for each the memory related issues described in Section 4.3 and
processor. As a result, those conflicts caused such delay interrupt configurations and hardware checking parts.
shown in Fig. 5.
On the other hand, the reason why such delay didn’t
occur in loading four megabytes or less is related to L2 6 Discussion
cache size of the processor. If the loading data is in the
cache, the processor need not to access the memory As mentioned in Section 1, a multi OS environment is
and the accessing does not affected by the memory an effective approach for building systems that require
ordering. Therefore, if the each processor has more a rich GUI and multimedia support in addition to
caches, the possibility of the cache hit increases and it traditional realtime processing capability requirements.
minimizes the delay by the memory access competition, Therefore the need for multi OS environments for
as a result. embedded systems is increasing. In this section, we
classify multi OS environment architectures into
three models shown in Fig. 6. Then, we discuss their
fulfillment of the six requirements we have laid. The
Table 2 The engineering cost System Modified lines first three requirements are modification, performance
of the guest OSes.
SIGMA Linux 25 and independent reboot described in Section 3. Next
XenoLinux 1441 two requirements, protection and realtime capability
L4Linux 6500 are also important requirements for most embedded
J Sign Process Syst (2010) 59:33–43 41

Figure 6 Three types of


multi OS environment (a–c).

App

App

App

App
App

App
OS0 OS1
OS1
VM0 VM1
VMM OS0 OS0 OS1
CPU0 CPU1 CPU0 CPU1 CPU0 CPU1
H/W H/W H/W
(a) (b) (c)

systems. At last, we discuss about the number of the In both approaches, the guest OS is scheduled as
guest OSes that should be able to be run in parallel. a process in the host OS. Inter OS communication
The brief results are shown in Table 3 and details are and device sharing mechanisms are implemented
described in the rest of this section. with functions provided by the host OS. Memory
protection mechanisms between processes in the
host OS are provided by the memory protection
6.1 Multi OS Environment Models
mechanism between the guest OS and the host OS.
3. Model3(SIGMA system) This is the approach that
Existing multi OS environments that are achieved by a
we have proposed in this paper and illustrated in
virtualization techniques are basically divided into two
Fig. 6. The SIGMA system assigns an individual
models. We added the model of the SIGMA system to
CPU to each OS and the guest OSes. Therefore all
these two models and classify the models of multi OS
the guest OSes can use all CPU resources exclu-
environments into three varieties shown in Fig. 6.
sively and no virtualization layer exists under the
guest OS. Thus, processes of each OS are scheduled
1. Model1 This model realizes a multi OS environ- completely independent from other OSes. The de-
ment by using a VMM. As shown in Fig. 6a, mul- vice sharing mechanisms are provided by a server
tiple OSes runs on VM interfaces provided by the in the host OS. However, in order to minimize
VMM. In this model, all OSes run as guest OSes. overhead and code modification costs of the guest
Each OS is scheduled and inter OS communica- OS, the host OS and the guest OS are running in
tion and device sharing mechanisms are provided privileged mode. Thus it is hard to provide any
by the VMM. It also provides memory protection memory protection mechanism between the guest
mechanism between each OS and the VMM. Major OSes and the host OS.
examples are Xen and VMware server.
2. Model2 This model provides multi OS environ-
ment by executing the guest OS as a process in 6.2 Modification and Performance
the host OS. The model is illustrated in Fig. 6b.
There are two approaches in achieving a multi OS The first two requirements are the low modification
environment with this model. One is running a cost and the performance of the guest OS. In Model1
para-virtualized guest OS as a process in the host and Model2, virtualization techniques are used to
OS. L4Linux and Wombat is a typical example run a guest OS. However full-virtualization and para-
of this approach.There also RT-Linux [18] and virtualization have a trade off between the modification
Linux on ITRON [19] that focus on the realtime cost of the guest OS and the performance of the guest
execution. Another approach is building a com- OS. Thus, a compromise between the two has to be
pletely virtualized hardware environment on the made. On the contrary, Model3 hardly requires any
host OS by using emulators such as QEMU [20]. modifications to the code of the guest OS.

Table 3 Comparison of Model Modification Performance Reboot Protection Realtime Guest OS Number
three types of multi OS
environment. TYPE1      
TYPE2      
TYPE3    ×  ×
42 J Sign Process Syst (2010) 59:33–43

6.3 Independent Reboot certain realtime capabilities of a guest OS because of


the virtualization. However, it is not difficult to guaran-
The third requirement is the capability to reboot the tee realtime capabilities of the host OS, since the host
guest OSes independently. On Model1, guest OSes OS is running on physical hardware and there is thus no
are booted and rebooted by functions provided by the performance degradation. Therefore, if the host OS is
VMM. Thus, all OSes can be rebooted independently. a realtime OS, the system is able to run realtime tasks.
On Model2, the guest OSes can be rebooted like any All OSes in Model3 run on physical hardware and have
process in the host OS. However, since guest OSes almost the same performance as a native OS would
strongly depends on functions of the host OS, it is have. That is, realtime OSes can be executed with no
impossible to reboot the host OS independently. In performance degradation in Model3.
Model3 there is no strong dependency between the
guest OS and the host OS. However boot and reboot
of the guest OSes are done by functions provided by 6.6 Number of Guest OSes
the host OS. Thus, reboot of each OS can be done
only by the host OS. That is, all OSes can be rebooted Last requirement is on the number of guest OSes that
independently only if the host OS can execute functions are possible to be run on the system simultaneously.
to reboot each OS. Model1 and Model2 have no limitation to it. Since
SIGMA system assigns a whole CPU to one guest OS,
6.4 Protection the number of guest OSes in the SIGMA system is
restricted by the number of CPUs. Model3 cannot run
Next requirement is a memory protection mechanism more guest OSes than there are CPUs.
between the guest OSes and the host OS. Generally in
the Model1 approach, VMM is executed in privileged
mode and guest OSes are executed in nonprivileged 7 Conclusion
mode with different address spaces. Therefore,
memory protection can be provided between the guest We have proposed an asymmetric multi OS environ-
OSes and between the guest OS and the host OS. ment named the SIGMA system on SMP. Asymmetric
In Model2, the memory of the guest OS is protected means that there are two types of OSes running on
by the memory protection mechanism that the host separate cores: L4 microkernel and its servers, includ-
OS provides to its processes. If the host OS does ing the OS manager, are used as a host OS, and a
not provide any memory protection mechanism to modified Linux named the SIGMA Linux is used as a
its processes, it will be difficult to provide memory guest OS. The SIGMA system can avoid performances
protection between the guest OSes. If any memory degradation of the SIGMA Linux because the SIGMA
access violation occurs in the kernel of the host OS, Linux has direct control of its hardware.
it may be impossible to protect the guest OSes from The evaluation clearly showed that the SIGMA
that error. It is also impossible to provide any memory system performs much faster than virtualization tech-
protection mechanism in the SIGMA system. Because niques such as Xen and that it achieves the perfor-
all OSes in the SIGMA system run in privileged mode mance of native Linux. The SIGMA system can be
and are allowed to execute all privileged instructions, realized with minimal overhead because no emulations
the guest OSes of the SIGMA system can change any are required to handle privileged processing. We also
memory mappings and invade the memory regions that measured the latency of the SIGMA Linux memory
should be managed by other OSes. loading. The results showed that the SIGMA Linux
was little affected in the condition. In addition, the
6.5 Realtime Capability engineering cost in implementing a SIGMA system
is much smaller than that of other para-virtualization
Next requirement is the real-time capabilities generally approaches.
required from embedded systems. Existing VMMs are Consequently, the experiments proved that SIGMA
designed for server and desktop environments and they system enables a multi OS environment with only a low
are not meant to be used in embedded systems. Para- performance overhead and only small modifications.
virtualization techniques try to minimize virtualization We believe that the SIGMA system is adoptable to em-
overhead. However, it is difficult to guarantee certain bedded systems that require a rich GUI and multime-
real time capabilities of a guest real time OS with dia support along with traditional realtime processing
Model1. Similarly with Model2, it is hard to guarantee requirements.
J Sign Process Syst (2010) 59:33–43 43

References

1. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T.,
Ho, A., et al. (2005). Wombat: A portable user-mode linux
for embedded systems. In Proceedings of the 6th Linux
Conf.Au Camberra. April 2005.
2. Barham, P., Dragovic, B., Fraser, K., Hand, S., Harris, T.,
Ho, A., et al. (2003). Xen and the art of virtualization. In
SOSP ’03: Proceedings of the 19th ACM symposium on op-
erating systems principles (pp. 164–177). Oct. 2004.
3. Goldberg, R. (1974). Survey of virtual machine research.
IEEE Coumputer, 7(6), 34–45.
4. Popek, G. J., & Goldberg, R. P. (1974). Formal requirements
for virtualizable third generation architectures. Communica-
tions of the ACM, 17(7), 412–421.
Wataru Kanda received B.S. degree of Engineering in Waseda
5. Rosenblum, M., & Garfinkel, T. (2005). Virtual machine
University, Tokyo, Japan in 2007 and currently working towards
monitors: Current technology and future trends. Computer,
a M.S. degree at Waseda University. His research interests are
38(5), 39–47.
embedded systems and virtualization.
6. Sugerman, J., Venkitachalam, G., & Lim, B. (2001). Virtu-
alizing I/O devices on VMware workstation’s hosted virtual
machine monitor. In: USENIX annual technical conference
(pp. 1–14).
7. Waldspurger, C. (2002). Memory resource management in
VMware ESX server. ACM SIGOPS Operating Systems
Review, 26(si), 181.
8. Härtig, H., Hohmuth, M., Liedtke, J., Schönberg, S., &
Wolter, J. (1997). The performance of microkernel-based sys-
tems. In SOSP f97: Proceedings of the 16th ACM sympo-
sium on operating systems principles (pp. 66–77). New York,
NY, USA.
9. Kuz, I. (2005). L4 User Manual NICTA L4-embedded API.
October.
10. Heiser, G. (2005). Iguana user manual (p. 4).
11. LeVasseur, J., Uhlig, V., Chapman, M., Chubb, P., Leslie, B.,
& Heiser, G. (2005). Pre-virtualization: Slashing the cost of
virtualization. Technical Report 2005-30, Fakult Nat f Nur
Informatik, Universit Nat Karlsruhe (TH). Nov. Yu Murata received M.S. degree of Engineering in Waseda
12. University of Karlsruhe Germany and University of New University, Tokyo Japan in 2007. His research interests are vir-
South Wales and National ICT Australia. (2005). Afterburn- tualization and embedded systems.
ing and the accomplishment of virtualization. April 2005.
13. LeVasseur, J., Uhlig, V., Leslie, B., Chapman, M., &
Heiser, G. (2005). Pre-virtualization: uniting two worlds. 23–
26 Oct. 2005.
14. Intel Corporation (1997). MultiProcessor specification ver-
sion 1.4.
15. Intel Corporation (2006). IA-32 Intel architecture software
developer’s manual.
16. McVoy, L. W., & Staelin, C. (1996). lmbench: Portable
tools for performance analysis. In USENIX annual technical
conference. Barkely (pp. 279–294), Jan. 1996.
17. Allen, E., Chase, D., Hallett, J., Luchangco, V., Maessen,
J.-W., Ryu, S. et al. (2007). The Fortress language specification
version 1.0 beta. Santa Clara: Sun Microsystems, Inc.
18. Yodaiken, V. (1999). The RTLinux manifesto. In Proc. of the
5th linux expo. Raleigh: NC, March 1999.
19. Takada, H., Iiyama, S., Kindaichi, T., & Hachiya, S. (2002).
Linux on ITRON: A hybrid operating system architecture for Tatsuo Nakajima is a professor of Department of Computer Sci-
embedded systems. In Proceedings of the 2002 symposium on ence and Engineering in Waseda University. His research inter-
applications and the internet (SAINT) workshops (p. 4). ests are embedded systems, distributed systems, and ubiquitous
20. Bellard, F. (2005). QEMU, a fast and portable dynamic computing. His research group is developing advanced software
translator. In Proceedings of the USENIX annual technical infrastructure for future many core processors for embedded
conference (pp. 41–46). April 2005. systems and future ubiquitous computing services.

You might also like