You are on page 1of 7

Type of submission:

X Technical session(s)

! Keynotes session(s)

Topic covered from the Call for Papers:

MMU-based software cache and swap mechanisms for


smart card operating systems
Olivier Lobry
Gemplus Software Research Lab,
Phone: +33 442 364 497, Fax: +33 442 365 555
Olivier.Lobry@research.gemplus.com

Abstract. Modern processor architectures have proven their benefits by


providing mechanisms that significantly improve the performance of classical
operating systems. Yet, it has not been proven that such processors can be as
much relevant to the Smart Card context. To start answering this question, we
analyze how the Memory Management Unit of the MIPS 4KSc architecture can
be used to build efficient memory management techniques. Cache and swap
mechanisms are especially studied. Their principles are presented and we show
why and how they can be applied to the Smart Card context. Then we propose
and detail a smart card-oriented implementation. In conclusion, we show how
this study demonstrates the benefits to introduce advanced features in Smart
Card processors.

Introduction
Modern processor architectures have dramatically simplify the design of operating
systems and provide efficient mechanisms to build advanced software abstractions.
High-frequency CPUs, large low-latency memories, Memory Management Units,
protection mechanisms, register windows, enhanced pipeline of RISC architecture,
large address spaces of 64 bits architecture, and more complex and efficient
communication devices are such features that make today's applications possible.
However, until recently, smart card-oriented processors (like ATMEL6464, ST22 or
Infineon 66) had very limited features. Memory was very scarce, especially RAM,
most processors are 8-bit architectures without pipeline, the communication interface
is reduced to its simplest form, timers rarely exist, no memory management support
are provided, etc. The main explanation of this situation is the fact that up to now
smart card applications are very limited: check a pin code, manage a small address
book, keep last banking transactions, ciphering. To keep the system small and
efficient, applications are generally integrated within a very simple operating system.
Moreover, the application model is also very limited: application are actually set of
commands that can be sequentially executed on demand by the terminal. The smart
card device is rarely addressed and real-time constraints are not a concern. For all
these reasons, hardware requirements was not very strong and the features of the
processor was mostly driven by its price.
So as to gain portability of applications, smart card actors turned to JavaCard
operating systems. When changing hardware, only the operating system need to be
ported. Applications stay the same, which greatly improve the time to market.
JavaCard also comes with a enhanced protection model, i.e sandboxing, which greatly
facilitates the design of multi-applicative smart cards. Hence, this new approach gives

Gemplus Developer Conference 2002

new perspectives to the utilization of smart cards. For this reason and because the
operating system is getting more complex, hardware requirements are getting bigger
and bigger.
The Gemplus research Lab goes one step further. It aims at proposing a new Java
environment for smart card applications. Rather that trying to enhance today's
JavaCard operating systems, it makes a real breakthrough by getting quite close to
Java specifications. Class files (instead of the very specific CAP format) could be
dynamically loaded, different execution models could be defined, sockets could be
used to communicate, etc. This new vision make of course hardware requirements
even more critical, which makes in turn advanced hardware support even more
necessary.
In this paper, we discuss maybe the most popular hardware support of modern
processor: the Memory Management Unit (MMU). This feature has greatly simplify
the management of memory and protection of modern operating system. However,
implemented MMU-based software techniques generally induce a cost in term of
memory space that make them not necessarily adapted to the smart card context. The
intention of the present work is to show that the MMU can however be of interest in
this context, by proposing software mechanisms that take into account this memory
space constraint.
The remaining of the paper is as follows. The next section introduce the issues that
must be tackled in smart card operating systems (SCOS). Section 3 details the
principles of managing a cache or a swap in this context, why these mechanisms are
interesting and what are the implementation issues. The principles, benefits and issues
are then presented in section 4 before e describe the SmartMips architecture in section
5 and show of its MMU can be used to implement a cache / swap mechanism that fits
the memory space constraint. Finally, section 6 concludes and gives some open issues.

Memory management in advanced SCOS


The role of the memory manager of smart card operating systems (SCOS) is to allow
applications (or other part of the system) to give applications access to memory
regions in order to create and manipulate data. Data can either be persistent, that is
they survive the termination of the application or the smart card session, or can be
temporary data that do not need to be store in non-volatile memories (NVM). It must
also ensure that applications have appropriate access rights when they access code or
data unless this is done by another system component. Transactional properties should
also be ensured, particularly atomicity of sets of updates in case of power failure.
Like in usual operating system, the management of memory is a critical part of SCOS.
The main difference is that in the context of the smart card, memory space is very
limited. Even though this limitation is reduced in next generation smart card
processors, memory consumption is still a concern. For example the memory feature
of the HiperSmart processor (that implements the SmartMips architecture) comes with
"only" 7K of RAM, 64K of NVM and 256K of ROM. There is no doubt that memory
managers of standard operating systems cannot be implemented as is with such
memory constraints. Thus, not only memory management techniques that take this
limitation into account must be implemented, but this constraint must also be
considered in their implementation.

MMU-based software cache and swap mechanisms for smart card operating systems 3

In addition to this memory-related non-functional properties, the system must ensure


other non-functional global properties. For example it must consume the less possible
power, give good reactivity to external events, etc. While this properties are beyond
the scope of memory management, they must be taken into account when
implementing a memory manager.
Note that the memory manager of a SCOS has to deal with memory devices like Flash
and EEPROM memories which have some specificity. First, they are non-volatile
memories and act as the persistent storage of the card. Second, they are directly
addressable by the processor, which is not the case of a disk. However, they cannot be
updated just like RAM: a long lasting programming process must be executed (of
about the latency of a magnetic disk) that makes busy the whole device. Finally, those
memory devices are also subject to stress: the more a memory location is updated, the
less it is able to retain safely its information.

Cache and swap management principles


Caching and swapping are two techniques that, even though they do not answer the
same problems, have common issues and implementation perspectives. Caching aims
at making data closer to the processor in order to accelerate execution. In the context
of the smart card, caching can be used for essentially three reasons. The first one is to
accelerate reads by bringing data from memories accessed through the external bus to
memories accessible on the internal bus. This can be either hardware caches or fast
RAM like the ScratchPad RAM (SPRAM) present in the HiperSmart (see next
section). However, bringing data from non-volatile memory to volatile memory is not
as advantageous as in usual operating systems if both are accessed through the
external bus, since non-volatile memories are addressable and their latency for read
operations is similar to the one of an external RAM.
The second reason is to accelerate updates to non-volatile memory. Instead of
launching a programming process for each updates, modifications are cached in a
volatile memory, and grouped when they must be reported to NVM. The benefit is to
reduce the number of updates thus accelerating execution and minimizing NVM stress.
Moreover, this stays true even when using external RAM instead of hardware caches
of SPRAM.
The third reason is to minimize the possibility of being blocked by a NVM
programming. Remember that while an NVM is being programmed, it cannot being
accessed for writing and even for reading. By bringing most accessed data in a nonblocking memory like RAM (internal or external) or hardware caches, one can
minimize the probability of being blocked by an NVM programming.
Swapping does not respond the problem of limited volatile memory. Without such a
mechanism, when volatile memory is running low and the demand still grows, the only
solution is first to suspend some applications and if this is not sufficient, to kill some
so as to get some memory back. In the context of the smart card where RAM is
generally very low this can dramatically limit the viability of a complex operating
system conceptually able to run multiple applications handling multiple
communication sessions in parallel. So the principle of swapping is simple: put some
volatile data temporarily in a less limited memory space like NVM and bring them
back into RAM when necessary. This technique is even more interesting when
swapping to NVM (instead to a magnetic disk) since data only need to be brought
back to NVM when updated.

Gemplus Developer Conference 2002

Note that caching and swapping are mechanisms, not abstractions. Here we mean
that an application can cache persistent data by itself or swap volatile data in a file or a
pre-allocated non-volatile memory region. But then, all the complexity must be handle
by the application itself. The abstraction behind caching is a fast non-volatile memory
(a bit like coming Feram or MRAM memories). The one behind swapping being the
illusion of a large RAM. In both cases the applications does not care about how these
abstractions are implemented. They execute as if they actually had a fast NVM and a
large memory. Even if it is not completely true, the way they are structured, organized
is not impacted and all the complexity is handle by the operating system.
Caching and swapping, even though they do not serve the same objectives, have the
following issues is common:
Mapping: since a same (logical) data can resides in different physical memory
locations, they must be able to manage a dynamic mapping between the logical
identifiers of data and the physical identifiers of memory areas;
Access detection: since the physical location may change depending on how data
are accessed, both need to know when and how data are (or are going to be)
accessed;
Copying: in both cases, data must be transferred between volatile and nonvolatile memories;
Replacement: since the cache is smaller than the persistent area and the available
RAM smaller than the emulated extended volatile memory, both techniques need
to choose, when RAM is saturated, an in-RAM victim when bringing data from
NVM;
Grouping: when the unit of transfer is bigger that the unit of allocation, data must
be grouped according to their access properties so as to minimize the number of
NVM programming processes.
They may differ however on different points:
Mapping persistency: in the case of a cache, the mapping between logical
identifiers and physical memory location in NVM must stay the same between
two card sessions and fault-tolerant while the mapping between volatile data and
their representation in NVM need not to be persistent and even less fault-tolerant;
Number of mappings: Since there is a priori more persistent than volatile data,
the number of mappings to maintain should be more important for caching than
for swapping;
Replacement & grouping strategies: the way data are accessed may be different
whether they are persistent or not and some non-functional properties such as
reactivity or anti-stress may differently guide these strategies;
Redundancy and coherency: whereas in a cache a logical data may have
different copies of which coherency must be addressed, the swap manages only
one copy for each data.
Yet, due to the issues they have in common, most techniques used for caching may
be used for swapping. Actually, one could see in the merging of both mechanisms the
implementation of the abstraction of a large fast memory which could be equally used
for manipulating short or long living data. Consequently, most of the effort and
conclusions that could be made for one mechanism may be applied to the other. In the
next section we show how their common issues, notably mapping and access
detection, can be addressed using an MMU while taking into consideration the
memory constraints of the smart card context.

MMU-based software cache and swap mechanisms for smart card operating systems 5

MMU-based cache and swap mechanisms for SCOS


The first feeling that one may have when hearing of someone trying to use an MMU in
a smart card context is that he/she will never succeed the memory space challenge.
This feeling is not totally unfounded since most implementations that use an MMU in
big systems generally need more memory that those available on a card. But first, one
should notice that those implementations have generally been thought with the
objective of speed in mind, not memory consumption. This means that they may be
some equivalent implementations, not as fast, but using less memory. Another point is
that the memory specificity of smart cards may have a significant impact on why and
how an MMU could be used.
We show here how the MMU of the 4KSc [1] can be used in the design of an
advanced SCOS. The 4KSc designates the architecture of a core jointly designed by
MIPS and Gemplus and based of the already existing 4Kc embedded-oriented core. It
implements the SmartMips Instruction Set Architecture specially designed for the
smart card context. It is a 32-bit RISC architecture with a 5-stages pipeline, 2
instruction sets (one 32 bit and one 16-bit for code compactness), several
cryptography-oriented instructions, security blocks and, more interesting for our
discussion, an enhanced MMU supporting 1K pages and an interface for a ScratchPad
RAM (SPRAM). This architecture has been chosen for the Philips HiperSmart smart
card processor [2] which also implements an ISO-7816 UART, 256 KB of ROM, 64
KB of NVM an a 7KB SPRAM. Noite that SPRAM combines the advantages of both
a RAM since it is addressable and a cache since it provides fast accesses.
The MMU consists of a Translation Lookaside Buffer (TLB) with 16-dual entries, a
2KB instruction cache and a 1KB data cache. The processor manipulates virtual
addresses which are transparently translated into physical addresses by the MMU.
Then, according to its physical address, an accessed data is loaded from either the
caches, the SPRAM or external non-volatile memories (through the external bus). In
parallel of the translation, access rights are compared with the type of access (load,
store or fetch). If the MMU cannot resolve the translation or the access rights are
insufficient, an exception is raised and the flow directly goes to a software handler
whose job is to update the MMU accordingly. Then the control goes back to the
faulting instruction so that the interrupted code can continue just as if no exception
occurred.
Thus, the MMU can be used as a hardware support for caching and swapping since
(1) it is able to translate logical identifiers (virtual addresses) into physical addresses
and (2) it is able to differentiate accesses for reading from accesses for writing. The
benefit is that those two actions are performed transparently and at no cost in most
cases, which would not be the case with a software implementation. Hence, the
abstractions of a fast non-volatile memory and of a large volatile memory can be
implemented easily (and efficiently) since programs need not to be aware of caching
and swapping mechanisms. As stated before, the granularity of mapping and access
protection of the 4KSc is 1KB (instead of 4KB in general MMU-based processors),
which means that the granularity of transfer for a cache or a swap is also 1KB. Even
though this might still be big compared with the size of memories, it is however small
enough to envision the implementation of MMU-based cache and swap mechanisms.
A volatile mapping can be maintain in the TLB by the operating system. Generally,
it is used as a cache of mapping entries maintained in a RAM-resident page table.
However, as we will now explain, the TLB will be used as the page table of the
system. Actually, while the TLB is generally seen as a cache of a page table, one could
reverse the problem and see the page table as a swap of the TLB, this swap being
needed only if the TLB is not big enough to hold all the memory mappings. Since
memories, and consequently the number of required mappings, are relatively small in

Gemplus Developer Conference 2002

the context of a smart card, one could legitimately ask oneself whether this swap need
to be big, or even exist.
The way we use the MMU for caching and swapping is illustrated by Figure 1. The
principle is relatively simple. Programs are provided the two abstractions mentioned
above through two different regions of the virtual address space. Pages that are not
accessed are not mapped. Once virtual pages are accessed (detected by the TLB with a
TLB_refill exception) and while they are only read, they are mapped to their
corresponding persistent page in the case of the cache, or the corresponding swapped
page in the case of the swap, and write-protected. When modified (detected by the
TLB with a TLB_modify exception)
There is several remarks to make with this implementation. The first one is that as
shown, the MMU can be shared to implement a cache and a swap running in
parallel. The second remark is that while virtual pages are accessed only for reading,
they can be mapped to NVM pages and access rights set to read-only. Remember that
is the case of a cache, the mapping to persistent data must be made persistent and
fault-tolerant. Fortunately, this can be done very easily and efficiently by managing
fixed and calculated mappings.
As shown in the figure, the physical address of a persistent page while it is accessed
for only reading, is equal to the virtual address of the page (obtain from the virtual
address of the access data, i.e. its identifier) plus a relative integer, and this rule never
change. This way, the mapping of these pages need not to be store in volatile nor in
non-volatile memory. Thus, the problem of persistency (durability in term of
transaction properties) and atomicity are just the concern of data, not of their mapping.
RAM

RO : for read only


RW : for read &write
: mapping kept in TLB

Logical space
Cached persistent memory
RO RO
RW RO RW RW RW RO

Virtual
memory

Extended volatile memory


RW RW RO RW RO RO RO

Non volatile memory

Persistent data

Swap

Figure 1 MMU-based cache and swap mechnisms on the 4KSc


Concerning RAM-resident pages, while the size of the RAM (7KB, here) is lesser
than what can be mapped by the TLB (32KB with 1KB pages). There is an impact
however of keeping these mappings only in the TLB (by locking them) since this
reduce of the number of entries usable for calculable mappings, which can lead to
excessive TLB refills. It should be noted however that bigger pages (4KB or 8KB)
could be envisioned for such mappings.
In the case of swapped pages, it is better to manage a dynamic mapping. A
calculated one could also be used as for the cache, but this means that according to the
NVM available for swapping, some pages could no longer be swapped. Besides,
keeping the mappings only in the TLB has the same impact than for RAM-resident
pages for the same reasons. Note however that the system can possibly manage a small
volatile page table (actually rather a software TLB like in the L4 micro-kernel [3]) if
necessary. Since the number of pages to map is relatively low, this approach will not
necessarily have a big impact on memory consumption.

MMU-based software cache and swap mechanisms for smart card operating systems 7

Of course, while the MMU simplifies the work of implementing a cache or a swap
and allows to provide our abstractions at the assembly level, this does not mean that
all the cost has been removed. Mappings still have to be established in the TLB,
transfers of data between volatile and non-volatile memories still need to be
performed, durability and atomicity of updates still need to be ensured, replacement
and grouping strategies still need to be implemented, etc. However, this cost is
essentially in term of processing time, not memory consumption (excepted for
atomicity but this is orthogonal to the fact of using an MMU or not).

Conclusion
With the emergence of the Java technology in the smart card context, the
requirements of Smart Card Operating Systems (SCOS) are getting more important
and we believe that this trend will continue. Fortunately, processor designers and
manufacturers are on the way of providing more and more advanced processors. The
new features present in processors like the HiperSmart may let SCOS designers hope
to benefit from techniques and research developed in more classical operating
systems.
In this presentation, we have pointed the fact that while the constraints specific to
smart cards still hold, the fears about the relevance of advanced hardware features is
not necessarily founded. We focused on the use of the MMU to implement software
caching and swapping and propose an implementation that suits the memory space
constraint. One could say that caching and swapping could be implemented all in
software. This is true but not with the same efficiency, compactness and flexibility.
One could also object that the MMU should be used for protection, which may
conflict with the way we propose to used it, but this does not necessarily hold in the
context of Java where protection is ensured another way.
It could be however interesting to see how different utilization may be possible in
parallel and how such a hardware like the MMU could be enhanced to better fit the
smart card context. For example, it could be interesting to see how reducing the size of
pages could help in implementing mechanisms such as locking, logging, patching,
stack overflow detection, and so on.
Anyway, the work done here make us believe that advanced hardware features in
general give new opportunities to SCOS designers and further to application designers
and consequently end consumers.
References
[1] MIPS32 4KSc Processor Core Family Software User's Manual, June 2001
[2] Philips HiperSmart specifications and documentation available at
http://www.semiconductors.philips.com/markets/identification/products/hipersmart/
[3] Liedtke, J., Elphinstone, K.: Guarded Page Tables on Mips R4600 Or An Exercise
in Architecture-Dependent Micro Optimization. ACM Operating Systems Review,
30(1):4-15, January 1996.

You might also like