You are on page 1of 215

Real-time Programming in RTCore

FSMLabs, Inc. Copyright Finite State Machine Labs Inc. 2001-2003

All rights reserved.

13th January 2004


1 Introduction 9
1.1 Some background . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2 How the book works . . . . . . . . . . . . . . . . . . . . . . . 11

I RTCore Basics 13
2 Introductory Examples 15
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Using RTCore . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 Hello world . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.2 Multithreading . . . . . . . . . . . . . . . . . . . . . . 17
2.2.3 Basic communication . . . . . . . . . . . . . . . . . . . 18
2.2.4 Signalling and multithreading . . . . . . . . . . . . . . 21
2.3 Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Real-time Concepts and RTCore 25

3.1 RTOS kingdom/phylum/order . . . . . . . . . . . . . . . . . . 25
3.1.1 Non-real-time systems . . . . . . . . . . . . . . . . . . 25
3.1.2 Soft real-time . . . . . . . . . . . . . . . . . . . . . . . 26
3.1.3 Hard real-time . . . . . . . . . . . . . . . . . . . . . . 27
3.2 The RTOS design dilemma . . . . . . . . . . . . . . . . . . . . 28
3.2.1 Expand an RTOS . . . . . . . . . . . . . . . . . . . . . 28
3.2.2 Make a general purpose OS real-time capable . . . . . 29
3.2.3 The RTCore approach to the problem . . . . . . . . . . 30
3.3 Interrupt emulation . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3.1 Flow of control on interrupt . . . . . . . . . . . . . . . 31
3.3.2 Limits of interrupt emulation . . . . . . . . . . . . . . 32


3.4 Services Available to Real-Time Code . . . . . . . . . . . . . . 33

3.4.1 Memory management . . . . . . . . . . . . . . . . . . . 33
3.4.2 Networking - Ethernet and FireWire . . . . . . . . . . 34
3.4.3 Integration with other services . . . . . . . . . . . . . . 34
3.4.4 What’s next . . . . . . . . . . . . . . . . . . . . . . . . 35

4 The RTCore API 37

4.1 POSIX compliance . . . . . . . . . . . . . . . . . . . . . . . . 37
4.1.1 The POSIX PSE 51 standard . . . . . . . . . . . . . . 38
4.1.2 Roadmap to future API development . . . . . . . . . . 38
4.2 POSIX threading functions . . . . . . . . . . . . . . . . . . . . 38
4.2.1 Thread creation . . . . . . . . . . . . . . . . . . . . . . 39
4.2.2 Thread joining . . . . . . . . . . . . . . . . . . . . . . 41
4.2.3 Thread destruction . . . . . . . . . . . . . . . . . . . . 42
4.2.4 Thread management . . . . . . . . . . . . . . . . . . . 42
4.2.5 Thread attribute functions . . . . . . . . . . . . . . . . 43
4.3 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.3.1 POSIX spinlocks . . . . . . . . . . . . . . . . . . . . . 44
4.3.2 Comments on SMP safe/unsafe functions . . . . . . . . 45
4.3.3 Asynchronously unsafe functions . . . . . . . . . . . . 45
4.3.4 Cancel handlers . . . . . . . . . . . . . . . . . . . . . . 46
4.4 Mutexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.1 Locking and unlocking mutexes . . . . . . . . . . . . . 48
4.4.2 Mutex creation and destruction . . . . . . . . . . . . . 49
4.4.3 Mutex attributes . . . . . . . . . . . . . . . . . . . . . 50
4.5 Conditional variables . . . . . . . . . . . . . . . . . . . . . . . 50
4.5.1 Creation and destruction . . . . . . . . . . . . . . . . . 51
4.5.2 Condition waiting and signalling . . . . . . . . . . . . . 51
4.5.3 Condition variable attribute calls . . . . . . . . . . . . 52
4.6 Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.6.1 Creation and destruction . . . . . . . . . . . . . . . . . 53
4.6.2 Semaphore usage calls . . . . . . . . . . . . . . . . . . 53
4.6.3 Semaphores and Priority . . . . . . . . . . . . . . . . . 54
4.7 Clock management . . . . . . . . . . . . . . . . . . . . . . . . 54
4.8 Extensions to POSIX (* np()) . . . . . . . . . . . . . . . . . . 55
4.8.1 Advance timer . . . . . . . . . . . . . . . . . . . . . . . 55
4.8.2 CPU affinity calls . . . . . . . . . . . . . . . . . . . . . 57
4.8.3 Enabling FPU access . . . . . . . . . . . . . . . . . . . 57

4.8.4 CPU reservation . . . . . . . . . . . . . . . . . . . . . 57

4.8.5 Concept of the extensions . . . . . . . . . . . . . . . . 58
4.9 ”Pure POSIX” - writing code without the extensions . . . . . 59
4.10 The RTCore API and communication models . . . . . . . . . 59

5 More concepts 61
5.1 Copying synchronization objects . . . . . . . . . . . . . . . . . 61
5.2 API Namespace . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.3 Resource cleanup . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Deadlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.5 Synchronization-induced priority inversion . . . . . . . . . . . 63
5.6 Memory management . . . . . . . . . . . . . . . . . . . . . . . 63
5.7 Synchronization . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.7.1 Methods and safety . . . . . . . . . . . . . . . . . . . . 64
5.7.2 One-way queues . . . . . . . . . . . . . . . . . . . . . . 65
5.7.3 Atomic operations . . . . . . . . . . . . . . . . . . . . 70

6 Communication between RTCore and the GPOS 73

6.1 printf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 rtl printf() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3 Real-time FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.3.1 Using FIFOs from within RTCore . . . . . . . . . . . . 74
6.3.2 Using FIFOs from the GPOS . . . . . . . . . . . . . . 75
6.3.3 A simple example . . . . . . . . . . . . . . . . . . . . . 75
6.3.4 FIFO allocation . . . . . . . . . . . . . . . . . . . . . . 78
6.3.5 Limitations . . . . . . . . . . . . . . . . . . . . . . . . 79
6.4 Shared memory . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.4.1 mmap() . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.4.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4.3 Limitations . . . . . . . . . . . . . . . . . . . . . . . . 86
6.5 Soft interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
6.5.1 The API . . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.5.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . 89

7 Debugging in RTCore 93
7.1 Enabling the debugger . . . . . . . . . . . . . . . . . . . . . . 93
7.2 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.3 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7.3.1 Overhead . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.3.2 Remote debugging . . . . . . . . . . . . . . . . . . . . 99
7.3.3 Safely stopping faulted applications . . . . . . . . . . . 99
7.3.4 GDB notes . . . . . . . . . . . . . . . . . . . . . . . . 100

8 Tracing in RTCore 101

8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2 Basic Usage of the Tracer . . . . . . . . . . . . . . . . . . . . 102
8.3 POSIX events . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

9 IRQ Control 105

9.1 Interrupt handler control . . . . . . . . . . . . . . . . . . . . . 105
9.1.1 Requesting an IRQ . . . . . . . . . . . . . . . . . . . . 105
9.1.2 Releasing an IRQ . . . . . . . . . . . . . . . . . . . . . 106
9.1.3 Pending an IRQ . . . . . . . . . . . . . . . . . . . . . . 106
9.1.4 A basic example . . . . . . . . . . . . . . . . . . . . . . 106
9.1.5 Specifics when on NetBSD . . . . . . . . . . . . . . . . 108
9.2 IRQ state control . . . . . . . . . . . . . . . . . . . . . . . . . 109
9.2.1 Disabling and enabling all interrupts . . . . . . . . . . 109
9.3 Spinlocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

10 Writing Device Drivers 113

10.1 Real-time FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . 113
10.2 POSIX files . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
10.2.1 Error values . . . . . . . . . . . . . . . . . . . . . . . . 117
10.2.2 File operations . . . . . . . . . . . . . . . . . . . . . . 117
10.3 Reference counting . . . . . . . . . . . . . . . . . . . . . . . . 118
10.3.1 Reference counting and userspace . . . . . . . . . . . . 119

II RTLinux
Professional Components 121
11 Real-time Networking 123
11.1 Introduction and basic concepts . . . . . . . . . . . . . . . . . 123

12 PSDD 125
12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
12.2 Hello, world with PSDD . . . . . . . . . . . . . . . . . . . . . 125
12.3 Building and running PSDD programs . . . . . . . . . . . . . 127

12.4 Programming with PSDD . . . . . . . . . . . . . . . . . . . . 127

12.5 Standard Initialization and Cleanup . . . . . . . . . . . . . . . 129
12.6 Input and Output . . . . . . . . . . . . . . . . . . . . . . . . . 129
12.7 Example: User-space PC speaker driver . . . . . . . . . . . . . 132
12.8 Safety Considerations . . . . . . . . . . . . . . . . . . . . . . . 134
12.9 Debugging PSDD Applications . . . . . . . . . . . . . . . . . 134
12.10PSDD API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
12.11Frame Scheduler . . . . . . . . . . . . . . . . . . . . . . . . . 138
12.11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . 138
12.11.2 Command-line interface to the scheduler . . . . . . . . 139
12.11.3 Building Frame Scheduler Programs . . . . . . . . . . . 141
12.11.4 Running Frame Scheduler Programs . . . . . . . . . . . 141
12.12Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142

13 The Controls Kit (CKit) 143

13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
13.2 CKit by Example . . . . . . . . . . . . . . . . . . . . . . . . . 144
13.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . 144
13.2.2 Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
13.2.3 Running and Stopping the Program . . . . . . . . . . . 150
13.3 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

III Appendices 155

A List of abbreviations 157

B Terminology 161

C Familiarizing with RTLinuxPro 173

C.1 Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
C.1.1 Self and cross-hosted development . . . . . . . . . . . . 174
C.2 Loading and unloading RTCore . . . . . . . . . . . . . . . . . 174
C.2.1 Running the examples . . . . . . . . . . . . . . . . . . 175
C.3 Using the root filesystem . . . . . . . . . . . . . . . . . . . . . 175
C.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177

D Important system commands 179


E Things to Consider 185

E.0.1 System Management Interrupts (SMIs) . . . . . . . . . 185
E.0.2 Drivers that have hard coded cli/sti . . . . . . . . . . 186
E.0.3 Power management (APM) . . . . . . . . . . . . . . . 186
E.0.4 Hardware platforms . . . . . . . . . . . . . . . . . . . . 186
E.0.5 Floppy drives . . . . . . . . . . . . . . . . . . . . . . . 187
E.0.6 ISA devices . . . . . . . . . . . . . . . . . . . . . . . . 187
E.0.7 DAQ cards . . . . . . . . . . . . . . . . . . . . . . . . 188

F System Testing 189

F.1 Running the regression test . . . . . . . . . . . . . . . . . . . 189
F.1.1 Stress testing . . . . . . . . . . . . . . . . . . . . . . . 190
F.2 Jitter measurement . . . . . . . . . . . . . . . . . . . . . . . . 191

G Sample programs 193

G.1 Hello world . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
G.2 Multithreading . . . . . . . . . . . . . . . . . . . . . . . . . . 193
G.3 FIFOs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194
G.3.1 Real-time component . . . . . . . . . . . . . . . . . . . 194
G.3.2 Userspace component . . . . . . . . . . . . . . . . . . . 196
G.4 Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
G.5 Shared Memory . . . . . . . . . . . . . . . . . . . . . . . . . . 198
G.5.1 Real-time component . . . . . . . . . . . . . . . . . . . 198
G.5.2 Userspace application . . . . . . . . . . . . . . . . . . . 201
G.6 Cancel Handlers . . . . . . . . . . . . . . . . . . . . . . . . . . 202
G.7 Thread API . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
G.8 One Way queues . . . . . . . . . . . . . . . . . . . . . . . . . 204
G.9 Soft IRQs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
G.10 PSDD sound speaker driver . . . . . . . . . . . . . . . . . . . 207

H The RTLinux Whitepaper 213

Chapter 1


Real-time software is needed to run multimedia systems, telescopes, machine

tools, robots, communication devices, and many other kinds of systems. The
RTCore hard real-time operating system has been used to control the me-
chanical animals in the movie Dr. Doolittle, perform jet engine testing for
the Joint Strike Fighter, aim the telescope at Kitt Peak, run flight simula-
tors, collect weather data for NASA, balance magnetic bearings, milk cows,
control Fujitsu’s humanoid robot, and more. The system is flexible enough
that for one customer, it can control an engine, while for another it just as
easily mimics the human hand playing a violin.
RTCore is designed to make real-time programming more convenient and
less mysterious. Real-time programming is still pretty challenging, but once
you start to understand the basic ideas, if you have some C and UNIX back-
ground, programming RTCore applications should be, if not simple, at least
feasible. In this book we will cover the basic principles of the OS and general
real-time programming, stressing examples and practical methods as much
as possible.
RTCore follows the UNIX philosophy of making it convenient to build
complex applications by connecting existing pieces of software. One way to
think of RTCore is as a small operating system that runs a second operating
system as its lowest priority task. All the non-time-critical applications can
be put in the second operating system. For most programmers it’s probably
more useful to think of RTCore as a special real-time process that runs
within a non-real-time operating system. It schedules itself and can always
pre-empt both the operating system and any applications. The non-real-time
operating system is usually Linux (RTLinux), although it can also be BSD


UNIX (RTCore/BSD) or even a Java VM. 1

The real-time system is multi-threaded using the POSIX threads API.
Real-time applications are written as threads and signal handlers that can
be installed in the real-time process. These threads and signal handlers can
be scheduled with great precision and can respond to interrupts with very
low latency.
On a standard 1.2GHz Athlon, an RTCore interrupt handler runs within
9 microseconds of the assertion of a hardware interrupt, under heavy load.
So if we have a device that generates an interrupt when the temperature
gets too high, at the worst case, the signal handler connected to that device
will start running 9 microseconds after the interrupt is generated. On the
same system, a periodic thread scheduled to run every millisecond will run
at most 13 microseconds late after the sum of interrupt latency, scheduling
overhead, and context switch. So if we have a data acquisition device we can
poll it at a regular rate, and know that the polling thread starts up within 13
microseconds of the scheduled time2 . RTCore is a hard real-time system, so
these are absolute worst-case times, not average or ”typical” times. Be wary
of tests that demonstrate other approaches - many are done against quiescent
systems, for short periods of time, or quote numbers that have no relevance
to real situations, and are in no way indicative of real-world results. We will
be discussing what numbers to look for in later examples.
Of course, all this speed does no good if programming the system is too
complicated. So we have designed RTCore to meet two goals. First, the time
critical software can be written in the familiar and well documented POSIX
threads/signals API. And second, it’s pretty easy to put non-time-critical
software into the application operating system. Our favorite example of how
to write a data logging program makes use of a single line of shell script on
the UNIX side:

./rtcore app > mylogfile

This runs the real-time application and logs output to a non-real-time Linux
file. For those who have used UNIX at all, this should look very familiar.
Generally, we use the term ”GPOS”, or General Purpose Operating System to gener-
ically refer to the non-real-time system. The RTCore API and behavior remain the same
regardless of which GPOS is being used.
In later chapters, we will see how to reduce this down to 0 microseconds, bypassing
hardware jitter.

This book starts with some background and simple examples and then
takes a detour for an in-depth introduction to the basic concepts of RTCore
and an overview of the API. Next, the available communication models for
exchanging data between real-time threads and the non-real-time domain are
presented. The sample programs then use these mechanisms to show how
to apply them to simple problems. These chapters are devoted to stepping
through these programs, making every step as clear as possible, and require
little prior knowledge. Following that, several chapters are devoted to the
more advanced features of RTLinux Professional, or RTLinuxPro. After
having covered the basic concepts in a few sample programs, we then provide
a basic model for writing real-time drivers.

1.1 Some background

RTLinux began as a research project in 1995, to investigate a simple method
of providing hard real-time services within the context of a general purpose
operating system. Soon after, it began to be used in a variety of domains.
FSMLabs was formed to provide a dedicated effort to improving the technol-
ogy, and to provide top tier support for commercial users of the product.
RTLinuxPro was developed out of this effort, and is licensed for commer-
cial use. FSMLabs continues to move the technology forward via RTLin-
uxPro and the RTCore OS, having dedicated many man years to providing
a solid and integrated hard real-time component for commercial customers.
The RTLinuxFree project, based on the GPL-released project, is commu-
nity supported and developed. FSMLabs continues to provide the necessary
resources to support the RTLinuxFree community.

1.2 How the book works

The main body of each chapter discusses the principles of how the software
examples work. In each chapter, side notes describe how to implement the
examples or test behavior in RTLinuxPro. There is an appendix with a basic
usage guide for RTLinuxPro. As mentioned, the RTCore OS can run different
non-real-time operating systems, but Linux and BSD UNIX will generally be
referred to by default. Ports to other operating systems are in development.
The target audience for this book is the engineer who is interested in

learning how to write real-time applications using the RTCore OS. The book
focuses on getting the user spun up on each facet quickly so they can become
productive quickly, rather than trying to intuit facts from scattered sources.
Experience in developing real-time applications is helpful but not necessary,
as RTCore uses the standard POSIX API. Users with some knowledge of
POSIX and UNIX should feel right at home.
The full sources of the programs referenced here can be found in Appendix
G and are provided with the RTLinuxPro development kit.
Part I

RTCore Basics

Chapter 2

Introductory Examples

2.1 Introduction
The RTCore OS is a small, hard real-time operating system that can run
Linux or BSD UNIX as an application server. This allows a standard oper-
ating system to be used as a component of a real-time application. In this
part, we will provide an overview of RTCore capabilities, introducing basic
concepts, the API, and some of the add-on components. This book starts
assuming you have already installed RTLinuxPro, RTCore/BSD or RTLin-
uxFree - refer to the installation instructions that came with your package
for details. This chapter will assume an RTLinuxPro environment, but the
procedures apply equally to a BSD host.

2.2 Using RTCore

RTCore extends the UNIX “design with components” philosophy to real-
time. A typical RTCore application consists of one of more real-time com-
ponents that run under the direct control of the real-time kernel, and a set
of non-real-time components that run as user-space programs. Let’s start off
with a couple of simple programs.
At this point, we assume some very basic familiarity with RTCore con-
cepts as discussed so far. If you would like a little more information to get
up to speed before continuing, please refer to the whitepaper in Appendix
H. Also provided is a basic guide to RTLinuxPro (C) to help you learn your
way around the system. Also, if you are working through these examples and


need more grounding, skip ahead to Chapter 3 for a little more background
For this example, you will need to have the core RTCore OS loaded
as described in the appropriate appendix, and we assume that your current
working directory as the root user is the rtlinuxpro directory of RTLinuxPro
(Or the appropriate installation point for your RTLinuxFree installation). If
you don’t see the referenced files in the directory, type make to ensure that
everything is up to date. Now, on with the code:

2.2.1 Hello world

As with any other system, it makes sense to start things off with a simple
”hello world” application. This is no exception. The real-time merits of such
an application are dubious, but it does serve to show how simple the API is.
Without further ado, here is the standard introductory program:

#include <stdio.h>

int main(void)
printf("Hello from the RTL base system\n");
return 0;

Surprised? This is all that is involved - nothing more than what you
would see in a normal C introduction. Running the example (./hello.rtl)
forces the RTCore OS to load the application, and enter the main() context.
Here it prints a message out through standard I/O for the user to see, and
Those familiar with older RTLinux versions are used to these messages
silently appearing in the kernel’s ring buffer, but now they print through
stdout just like any other application. Also, there is a standard printf(),
rather than the rtl printf() some users have seen. This printf() is fully
capable, and can handle any format that a normal printf() can handle.
Once the message has been printed, the program exits, RTCore unloads
the application, and we’re done. Now, let’s move on to something a little bit
more useful.

2.2.2 Multithreading
If you’re familiar with POSIX threading, you’ll feel at home with RTCore.
If you’re not familiar with it, there are many solid references on the subject,
such as the O’Reilly book on Pthreads Programming. Let’s start with a
basic example of the pthread model here, with a task that operates on a 1
millisecond interval.

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <rtl posixio.h>

pthread t thread;

void *thread code(void *t)

struct timespec next; 10
int count = 0;

clock gettime( CLOCK REALTIME, &next );

while ( 1 ) {
timespec add ns( &next, 1000*1000 );
&next, NULL);
if (!(count % 1000)) 20
printf("woke %d times\n",count);

return NULL;

int main(void)
pthread create( &thread, NULL, thread code, (void *)0 );

rtl main wait();

pthread cancel( thread );

pthread join( thread, NULL );

return 0;

Again, everything starts with a normal main() function. A standard

thread is spawned right away (pthread attributes will be covered later), and
the code calls rtl main wait(). This is really a blocking function that allows
the application to stay suspended until otherwise shut down. For those of
you who have ever done graphical applications with a main event loop, the
same concept applies here.
If the application is killed (via CTRL-C or otherwise), the waiting call
will complete, and the rest of the function will cancel the thread, join its
resources, and return.
The thread itself is a hard real-time thread running under RTCore that
executes on an exact 1 millisecond period. It samples the current time ex-
actly, adds 1 millisecond to that value, and sleeps until that time hits. It
counts the number of wakeups, and prints a count every 1000 iterations.
(1000 printf() calls per second clutters the terminal pretty quickly.) This
thread will execute indefinitely, or until the application is actively unloaded.
Details follow later, but it is important to note that code in the main()
routine is inherently non-real-time. Any potentially non-real-time activity
should be done here, such as memory allocation and other initialization tasks.
(We’ll cover why memory allocation is a potentially non-real-time activity in
a later chapter.)

2.2.3 Basic communication

There needs to be some communication from one real-time thread to another,
and also between real-time threads and non-real-time threads, such as Linux
processes. Later chapters will discuss this in more detail, but here we’ll just
look at the simplest of mechanisms, the FIFO.
Real-time FIFOs are just like any other FIFO device - a producer (whether
it is a real-time thread or a userspace application) pushes data in, and a

consumer recieves it in the order it was submitted. Real-time FIFOs are

constructed such that real-time threads will never block on data submission
- they will always perform the write() and move on as quickly as possible.
This way real-time applications can never be stalled because of the FIFO’s
First, here is the real-time component:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <sys/mman.h>
#include <rtl posixio.h>

pthread t thread;
int fd1;

void *thread code(void *t) 10

struct timespec next;

clock gettime( CLOCK REALTIME, &next );

while ( 1 ) {
timespec add ns( &next, 1000*1000*1000 );


&next, NULL); 20

write( fd1, "a message\n", strlen("a message\n"));


return NULL;

int main(void)
mkfifo( "/communicator", 0666); 30

fd1 = open( "/communicator",


ftruncate(fd1, 16<<10);

pthread create( &thread, NULL, thread code, (void *)0 );

rtl main wait();

pthread cancel( thread );
pthread join( thread, NULL );

close( fd1 );
unlink( "/communicator" );

return 0;

This code starts up and creates the FIFO with standard POSIX calls.
mkfifo() creates the FIFO with permissions such that a device will appear
in the GPOS filesystem dynamically. We then open the file normally and
call ftruncate() to size it - this sets the ’depth’ of the FIFO.
A thread is spun, we wait to be killed, and the main code is done. Once
rtl main wait() completes, we need to close/unlink the FIFO in addition
to the thread cleanup, just like any normal file. RTCore will catch dangling
devices and clean them up for a user, but good programming practice is to
do the work right in the first place.
Our thread in this instance sleeps on a one second interval and writes
to the FIFO every time it wakes up. As before, it will do this indefinitely.
There are no real surprises here, so let’s look at the userspace code:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char **argv) {

int fd;
char buf[255];
fd = open("/communicator", O RDONLY);
while (1) {

Again, there should be no surprises here - this is a normal non-real-

time, userspace application. It opens the other end of the FIFO, and reads
periodically, getting the message from the other end. This could have been
some device data protocol from the RTOS, the userspace application could
write data up to the RTOS to direct thread execution, or FIFOs can be
used between real-time threads. Either way, they provide a simple file-based
means of exchanging data.

2.2.4 Signalling and multithreading

Communication between threads is also done via standard POSIX mecha-
nisms. Again, all of the different means are covered later, but here let’s look
at semaphores, which are a very convenient method of signalling between
threads. Here’s an example:

#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <rtl posixio.h>
#include <posix/semaphore.h>

pthread t wait thread;

pthread t post thread;
sem t sema;
void *wait code(void *t)

while (1) {
sem wait(&sema);
printf("Waiter woke on a post\n");

void *post code(void *t)

{ 20
struct timespec next;

clock gettime( CLOCK REALTIME, &next );

while ( 1 ) {
timespec add ns( &next, 1000*1000*1000 );

clock nanosleep( CLOCK REALTIME,

printf("Posting to the semaphore\n");
sem post(&sema);

return NULL;

int main(void)
sem init(&sema, 1, 0); 40

pthread create( &wait thread, NULL, wait code, (void *)0 );

pthread create( &post thread, NULL, post code, (void *)0 );

rtl main wait();

pthread cancel( post thread );

pthread join( post thread, NULL );

pthread cancel( wait thread ); 50

pthread join( wait thread, NULL );

sem destroy(&sema);

return 0;

Instead of a single thread, two are spun up, once the semaphore is ini-
tialized. One thread waits on the semaphore, while the other sleeps and
periodically performs the sem post() operation. Before the post occurs, and
after the waiter wakes, a message is printed to indicate the sequence of events.
Semaphores really are that easy - we’ll see how they can be used later on
to very easily handle synchronization problems.

2.3 Perspective
At this point, take a step back and look at what we’ve just covered. In a
short introduction, you’ve seen code that performs standard output, POSIX
threads, communication through real-time devices, and synchronization through
standard POSIX semaphores. None of it required much experience beyond
basic knowledge of C and POSIX, and a little bit of UNIX background. In
fact, these applications are no different than what you would see in a normal
C environment under another UNIX. The difference here is that you get hard
real-time response in your threads.
The point of this was to get you, the reader, handling useful code as
quickly as possible, easing the stigma surrounding real-time programming.
Now that you see that it doesn’t involve occult knowledge, we’ll step back
and take a broader view of RTCore, the API and some of the grounding
principles of real-time programming.
Chapter 3

Real-time Concepts and


You’ve now seen some basic RTCore code, and can see that real-time pro-
gramming isn’t as mystifying as it sounds. However, before we dive into
detailed coverage of the API, some basic concepts need to be demonstrated.
For those familiar with RTOS concepts, most of this chapter should be re-
view, but skimming is recommended as we will be explaining how RTCore
handles real-time problems.

3.1 RTOS kingdom/phylum/order

The definition of real-time varies greatly based on its use. Anything from
stock quotes to stepper motors can be said to be real-time. Within the
computing industry, real-time has many different meanings depending on
the requisite service level. Here is a simple breakdown of operating systems
in relation to real-time applicability.

3.1.1 Non-real-time systems

”Non-real-time” systems are the operating systems most often used. These
systems have no hard guarantees and are able to utilize optimization strate-
gies contradictory to real-time requirements, such as caching and buffering.
Non-real-time systems have the following characteristics:


• No guaranteed worst-case scheduling jitter. Under heavy system load,

the system may defer scheduling of a task as long as it deems necessary.
• No theoretical limit on interrupt response times. System load may
result in delayed interrupt response. Also, running with interrupts
disabled for considerable periods, while considered to be bad form, is
not catastrophic.
• No guarantee that an event will be handled. Varying the system load
affects the number of events it intercepts, such as interrupts.
• System response is strongly load-dependent. Tasks that take x amount
of time under one system load will take y amount of time under a differ-
ent load. Response prediction with any surety is generally impossible.
• System timing is a unmanaged resource. This means that timing data
is not considered important to system execution, and is not tracked
with precision.
Non-real-time systems are unpredictable even at a statistical level, as
system reaction is highly dependent on system load. Rough predictions can
be made if the error window is opened widely, but the results cannot be
proven to fall inside the predicted range.

3.1.2 Soft real-time

In cases where missing an event is not critical, as in a video application where
a missed frame or two is not fatal, a ”soft real-time” system may do. Such a
system is characterized by the following criteria:
• The system can guarantee a rough worst-case average jitter, but not
an absolute worst case scenario.
• Events may still be missed occasionally. This is better than the non
real-time system, as there is more control over response, but as the
absolute worst case is unknown, events such as interrupts may still be
lost. This may occur even when not in a worst case situation.
Soft real-time systems are statistically predictable for the average case,
but a single event can not be predicted reliably. Soft real-time systems are
generally not suited for handling mission-critical events.

3.1.3 Hard real-time

The following list of requirements define a hard real-time system. The ab-
sence of predictability for any one of these items disqualifies a system from
being a hard real-time system.

• System time is a managed resource. Timing resources are managed

with the highest possible level of precision.

• Guaranteed worst-case scheduling jitter. If a task needs to be happen

within a certain deviation, it is guaranteed to occur.

• Guaranteed maximum interrupt response time. As with scheduling

latency, interrupts are guaranteed to be acknowledged and handled
within a certain window.

• No real-time event is ever missed. This is important. Under no cir-

cumstances will a scheduled task not be run on time, an interrupt be
missed, or any other event the real-time code is interested in.

• System response is load-independent. Execution of real-time tasks is

guaranteed to fall within the worst case value range, regardless of the
system load factor. A thrashing databaseprocess will not delay move-
ment of a robotic arm.

A system that can fulfill these criteria is fully deterministic and considered
to be ”hard” real-time. Of course, there are varying levels of service, as some
hard real-time systems might have a worst case jitter of 2 seconds, while
others provide 25 microseconds. Both qualify according to the definition, but
only one is usable for a wide range of applications. The RTCore approach
qualifies on all of these counts, and the response time tends to be near the
limits of the underlying hardware.
Hard real-time systems will generally have slightly lower average perfor-
mance than soft real-time systems, which in turn are generally not as efficient
with resources as non-real-time systems. This is because non-real-time sys-
tems are concerned with throughput - if an Ethernet transfer is delayed a
little in order to burst out several disk transfers, this results in higher system
output, and has no significant repercussions in a non-real-time environment.
In a hard real-time system, not performing this optimization results in lower
overall throughput, but it maintains determinism. This determinism is what

makes the difference between getting your task done without fail and doing
a ”best effort” based on available system resources.

3.2 The RTOS design dilemma

The fundamental problem of an RTOS is that users have conflicting demands
with respect to system design. On one hand, an RTOS should obviously be
capable of real-time operations. On the other hand, users want access to
the same rich feature sets found in general-purpose operating systems which
run on desktop PCs and workstations. To resolve this dilemma, two general
concepts have traditionally been used.

3.2.1 Expand an RTOS

Design guidelines for an RTOS include the following: It needs to be compact,
predictable and efficient; it should not need to manage an excessive number
of resources, and it should not be dependent on any dynamically allocated
resources. If one expands a compact RTOS to incorporate the features of
typical desktop systems, it is hard (if not impossible) to fulfill the demands
of the core RTOS. Problems that arise from this approach include:

• The OS becomes very complex. This makes it difficult to ensure de-

terminism, since ALL core capabilities must be fully preemptive. This
means that all developers must now take into account every possible
real-time demand in addition to solving problems in their specific do-

• Drivers for hardware become very complex. Since priority inversion

must not occur, drivers must be able to handle situations in which
they are not being serviced. Again, this forces all developers to deal
with additional possibilities outside their domain.

• Since the core system is an RTOS, the vast amount of available software
cannot (in most cases) be used without modification or at least signifi-
cant analysis with respect to real-time demands. It is almost impossible
to determine interactions between the software and the RTOS.

• Many mechanisms for efficiency, like caching and queuing, become

problematic. This prohibits usage of many typical optimization strate-
gies for the non-real-time applications in the system.

• Maintenance costs of such a system are considerable for both developers

and customers. Since every component of the system can influence
the entire system’s behavior, it is very hard to evaluate updates and
modifications with respect to how they will influence real-time behavior
of the rest of the system. Engineering costs skyrocket as reliability
becomes questionable.

3.2.2 Make a general purpose OS real-time capable

The seemingly natural alternative strategy would be to add real-time capa-
bilities to a general purpose OS. In practice, this approach meets constraints
similar to those noted above, as both are converging on the same idea from
different directions. Problems that arise with such an approach include:

• General purpose operating systems are event-driven, not time-triggered.

• General Purpose OSs are (generally) not fully preemptive systems.

Making them fully preemptive requires modifications to all hardware
drivers and to all resource handling code. For a constantly evolving
system, tracking these modifications is prohibitive from both a man-
power perspective and becomes even more difficult as the OS is patched
and modified in the field.

• Lack of built-in high-resolution timing functions entail substantial sys-

tem modification.

• Modifying existing applications to be preemptive is very costly and


• The use of modified applications would also greatly increase mainte-

nance costs.

• Optimization strategies used in general purpose OSes can contradict

the real-time requirements. For example, removing all caching and
queueing from an OS would substantially degrade performance in areas
where there are no real-time demands.

• Because such systems are very complex (and often not well-documented),
it is extremely difficult to reliably achieve full preemption, especially
without performance degradation in some usage scenarios. Add in the
fact that the system is constantly developing, and the problem worsens.

General purpose operating systems are efficient with resources. Because

they don’t manage time as an explicit resource, trying to modify the system
to do so violates many of its design goals, and causes components to be used
in ways they were never designed for. This is in principle a bad strategy,
especially when there are many developers, all with different visions of what
the exact behavior of the machine should be.

3.2.3 The RTCore approach to the problem

To resolve these conflicting demands, a simple solution has been developed.
RTCore splits the OS entirely, so that one kernel (Linux or BSD UNIX) runs
as a general purpose system (GPOS) with no hard real-time capabilities but
with a large capability set, and a second kernel (RTCore), designed around
real-time capabilities efficiently handles real-time work. The real-time kernel
allows the GPOS to run when there are no real-time demands. This approach
allows the non-real-time side of the OS to provide all the capabilities that
desktop users are used to, while the real-time side can be kept small, fast,
and deterministic.
Three major attributes make RTCore work:

• It disables all hardware interrupts in the GPOS.

• It provides interrupts via interrupt emulation.

• It runs full featured non-real-time Linux (or BSD) as the lowest priority
task. It is the ”idle task” of the RTOS, meaning that it is run whenever
the real-time system has nothing else to execute.

3.3 Interrupt emulation

The main problem in adding hard real-time capabilities to a general purpose
operating system is that the disabling of interrupts is widely used in the
kernel for synchronization purposes. The strategy of disabling interrupts in

critical code sequences (as opposed to using synchronization mechanisms like

semaphores or mutexes), is quite efficient. It also makes code simpler, since
it does not need to be designed to be reentrant. But disabling of interrupts
for long periods results in lost events.
To maintain the structure of the GPOS kernel while providing real-time
capabilities, one must provide an ”interrupt interface” that gives full control
over interrupts, but at the same time appears to the rest of the system like
regular hardware interrupts. This interrupt interface is essentially an inter-
rupt emulation layer, and is one of the core concepts in RTCore. Interrupt
emulation is achieved by replacing all occurrences of sti and cli with emu-
lation code. This introduces a software layer between the hardware interrupt
controller and the GPOS kernel, allowing the real-time kernel to handle in-
terrupts as needed by real-time code, but still allowing the general purpose
OS to handle them if there is a need.
Interrupts that are not destined for a real-time task must be passed on to
the GPOS kernel for proper handling when there is time to deal with them.
In other words, RTCore has full control over the hardware and non-real-time
GPOS sees soft interrupts, not the ”real” interrupts. Hardware interrupt
interaction is simply emulated in the GPOS. This means that there is no
need to recode GPOS drivers, provided there are no hard-coded instructions
in binary-only drivers that bypass the emulation. (See E.0.2 for details.)

3.3.1 Flow of control on interrupt

What happens when an interrupt occurs in RTCore? The following pseu-
docode shows how RTCore handles such an event.

if (there is an RT-handler for the interrupt) {

call the RT-handler
if (there is a GPOS-handler for the interrupt) {
call the GPOS handler for this interrupt
} else {
mark the interrupt as pending

This pseudocode represents the priority introduced by the emulation layer

between hardware and the GPOS kernel. If there is a real-time handler

available, it is called. After this handler is processed, the GPOS handler is

called. This calling of the GPOS handler is done indirectly: it runs as the
idle task of the RTCore kernel, so the GPOS handler will be called as soon as
there is time to do so, but a GPOS interrupt handler cannot block RTCore.
That is, the interrupt handler for the GPOS is called from within the GPOS,
not from RTCore. If the interrupt is deferred to the GPOS and its interrupt
handler is executing when a real-time task must run, the real-time kernel will
suspend it’s execution, and the real-time code will execute as needed.

3.3.2 Limits of interrupt emulation

Interrupt emulation does have limits. Even for non-realtime interrupts, the
system must take the time to acknowledge the interrupt controller and record
the fact that the interrupt has happened. The hardware interrupts have pri-
ority over the real-time tasks, and so a GPOS hardware interrupt may disturb
the real-time scheduling. Fortunately, the actual code is well optimized and
has very little impact even on older platforms. Also, the system works in
such a way that a particular GPOS interrupt may not preempt real-time
system more often than once a period of RT activity. Therefore, the worst
case scheduling jitter that can be attributed to the non-realtime hardware in-
terrupts is bounded by the number of such interrupts that can be received by
the current CPU times the maximum time to acknowledge an interrupt and
record the interrupt occurence. CPU reservation facility (see Section 4.8.4)
can eliminate this and minimize other sources of scheduling jitter, making
for excellent real-time performance on SMP systems. The RTCore advance
timer option (Section 4.8.1) may be used to improve performance on both
uniprocessor and multiprocessor systems.
Since non-realtime activity may have effect on worst-case timings in the
system (e.g. ping flooding a system while running a critical real-time task
may shift its timings), to test a system’s worst-case scheduling jitter and
interrupt response time, testing should be done under the worst possible
In later chapters, and Appendix F, we will cover some basic testing envi-
ronments you can use to stress your hardware.

3.4 Services Available to Real-Time Code

Code run in the RTCore real-time kernel does not exist in a vacuum. Services
are available to real-time code, although applicability may vary depending
on system configuration, real-time demands, and RTLinuxPro components
available. In later chapters, we will cover these components in detail, but a
few notes are in order to aid in the understanding of the examples.

3.4.1 Memory management

Strictly speaking, there is no memory management from within RTCore. The
reason for this is that memory allocation is difficult to manage determinis-
tically with respect to real-time demands. Simple memory allocators can be
written for known usage patterns, and some users have written basic systems
for their own applications. When applied to the generic case, however, the
problem becomes difficult to provably handle.
There are alternatives for real-time code:

• Allocate memory in intialization code, during execution of main(). As

this is in the startup context, not in a real-time thread, interrupt han-
dler, or otherwise, it is perfectly safe. Also, the memory could be
declared as static to the module.

• Soft-IRQs: We will discuss this in detail later, but this approach in-
volves creating a virtual IRQ that is visible to the GPOS. Using this
IRQ, real-time code can signal to the non-real-time system that it needs
memory, and a handler on the other side safely takes care of the possible
blocking when allocating the memory. When this operation completes,
a signal is sent back to the RTCore code. There may be any amount
of delay before this handler gets to do the work, though.

• Simple memory allocators/deallocators. As mentioned previously, it

is possible to write a deterministic memory allocator if you know the
usage patterns of the code that will need it.

This point is important, and should be considered carefully. Many users

need to do work involving memory allocation, and don’t understand when it is
safe to do what. Whenever possible, perform allocations within main(), along
with any potentially blocking kernel calls, such as PCI device initialization.

This also includes RTCore calls such as ftruncate(fd, size); on FIFOs

and shared memory, which involves a kernel memory allocation to create
space for device data. Put simply, you cannot safely perform allocations
from within real-time code, including threads and interrupt handlers. Calls
chaining directly from main() are safe to allocate from.
Later on, in Chapter 4, we will demonstrate how to use preallocated
memory in order to spawn new threads from within real-time code. This
involves exactly what we describe here - allocating a block of memory before
you enter the real-time system, and then using it safely later on, when in the
context of real-time code.

3.4.2 Networking - Ethernet and FireWire

RTLinuxPro offers a component called Light Net (LNet) allowing real-time
networking, from raw packets up to UDP, over Ethernet or Firewire. This
allows one to easily create and send raw packets destined for the network, for
hard real-time communication with other machines. Both transport mediums
are in heavy field use by FSMLabs’ customers.
Of course, you can still interact with other machines without LNet, but
the networking stacks are all dependent on the GPOS. The traditional path
is for data to be collected by real-time code, pushed over a FIFO or shared
memory to userspace, which then does any packaging work and pushes it
through the network stack via a socket. With LNet, your real-time data
can be collected and dumped to the hardware through a zero-copy interface
immediately, allowing deterministic network transfers between machines, and
saving the trouble of going to userspace and back through kernelspace. Later
chapters will cover this in detail, but for now begin to consider the idea
that individual real-time systems do not have to operate without real-time
assistance from other processing nodes.

3.4.3 Integration with other services

As we will cover in a later chapter, the Controls Kit offers a means of inte-
grating low-level real-time systems with the rest of your organization. Com-
ponents that use the Controls Kit can be directed through web interfaces,
Excel spreadsheets, and other systems. This simplifies integration with ex-
isting infrastructure - for example, now it’s easy to let your Oracle database

retain statistical information on how your machine floor devices are doing.
Later on, we’ll cover the capabilities of this package in detail.

3.4.4 What’s next

At this point, let’s shift gears and cover the RTCore programming API. As it
is POSIX-based, it provides few surprises, but it needs some coverage before
diving into more advanced topics and techniques. We recommend at least
skimming the API sections even if you are familiar with POSIX, as there are
some areas that RTCore’s API covers but POSIX does not handle. After this
chapter, there will be many more examples and techniques for more advanced
Chapter 4

The RTCore API

The RTCore API is POSIX-based with some extensions. The development

of the API continues to evolve to reflect new needs in the industry, but
compatibility with previous releases is provided. Current efforts include con-
tinued POSIX compliance, along with some extensions to cover needs either
not mentioned by POSIX, or not sufficiently addressed in current standards

4.1 POSIX compliance

To ease the real-time learning curve, FSMLabs long ago moved RTLinux
(and thus RTCore) to a POSIX-compliant API. When learning a real-time
system, most developers have a solid programming background, and simply
need to adjust to the specific API set provided by the RTOS. With RTCore,
this adjustment comes for free, as code under RTCore looks familiar to just
about anyone who has used a UNIX.
It should be noted that the POSIX standard has evolved and will continue
to do so, but in a controlled manner. FSMLabs will continue to maintain
POSIX compliance in light of new developments. Existing POSIX-based
systems are easily moved to RTCore, although source-code compatibility with
other RTOSs should not be expected. Source compatibility is provided when
moving between RTLinuxPro and RTCore/BSD, as both use the POSIX
RTCore provides POSIX extensions when needed (indicated by an np
in the name) to implement features that fall outside of the POSIX domain.


These are mainly relegated to performance improvements in areas such as

SMP where POSIX does not provide full guidance. Some of these may not
be an option for those in strict development environments, but it is up to
the programmer to determine the best approach.

4.1.1 The POSIX PSE 51 standard

The guiding standard for the RTCore API was POSIX PSE 51, a minimum
set of POSIX threading functions for real-time and embedded systems. Pro-
grammers that have learned the various pthread *() calls for normal thread-
ing and synchronization will have the same function set they are used to. The
major shift involved will be to keep in mind the contraints of timing-specific
real-time code, such as scheduling, minimalism, and other real-time-specific
demands, but the programmer will not be burdened with learning a new API.

4.1.2 Roadmap to future API development

The RTCore OS will continue to follow the POSIX standard in order to
maintain a proper model for the developer community. At the same time,
there will continue to be a need for extensions, as POSIX does not cover
all of the possible industry needs. Sometimes these ideas are moved into
later versions of the POSIX standard. Some are specific to a certain system
configuration, such as SMP systems, where CPU affinity calls are needed for
performance reasons. In some cases, extensions are added in order to simplify
work that could be done with standard calls. These extensions are presented
as an option that may facilitate development, but most work revolves around
the POSIX calls.

4.2 POSIX threading functions

Here we present the POSIX functions available from RTCore, a brief descrip-
tion of what they do, and some notes with respect to real-time usage. These
calls are used throughout the examples in the book, and you should be able
to get a good practical grasp of their usage from these. For specific notes,
refer to the man pages, provided in various forms with RTLinuxPro 1
For a full description of the POSIX threading API concepts and usage, refer to the
O’Reilly book on PThreads Programming or the POSIX standard directly.

4.2.1 Thread creation

int pthread_create(pthread_t *thread, pthread_attr_t *attr,
void *(*start_routine)(void *), void *arg);

This will create a thread whose handle is stored in *thread. The thread’s
execution will begin in the start routine() function with the argument
arg. Attributes controlling the thread are specified by attr, and will use the
default values and create a stack internally if this value is NULL.
Note that pthread create() calls are generally limited to being within
the intialization context of main(). If the call is needed during normal real-
time operation, threads can be created with preallocated stack space. Other-
wise, calling pthread create() from another real-time thread would at the
worst cause deadlock, and at best delay the first real-time thread an unknown
amount while memory is allocated for the stack.
There is an attribute function (pthread attr setstackattr()) that al-
lows a thread to be prepared with a preallocated stack for operation. Let’s
look at a simple example:

#include <rtl.h>
#include <time.h>
#include <pthread.h>
#include <stdio.h>

pthread t thread1, thread2;

void *thread stack;

void *handler(void *arg)

{ 10
printf("Thread %d started\n",arg);
if (arg == 0) { //first thread spawns the second
pthread attr t attr;
pthread attr init(&attr);
pthread attr setstacksize(&attr, 32768);
pthread attr setstackaddr(&attr,thread stack);
pthread create(&thread2,&attr,handler,(void*)1);

return 0; 20

int main(int argc, char **argv) {

thread stack = rtl gpos malloc(32768);

if (!thread stack)
return −1;

pthread create(&thread1, NULL, handler, (void*)0);

rtl main wait();

pthread cancel(thread1);
pthread join(thread1, NULL);
pthread cancel(thread2);
pthread join(thread2, NULL);
rtl gpos free(thread stack);

This again demonstrates the point that anything outside of the main()
call cannot directly allocate memory. Instead, we allocate a stack with
rtl gpos malloc()2 in main(), where it is safe to block while the system
handles any work, such as defragmentation, associated with the allocation.
This must be allocated, as on some architectures a global static value may
not be a safe place to store the stack of a running thread.
Next, a real-time thread is spawned. Within the handler function, it
initializes an attribute and configures it to use our preallocated area for the
stack. Finally, we spawn the thread and execution occurs just as you would
expect POSIX calls to behave, with the exception being that the stack is
already present. Note: A thread created with pthread create() is not
guaranteed to be started when the call returns, it is just slated for scheduling.
Note that thread stacks in RTCore are static, and will not grow as needed
depending on call sequence. Users need to make sure that they create enough
stack space for the thread, and prevent too many large structures from being
placed on the stack. In a system that allows for dynamic memory manage-
ment and the possible delays incurred by doing so, stacks can dynamically
rtl gpos malloc() uses the correct malloc() available on the host GPOS.

grow as the application needs space. Under RTCore, growing the stack would
require the program to wait while proper memory is found, possibly destroy-
ing real-time performance. Instead, the stack is allocated at thread creation
and does not grow.
This stack can generally be only a couple dozen kilobytes in size, but
users with large data structures in function contexts need to understand
that these structures can soak up available stack space very quickly, causing
an overflow. If a thread has a 20K stack, and calls a function 3 times in
sequence, with a local structure of 7K per invocation, an overflow will occur.
Smaller structures should be used, or large structures should be kept off the
stack, or the thread’s stack should be enlarged to compensate.
Lastly, RTCore uses sizeof(struct rtl thread struct) bytes on the
bottom of the stack for thread information. For nearly all users, this differ-
ence is negligible, and is not likely to be noticed. But if you need to manage
your stack down to the last byte, it is recommended that you allocate your
needed stack size plus the size of the structure to be safe. So instead of:
rtl gpos malloc(32768);
rtl gpos malloc(32768 + sizeof(struct rtl thread struct));

4.2.2 Thread joining

int pthread_join(pthread_t thread, void **arg);
This joins with a running thread, storing the return value into arg, and
has no restriction on the length of time it takes to complete. If the thread
has already completed, this call returns immediately. As with normal POSIX
calls, this frees resources associated with the thread, such as the stack, if it
was not configured by hand. If you look at our previous example, you can see
that we use this call to join both a preallocated stack thread and a normal
thread, and it cleans up the resources for both, except for the stack on the
second thread, which we explicitly have to free.
pthread_detach(pthread_t thread);
The pthread detach() call will ’unhook’ a running thread whose status
was previously joinable. After the thread is detached, it is no longer joinable,
and needs no further management. Its resources will be cleaned up on thread

4.2.3 Thread destruction

int pthread_cancel(pthread_t thread);
This will cancel the thread specified by the given parameter. There are
many caveats to this as specified in the full man page, such as the fact that
as a cancelled thread works through its cancel handlers, it is not required to
release any mutex locks it holds at the point of cancellation. (Though this
is a good idea to do if you want a stable system.) Also, it may not cancel
immediately, depending on the state the thread is in at the point of the call.
The target thread will continue to execute until it enters a cancellation point,
when it will begin to unwind itself through its registered cancel handlers.
For most users, pthread cancel() followed by a pthread join() is most
effective as a means of shutting down real-time code from within the tail end
of main().

4.2.4 Thread management

pthread_t pthread_self(void);
This is a very simple function, generally used by threads to get their own
thread handle for further calls.
pthread_setcancelstate(int state, int *oldstate);
pthread_setcanceltype(int state, int *oldstate);
Threads may use the pthread setcancelstate() to disable cancella-
tion for themselves. The previous state is stored in the oldstate vari-
able. Likewise, the pthread setcanceltype() call is used to determine
the type of cancellation used, either PTHREAD CANCEL DEFERRED or
PTHREAD CANCEL ASYNCHRONOUS. In real-time environments, how-
ever, most systems have a minimal set of simple, continuous threads, and do
not make heavy use of cancellation calls.
This call ensures that any pending cancellation requests are delivered to
the thread. It has little use in real-time applications, as cancellation must be
a deterministic call in the first place. If there are ambiguities present in the
code, it may be better to remove them, rather than being forced to check if
the real-time thread should continue.

int pthread_kill(pthread_t thread, int signo);

pthread kill() sends the signal specified by signo to the thread speci-
fied. This is fast and deterministic if called on a thread running on the local
CPU, but there can be a delay when signalling a thread on a remote CPU.

4.2.5 Thread attribute functions

In addition to the normal thread calls, RTCore also exposes the pthread attr *()
functions, which control attributes of a thread. These functions behave as
they would in any other situation, and we refer you to the standard docu-
mentation for more detail.

int pthread_attr_init(pthread_attr_t *attr);

int pthread_attr_destroy(pthread_attr_t *attr);

These two functions initialize and destroy attribute objects, respectively.

Attribute objects should be created or destroyed with these calls, not by

int pthread_attr_setstacksize(pthread_attr_t *attr,

size_t stacksize);
int pthread_attr_getstacksize(pthread_attr_t *attr,
size_t *stacksize);

Programmers can use these calls to manipulate the stack size of the thread
the attribute is tied to. Note that this must be done within the main()
context, where memory management is possible. Refer back to our example
Section 4.2.1 for details, both on this and the pthread attr setstackaddr()
call. If these attributes are not set, the RTCore OS will handle the stack
manipulation internally.
Again, note that thread stacks under RTCore are static, and will not
grow as needed based on what functions are called. Users need to ensure
that they have enough stack space for their thread from the start. Section
4.2.1 has more details.

int pthread_attr_setschedparam(pthread_attr_t *attr,

const struct sched_param *param);
int pthread_attr_getschedparam(pthread_attr_t *attr,
struct sched_param *param);

As with normal POSIX threads, these two routines determine scheduling

parameters as driven by the contents of the param parameter. Also, as usual,
use the sched get priority min() and sched get priority max() calls
with the appropriate scheduling policy to get the priority ranges. SCHED FIFO
is the default scheduling mechanism, and while it does not have to be speci-
fied, it is helpful to ensure forward compatibility.

int pthread_attr_setstackaddr(pthread_attr_t *attr,

void *stackaddr);
int pthread_attr_getstackaddr(pthread_attr_t *attr,
void **stackaddr);

These calls are important when creating threads from within the real-time
kernel. As there is no memory management, threads need to be spawned us-
ing preallocated memory areas. By using these calls to manage the stack
address, one can create threads from inside the real-time kernel. We’ve al-
ready seen this used in the thread creation example, and as you can see, it
is not difficult to manage.

int pthread_attr_setdetachstate(pthread_attr_t *attr,

int detachstate);
int pthread_attr_getdetachstate(const pthread_attr_t *attr,
int *detachstate);

Use these two calls to switch a thread’s joinable state from PTHREAD
the pthread detach() call can be used to alter a running thread’s state.

4.3 Synchronization
4.3.1 POSIX spinlocks
RTCore provides support for the POSIX spinlock functions too. The API is
much like other POSIX objects - there is an initialization/destruction set:

pthread_spin_init(pthread_spinlock_t *lock, int pshared);

pthread_spin_destroy(pthread_spinlock_t *lock);

As with other similar calls, these initialize or destroy a given spinlock -

no surprises there. As for the rest, the following calls are supported:

pthread_spin_lock(pthread_spinlock_t *lock);
pthread_spin_trylock(pthread_spinlock_t *lock);
pthread_spin_unlock(pthread_spinlock_t *lock);

Again, no surprises - these calls allow you to take a lock, try to take it but
return if the lock is already held, and unlock a given spinlock, respectively.
The spinlocks themselves behave just like a normal spinlock - they will spin
a given thread in a busy loop waiting for the resource, rather than putting
it on a wait queue to be woken up later.
As such, the same spinlock caveats apply - they are generally only prefer-
able to another synchronization method when the given thread will spin a
shorter amount of time waiting than the sum of the work involved in putting
it on a queue (and any associated locking), and waking it up appropriately
when the resource becomes available. In a real-time system, it is also of
course important that the resource is available quickly so the thread does
not lose determinism due to a faulty locking chain in other subsystems.

4.3.2 Comments on SMP safe/unsafe functions

The functions described here are inherently safe in SMP situations, although
there are real-time considerations. For calls that target threads running on
other CPUs, there may be a delay in getting the signal to the running code.
pthread cancel() and pthread kill() are two examples of this - when
sending a signal to code on the current CPU, the code is fast and determin-
istic, but may delay slightly when targetting a ’remote’ thread. While in
normal situations this is unimportant, the incurred delay may have reper-
cussions for real-time code. Keep these factors in mind when writing the
real-time component of your application - it may help to reconfigure which
CPUs run which threads.

4.3.3 Asynchronously unsafe functions

Some functions are not asynchronously safe, at least in a real-time environ-
ment. To ensure correct behavior, pthread cancel() is not recommended
for threads that use any of these functions. By ’asynchronously unsafe’ we

mean calls that may leave the system in an unknown state if the call is in-
terrupted in the middle of execution. An example would be a function that
locks several mutexes in order to do work, and installs no cleanup handlers.
If the call is halfway through and is cancelled by a remote pthread cancel()
call, that thread will exit while holding some mutexes, potentially blocking
other threads indefinitely.
It is possible to handle mutex cleanups in a safe manner if one pushes
cleanup handlers for all shared resources, but this is complicated and dan-
gerous. Extreme care must be taken to ensure that held resources are freed
in a manner that doesn’t incur locking, and that everything is cleaned prop-
erly for every possible means of failure. Failing to get this absolutely right
will leave all waiting threads blocked forever, as the cancelled thread will
terminate with locked mutexes left behind.

4.3.4 Cancel handlers

We’ve already mentioned these a couple of times, and will continue to do so
as we cover more of the API. These calls are difficult to get right in all cases,
and many developers don’t come into contact with them too often. In the
interest of sidestepping future confusion and grounding the discussion, we
will now diverge into a short example.
Put simply, cancel handlers are hooks attached to a running thread, as
functions, and are executed in the case that a thread is cancelled while a
resource is held. The handlers are pushed on as a stack, so that if the thread
is cancelled, the handlers are executed in the order they were pushed on the
Also, a cancelled thread does not execute cleanup functions at the time the
cancel is received: Rather, it continues execution until it enters a ’cancellation
point’, which is generally a blocking function. Refer to the POSIX specifica-
tion for specific cancellation points, but this generally means that code will
continue to execute until it hits a blocking call like pthread cond wait().
Let’s look at an example:

#include <rtl.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>

#include <stdio.h>

pthread t thread;
pthread mutex t mutex;

void cleanup handler(void *mutex) { 10

pthread mutex unlock((pthread mutex t *)mutex);

void *thread handler(void *arg)

pthread cleanup push(cleanup handler,&mutex);
pthread mutex lock(&mutex);
while (1) { usleep(1000000); }
pthread cleanup pop(0);
pthread mutex unlock(&mutex); 20
return 0;

int main(int argc, char **argv) {

pthread mutex init (&mutex, NULL);
pthread create (&thread, NULL, thread handler, 0);

rtl main wait();

pthread cancel (thread); 30

pthread join (thread, NULL);
pthread mutex destroy(&mutex);
return 0;

This code correctly handles the cancellation problem. In our initialization

code, we create a mutex and spawn a thread. This thread correctly pushes
a cleanup handler on the stack before it locks the mutex, and then enters
a useless loop. (Yes, it should do something useful and unlock, but this is
only for illustrative purposes.) Now the mutex is locked indefinitely, and
any cancellation must cause the mutex to be unlocked. If we cancel the

application with CTRL-C at the command line, it induces the cancel and
cleanup handler, causing a proper exit.
Note again the concept of a cancellation point - if the code pushes the can-
cel handler on, but the thread is cancelled asynchronously before it actually
locks the mutex, the thread will continue to run until it enters a cancellation
point. It will continue to execute, running through the after the cleanup han-
dler push but before the mutex lock. Once it locks the mutex, the thread is
cancellable, the signal will be delivered, and the handler will be called from a
known point. Think of cancellation points as being places where the system
checks to see if it should stop and clean up.
Consider this case without the cleanup handler, even where the code
wasn’t infinitely blocked. Once the thread locks the mutex, and another
process asynchronously cancels the thread, the thread will still wait for a
cancellation point, but without the handler, it will exit with the mutex held,
and any other code that depends on it will be blocked indefinitely. Now
imagine what happens if you have multiple resources held at various times,
depending on the call chain. Any lockable resource that isn’t attached to a
cleanup handler properly can cause a deadlock if the holding thread thread
is cancelled.
As you can see, while there are mechanisms to avoid cancellation prob-
lems, extreme care must be taken to make sure that everything is handled
properly. Failure to do so in every possible cancel situation will result in
system deadlock. With a real-time system, this can be disastrous, and it is
for this reason we’ve taken this time to demonstrate how careful one must

4.4 Mutexes
The POSIX-style mutexes are also available to real-time programmers as a
means of controlling access to shared resources. As timing is critical, it is
important that mutexes are handled in such a way that blocking will not
impede correct operation of the real-time application.

4.4.1 Locking and unlocking mutexes

int pthread_mutex_lock(pthread_mutex_t *mutex);
4.4. MUTEXES 49

As with the standard POSIX call, this locks a mutex, allowing the caller
to know that it is safe to work on whatever resources the mutex protects. In
a real-time context, locks around mutexes must be short, as long locks could
cause serious delays in other waiting threads.

int pthread_mutex_trylock(pthread_mutex_t *mutex);

The pthread mutex trylock() call will attempt to lock a mutex, and
will return immediately, whether it gets the lock or not. Based on the return
value, one can tell whether the lock is held, and take appropriate action. For
some applications that may not be able to wait for a lock indefinitely, this is
a way to avoid long delays.

int pthread_mutex_timedlock(pthread_mutex_t *mutex,

const struct timespec *abstime);

Similar to the above pthread mutex trylock() function, pthread mutex

timedlock() provides a way to attempt to grab a lock, with an upper bound
on the length of the wait. If the mutex is made available and locked by the
caller before the allotted time has passed, the mutex will be locked. If the al-
lowed time passes and the mutex cannot be locked by the caller, the function
returns with an error so that the caller can recover appropriately.

int pthread_mutex_unlock(pthread_mutex_t *mutex);

As you would guess, this unlocks a held mutex. It signals a wakeup on

those threads that are blocking on the mutex.

4.4.2 Mutex creation and destruction

int pthread_mutex_init(pthread_mutex_t *mutex,
const pthread_mutex_attr *attr);
int pthread_mutex_destroy(pthread_mutex_t *mutex);

As with normal POSIX calls, this inititalizes a given mutex. If a pthread mutex attr
is provided, it will use it, otherwise a default attribute set will be created
and attached. The second call of course destroys an existing mutex, assum-
ing that it is in a proper state and not already locked. Destroying a mutex
that is in use will result in an error to the caller.

4.4.3 Mutex attributes

int pthread_mutexattr_init(pthread_mutexattr_t *attr);
int pthread_mutexattr_destroy(pthread_mutexattr_t *attr);

This is used to initialize a given mutex attribute with the normal values,
or destroy an already existing attribute.

int pthread_mutexattr_settype(
pthread_mutexattr_t *attr,
int type);
int pthread_mutexattr_gettype(
pthread_mutexattr_t *attr,
int *type);

This call allows you to set the type of mutex used. For example, the
type can be either PTHREAD MUTEX NORMAL, which implies normal
mutex blocking, or PTHREAD MUTEX SPINLOCK NP, which will force
the mutex to use spinlock semantics when attempting to grab a lock. The
second call will return the type previously set, or the default value.

int pthread_mutex_setprioceiling(
pthread_mutex_t *mutex,
int prioceiling,
int *old_ceiling);
int pthread_mutex_getprioceiling(
const pthread_mutex_t *mutex,
int *prioceiling);

This call sets the priority ceiling for the given mutex, returning the old
value in old ceiling. This call blocks until the mutex can be locked for
modification. The second call returns the current ceiling. More detail on
priority ceilings will follow later on.

4.5 Conditional variables

These calls are the same as those used in normal POSIX environments. Keep
in mind that if a thread waiting on a condition variable is cancelled while

blocked in either pthread cond wait() or pthread cond timedwait(), the

associated mutex is reacquired by the cancelled thread. To prevent deadlocks
a cleanup handler that will unlock all acquired mutexes must be installed.
Reacquiring the associated mutex will take place before the cleanup handlers
are called.

4.5.1 Creation and destruction

int pthread_cond_init(pthread_cond_t *cond,
const pthread_condattr_t *attr);
int pthread_cond_destroy(pthread_cond_t *cond);

A condition variable must be created and destroyed just like any other
object. Note that there is an attribute object that is specific to condition
variables, and can be used to drive the behavior of the variable.

4.5.2 Condition waiting and signalling

int pthread_cond_wait(pthread_cond_t *cond,
pthread_mutex_t *mutex);

This behaves as one would expect. The caller waits on a condition to

happen specified by cond, and coordinates usage with the mutex parameter.
The mutex must be held at the point of the call, at which point it is released
for other threads to cause the condition to occur, also using the mutex. When
the call returns, signalling that the condition has occurred, the mutex is again
held by the caller. The associated mutex must be released after the critical
section is complete.

int pthread_cond_timedwait(pthread_cond_t *cond,

pthread_mutex_t *mutex,
const struct timespec *abstime);

As with pthread cond wait(), this call waits for a condition to happen,
locked by a mutex. In this version, however, it will only wait the amount
of time specified by abstime. Based on the return value, the caller can
determine whether the call succeeded and the condition occurred, or if time
ran out.

int pthread_cond_broadcast(pthread_cond_t *cond);

int pthread_cond_signal(pthread_cond_t *cond);
This is a very simple function that broadcasts a condition signal to all
those waiting or to a single thread waiting on a condition variable, respec-
tively. Note that the caller of these functions does not need to hold the mutex
that waiting threads have associated with the condition variable.

4.5.3 Condition variable attribute calls

int pthread_condattr_init(pthread_condattr_t *attr);
int pthread_condattr_destroy(pthread_condattr_t *attr);
The attribute object calls appropriate for condition variables are no differ-
ent than any other attribute calls. The same object creation and destruction
methods apply.
int pthread_condattr_getpshared(
const pthread_condattr_t *attr,
int *pshared);
int pthread_condattr_setpshared(
pthread_condattr_t *attr,
int pshared);
Relative to thread and other object types, there is not much that can be
modified for conditional variable attributes. These calls toggle the status of
a conditional variable’s shared status. No other methods apply to this type.

4.6 Semaphores
Again, RTCore semaphores look just like POSIX semaphores. As with the
conditional variables, if a thread is cancelled with a process-shared semaphore
blocked, this semaphore will never be released, and consequently, a deadlock
situation can occur. It is the programmer’s responsibility to ensure that
semaphores are handled properly in cleanup handlers.
Signals that interrupt sem wait() and sem post() will terminate these
functions, so that neither acquiring or releasing the semaphore is accom-
plished. The function call interrupted by a signal will return with value

4.6.1 Creation and destruction

int sem_init(rtl_sem_t *sem, int pshared,
unsigned int value);
int sem_destroy(rtl_sem_t *sem);

These functions operate properly on semaphores. As with the mutex

functions, these functions will detect in-use semaphores and other problems
that could cause unpredictable behavior. Refer to the examples and full
documentation for more details on their use, but in general they will behave
as they would in any other environment.

4.6.2 Semaphore usage calls

int sem_getvalue(sem_t *sem, int *sval);

This function will store the current value of the semaphore in the sval

int sem_post(sem_t *sem);

sem post() increases the count of the semaphore, and never blocks, al-
though it may induce an immediate switch if posting to a semaphore that a
higher priority thread is waiting for.

int sem_wait(sem_t *sem);

int sem_trywait(sem_t *sem);
int sem_timedwait(sem_t *sem,
const struct timespec *abs_timeout);

These are the calls used to force a wait until the semaphore reaches a
non-zero count, and operate in the same way the mutex wait calls do. The
sem wait() call blocks the caller until a non-zero count is reached, and the
sem trywait() does the same without blocking, returning EAGAIN if the
count was 0. The sem timedwait() call blocks up to the amount of time
specified by abs timeout.

4.6.3 Semaphores and Priority

Semaphores must be handled with care in the context of real-time code. If
you have low priority code that does a sem post(), you must keep in mind
that if a higher priority thread was waiting on that semaphore, the post will
induce an immediate transfer of control to the higher priority thread.
This comes as a surprise to some users, but you must keep in mind that in
real-time systems, speed is of course the most important factor. If this means
that your real-time thread suspends the moment it does the post, that’s all
right - the alternative is to further block the high priority thread that needs
the semaphore.
Aside from ensuring the best possible performance, semaphores are also
used in this way to simplify driver development. Interrupt handlers can
be kept very simple and succinct, with semaphore posts after the minimal
amount of work is done. This will cause a switch to the handling thread,
which can perform the rest of the work. As threads are more capable than
interrupt handlers (being able to use the FPU, the debugger, etc), the data
can be handled in a simple thread context rather than building complex
interrupt handlers.

4.7 Clock management

RTCore provides standard POSIX mechanisms for managing the clock, thread
sleeps, delays, and similar tasks. Examples include clock nanosleep(),
clock gettime(), and so on. For detailed information on these functions,
refer to the Single UNIX Specification provided with RTLinuxPro.
One additional piece of information worth noting here is the addition
of an advance timer to clock nanosleep(). Virtually every system has an
inherent amount of jitter, depending on hardware load. Some applications
require determinism below the threshold of this jitter. For these applications,
RTCore provides the advance timer. Threads generally sleep with:
struct timespec t;
clock_gettime(CLOCK_REALTIME, &t);
timespec_add_ns(&t, 500000);
This way, the thread will be woken when the absolute time specified in t,
which was the current time plus 500us. If there is inherent hardware jitter,

though, the thread may be delayed by a couple of microseconds. Using the

advance timer will allow the thread to ’preload’ just before the exact time
needed, and release at the exact start of the period. For a system with an
inherent 7us of jitter, the syntax changes to:

struct timespec t, advance;

advance.tv_sec = 0;
advance.tv_nsec = 7000;
clock_gettime(CLOCK_REALTIME, &t);
timespec_add_ns(&t, 500000);
&t, &advance);

With this configuration, RTCore will prepare the thread 7us in advance,
and then release it exactly when the required period hits. It will take away
from available CPU throughput as the chip will be spinning in that prepara-
tion window, but it will provide jitterless timing.
Users with very specific timing requirements may want to disable inter-
rupts during this, so that an interrupt that comes in as the thread is being
prepared does not induce even the slightest jitter to the thread.

4.8 Extensions to POSIX (* np())

There are some calls available to developers that are specific to RTCore.
These calls are not part of the POSIX specification, but do fill in some of the
gaps left by it, and may make their way in some form into future revisions
of the standard. We list them here as an option to developers. In order to
properly handle some situations, such as SMP environments, these calls may
make life much easier.

4.8.1 Advance timer

For some applications, the allowable worst case hardware jitter and schedul-
ing deviation may run very close to what RTCore and the underlying hard-
ware is capable of delivering. Suppose your application has a worst case jitter
allowance of 13 microseconds, meaning that if you schedule thread X to run
at a certain time, under a worst case load, execution of thread X cannot

deviate from that time by more than 13 microseconds. If the hardware, un-
der load, deviates by 10 microseconds, and the RTCore scheduling takes 3,
and the context switch time takes 2, you are already outside of the allowable
For some users, the application might not be too cost-sensitive, and it is
just a matter of getting faster hardware. But for low cost systems, or where
there is no faster hardware, RTCore offers the advance timer. This allows
you to compensate for things like scheduling and hardware jitter.
The advance timer works as follows: When you make a call to clock nanosleep()
to sleep until your next scheduling period, RTCore allows an extra flag and
structure to perform early scheduling. So instead of:

clock nanosleep( CLOCK REALTIME, TIMER ABSTIME, &next, NULL);

The code uses:


&next, &ts advance);

The ts advance structure is a normal struct timespec, with the tv nsec

field used to indicate how much deviation you would like to account for, in
nanoseconds. So for our example, if we had a worst case latency of 9 mi-
croseconds on our hardware, but the application demanded 4 microseconds,
tv nsec would be set to something like 13000.
This tells the scheduler to account for 13 microseconds of latency. RTCore
will then switch the thread in early and spin it in a busy wait until the exact
scheduling moment occurs. It then releases the thread to run, and as it
is already prepared, the problem of scheduling deviation is averted. It is
important to ensure that hard interrupts are disabled upon entering this
call. If your thread’s advance point has occurred, and it is in a busy-wait
until the real scheduling point occurs, it is possible that another interrupt
can come in and interrupt the thread’s execution just before it needs to
actually run. So in order to be entirely safe, this call must be surrounded by
rtl stop interrupts() and rtl allow interrupts() calls.
Using this method, a developer can get much better scheduling resolution
than would normally be possible, given the underlying hardware. This option
is only available in the Professional line of products.

4.8.2 CPU affinity calls

int pthread_attr_setcpu_np(pthread_attr_t *attr,
int cpu);
int pthread_attr_getcpu_np(pthread_attr_t *attr,
int *cpu);

These two functions modify a given thread attribute in order to get a

thread to run on a specific CPU. By default, the thread is run on the same
CPU as it was created on. This is a means of ensuring that work is specifically
distributed throughout the system.
When developing real-time applications, it is generally required that dif-
ferent threads operate in phase with each other, so that thread x is doing
some work so that data will be ready when thread y needs it. On an SMP
system, this may mean that the first thread must be bound to one CPU while
the second is on the second CPU, and they operate in tandem to provide the
highest throughput. Without these calls, both threads may end up on the
same CPU, and the correct phase relationship may not be possible.
Refer for the reserve CPU capabilities for more advanced calls. (pthread attr setreserve np())

4.8.3 Enabling FPU access

int pthread_setfp_np(pthread_t thread, int flag);

By default, real-time threads do not have access to the CPU’s floating

point unit, since the system’s context switch times are faster if it doesn’t
have to restore floating point registers. This call will enable or disable that
access. For threads running on another CPU, pthread attr setfp np() is
the proper way of enabling FPU support for the thread.

4.8.4 CPU reservation

int pthread_attr_setreserve_np(&attr, 1);
int pthread_attr_getreserve_np(&attr);

In SMP applications, especially very high speed systems, it is benefi-

cial to reserve a CPU for only real-time applications, whether they are
threads or interrupt handlers. By using this thread attribute along with
the pthread attr setcpu np() call, one can spawn a real-time thread on

a CPU such that the GPOS cannot run on that CPU. The benefit is that
the real-time code can then in many cases live entirely in cache, and achieve
more deterministic results at high speeds, as the GPOS cannot run on that
CPU and disturb the cache usage.
Tests on larger scale systems with significant bus traffic indicate that
reserve CPU capabilities can reduce jitter by an order of magnitude.

Affinity on NetBSD
RTCore on NetBSD also allows users to pin userspace processes to a specific
CPU. (Generally, they can be scheduled to run on any CPU.) To pin a
process, user programs must include the rtl pthread.h header file, and


A positive number should be used to bind to a specific CPU - if this

succeeds, the call returns 0. To unpin the process, pass -1 to the function
from the reserved process.

4.8.5 Concept of the extensions

The idea behind the extensions is simple - they are there to provide easy
means of handling aspects of real-time programming not covered (or not
covered well) by the the POSIX standard. In some situations, they provide
an easy way to get something done that coule be done with standard calls,
but would be much more work and would result in convoluted code. In other
cases, the standard doesn’t specify with detail certain aspects of real-time
operations, and the extra calls are there to work around the ambiguities.
Most of these situations relate to how certain operations are carried out
in SMP mode, and handle ambiguities associated with targetting code on
another CPU in real-time. The RTCore extensions take what could be a
non-deterministic situation and remove execution ambiguities.

4.9 ”Pure POSIX” - writing code without

the extensions
There are users who don’t want to use the non-POSIX extensions. In these
cases, there is usually some need for all of the code to be POSIX-compliant,
usually based on an internal coding standard. If you are in a similar situation,
it is possible to write code without the extensions, although there may be
performance issues as a result. RTCore does not force you to deviate from
the standard, it simply offers some solutions to improve performance.

4.10 The RTCore API and communication

We’ve focused so far on demonstrating the API used in RTCore to commu-
nicate between threads and other code living in the real-time kernel. A later
chapter will focus more on the auxiliary communication models, such as FI-
FOs and shared memory, when you need to communicate with the GPOS.
Surprises are few and far between, though, as you’ll see just as many POSIX
examples there as you do here.
Chapter 5

More concepts

So far, we’ve looked at some basic real-time concepts, introduced some ex-
amples, and walked through the basics of the API. Given that the API is
POSIX, much of the learning curve is gone, and we can now hop back into
some general programming practices and concepts, and how they work in
RTCore. Let’s start off with some basic practices:

5.1 Copying synchronization objects

Do not copy any objects of type mutex, conditional variable, or semaphore,
as operations on a copied synchronization object can result in unpredictable
behavior. All of the synchronisation objects should be initialized and de-
stroyed with the appropriate function for the data type.
The same holds true for attributes associated with synchronization ob-
jects. They should never be copied, instead initialize them with the appro-
priate calls. The following is wrong:

pthread_mutex_t mutex1, mutex2;

memcpy(&mutex2, &mutex1, sizeof(pthread_mutex_t);

Instead, this should be used:

pthread_mutex_t mutex1, mutex2;



5.2 API Namespace

The RTCore API is POSIX, which we have reiterated enough times by now.
However, the RTCore API is also available using an rtl prefix. This means
that pthread create() can also be referenced as rtl pthread create().
This is an added feature so that users can explicitly reference RTCore
functions when needed, if there is any ambiguity. In PSDD, as we will
see later, real-time applications exist inside of normal GPOS applications.
In these situations, an ambiguity exists - pthread create() will by de-
fault refer to the normal userspace GPOS function, rather than the RTCore
pthread create(). In situations such as these, the rtl prefix is needed.

5.3 Resource cleanup

RTCore will clean up some unfreed resources for you if your application
doesn’t explicitly catch everything on cleanup. As we saw in the first exam-
ples, devices and open file descriptors are cleaned up automatically. If you
exit your program and forget to call close(), RTCore will detect this and
make the call for you. This will allow file usage counts to remain in proper
order. This also holds true for POSIX I/O-based devices that your code may
have registered - it will do the proper deregistration.
However, some resources are not handled at this time. Threads are not
cleaned up automatically, so it is up to the programmer to make sure that
each thread belonging to an application is cancelled and joined properly. The
same goes for memory allocated through rtl gpos malloc() - the caller must
free these areas with rtl gpos free() to prevent memory leaks.

5.4 Deadlocks
When using synchronization primitives, it is the programmers responsibility
to ensure that either all shared resources correctly freed if asynchronous
signals are enabled or that these are blocked. Make sure to use thread cleanup
handlers to safely free resources if the thread is cancelled while holding a

5.5 Synchronization-induced priority inversion

If a high priority thread blocks on a mutex (or any other synchronisation
object) that was previously locked by a low priority task, this will lead to
priority inversion: The lower priority thread must gain a higher priority in
order to guarantee execution time. A high priority thread may come along
and block execution of the low priority task from running, preventing mutex
release and stalling both the low and high priority threads. The high priority
thread is waiting for the low priority thread to release, and the low priority
thread is waiting for execution time. The mutex will never be unlocked.
Any scenario that allows a lower priority task blocking a high priority
task is an implicit priority inversion. Theories abound on what the correct
mechanism is to handle this problem, and FSMLabs has found that analysis
of code is the best means of avoiding the problem. Based on internal and
external experience, it follows that if you don’t know what resources your
code might or might not hold at a given point, the chances of there being
dangerous potential situations is very high.
Protocols such as priority inheritance exist to solve this problem, but in
turn induce potentially unbounded priority inversion. Inheritance involves
lower priority resources being promoted to higher priority levels such as when
a higher priority task is waiting on the lower. This approach can lead to un-
bounded suspension, though - consider a high priority thread that is waiting
on a lower priority thread that holds a lock. The lower priority thread is
promoted so it can execute and release. However, this thread now needs a
lock held by an even lower priority thread. This third thread is then raised so
that it can execute, and so on. In the meantime, the high priority thread may
not be considered ’real-time’ anymore, as it can easily lose its deterministic
RTCore provides optional support for the ’priority ceiling protocol’, in
which resources are given a ceiling priority they cannot exceed. This still
requires analysis and is not perfect, but does provide a middle ground for

5.6 Memory management

As we have mentioned, general purpose memory management is not available
to real-time threads. If your application does have a need for a memory pool,

it is best to allocate it during initialization and then allocate pieces from that
pool by hand during execution.
The reason for this approach is simple - bounded time allocation in a
general purpose memory allocator is difficult to prove. For non-real-time
applications, a generic allocator is fine - the user calls malloc(), and the call
may return immediately if a chunk is available, or it may block indefinitely.
On an active system, it is entirely possible that memory may be extremely
fragmented, and the allocator might have to do a lot of work in order to
defragment existing pieces enough that the user request can be handled. In
a real-time system, this may mean that your thread is indefinitely blocked.
Users can allocate a large chunk during initialization (in main()), where
the code does not have real-time demands. This will ensure that a pool is
around for real-time use. If the usage pattern of the pool is known, a simple
allocator/deallocator could be implemented on top of this pool that would
allow for memory management calls in bounded time.
Bounded-time allocators will return with an answer in a specific time
frame, but may not use memory as efficiently as it could, while generic allo-
cators will optionally take extra time to use every last bit of memory. Some
bounded-time allocation mechanisms and algorithms do exist, and FSMLabs
is evaluating and testing some of these options. Future releases of RTCore
will likely include a bounded time allocator of some type as a convenience to

5.7 Synchronization
This is possibly the most important concept in real-time systems engineering.
While synchronization is important as a protection mechanism in normal,
non-real-time threaded applications, it can make or break a real-time system.
In a normal application, a waiting thread will do just that - wait. In a hard
real-time system, a waiting thread might mean that a fuel pump isn’t being
properly regulated, as it is waiting on a mutex that another thread has held
too long.

5.7.1 Methods and safety

Safe synchronization relies on several things judicious use of it, code analysis,
and above all, understanding of the code at hand. No amount of software

protection will save the system from a programmer who doesn’t understand
or care what locks are held when in a real-time system. In fact, the presence
of it may result in carelessness on the part of the developer.
RTCore offers the standard POSIX synchronization methods, such as
semaphores, mutexes, and spinlocks, but also focuses on other, higher per-
formance synchronization methods. In fact, much of RTCore is designed
in such a way that synchronization is not necessary, or is very lightweight.
Heavy synchronization methods such as spinlocks can disable interrupts and
interfere with other activity in the system. Lighter mechanisms such as
atomic operations create very little bus traffic and have a minimal impact.
Of course, an entirely lock-free mechanism is even better, if possible.
An example of this is the RTCore POSIX I/O subsystem. The original
Free versions were very fast but had no locking mechanisms whatsoever.
While the performance was good, it didn’t hold up to industrial use. It
needed proper locking in order to traverse the layers properly. The layer also
needed to stay as fast as it was before - users want it to be fast and safe. A
simple and effective method would be to put mutexes around each contended
piece, locking and unlocking as needed.
While simple, this would severely slow the system down, as mutexes in-
volve waiting on queues, switching threads while others complete, and so on.
Instead, FSMLabs added a light algorithm based on atomic operationsThese
are covered in more detail in section 5.7.3. As requests to in to
add a device name to the pool, atomic operations such as rtl xchg are
used to grab pieces of the pool. This prevents interrupt disabling, and allows
other threads to use other pieces of the pool at the same time.
Some other restructuring was also used, resulting in a more flexible and
safe architecture that was just as fast as it had always been. Other systems
require different approaches, from heavier synchronization to none at all, but
it is very important that the correct method is chosen, not just one that
Now that we’ve briefly covered the topic (synchronization is a very broad
topic in real-time systems), let’s look at a specific example of a light syn-
chronization method in RTCore.

5.7.2 One-way queues

As we have said, POSIX provides several synchronization methods, but other
approaches are sometimes called for - usually, when a very light and quick

method is needed. The one-way queues provided by RTCore handle many of

these situations.

The Basic Idea

Many usage patterns require that one thread sends messages to another, in
real-time. In one form or another, this results in a queue. As queues are
simple, many users write their own, and rather than leaving it open, protect
queue operations with a lock. While the lock will rarely be contended, the act
of grabbing and releasing the lock may interfere with other system activity.
In light of this, RTCore provides a ’one-way queue’ implementation. This
allows a user to declare specific message queues, shared between a single
consumer and a single producer thread. Each queue declaration implicitly
defines a set of functions to operate specifically on that distinct queue, so
that all code using the queue interacts with it using a specific function name.
These queues are lock-free, meaning that there is no locking on sends and
receives. The API can handle concurrent enqueue and dequeue operations,
but the user must wrap the calls with a lock if there are multiple consumers
or multiple producers. The result is a very fast mechanism for exchanging
data that needs very little, if any, management overhead. Let’s look at an

#include <rtl.h>
#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

pthread t thread1, thread2;

DEFINE OWQTYPE(our queue,32,int,0,−1);

DEFINE OWQFUNC(our queue,32,int,0,−1); 10
our queue Q;

void *queue thread(void *arg)

int count = 1;
struct timespec next;

clock gettime(CLOCK REALTIME, &next);

while (1) {
timespec add ns(&next, 1000000000); 20
clock nanosleep(CLOCK REALTIME,

if (our queue enq(&Q,count)) {

printf("warning: queue full\n");
void *dequeue thread(void *arg)
int read count;
struct timespec next;

clock gettime(CLOCK REALTIME, &next);

while (1) {
timespec add ns(&next, 500000000);
clock nanosleep(CLOCK REALTIME,
read count = our queue deq(&Q);
if (read count) {
printf("dequeued %d\n",
read count);
} else {
printf("queue empty\n");
int main(int argc, char **argv) {
our queue init(&Q);
pthread create (&thread1, NULL,
queue thread, 0);

pthread create (&thread2, NULL,

dequeue thread, 0);

rtl main wait();

pthread cancel (thread1); 60

pthread join (thread1, NULL);
pthread cancel (thread2);
pthread join (thread2, NULL);

This requires some explanation, as the syntax hides much of the work.
There are two threads, spawned as normal, where one enqueues data and the
other dequeues. Both are periodic, and as a quick method of preventing the
queue from overflowing, the dequeueing thread defines a period half as long
as the enqueueing thread. Half of the dequeue calls result in an empty queue
being found, but this is acceptable for our purposes.
Now let’s break down the interesting part into discrete steps, starting
with the initial declarations.

We need to define a queue for data to flow between the threads. The syntax
defines two steps: Let’s look at just step 1 first:


This first step creates a datatype for our queue. Think of this as the
backing for the queue operations - it defines the queue, it’s properties, and
structure. Parameter 1 is the name that will be provided so that we can
instantiate the queue itself, and parameter 2 defines the length of the queue.
Parameter 3 defines the type of unit the queue is made of - here we use an
integer as the base element, but we could have used pointers or anything
else. As the queue operations copy data into the queue, light units such as
pointers are favored over large structures. Parameters 4 and 5 are not used
at the moment.
We now have a queue structure named our queue containing 32 elements,
each the size of an int. If you were passing characters or structures through
the queue, you would use char or struct x as parameter 3.

Now let’s look at the step 2:


This defines functions to be used explicitly on the queue type defined in

step 1. The parameters work in a similar fashion: Parameter 1 defines both
the prepending name of the new queue operations and the type of queue
structure that the functions will work on. Parameter 2 again defines the size
of each unit, and 3 determines the type.
The last two parameters are used in this case: One defines the return
value for a dequeue call on an empty queue, and the other is the return value
for an enqueue call on a full queue. Values such as 0 and -1 are generally
safe, but are configurable in light of situations where 0 is a valid value to be
pushing through the queue. If you enqueue 0, and the other end dequeues it,
there must be some means of determining that the value of 0 was intended,
and is not a result of a call on an empty queue. Select a value that is known
to be unique from your valid queue values.
Lastly, we define an instance of our queue structure to be used in our
threads with the line:

our_queue Q;

Looking on to the thread code, you can see that the actual usage of the queue
is simple: One thread calls our queue enq(&Q,count), which is the enqueue
function created in step 2 above, using our defined structure Q, and pushing
a value of count into it. The other end does a our queue deq(&Q) which
returns a correct type off of the queue for usage in the other thread. Note that
step 2 also defines a few other simple calls for the queue: our queue full()
to see if the queue is full, our queue init() to initialize a queue structure,
and also a our queue top(), which will return the current head of the queue
without removing it. (This also serves as an isempty() function.)
Queue interaction, as you can see, is very easy. It is also extremely fast,
and doesn’t require locking for most cases. The code is safe for multiple
threads that are doing enqueue and dequeue operations at the same time,
which is the common case. The user needs to add an external lock when two
or more threads are enqueueing data at the same time or a set are dequeueing
data at the same time. Otherwise, no additional locking is needed.

This is only one example of a light synchronization method. RTCore pro-

vides this for the user’s convenience, and the user is encouraged to closely
analyze their synchronization needs to ensure that the right approach is cho-

5.7.3 Atomic operations

We’ve mentioned atomic operations a few times now, and it’s high time we
look at them in some detail. In general, atomic operations include any type
of operation that cannot be further subdivided, and can be viewed as a single
distinct operation. (There are other definitions too.) For our purposes, we
will be looking at atomic bit operations - work that is done on a specific
memory location in a a single step.
Depending on the system at hand, simple steps like setting a variable to
a specific value may appear to be atomic but can be very far from it. Code
that writes to a generic value may end up with that left in cache but not
synchronized with main memory, which on an SMP system can wreak havoc.
Consider writing to a simple integer that another thread on a different CPU
is waiting for - the first write may only makes it to the cache but not to main
memory, allowing the first thread to continue on with other work, while the
second is working from flawed data. Even worse, both threads could update
the value at the same time.
Atomic operations allow you to say ’There is a value at this address.
Set bit 3 of this atomically.’ The operation will be carried out in a specific
atomic fashion, and error if someone else tried to do the same thing at the
same time, signalling that you have to try again or take another route.
RTCore provides some simple API calls to handle these problems. They
are custom, as POSIX does not define functions related to this problem,
but they are meant to be easy to understand. As each architecture handles
atomic operations differently, these functions were designed to do the right
thing depending on the architecture at hand. Let’s take a look:

rtl_a_set(int bit, volatile void *word);

rtl_a_clear(int bit, volatile void *word);

These two atomically set a bit or clear within a word address, respectively.
The first parameter specifies which bit should be toggled, and the second
specifies the address base to be used for the operation.

rtl_a_test_and_set(int bit, volatile void *word);

rtl_a_test_and_clear(int bit, volatile void *word);

Implementations of the standard test and set/clear operations are pro-

vided here. The call will atomically set a specific bit within a word and
return the previous value.

rtl_a_incr(unsigned long *w);

rtl_a_decr(unsigned long *w);

Operating on a single long, these two will simply increment or decrement

the current value safely.
These are simple operations with a simple interface, and can be used
to build very elegant and high performance synchronization methods. While
other, more common mechanisms abound, most of them involve locks, queues,
and other structures. With atomic operations, synchronization can be as
simple as a single bit operation on an address.
Chapter 6

Communication between
RTCore and the GPOS

The two components of a complete RTCore system, real-time and the user-
space, generally run in two separate, protected address spaces. The real-time
component lives in the RTCore kernel, while the rest of the code lives as a
normal process within the GPOS. In order to manage each side, there has
to be some kind of communication between the two. RTCore offers several
mechanisms to facilitate this.

6.1 printf()
printf() is probably the simplest means of communicating from a real time
thread down to non-real-time applications. When an RTCore application
starts up, it creates a ’stdout’ device to communicate to the calling envi-
ronment, usually a terminal device of some kind. Calls to printf() in the
real-time application appear in the calling terminal the same way a printf()
call would in a normal application. This allows you to log real-time output
the following way:

./rtcore_app > log_file

The printf() implementation is fully capable, and can handle any nor-
mal data type and format. It also is a lightly synchronized method compared
to some others we will present here, and very fast as a result, without im-
pacting other core activity.


6.2 rtl printf()

This can be thought of as a simple method of dropping information in the
GPOS’s kernel ring buffer. rtl printf() is a normal printf() call that
exists within the real-time kernel, and works the exact same way as printf()
or printk(), but is safe from a real-time process.
For simplicity and speed in the kernel, this call does not support all format
types that a standard printf() call does. Most notably, it does not handle
formatting of floating point types.
While the overhead of rtl printf() is minimal, it is important to note
that there are implications. In order to safely synchronize with the GPOS,
interrupts must be briefly disabled. This means that you should avoid heavy
use of it, especially in a tight loop. Any operation that affects timing must
be carefully considered with respect to real-time goals, so make sure that
your debug output isn’t causing more problems than it is helping you solve.
This call is a very useful method of logging via the kernel buffer, but most
users will probably find the normal printf() call to be more convenient and

6.3 Real-time FIFOs

Generally, there is a need for bidirectional communication between the real-
time module and the user-space code. The most straightforward mechanism
for this is the real-time FIFO. Applications can instruct RTCore to create
FIFO devices at runtime via POSIX calls, as we will see. The real-time
module reads or writes data to this device in a non-blocking manner, and on
the Linux side, a process can open it and make read()/write() calls on it
to exchange data with the real-time kernel (non-real-time applications can
be blocking or non-blocking.

6.3.1 Using FIFOs from within RTCore

For every FIFO that is used, initialization code must do:

mkfifo("/mydevice", 0777);
fd = open("/mydevice", O_NONBLOCK);
ftruncate(fd, 8192);

An important factor to remember is that the FIFO creation calls involve

memory management in the ftruncate() operation, which is not available
from within real-time threads. As such, these calls must be made from within
the main() context in order to be safe. This is unlikely to be a problem, as
in nearly all cases, you need to set up your FIFOs before starting real-time
operations. This is only for calls performing initialization, though - real-time
threads that call open("/mydevice", O NONBLOCK); do not invoke memory
management, but instead just attach to the previously created device, and
are safe from within real-time threads.
In the example above, 0777 was used in the mkfifo() call. This indicates
to RTCore that the device should also be present in the GPOS filesystem.
In the process of the call, a device of that name and permissions (masked
with the caller’s umask) will be created. To create FIFOs that are to be used
strictly between real-time threads, specify 0 for the mask. This will register
the device so that real-time threads can use it, but it will not be visible to
the GPOS. More documentation on this can be found in the Arbitrary FIFO
device article provided in PDF form with RTCore.

6.3.2 Using FIFOs from the GPOS

On the user-space side, the FIFO appears to be a normal file. As such, any
normal file operation is usable on the FIFO. For example, the user-space
code could be a perl script, or maybe just a logging utility comprised of:

cat /my_device > logfile

6.3.3 A simple example

FIFOs are extraordinarily simple to work with, but it might be helpful to see
some of the calls described here in a single application:

#include <rtl.h>
#include <time.h>
#include <pthread.h>
#include <unistd.h>
#include <rtl fifo.h>

pthread t thread;

int fd0, fd1, fd2;

void *start routine(void *arg) 10

int ret, status = 1;
int fd0, fd1, fd2;
int read int;

while (status) {
ret = read(fd1,&read int,sizeof (int));
if (ret) {
printf("/mydev0: %d (%d)\n", 20
read int,ret);
write(fd2,&read int,ret);
ret = read(fd0,&read int,sizeof (int));
if (ret) status = 0;
return 0;

int main(int argc, char **argv) { 30

mkfifo("/mydev0", 0777)
fd0 = open("/mydev0",O NONBLOCK);
ftruncate(fd0, 4096);

mkfifo("/mydev1", 0777)
fd1 = open("/mydev1",O NONBLOCK);
ftruncate(fd1, 4096);

mkfifo("/mydev2", 0777)
fd2 = open("/mydev2",O NONBLOCK); 40
ftruncate(fd2, 4096);

pthread create (&thread, NULL, start routine, 0);

rtl main wait();


pthread cancel (thread);

pthread join (thread, NULL);
close(fd1); 50

We’ve already seen most of this code in other examples, but this succinctly
shows you how to use POSIX I/O from within RTCore. As usual, we spawn
a real-time thread from within main(), but we first have to explicitly create,
open, and size our FIFOs with the proper amount of preallocated space. 1
In the thread, we read() from fd0 to see if it’s time to shut down, and
otherwise read() from fd1 and write received data to fd2. These calls are
non-blocking for a reason - if the real-time thread ended up waiting for a
GPOS application that rarely got scheduling time, it would not be determin-
istic. So in this case, we just sleep and attempt to read from the devices.
There isn’t much to look at on the user-space side, but for the sake of
completeness, here it is:

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main (int argc, char **argv) {

int fd0, fd1, fd2, i, ret, read int;

fd0 = open("/mydev0",O WRONLY); 10

fd1 = open("/mydev1",O RDWR);
fd2 = open("/mydev2",O RDWR);

for (i = 0; i < 10; i++) {

write(fd1,&i,sizeof (int));
read(fd2,&read int,sizeof (int));
Refer to the next section for details on determining the correct amount to preallocate.

printf("Received %d from RTCore\n",

read int);
write(fd0,&i,sizeof (int)); 20

Looks pretty much like any other userspace application, doesn’t it? That’s
because it is. All we do is open the FIFOs, dump data over them, and read
it back. After we’re done, we write to a third FIFO to signal the real-time
thread that it’s time to shut down, and then we close the files. One minor
difference is that on this end, we didn’t open the devices as non-blocking,
although it can easily be done that way.

6.3.4 FIFO allocation

There are some rules as to how to handle FIFO allocation. When using the
POSIX interface, it is possible to do a normal


style of call, but only in the following situations:

1. You are running in the main() context, and not in a thread. This way,
if the call determines that there is no preallocated space to use for the
device, it is safe to block while the memory allocation work is handled.

2. You are in the real-time context, and RTCore is running with preallo-
cated buffers for your data. In this case, even if you never performed an
explicit O CREAT, the open is safe, because RTCore has space set aside
for use by the FIFO. In this case, you will be forced to use the default
FIFO size that RTCore was built to use. This is a legacy option for the
/dev/rtf* devices and does not apply to arbitrarily named devices. It
also depends on specific compilation settings in RTCore, and as such,
using arbitrarily-named devices with proper sizing during initialization
is recommended.

6.3.5 Limitations
The real-time kernel is not bound to operate synchronously with the normal
operating system thread. If the real-time kernel is under heavy load, it may
not be able to schedule time for the GPOS to pull the data from the FIFO.
Since the FIFO is implemented as a buffer, it is feasible that the buffer
might fill from the real-time side before the user-space thread gets a chance
to catch up. In this case, it is advisable to increase the size of the buffer
(with ftruncate()) or to flush the buffer from the real-time code to prevent
the user-space application from receiving invalid data.
The inverse of this problem is that the FIFO cannot be a deterministic
means of getting command data to the real-time module. The real-time
kernel is not forced to run the GPOS thread with any regularity, as it may
have more important things to do. A command input from a graphical
interface on the OS side through the FIFO may not get across immediately,
and determinism should never be assumed.
A subtler problem that must be overcome by the programmer is that
the data passed through the FIFO is completely unstructured. This means
that if the real-time code pushes a structure into the FIFO with something
like write(fd,&x,sizeof(struct x));, the user-space code should pull it
out on the other side by reading the same amount of data into an identical
structure. There has to be some kind of coordination between the two in
order to determine a protocol for the data, as otherwise it will appear to
be a random stream of bits. For many applications, a simple structure will
suffice, possibly with a timestamp in order to determine when the data was
sampled and placed in the FIFO.

6.4 Shared memory

FIFOs provide serialized access to data, which is appropriate for applications
that operate with data in a queued manner. However, many applications
require both userspace and real-time code to work with large chunks of data,
and this is not always convenient to stream in and out of a FIFO. RTCore
provides an option for these workloads: shared memory with mmap().

6.4.1 mmap()
If you are not familiar with mmap(), please refer to the RTCore or standard
man page for full details. The basic idea is that you open a file descriptor,
call mmap() on it with a given range, and it returns a pointer to an area in
this file or device. Under RTCore, this is used with a device. As we shall see,
both the real-time module and the user-space application both open the same
device, call mmap(), and can subsequently access the same area of memory.
The shared memory devices themselves are created with the POSIX
shm open(), destroyed with shm unlink(), and sized with ftruncate().
Please refer to the man pages for specific details - only an overview will be
given here.
First, the device must be created. This is done with shm open(), which
takes the name of the device, open flags, and optionally a set of permission
bits. If you are the first user and are creating the device, use RTL O CREAT.
Furthermore, if you want this device to be automatically visible in the GPOS
filesystem, specify a non-zero value for the permission bits. For example, the
following call creates a node named /dev/rtl shm region that is visible to
the GPOS with permission 0600, and returns a usable file descriptor attached
to the device:

int shm_fd = shm_open("/dev/rtl_shm_region",

RTL_O_CREAT, 0600);

Now you have a handle to a shared region - however, it doesn’t have a

default size. This must be set via a call to ftruncate, as in:


Note that this will round up the size of the shared region in order to align
it on a page boundary (page size is dependent on architecture but generally
4096 bytes). Also, as it does perform memory allocation, it must occur in
the initialization segment. Now you can use mmap() from either real-time
code or user-space code, as in:

addr = (char*)mmap(0,MMAP_SIZE,PROT_READ|PROT_WRITE,

The resulting addr can be used to address anything in that region up to

the size specified by the value passed to ftruncate().
Once the code is done with the area, it can call close() on the file
descriptor. The last user calls shm unlink() on the name of the device to
destroy the area and unlink it from the GPOS filesystem:


It is worth noting that these need not be in order: if a thread is still using
the area and another calls shm unlink(), the region will remain valid until
the last user calls close() on the file descriptor. RTCore does reference
counting on devices like shared memory and FIFOs in order to allow this

6.4.2 An Example
The theory and practice are very simple, so without further discussion, let’s
look at an example. First, the real-time application:
#include <rtl.h>
#include <time.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>

#define MMAP SIZE 5003

pthread t rthread, wthread;
int rfd, wfd;
unsigned char *raddr, *waddr;

void *writer(void *arg)

struct timespec next;
struct sched param p;

p.sched priority = 1; 20
pthread setschedparam(pthread self(), SCHED FIFO, &p);

waddr = (char*)mmap(0,MMAP SIZE,PROT READ|PROT WRITE,

MAP SHARED,wfd,0);
if (waddr == MAP FAILED) {
printf("mmap failed for writer\n");
return (void *)−1;

clock gettime(CLOCK REALTIME, &next); 30

while (1) {
timespec add ns(&next, 1000000000);
&next, NULL);
} 40

void *reader(void *arg)

struct timespec next;
struct sched param p;

p.sched priority = 1;
pthread setschedparam(pthread self(), SCHED FIFO, &p);

raddr = (char*)mmap(0,MMAP SIZE,PROT READ|PROT WRITE, 50

MAP SHARED,rfd,0);
if (raddr == MAP FAILED) {
printf("failed mmap for reader\n");
return (void *)−1;

clock gettime(CLOCK REALTIME, &next);


while (1) {
timespec add ns(&next, 1000000000);
&next, NULL);
printf("rtl_reader thread sees "
"0x%x, 0x%x, 0x%x, 0x%x\n",
raddr[0], raddr[1], raddr[2], raddr[3]);

int main(int argc, char **argv) {

wfd = shm open("/dev/rtl_mmap_test", RTL O CREAT, 0600); 70

if (wfd == −1) {
printf("open failed for write on "
"/dev/rtl_mmap_test (%d)\n",errno);
return −1;

rfd = shm open("/dev/rtl_mmap_test", 0, 0);

if (rfd == −1) {
printf("open failed for read on "
"/dev/rtl_mmap_test (%d)\n",errno); 80
return −1;

ftruncate(wfd,MMAP SIZE);

pthread create(&wthread, NULL, writer, 0);

pthread create(&rthread, NULL, reader, 0);

rtl main wait();

pthread cancel(wthread);
pthread join(wthread, NULL);
pthread cancel(rthread);
pthread join(rthread, NULL);
munmap(waddr, MMAP SIZE);

munmap(raddr, MMAP SIZE);

shm unlink("/dev/rtl_mmap_test"); 100
return 0;

First, we create and open a device twice, once for a reader thread and
once for a writer. A thread is spawned for each task, which actually performs
the mmap(). Note that the ftruncate() call is in the main() context, as it
needs to perform memory allocation to back the shared area. Further calls
such as mmap() that don’t cause allocations can happen anywhere.
The result of the mmap() call is a reference to the shared area, so once we
have the handles needed, can reference the area freely. One thread updates
the area every second, and the other reads it. Now we have an area that
is shared between real-time process, but what about userspace? The same
mechanism applies, as you can see here:
#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>

#define MMAP SIZE 5003 10

int main(void)
int fd;
unsigned char *addr;

if ((fd=open("/dev/rtl_mmap_test", O RDWR))<0) {
} 20

addr = mmap(0, MMAP SIZE, PROT READ, MAP SHARED, fd, 0);
if (addr == MAP FAILED) {
printf("return was %d\n",errno);

while (1) {
printf("userspace: the rtl shared area contains" 30
" : 0x%x, 0x%x, 0x%x, 0x%x\n",
addr[0], addr[1], addr[2], addr[3]);

munmap(addr, MMAP SIZE);

return 0;
} 40

There isn’t much work involved here. The code opens the device as a
normal file and calls mmap() on it just as before. This piece of code performs
the same action as the reader in the real-time space, dumping the values of
the first few bytes of data every second or so. As the writer updates the area,
both the real-time reader and the user-space program see the same changes.
As with other RTCore mechanisms, it is assumed that the real-time side
does the initial work of creating the shared area. This ensures that the real-
time code has a handle on what exists, and doesn’t have to optionally wait
for some user-space application to get around to doing the work first. If you
attempt to start the user-space code first, it will fail for multiple reasons:
First, the device isn’t there to be opened until shm open() is called from
real-time code, and even if it is there, if there are no registered hooks for the

6.4.3 Limitations
With shared memory, there is no inherent coordination between userspace
and real-time, as you can see in the example. Any rules governing usage of
the area must be added by your code. At any point, user code can overwrite
one area that a real-time thread needed to retain data in. In addition, one
can’t write to the area from real-time and then wait for it to be read and
cleared when Linux gets time to schedule your user-space process. This would
delay your real-time code indefinitely.
A little bit of synchronization can solve this type of problem. For example,
if you are using the area to get frames of data over to user-space, the real-time
thread could write the blocks at a given interval across the shared space, and
prepend each segment with a status byte indicating the state of the data.
The user-space program, when it is done reading or analyzing each segment,
can update that status byte to show that it is in use. This way the real-time
side can easily tell what areas are safe to overwrite.
This by-hand coordination can also easily allow you to direct real-time
code from user-space. One simple use is to allow control of real-time threads.
If both ends know that a certain area is meant to direct the actions of a real-
time thread, userspace code can easily flip a bit and indicate that a certain
thread should be suspended, resumed, or even spawned. This can be used
to (non-deterministically) direct nearly anything that the real-time code is
doing, or vice-versa.

6.5 Soft interrupts

On x86 platforms, running Linux, you will normally only find interrupts
numbered from 0 to 15 + NMI, as in the following:

0: 75636868 XT-PIC timer
1: 6 XT-PIC keyboard
2: 0 XT-PIC cascade
4: 106 XT-PIC serial
5: 157842206 XT-PIC eth0
8: 1 XT-PIC rtc
13: 1 XT-PIC fpu
14: 13637083 XT-PIC ide0

15: 12966 XT-PIC ide1

NMI: 0

On systems running RTCore, high interrupt numbers show up in /proc/interrupts

that range from 16-223. 2

0: 1398262 RTLinux virtual irq timer
1: 4 RTLinux virtual irq keyboard
2: 0 RTLinux virtual irq cascade
11: 4902708 RTLinux virtual irq usb-uhci, eth0
12: 0 RTLinux virtual irq PS/2 Mouse
14: 29546 RTLinux virtual irq ide0
15: 5 RTLinux virtual irq ide1
219: 12178 RTLinux virtual irq sofirq jitter test
220: 0 RTLinux virtual irq RTLinux Scheduler
221: 26 RTLinux virtual irq RTLinux FIFO
222: 1293626 RTLinux virtual irq RTLinux CLOCK_GPOS
223: 5124 RTLinux virtual irq RTLinux printf
NMI: 0
ERR: 0

The interrupts above IRQ 15 are the software interrupts as provided by

RTCore, although they still appear to be real hardware interrupts as far
as Linux is concerned. The handler for these interrupts is executed in the
GPOS’s kernel context, permitting a real-time thread to indirectly call func-
tions within the GPOS kernel safely.
This demands a little explanation - you cannot safely call GPOS kernel
functions from within the real-time kernel, as many of those calls will block
as the Linux kernel performs various tasks. This generally leads to dead-
lock, and has obvious implications for code that is supposed to be executing
deterministically. A safe way around this is to register a software interrupt
handler in the Linux kernel that waits for a certain interrupt. When the
real-time code requires a service to be done asynchronously in the GPOS
space, it signals an interrupt for this handler. The handler will not execute
in real-time, so the real-time code is not blocked in any way, but there is no
guaranteed worst case delay between calling the soft-interrupt handler and
RTCoreBSD systems have a limit of 32 soft IRQs.

actual execution. This is due to the same reason as before: The real-time
kernel may prevent the GPOS kernel from running for some time, depend-
ing on the current set of demands. However, for soft-real-time tasks, this is
generally a sufficient approach.
Again, it must be stressed that the GPOS is only seeing RTCore virtual
IRQs. The handlers the GPOS had installed before RTCore was loaded are
not affected but are now managed by the interrupt emulation layer, and thus
have become soft interrupts. This process of insertion is handled transpar-
ently to GPOS drivers.
This can be used to handle many inter-kernel communication mechanisms.
As previously discussed, rtl printf() uses this mechanism to pass data to
the kernel ring buffer. It could also serve as a way for real-time code to
allocate memory, by signalling a GPOS hander to safely perform the memory
management asynchronously.

6.5.1 The API

#include <rtl.h>

To setup a software interrupt only a few functions are needed. With

the rtl get soft irq() and rtl free soft irq(), interrupts are registered
and deregistered:

int rtl_get_soft_irq(void (* handler)(int, void *, struct rtl_frame *),

const char * devname);
void rtl_free_soft_irq(unsigned int irq);

The string passed as second argument to rtl get soft irq() is the
string name that will be associated with the IRQ, which on Linux will be
displayed in /proc/interrupts. It is a good idea to make this something
meaningful, especially if you are making heavy use of the soft IRQ handlers.
The interrupt number assigned is the first free interrupt number from
the top down. As such, there is little risk this will ever collide with a real
hardware interrupt. rtl get soft irq() will return -1 if there is a failure,
but should otherwise return the value of the IRQ registered.

void rtl_global_pend_irq(int irq);


To actually signal the interrupt to Linux the function rtl global pend
irq() is given the soft interrupt number. When the Linux kernel runs, it
will see this interrupt as pending and execute your Linux handler.
The interrupt handler declaration is just like the one you would use for a
regular Linux interrupt handler:

static void my_handler(int irq, void *ignore,

struct rtl_frame *ignore);

The same restrictions that apply to Linux-based hardware interrupt han-

dlers apply to soft interrupt handlers, with respect to things like synchro-
nization with Linux kernel resources from within an interrupt handler, etc.

6.5.2 An Example
This section wouldn’t be complete without a simple example. The soft IRQ
API is fairly small, so let’s look at a piece of code that uses all of the calls:

#include <rtl.h>
#include <time.h>
#include <stdio.h>
#include <pthread.h>

pthread t thread;
static int our soft irq;

void * start routine(void *arg)

{ 10
struct sched param p;
struct timespec next;

p . sched priority = 1;
pthread setschedparam (pthread self(),

clock gettime(CLOCK REALTIME, &next);

while (1) {
timespec add ns(&next, 500000000); 20

clock nanosleep (CLOCK REALTIME,

rtl global pend irq(our soft irq);
return 0;

static int soft irq count;

void soft irq handler(int irq, void *ignore, 30

struct rtl frame *ignore frame) {
soft irq count++;
printf("Recieved soft IRQ #%d\n",soft irq count);

int main(int argc, char **argv) {

soft irq count = 0;
our soft irq = rtl get soft irq(soft irq handler,
"Simple SoftIRQ\n");
if (our soft irq == −1) 40
return −1;
pthread create (&thread, NULL, start routine, 0);

rtl main wait();

pthread cancel (thread);

pthread join (thread, NULL);
rtl free soft irq(our soft irq);
return 0;
} 50

On initialization, we get a soft IRQ, providing the function that should

act as the handler, and a short name. If this call is successful, we spawn a
From this point on, our soft irq handler() is registered in the Linux
kernel as an interrupt handler, and we have a real-time thread in an infinite
loop. In this loop, it activates on half-second intervals, pending our soft
IRQ each time. These interrupts are caught by Linux, which executes our

soft irq handler(), which in turn dumps the current interrupt count via
printk(). On exit, the cleanup module() destroys our real-time thread as
usual, and then deregisters the soft IRQ handler.
As you can see, it isn’t very hard to interact with the Linux kernel in this
fashion. By simply pending interrupts, you can trigger your own handlers to
do some dirty work in the GPOS kernel, without sacrificing determinism in
your real-time code.
Chapter 7

Debugging in RTCore

No one likes to admit it, but most developers spend a large chunk of time
debugging code, rather than writing it. Bugs in RTCore can be even more
difficult to trace down: by inserting any debug traces or other mechanisms,
the system is changed, and all of a sudden the bug won’t trigger. (Timing de-
pendent bugs are of course possible in other systems, but are more prevalent
in real-time development.)
Additionally, all real-time code, if it is running inside the RTCore kernel,
has the potential to halt the machine (PSDD threads live in external address
spaces). Debugging userspace applications is simpler, as a failure will simply
result in the death of the process, not the kernel. Trying to tackle the bug
is usually just a matter of cleaning up and trying the program again. These
luxuries are harder to come by in the kernel.
Fortunately, RTCore provides a debugger that can often prevent pro-
gramming errors from bringing the system down. Loaded with the rest of the
RTCore, (it can be disabled through recompilation with source) the RTCore
debugger watches for exceptions of any kind, and stops the thread that caused
the problem before the system goes down.

7.1 Enabling the debugger

The debugger is enabled during configuration of RTCore, under selective
component building options.


7.2 An example
There are some important things to know about the debugger, but before
getting into the details, lets walk through a simple example to describe ex-
actly what we are talking about. As with anything else, the first step is a
hello world application:

#include <rtl sync.h>

#include <rtl.h>
#include <time.h>
#include <stdio.h>
#include <pthread.h>
#include <rtl debug.h>

pthread t thread;
pthread t thread2;
void * start routine(void *arg)
int i;
struct sched param p;
struct timespec next;

volatile pthread t self;

self = pthread self();

p . sched priority = 1; 20
pthread setschedparam (pthread self(), SCHED FIFO, &p);

if (((long) arg) == 1) {
/* cause a memory access error */
*(unsigned long *)0 = 0x9;

clock gettime(CLOCK REALTIME, &next);

for (i = 0; i < 20; i ++) {
timespec add ns(&next, 500000000); 30
clock nanosleep (CLOCK REALTIME,
7.2. AN EXAMPLE 95


printf("I’m here; my arg is %ld\n", (long) arg);
return 0;

int main(int argc, char **argv)

pthread create (&thread, NULL, start routine, (void *) 1); 40
pthread create (&thread2, NULL, start routine, (void *) 2);

rtl main wait();

pthread cancel (thread);

pthread join (thread, NULL);
pthread cancel (thread2);
pthread join (thread2, NULL);
return 0;
} 50

As with our other examples, we have an initialization context and a

cleanup context, with real-time code that is run in between. In our initial-
ization, we spawn two real-time threads running the same function, with an
error (access of illegal memory) that the first thread will hit, as its argument
is 1.
What happens when this module is loaded is that the first thread is
spawned and causes a memory access error. The debugger catches this, and
halts all real-time threads. This means that the second thread is also halted,
so there is no stream of ”I’m here” messages from the second thread, even
though it doesn’t have a problem.
The debugger prints a notice of the exception to the console, so that run-
ning ’dmesg’ will produce a line detailing which thread caused the exception,
where it was and how to begin debugging. Now we can start the debugger
and analyze the running code.
RTLinuxPro provides the real-time debugger module, and also GDB to
be used from userspace. Other debuggers are also usable, such as DDD, but
we will assume GDB for this example. Now that we have real-time code that

has hit an exception, we can run GDB on the object file that was saved for
us during compilation for debugging:

# gdb hello.o.debug

The next step is to connect GDB to the real-time system. This is accom-
plished using the remote debugging facility of GDB. The real-time system
provides a real-time FIFO for debugger communication:

(gdb) target remote /dev/rtf10

Remote debugging using /dev/rtf10

The RTCore debugger uses three consecutive real-time devices: /dev/rtf10,

/dev/rtf11, and /dev/rtf12. The starting FIFO can be changed with the
source version of the kit. Future versions of RTCore may use the named
FIFO capability of RTCore rather than the older /dev/rtf devices.
Now, in our case, we expect to see a a memory access violation. Once the
target remote /dev/rtf10 message is entered we should see GDB display
the following:

Remote debugging using /dev/rtf10

[New Thread 1123450880]
start\_routine (arg=0x1) at test.c:25
25 *(unsigned long *)0 = 0x9;

The above message tells us that we are indeed debugging through /dev/rtf10,
that the thread ID that faulted is 1123450880 and that the fault was in the
function start routine which was passed 1 argument named arg with value
0x1. This is all contained in source file test.c on source line 25. GDB also
displays the actual source line in question. The error that was generated was
indeed where we placed it.
Now, we examine the function call history. This may be necessary in
complex applications in order to determine the source of an error. Typing
bt will cause cause GDB to print the stack backtrace that led to this point.

(gdb) bt
#0 start\_routine (arg=0x1) at test.c:25
#1 0xd1153227 in ?? ()
7.2. AN EXAMPLE 97

Perhaps it is not clear what type of variables are being operated on. If
you wish to examine the type and values of some variables use the following

(gdb) whatis arg

type = void *
(gdb) print arg
\$1 = (void *) 0x1

To get a better idea of what other operations are being performed in this
function one can list source code for any function name or any set of line
numbers with:

(gdb) list start\_routine

17 volatile pthread_t self;
18 self = pthread_self();
20 p . sched_priority = 1;
21 pthread_setschedparam (pthread_self(), SCHED_FIFO, &p);
23 if (((long) arg) == 1) {
24 /* cause a memory access error */
25 *(unsigned long *)0 = 0x9;

It is also possible to disassemble the executable code in any region of

memory. For example, to view the start routine function:

(gdb) disassemble start\_routine

Dump of assembler code for function start\_routine:
0xd1137060 <start\_routine>: push %ebp
0xd1137061 <start\_routine+1>: mov %esp,%ebp
0xd1137063 <start\_routine+3>: sub $0x10,%esp
0xd1137066 <start\_routine+6>: lea 0xfffffff8(%ebp),%eax
0xd1137069 <start\_routine+9>: push %edi
0xd113706a <start\_routine+10>: push %esi
0xd113706b <start\_routine+11>: push %ebx
0xd113706c <start\_routine+12>: mov 0xd116f8a0,%edx
0xd1137072 <start\_routine+18>: mov %edx,0xfffffffc(%ebp)

0xd1137075 <start\_routine+21>: movl $0x1,0xfffffff8(%ebp)

0xd113707c <start\_routine+28>: push %eax
0xd113707d <start\_routine+29>: push $0x1
0xd113707f <start\_routine+31>: push %edx
0xd1137080 <start\_routine+32>: call 0xd11540e4
0xd1137085 <start\_routine+37>: add $0xc,%esp
0xd1137088 <start\_routine+40>: cmpl $0x1,0x8(%ebp)
0xd113708c <start\_routine+44>: jne 0xd1137098 <start\_routine+56>
0xd113708e <start\_routine+46>: movl $0x9,0x0
0xd1137098 <start\_routine+56>: lea 0xfffffff0(%ebp),%ebx
0xd113709b <start\_routine+59>: push %ebx
0xd113709c <start\_routine+60>: mov 0xd1161e8c,%eax
0xd11370a1 <start\_routine+65>: push %eax

Once you are done debugging, you may exit the debugger and stop exe-
cution of the process being debugged.

(gdb) quit
The program is running. Exit anyway? (y or n) y

RTCore will resume execution of all threads but will leave the application
that was being debugged stopped. To actually remove the application or
module you must stop it through the means that you are used to - either by
sending it a signal to stop it (perhaps by typing control-c in the window) or
removing the application module.

7.3 Notes
There are a few items to keep in mind when using the RTCore debugger.
Most of these items are short but important, so keep them in mind in order
to make your debugging sessions more effective.

7.3.1 Overhead
The debugger module, when loaded, catches all exceptions raised, regardless
of whether it is related to real-time code, GPOS, or otherwise. This incurs
some overhead: Consider for example the case where a userspace program
causes several page faults as it is working through some data. These page
7.3. NOTES 99

faults cause the debugger to do at least some minor work to see if the fault
is real-time related. This may lead to a slight degradation of the GPOS
performance, so if the GPOS really needs some extra processing, the debug-
ger module may be removed. In practice, however, the benefits of having
protection against misbehaving RT programs usually outweigh the overhead
incurred by the debugger.
For those who wish to avoid this overhead, the source version of RTCore
allows you to reconfigure the OS without the debugger for production use.

7.3.2 Remote debugging

Sometimes it is helpful to debug code remotely. This usually occurs when
the remote machine is a different architecture, and you don’t want to run
GDB on the target machine itself. (RTLinuxPro provides GDB, but there
may not be enough room on the target device, you may need some additional
tools, etc.) In this case, netcat is the preferred option.
Netcat provides the ability to pipe file data over a given port. In the
context of the RTCore debugger, this means that we can start netcat on the
target such that it essentially exports /dev/rtf10 over the network. Here is
an example of how to start netcat on the target machine:

nc -l -p 5000 >/dev/rtf10 </dev/rtf10 &

This starts netcat on the device, listening on port 5000, feeding data from
the network listener into the FIFO, and also pushing data coming out of the
FIFO out onto that same listener. In GDB running on the development
host, you can connect to the remote real-time system with target remote
targethost:5000, where targethost is the target machine name.

7.3.3 Safely stopping faulted applications

Once you are done analyzing the state of the system, the faulty app must be
stoppped. This can be done the following series of commands:

(gdb) CTRL-Z
[1]+ Stopped gdb
# killall app_name
# kill %1

Make sure to not trigger any GDB commands that would cause the real-
time code to continue, as it would just execute the faulty code again.

7.3.4 GDB notes

GDB has a problem with examining data in the bss section, so any variables
that were not explicitly initialized are not viewable from GDB. This may be
fixed in a later release, but in the meantime, it is simplest to initialize any
variables that will be analyzed with GDB.
The RTCore debugger can be used to debug the user-space (PSDD) RT
threads. Debugging threads running under the userspace frame scheduler is
also supported.
Under BSD, the RTCore symbols must be explicitly loaded. This can be
done with:

gdb hello.o
(gdb) symbol-file /var/run/rtlmod/ksyms
(gdb) target remote /dev/rtf10
Chapter 8

Tracing in RTCore

8.1 Introduction
Real-time programs can be challenging to debug because traditional debug-
ging techniques such as setting breakpoints and step-by-step execution are
not always appropriate. This is mainly due to two reasons:

• Some errors are in the timing of the system. Stopping the program
changes the timing, so the system can not be analyzed without modi-
fying it’s behavior.

• If the real-time program controls certain hardware, suspending the pro-

gram for analysis may cause the hardware to malfunction or even break.

RTCore implements a subset of POSIX trace facilities. Using them, it

is possible to analyze and evaluate real-time performance while a real-time
program is running. An introduction to the POSIX tracing as well as the
API definitions can be found in the Single UNIX Specification 1 .
The tracer aims to follow the POSIX Tracing API reasonably closely. One
notable difference is that most functions and constants have the RTL TRACE
or rtl trace prefix rather than posix trace . The API functions are de-
clared in the include/rtl trace.h file. To use the tracer, CONFIG RTL TRACER
option (”RTLinux tracer support”) must be enabled during configuration of
the system.
The tracing subsystem is provided in the rtl tracer.o module.


examples/tracer/rtl trace default.o is a module that creates a trace-

stream for each cpu and starts the tracing.
To see a quick demonstration, recompile the system with CONFIG RTL TRACER
on, load the RTCore, modules/rtl tracer.o, examples/tracer/rtl trace default.o,
examples/tracer/testmod.o modules, and run the examples/tracer/tracedump 0,
where 0 can be replaced with the desired CPU number. You should see the
dump of stream of events on the target CPU.

8.2 Basic Usage of the Tracer

There are two parties involved in tracing: the program being analyzed and
the analyzer process. When the program to be analyzed is instrumented for
tracing, it records the information about events encountered during execu-
tion. For each event, information about the current CPU, current thread id,
a timestamp, and optional user data is recorded into an in-memory buffer.
RTCore tracer provides built-in trace points for certain system events such as
context switches. The list of currently supported system events is provided
in the next section. In addition, an RTCore program can trace user-defined
events by invoking rtl trace event function with RTL TRACE UNNAMED USEREVENT
as the event id.
Before the tracing can be started, a POSIX trace stream must be created.
For an example of creating a trace stream, please see the examples/tracer/rtl trace default.o
The analyzer process is a GPOS (userspace) process that reads the event
records made by the trace subsystem. This is done with the functions
rtl trace trygetnext event, rtl trace getnext event, and
rtl trace timedgetnext event. An example of a trace analyzer process
can be found in examples/tracer/tracedump.c.

8.3 POSIX events

For every event, the following members of the struct rtl trace event info are

• posix event id is the event identifier.


• posix timestamp is struct timespec, represents the time of the event;

the clock used does not necessarily corresponds to any of the system

• posix thread id is the thread id for the current thread.

The list of currently supported events include:

• RTL TRACE OVERFLOW – The system detected an overflow. Some events

have been lost. It is necessary to reset the profiling in progress to avoid
getting incorrect results.

• RTL TRACE RESUME – The system has recovered from an overflow con-

• RTL TRACE SCHED CTX SWITCH – context switch. The accompaning data
is a void * pointer of the new thread.

• RTL TRACE CLOCK NANOSLEEP – the thread invoked the clock nanosleep

• RTL TRACE BLOCK– the thread voluntarily blocks itself (e.g., as a result
of a clock nanosleep call).

• RTL TRACE UNNAMED USEREVENT – this is a user-defined event. The data

can be arbitrary.

The events may be selectively enabled for tracing with the rtl trace set filter
function. For best performance, it is advisable to disable unneeded event
It is possible to perform function call tracing the help of the tracer. To do
this, the program to be analyzed must be compiled with the -finstrument-functions
option to gcc. For an example, please see examples/tracer/testmod.c in
the RTCore distribution. For modules compiled with -finstrument-functions,
two special events are generated:

• RTL TRACE FUNC ENTER – Function entry. event->posix prog address

represents the address in the program from which the function call has
been made. The data that accompanies this event is a void * pointer
to the function that has been called.

• RTL TRACE FUNC EXIT – Function exit. event->posix prog address

represents the address in the program from which the function call has
been made. The data that accompanies this event is a void * pointer
to the function that has exited.
Chapter 9

IRQ Control

Once RTCore is loaded, the GPOS does not have any direct control over
hardware IRQs - manipulation is handled through RTCore when there are
no real-time demands. However, RTCore applications can manipulate IRQs
for real-time control. We’ll now cover the basic usage of the IRQ control

9.1 Interrupt handler control

First, let’s look at the calls needed to manage interrupt handlers. Unless
otherwise specified, only the original GPOS interrupt handlers will handle
incoming interrupts, once there are no real-time demands. Here we cover
how to set up your own real-time interrupt handlers.

9.1.1 Requesting an IRQ

An RTCore application can install an IRQ handler with the call rtl request irq(
irq num, irq handler), where the irq handler parameter is a function of

unsigned int *handler(unsigned int irq, struct rtl_frame *regs);

This will hook the function passed as the second argument to rtl request irq
to be called when IRQ irq num occurs, much like any other IRQ handler.
When that function is invoked, it will run in interrupt context. This means
that some functions may not be callable from the handler and all interrupts


will be disabled. This handler is not debuggable directly, but as threads are,
it is safe to post to a semaphore that a thread is waiting on. The thread
will be switched to immediately so that operations can be performed in a
real-time thread context. Upon execution of any operation this thread that
causes a thread switch control will return to the interrupt handler.

9.1.2 Releasing an IRQ

An IRQ can be released with rtl free irq(irq num). This will unhook the
handler given to rtl request irq and it will not be called again. However,
it is possible that this interrupt handler is still executing on the current or
another CPU so care should be taken by the application programmer to
ensure this is not the case.

9.1.3 Pending an IRQ

Many applications require that a GPOS interrupt handler get an IRQ in
addition to the handler installed by rtl request irq once the handler is
done doing any work. The RTCore application might just be interested in
keeping track of when IRQs are coming in, or some simple statistic, before
allowing the GPOS to proceed and handle the work.
In these cases, the rtl global pend irq(irq num) function should be
used. This will pend the IRQ for the GPOS and once the RTOS is finished
the GPOS will process this as a pending IRQ.

9.1.4 A basic example

Let’s look at a basic example of an application that tracks incoming IRQs for
the GPOS. This grabs an IRQ with rtl request irq(), pends it during op-
eration with rtl global pend irq(), and releases it with rtl free irq().

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

#include <rtl sched.h>

#include <rtl pthread.h>
#include <rtl unistd.h>

#include <semaphore.h>

pthread t thread; 10
sem t irqsem;
int irq = −1;

void *thread code(void *t) {

static int count = 0;

while ( 1 ) {
sem wait (&irqsem);
printf("IRQ %d has occurred %d times\n", 20
irq, count);

return NULL;

unsigned int intr handler(unsigned int irq,

struct rtl frame *regs) {
rtl global pend irq(irq);
sem post( &irqsem ); 30
return 0;

int main(int argc, char **argv)

int ret;

if ( (argc != 2) | | strncmp( argv[1], "irq=", 4 ) ) {

printf( "Usage: %s: irq=#\n", argv[0] );
return −1; 40

irq = rtl atoi(&argv[1][4]);

sem init (&irqsem, 1, 0);


rtl pthread create( &thread, NULL, thread code, (void *)0 );

if ( (ret = rtl request irq( irq, intr handler )) != 0 ) {

printf("failed to get irq %d\n", irq);
ret = −1; 50
goto out;

rtl main wait();

rtl free irq(irq);

rtl pthread cancel( thread );
rtl pthread join( thread, NULL ); 60

return ret;

This code initializes and pulls the requested IRQ for tracking from the
passed-in arguments. It then spawns a thread that waits on a semaphore -
this thread will be printing out the IRQ count as they occur. As mentioned,
the handler will be invoked in interrupt context, and as such is fairly limited
in what it can do. Instead, the handler is hooked up but does no real work
except for the rtl global pend irq() for the GPOS and sem post() for
the thread.
As with the other examples, this can continue indefinitely. If it is hooked
to the interrupt for a hard drive, it will trigger a message with a count for
each interrupt triggered by the device. When the application is stopped with
a CTRL-C, it will release the IRQ handler, kill the thread, and unload as
usual. The GPOS IRQ handler will then be the only handler for the device.

9.1.5 Specifics when on NetBSD

RTCore on BSD UNIX also requires that you call rtl map gpos irq(bsd irq)
to obtain the IRQ identifier prior to using any RTCore interrupt control func-
tions. This function transforms the NetBSD IRQ identifier to an RTCore

NetBSD’s interrupt scheme changed considerably with the addition of

SMP support. This call maintains compatibility with RTCore interrupt han-
dling, and must be called before functions like rtl request irq().
The IRQ can be an ISA IRQ number (e.g., IRQ7 for LPT), or the return
value from the PCI interrupt lookup function pci intr map(). On success,
the function returns a value that can be used for rtl request irq() or other
RTCore IRQ functions. Failure to map to a NetBSD irq identifier will result
in a negative return value.

9.2 IRQ state control

Besides interrupt handlers, applications commonly need to control interrupt
states - specifically, whether interrupts are enabled or disabled. This is a
common means of synchronization for some tasks, although less intrusive
means of mutual exclusion are generally possible. Here we cover how to
enable and disable interrupts, save state, and similar tasks.

9.2.1 Disabling and enabling all interrupts

Generally, interrupts are disabled with a “hard” disable and “hard” enabled.
When RTCore is running, any enable and disable calls made by the GPOS are
virtualized so that they do not actually disable real interrupts. RTCore appli-
cations can directly disable hardware interrupts with rtl stop interrupts
and enable them again with rtl allow interrupts. These function calls
enable and disable interrupts unconditionally. Sometimes, it is preferable to
save the current state, disable interrputs, perform some critical work and
then restore the saved state. This can be done with the sequence below:

#include <rtl sync.h>

void function(void)
rtl irqstate t flags;

/* save state and disable interrupts */

rtl no interrupts( flags );

/* perform some critical operation. . . */ 10

/* restore the previous interrupt state */

rtl restore interrupts( flags );

These calls do disable the real interrupts, so they must be used with
care. Interrupts should never be disabled longer than absolutely necessary,
as events may be missed. The system may also run out of control if the ap-
plication never re-enables the interrupts. However, some applications cannot
handle any kind of jitter during certain operations, even the minimal over-
head of receiving an Ethernet IRQ, and must disable all interrupts for short
While this is a simple mechanism for synchronization, it cannot be stressed
enough that lighter mechanisms that do not disable interrupts are almost al-
ways favorable. Even if you think that the code protected with disabled
interrupts is not on an important path, it may be running on the same hard-
ware with another application that cannot tolerate that kind of behavior.
Please see section 5.7.3 for more details.
Specific IRQs can be enabled or disabled with rtl hard enable irq(irq num)
and rtl hard disable irq(irq num) respectively. This allows the user to
target a specific IRQ rather than the entire set.

9.3 Spinlocks
pthread spin lock includes an implicit IRQ save of state and interrupt dis-
able and pthread spin unlock includes an implicit restore of the interrupt
state at the time the corresponding pthread spin lock call.
This can be a problem in cases where the locks are released in a different
order from when they were taken. For example:

#include <rtl spinlock.h>

void function(void)
rtl pthread spinlock t lock1, lock2;
9.3. SPINLOCKS 111

/* initialize the locks */

rtl pthread spin init( &lock1, 0 );
rtl pthread spin init( &lock1, 0 ); 10

/* . . .assume interrupts are enabled here. . . */

/* acquire lock 1 */
rtl pthread spin lock( &lock1 );
/* . . .interrupts are now disabled here. . . */

/* acquire lock 2 */
rtl pthread spin lock( &lock2 );
/* the state in lock2 is interrupts “enabled” */ 20

/* release lock 1 */
rtl pthread spin unlock( &lock1 );
/* interrupts are now disabled since lock1 state was “enabled” */

/* release lock 2 */
rtl pthread spin unlock( &lock2 );
/* restore to a interrupt disabled state */
} 30

Note that state restore when releasing lock1 and lock2 is incorrect since
the locks were acquired in a different order than they were released in.
Chapter 10

Writing Device Drivers

This chapter presents examples of several classes of RTCore drivers and how
they interact with user-level programs and other RTCore applications.
Writing RTCore device drivers is very similar to writing normal RTCore
applications. Since all memory, including device memory, is accessible to
RTCore applications every RTCore program can potentially function as a
driver. Where drivers and normal RTCore applications differ is in how they
communicate with user-space (GPOS) applications and other RTCore pro-

10.1 Real-time FIFOs

The simplest way of communicating with a driver is through a realtime FIFO.
This is the simplest type of driver and is best used when one-way communi-
cation with the driver is needed since FIFOs only perform read() or write()
operations. An example is a motor controller that that only receives com-
mands (such as motor speed) or a simple data acquisition device that sends
information (such as the temperature of a probe).
FIFO operations and how to use them in RTCore applications is well
covered in previous chapters so it will not be covered here.

10.2 POSIX files

A more advanced, and more full featured, interface is through POSIX file
operations. Drivers can advertise their services to other RTCore applications,


and only RTCore applications, through files, just as with a standard UNIX
system. These files are managed by RTCore and are not directly accessible
from the GPOS environment. For example, a Linux application that opens
/dev/lpt0 is communicating with the Linux (non-realtime) parallel port
driver and not the RTCore driver. Conversely, a RTCore application that
opens /dev/lpt0 is communicating with the RTCore driver and not with the
Linux driver.
The example driver below provides a /dev/lpt0 file that can be used
through POSIX open(), read(), write(), ioctl, mmap and close() calls
from RTCore applications. Two files, /dev/lpt0 and /dev/lpt1 are created.
When an RTCore application performs any operations on these files the driver
prints a message.
#include <rtl.h>
#include <stdio.h>
#include <rtl posixio.h>

static ssize t rtl par read(struct rtl file *filp, char *buf,
size t count, off t* ppos)
printf("read() called on file /dev/lpt%dn", filp−>f priv);
return 0;
} 10

static ssize t rtl par write(struct rtl file *filp, const char *buf,
size t count, off t* ppos)
printf("read() called on file /dev/lpt%dn", filp−>f priv);
return 0;

static int rtl par ioctl(struct rtl file *filp,

unsigned int request, unsigned long l) 20
printf("ioctl() called on file /dev/lpt%dn", filp−>f priv);
return 0;

static int rtl par open(struct rtl file *filp)

10.2. POSIX FILES 115

printf("open() called on file /dev/lpt%dn", filp−>f priv);
return 0;
} 30

static int rtl par release(struct rtl file *filp)

printf("close() called on file /dev/lpt%dn", filp−>f priv);
return 0;

static off t rtl par llseek(struct rtl file *filp,

off t off, int flag)
{ 40
printf("lseek() called on file /dev/lpt%d, offset %d and flag %d\n",
filp−>f priv, off, flag);
return 0;

int rtl par mmap(struct rtl file *filp, void *a, size t b,
int c, int d, off t e, caddr t *f)
return 0;
} 50

int rtl par munmap(struct rtl file *filp, void *a, size t length)
return 0;

int rtl par unlink(const char *filename, unsigned long i)

printf("unlink() called on %s, should be /dev/lpt%d\n",
filename, i); 60
return 0;

int rtl par poll handler(const struct rtl sigaction *sigact)


printf("sigaction() with SIGPOLL called\n");
return 0;

int rtl par ftruncate(struct rtl file *filp, off t off) 70

printf("ftruncate() called on file /dev/lpt%d\n",
filp−>f priv);
return 0;

void rtl par destroy(int minor)

printf("destroy() called on minor %d, last user done\n", 80

static struct rtl file operations rtl par fops =

open: rtl par open,
release: rtl par release,
read: rtl par read,
write: rtl par write,
ioctl: rtl par ioctl, 90
llseek: rtl par llseek,
munmap: rtl par munmap,
mmap: rtl par mmap,
unlink: rtl par unlink,
install poll handler: rtl par poll handler,
ftruncate: rtl par ftruncate,
destroy: rtl par destroy

int main(int argc, char **argv) 100

rtl register dev( "/dev/lpt0", &rtl par fops, 0 );
10.2. POSIX FILES 117

rtl register dev( "/dev/lpt1", &rtl par fops, 1 );

rtl main wait();

rtl unregister dev( "/dev/lpt0" );

rtl unregister dev( "/dev/lpt1" );

return 0; 110

10.2.1 Error values

Drivers should report errors to the caller through handler return values for
each operation. For example, a driver that wishes to report a failure on during
a write() when there is no space remaining should return -RTL ENOSPC. The
POSIX file layer of RTCore will translate any return value less than 0 as an
error and will set errno appropriately. So, RTCore applications making
this write() call will receive a -1 return value and rtl errno will contain
RTL ENOSPC. This application and print the errno value through rtl perror.
A complete list of errno values can be found in include/rtl errno.h.

10.2.2 File operations

Any file operation that a driver does not wish to handle can be safely set to
NULL. The RTCore POSIX file layer will check for NULL handlers and will
report the appropriate error to the caller.
A list of ioctl() flags is in include/rtl ioctl.h. There are many flags
for specific devices and for general use. It is recommended that you not
create new flags unless one of those found in rtl ioctl.h does not fit your
Just as there is a list of ioctl() flags there is also a list of mmap() flags
in sys/mman.h. Only create new flags if you absolutely need them and none
of the pre-existing flags will fit.

10.3 Reference counting

Devices registered with the RTCore kernel are reference counted. If you
poked into include/rtl posixio.h, you have already seen an extra callback
named destroy(). RTLinuxPro as of version 1.2 has added the capability
of internally handling reference counts to all devices.
From a developer’s standpoint, this generally doesn’t require any extra
work in most situations, but it is worth stepping through the rules of how
RTCore handles these operations. We’ll use our previous simple example as
a reference point.
First, when you register a device with rtl register dev(), it registers
the name and sets the usage count to one. This causes the usage to drop back
to 0 when you call rtl unregister dev(). Also, any open() call increments
the device’s usage count, while close() decrements it again.
For devices that allocate and destroy areas, it is important that when the
last user detaches from the device, any resources associated with that device
are destroyed. Lets look at an example where the device driver maintained a
pointer to a shared region of memory, initialized to NULL, for the user. When
the first user calls open(), memory is allocated for use by the threads. When
the last user detaches from this device through close(), it is important that
the area is deallocated.
This is the reason for the destroy() callback in include/rtl posixio.h.
If the device has work that needs to be done when the last user exits on a
device, this hook is called. For the shared memory example, we would have
added a destroy callback defined as:

void example_destroy(int minor) {


This would have been passed in our fops structure with everything else.
When the last user exits, RTCore will call this function so that our memory
is safely deallocated, and not when other threads may be using it. Otherwise,
if some code was using the area when another called rtl unregister dev(),
the memory would be freed out from under active code.
RTCore provides a couple of routines to allow you to control these counts
by hand if needed: incr dev usage(int minor) and decr dev usage(int
minor). This is helpful if you need to work with device resources, and want

to make sure that the last user doesn’t exit and cause a destruction of all
device resources while this work is occurring. An alternative is to perform
a normal open() on the device, do the work, and then close(). This is
the simplest method, but some drivers may still derive some use from the
incr/decr routines.
There is one more factor to keep in mind when using these calls: the
rtl namei() call performs an implicit incr dev usage(). This is done in
order to simplify the process of safely allocating a device. For functions
that use rtl namei(), there must be a symmetric decr dev usage() call to
prevent an artificiallly raised usage count.

10.3.1 Reference counting and userspace

Another important note is that this reference count concept extends to de-
vices available to userspace processes. Consider the API call rtl gpos mknod(),
which allows RTCore code to create devices visible in the GPOS filesystem.
If you create a real-time device and a userspace-visible counterpart, there
may also be userspace processes bound to the area as well. With respect to
reference counting, these processes are treated the same way. . Each GPOS
open() raises the device count, and a GPOS close() decrements it. Even
if all of the real-time threads close and exit while one userspace maintains a
handle, the destruction of the resource waits until the last user closes. When
the userspace code exits, the callbacks will find that it was the last user, and
will free any resources just as if it was opened by a real-time thread.
For example, if we had added an rtl gpos mknod() call to the creation
of a shared memory device (and an rtl gpos unlink() to the cleanup), let a
userspace application also access the area, and then shut down our real-time
threads, the userspace application would still be able to access the area. Once
it exits, the close() would occur and bring our usage count to 0, causing
the destroy callback to execute and clean up.
Of course, this is a fairly simple example, but it doesn’t get much more
complicated in a real-world system. One difference is that most drivers en-
capsulate information on a per-device basis, so the destroy() logic needs to
use the minor parameter in order to determine what should be cleaned up.
However, all of the basic concepts apply, and RTCore does all of the work
for you internally. This allows for greater flexibility and simplicity in the
common driver.
Part II

RTLinux r Professional

Chapter 11

Real-time Networking

11.1 Introduction and basic concepts

For many applications, a simple machine running real-time code will solve a
problem sufficiently. Common problems are generally self contained, if not
simple, and there usually is no need to refer to external sources in real-time
for information. The configuration data comes from a user application in
the general purpose OS that is either interacting with the user or with some
normal data source.
However, more complex systems are appearing on the market that need
to access real-time data that may not be contained on the local system. An
example would be a robot with multiple embedded boards connected by an
internal network, where each machine needs to transfer processed information
between components in real-time. Visual information that has been processed
and converted into motion commands need to get to the board driving the
robot’s legs quickly, or it may stumble on an obstacle ahead.
RTLinux r offers real-time networking over both Ethernet and FireWire,
through a set of common UNIX network APIs. This allows users to com-
municate over FireWire links or Ethernet segments with the same calls one
would use anywhere else.
For more information on this package, please email

Chapter 12


12.1 Introduction
The standard RTLinux/Pro (RTCore) execution model may be described as
running multiple hard real-time threads in the context of a general purpose
OS kernel. This model is very simple and efficient. However, it also implies
no memory protection boundaries between real-time tasks and the OS kernel.
For some applications, the single name space for all processes may also be a
problem. This is where Process Space Development Domain (PSDD) comes
into play.
In PSDD, real-time threads execute in the context of an ordinary user-
space processes and thus have the benefits of memory protection, extended
libc support, easier developing and debugging. It is also possible to use it for
prototyping ordinary in-kernel RTCore modules.

12.2 Hello, world with PSDD

Let’s look at PSDD ”hello world” application (Figure 12.1). The main()
function locks all the process’s pages in RAM, creates an RTCore thread,
and sleeps. The real-time thread prints a message to the system log every
second. This periodic mode of execution is accomplished by obtaining the
current time and using it as a base for rtl clock nanosleep(3) absolute
timeout value.
There is a couple of interesting things about this program. First of all,
we need to use mlockall(2) to make sure we don’t get a page fault while


#include <rtl pthread.h>

#include <sys/mman.h>
#include <stdio.h>
#include <unistd.h>

rtl pthread t thread;

void *thread code(void *param) {
int i = 0;
struct timespec next;
rtl clock gettime(RTL CLOCK REALTIME, &next); 10 sec ++;
while (1) {
&next, NULL);
rtl printf("hello world %d\n", i++); sec ++;
return NULL;
int main(void) {
if (mlockall(MCL CURRENT | MCL FUTURE)) {
return −1;
rtl pthread create(&thread, NULL, &thread code, NULL);
while (1)
return 0;
} 30

Figure 12.1: PSDD ”hello world” program


all: psddhello


psddhello: psddhello.c
$(USER CC) $(USER CFLAGS) −opsddhello psddhello.c \
−L$(RTL LIBS DIR) −lpsdd

Figure 12.2: a Makefile for building PSDD ”hello world”

in real-time mode. Second, rtl /RTL prefixes are added to the names of
all RTCore POSIX functions and constants to distinguish them from other
user-space POSIX threads implementations, e.g. LinuxThreads/glibc.

12.3 Building and running PSDD programs

The above example program can be built using the Makefile shown in Fig-
ure 12.2. is a small makefile fragment that is found in the top-level
directory of RTCore distribution. It contains assignments of variables that
are useful in building RTCore applications. We need to link our program to
the PSDD library libpsdd.a.
To run the program, execute ./psddhello as root. You can use dmesg(8)
to view the messages from the program.

12.4 Programming with PSDD

The RTCore paradigm of strict separation between real-time and non-real-
time application components is still true with PSDD. Typically, the main()
program performs application-specific initialization, locks down process pages
in memory, creates some RT threads using rtl pthread create, and then
proceeds to interact with them or just sleeps. Note that real-time threads
execute in the same address space as the process, so shared memory is auto-
matically available.
As with kernel-level RTCore, you are restricted with what you can do in
the real-time threads. First of all, no general purpose OS (GPOS) system
calls are allowed in RT threads. If a function that results in a system call, for

example sleep(3), is called from a real-time thread, RTCore issues warning

message of the following form to the syslog:
Attempt to execute syscall NN from an RT-thread!
You can use reentrant functions from libc and other libraries, for example,
sprintf(3), and RTCore API functions.
In GPOS context, RTCore API functions are also allowed, as long as they
are non-blocking. For example, rtl clock nanosleep and rtl sem wait are
not allowed in GPOS context, while rtl sem post is OK.
Running hard real-time threads in user process context requires the pro-
cess memory map to be fixed while real-time threads are running. RTCore
enforces it by making all attempts to change the memory mappings fail after
the first real-time thread was created in a process. Let’s consider the ways
in which a user space process memory map may potentially change.

• Automatic stack growth. Ordinary, GPOS will attempt to automati-

cally map new pages to process stack as it grows. For PSDD program,
a fixed amount of stack (the default is 20 KBytes; this can be changed
with rtl growstack(int stacksize)) is allocated for the main() rou-
tine at the time the first RT thread is created. An attempt to use more
stack than the allocated amount will cause a segmentation fault.
• Dynamic memory allocation routines, e.g. malloc(), free(), rtl gpos malloc()
etc can only be used before the first RT-thread is started.
• Memory remapping calls: mmap(), shmat() etc. Same restrictions as
for the malloc().
• fork() and exec() should never be used in PSDD programs.

An implication of the above concerns RT-thread stacks. There is an im-

plicit malloc() call done in rtl pthread create() if the RT thread stack
has not been provided with pthread attr setstackaddr(). Therefore, one
has to use pthread attr setstackaddr() to provide stack space for all RT
threads (with a possible exception for the first thread).
Given the above, the correct initialization sequence of a PSDD application
is as follows.
1. Make an mlockall(MCL CURRENT|MCL FUTURE) call to lock down the
process memory.

2. Allocate all needed memory (including memory for the RT thread

stacks), establish shared memory and other mappings.

3. Optionally call rtl growstack(stacksize) function to specify the

amount of stack in bytes for the main() function.

4. Possibly perform additional application initialization.

5. Create application real-time threads. At the time of the creation of

the first RT thread in a program, the main() stack will be allocated
and then the process memory map will be fixed. Subsequent malloc(),
free(), mmap() etc. calls will fail.

Most RTCore API functions have rtl prefix added to their names to
avoid ambiguity. This may result in a confusion. For example, there are
both nanosleep() and rtl nanosleep() available in PSDD environment.
nanosleep() should only be used in GPOS context (functions called from
main()). On the other hand, rtl nanosleep() should only be called from
RT threads and never from GPOS. A single program may use both functions
in different contexts.

12.5 Standard Initialization and Cleanup

PSDD can be used for prototyping in-kernel RTCore modules. With PSDD,
it is often possible to enjoy convenience and safety of user space development,
and then simply recompile it for inclusion into the kernel for improved per-
formance. To this end, PSDD provides helper routines to facilitate migration
between user and kernel spaces. The provided psddmain.o object file pro-
vides a standard main() routine that arranges locking the process memory,
installs signal cleanup handlers, calls the module’s init module() routine
and enters an infinite sleep. On process exit, the user’s cleanup module()
is called. Depending on the way you compile the program source, you can
get either a kernel module, or a user program – or both.

12.6 Input and Output

PSDD programs have access to all of the available real-time devices. This is
accomplished with the standard POSIX IO functions. The PSDD versions

of those are rtl open, rtl read, rtl write, rtl ioctl, rtl close. Most
devices currently do not implement blocking IO, and thus require O NONBLOCK
flag to open them. The notable exception is /dev/irq. Commonly available
devices include:

• /dev/rtfN real-time FIFOs

These are FIFO channels that can be used for communication between
RT and non-RT components of the system. To create a FIFO, use
rtl open.


To set the size of an RT-FIFO to 4000, use:

rtl_ioctl(rt_fd, RTF_SETSIZE, 4000);

After that, rtl write call can be used to put data to the RT-FIFO.
The user side will use ordinary user-space open/read/write functions
to access the FIFO.

• /dev/irqN interrupt devices

These are intended for handling RT-interrupts in user-space context.
A blocking read from /dev/irqN blocks execution of a calling thread
until the next interrupt number N is received. RTL IRQ ENABLE rtl ioctl
must be called on the irq file descriptor to enable receiving of further

• /dev/ttySN RTCore serial driver

• /dev/lptN RTCore parallel driver

RTCore also provides rtl inb and rtl outb functions for accessing x86
IO space.

int init module(void) {

char ctemp;
char devname[30];
sprintf(devname, "/dev/rtf%d", FIFO NO);
fd fifo = rtl open(devname, O WRONLY|O CREAT|O NONBLOCK);
if (fd fifo < 0) {
rtl printf("open of %s returned %d; errno = %d\n",
devname, fd fifo, rtl errno);
return −1;
} 10
rtl ioctl (fd fifo, RTF SETSIZE, 4000);
fd irq = rtl open("/dev/irq8", O RDONLY);
if (fd irq < 0) {
rtl printf("open of /dev/irq8 returned %d; errno = %d\n",
fd irq, rtl errno);
rtl close(fd fifo);
return −1;
rtl pthread create (&thread, NULL, sound thread, NULL);
/* program the RTC to interrupt at 8192 Hz */ 20
save cmos A = RTL RTC READ(RTL RTC A);
save cmos B = RTL RTC READ(RTL RTC B);
/* 32kHz Time Base, 8192 Hz interrupt frequency */
ctemp &= 0x8f; /* Clear */
ctemp |= 0x40; /* Periodic interrupt enable */
return 0; 30

Figure 12.3: PSDD sound driver initialization


12.7 Example: User-space PC speaker driver

Let us consider a larger example, a PC speaker driver written with PSDD.
This example demonstrates interrupt handling in user space processing and
x86-style IO.
IBM PC compatible computers have a speaker that can be turned on
and off by switching a bit in IO port 0x61. So the idea is to convert the
incoming audio stream to a series of 1-bit samples to turn this bit on and off
to make the speaker produce the sound. Appendix G contains full source of
a user-space PC speaker driver. Here we’re going to examine the interesting
The input for our sound driver is a stream of 1-byte logarithmically en-
coded (ulaw -encoded) sound samples. The most common sampling rate for
such files is 8000 Hz. Rather than using a periodic thread, we will drive the
speaker using interrupts from the so-called Real-Time-Clock (RTC) avail-
able on x86 PCs. We program the RTC to interrupt the CPU at 8192 Hz
which is a close enough match for the sampling frequency. The example uses
RT-FIFO 3 to buffer samples.
The user module initialization function (Figure 12.3) creates and opens
RT-FIFO 3, sets the FIFO size to 4000, and opens the /dev/irq8 device.
Interrupt 8 is the RTC interrupt. Then it starts up the thread that is going
to do all data processing and programs the RTC to interrupt at the needed
The real time thread (Figure 12.4) enters an infinite loop. First it calls
rtl read blocks on the /dev/irq8 file descriptor. This causes the to block
until the next interrupt from RTC is received. Once this happens, an attempt
to get a sample from the RT FIFO is performed. If successful, the data
is converted from the logarithmic encoding, and the speaker bit is flipped
A very important moment here is that interrupt processing code has to
signal the device that it can generate more interrupts (”clear device irq”).
This code is device specific. In addition, the interrupt line needs to be
reenabled in the interrupt controller. The latter is accomplished by using
RTL IRQ ENABLE ioctl in the driver.
The cleanup routine (please see the listing in the Appendix) cancels and
joins the thread and closes file descriptors to deallocate interrupt and FIFO

void *sound thread(void *param) {

char data;
char temp;
struct rtl siginfo info;
while (1) {
rtl read(fd irq, &info, sizeof (info));
(void) RTL RTC READ(RTL RTC C); /* clear IRQ */
rtl ioctl(fd irq, RTL IRQ ENABLE);

if (rtl read(fd fifo, &data, 1) > 0) { 10

data = filter(data);
temp = rtl inb(0x61);
temp &= 0xfc;
if (data) {
temp |= 3;
rtl outb(temp,0x61);
return 0; 20

Figure 12.4: PSDD sound driver Real-time thread


12.8 Safety Considerations

PSDD environment provides a safe execution environment for hard real-time
programs. All arguments of the RTCore API functions are checked for valid-
ity, and memory protection is enforced. A hard real time program, however,
can potentially bring the system down simply by consuming all available
CPU time. To ensure that this does not happen, RTCore provides an op-
tional software watchdog that would stop all real time tasks in case of such
event. The watchdog may be enabled during configuration of the system.
Normally, root privilege is required to use PSDD facilities. Memory lock-
ing functions in GPOS also require root privilege. It is possible to allow non-
root users to run PSDD applications by reconfiguring the RTCore kernel,
however, this is potentially insecure, and can not normally be recommended.

12.9 Debugging PSDD Applications

The RTCore debugger (described in the “Debugger” chapter) allows pro-
grammers to debug PSDD applications in the same way normal RTCore
applications are debugged. For detail on the debugger please refer to the
“Debugger” chapter. Here, we will discuss some PSDD specific features of
the debugger.
One of the most common uses for debugging in large PSDD programs is
to trace the location of an illegal system call (a non-PSDD system call). As an
example we will walk through the program in rtlinuxpro/examples/psdd debug.
This application creates a realtime thread that eventually executes an ille-
gal system call readv(). Normally the RTCore system will print a message
telling the user that the thread has executed an illegal system call, disallow
the systemcall and then allow the thread to continue executing. Below is an
example of what is displayed on the console:

Attempt to execute syscall 145 from a RTLinux thread

PC: 0x080518c1

This allows applications that do make non-realtime system calls continue

to execute without stopping the application (and preventing it from perform-
ing it’s assigned duties). However, this warning message does not provide
very useful information for debugging the application and finding where the

offending system call is being made. In order to do this, one must config-
ure RTCore and enable the option “Put PSDD tasks in a debuggable state
when executing syscalls” and recompile RTCore. With this option enabled
a breakpoint will be inserted anywhere a PSDD task executes a non-PSDD
system call. The programmer can then attach the debugger to the applica-
tion and obtain a source listing, backtrace or any other useful information in
determining where the call was made.
When the same example is run with the above debugging option enabled
the console output looks like:

Attempt to execute syscall 145 from a RTLinux thread

PC: 0x080518c1
Inserted breakpoint in PSDD task where system call was made.
rtl\_debug: exception 0x3 in psdd\_debug (pid 21276, EIP=0x80518c2), psdd
thread id 0; (re)start GDB to debug

To debug this, one runs GDB as normal and connects to the debugger

root@host115<psdd_debug>$ gdb psdd_debug

GNU gdb (5.3)
Copyright 2003 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...
(gdb) target remote /dev/rtf10
Remote debugging using /dev/rtf10
[New Thread 0]
0x080518c2 in __readv (fd=0, vector=0x0, count=0) at ../sysdeps/unix/sysv/linux/readv.c:
51 ../sysdeps/unix/sysv/linux/readv.c: No such file or directory.
in ../sysdeps/unix/sysv/linux/readv.c

Now you can see that the call occurred in readv(), a part of libc. This
information by itself is not very useful since one already knows that the error
occurred in side libc as the system printed a message notifying us that system
call was made. To find out where that call was made from, a backtrace listing

(gdb) back
#0 0x080518c2 in __readv (fd=0, vector=0x0, count=0) at ../sysdeps/unix/sysv/linux/read
#1 0x080481e7 in start_routine ()
#2 0x0804831a in psdd_startup ()

From this one can tell that the function that called readv() was start routine()
which is the function that makes this call in the example.

12.10 PSDD API

Table 12.1 describes the functions provided by the PSDD API. The functions
are broken into groups according to functionality. For each function, it is
specified whether or not it can be used in RT-threads and in GPOS context.
The detailed description of these functions may be found in the manpages1 .

Table 12.1: The PSDD API

Function RT GPOS Group

rtl clock gettime Y Y Clock and sleep functions
rtl clock nanosleep Y N
rtl nanosleep Y N
rtl usleep Y N
rtl open Y2 Y File IO
rtl close Y Y
rtl ioctl Y Y
rtl lseek Y Y
rtl read Y Y
rtl write Y Y
rtl cpu exists Y Y SMP support
rtl getcpuid Y Y
rtl pthread attr init Y Y Thread creation attributes
rtl pthread attr destroy Y Y
rtl pthread attr setcpu np Y Y
rtl pthread attr getcpu np Y Y
Some of the functions appear in the manual pages without the rtl prefix. For ex-
ample, one may have to look up pthread cancel() rather than the rtl pthread cancel()
function. This will be addressed in the next release
rtl open can not be called from RT if O CREAT is specified for a FIFO
12.10. PSDD API 137

Table 12.1: The PSDD API

Function RT GPOS Group

rtl pthread attr setfp np Y Y
rtl pthread attr getfp np Y Y
rtl pthread attr setschedparam Y Y
rtl pthread attr getschedparam Y Y
rtl pthread attr setstackaddr Y Y
rtl pthread attr getstackaddr Y Y
rtl pthread attr setstacksize Y Y
rtl pthread attr getstacksize Y Y
rtl pthread create Y3 Y Thread control functions
rtl pthread cancel Y Y
rtl pthread exit Y N
rtl pthread join Y Y
rtl pthread equal Y Y
rtl pthread kill Y Y
rtl sched get priority max Y Y
rtl sched get priority min Y Y
rtl pthread self Y Y
rtl pthread idle Y Y
rtl pthread testcancel Y N
rtl sem init Y Y Semaphore support
rtl sem destroy Y Y
rtl sem getvalue Y Y
rtl sem post Y Y
rtl sem trywait Y Y
rtl sem wait Y Y4
rtl sem timedwait Y Y
rtl printf Y Y Message logging
rtl growstack N Y Stack allocation for the main()
rtl virt2phys5 N Y Address translation
Only if preallocated stack is specified with pthread attr setstackaddr
The whole of GPOS on the current CPU is blocked by rtl sem wait and
rtl sem timedwait. One has to make sure that this blocking is not done for too long;
otherwise the system will freeze.
rtl virt2phys is currently not supported on x86 Linux systems that have more than
4G of physical memory and use the PAE addressing.

12.11 Frame Scheduler

12.11.1 Introduction
Many real-time tasks contain periodic loops that do not require sophisticated
scheduling that RTCore is capable of providing. It is also often convenient
to separate scheduling details from program logic. This allows the real-
time systems developer to experiment with different scheduling parameters
without recompiling application programs. For such cases, PSDD provides a
user-space frame scheduler.
The frame scheduler supports hard real time scheduling of user space
tasks in terms of frames and minor cycles. There is a fixed number of minor
cycles per frame. Minor cycles can be either time-driven or interrupt-driven.
For each task, it is possible to specify task priority, the CPU to schedule
the task on, the starting minor cycle number within the frame, and the run
frequency in terms of minor cycles. (For example, if there are 10 minor cycles
in a frame, the starting minor cycle is 2, and the run frequency is 3, the task
will run at the following minor cycles: 2, 5, 8, 2, 5, 8, ...). If there are multiple
tasks ready at the start of a minor cycle, the task with a higher priority is
run first.
The tasks running under a frame scheduler are UNIX processes of the
following structure:

void rt thread(void*arg) {
/* thread executed in hard real-time */
while (1) {
/* block the execution until the next run */
fsched block();
user code();
int main(int argc, char **argv) {
struct fsched task struct task desc;
application init();
// initialize RT subsystem
fsched init(argc, argv, &task desc, NULL);

// start real−time thread

fsched run(rt thread, &task desc);

/* main thread sleeps forever; hard RT thread is running*/

while (1) { 20

The hard real time part of the user process is a PSDD RT thread and
therefore subject to the same restrictions, e.g., it can not use UNIX system
calls or non-reentrant library functions.
The task code itself does not contain any scheduling information. This
information is supplied when attaching a new task to the scheduler via com-
mand line interface. This approach allows the user to change schedules with-
out recompiling.
The power of PSDD can be seen from the fact that the frame scheduler
itself is implemented using hard real time user facilities of PSDD. Thus, quite
complicated real-time applications can be developed using the framework.
The source for the frame scheduler can be found in the PSDD distribution.

12.11.2 Command-line interface to the scheduler

The user manipulates the frame scheduler via the ”fsched” command. The
description of the supported formats and their meanings is provided below.

fsched create
- create and initialize the frame schedulers subsystem. This command
has to be issued before any other commands can be used. In the
current implementation, this starts the user-space scheduler process,
rtl fsched, and thus the directory containing rtl fsched must be
present in the user’s PATH variable.

fsched delete
- destroy the frame schedulers subsystem

fsched config -mpf minor cycles per frame -dt dt per minor cycle
[ -s sched id ] [ -i interrupt source ]

- configure a scheduler. Must be issued before the scheduler can be


fsched [ -s sched id] start

- starts the frame scheduler

fsched [ -s sched id ] stop

- stops the frame scheduler. If there are any user tasks attached to the
scheduler, they are detached and killed.

fsched [ -s sched id] pause|resume

- pauses and resumes the execution at the next minor cycle.

fsched attach [ -s sched id ] -n program -p priority -rf run freq

-smc starting -cpu cpu number -args "arguments passed to user
- attach a program to the frame scheduler. ”program” is the name or
path of the executable to start. ”priority” can lie between 1 (min) and
255 (max). If the CPU is not specified, the default CPU is used. The
task starts the execution at the ”starting” minor cycle number of the
next frame with ”run freq” frequency.

fsched info [ -s sched id ] [ -n average runs ]

- display the information about the schedulers and tasks. More infor-
mation is provided in the Section 12.11.4.

fsched reset [ -s sched id ] - reset scheduler statistics

fsched debug -p pid - break in the user process ”pid”. The break
happens at the next minor cycle, and all scheduling activity stops. Af-
ter that, it is possible to attach to the process with GDB and perform
source-level debugging. Please refer to the GDB example in the distri-
bution and to the Chapter 7 for more information.

If sched id is omitted, the default scheduler id of 1 is used. There may

be several slot schedulers running concurrently on the same machine. It is
up to the user to ensure that there are no conflicts between the schedules.

12.11.3 Building Frame Scheduler Programs

A typical Makefile structure for the frame scheduler user programs is as

all: engine


engine: engine.c
$(USER_CC) $(FSCHED_CFLAGS) -o engine engine.c $(FSCHED_LIBS) is a small makefile fragment that is provided with by the slot

scheduler. It contains assignments of various variables that encapsulate in-
clude paths, compiler switches and libraries.

12.11.4 Running Frame Scheduler Programs

First, it is necessary to make sure the fsched directory is in the PATH:

export PATH=$PATH:/directory_that_contains_fsched

Typically, running frame scheduler programs is accomplished with a shell

script like the following:

fsched create
sleep 1
fsched config -mpf 10 -dt 50
fsched attach -n user1 -rf 3 -smc 1 -p 1
fsched start

Here we create a frame scheduler, configure it to each 50 milliseconds

period with 10 minor cycles per frame. Then we attach a user program to
execute starting at minor cycle 1 of each frame with the frequency of 3 minor
cycles. The task runs at priority 1. Finally, the whole system is started with
the fsched start command.
It is often useful to keep a continuously updating window with the sched-
uler status display. This can be accomplished with the following command:

watch -n 1 fsched info -s 1


Every 1s: ./fsched info -s 1 Wed Aug 28 19:54:54 2002

FS: 1 baraban IRQ=0 MPF=10 DT=50ms started

CPU0 load 1%
3474 0 1 3 43.7 30.9 43.7 37.8 15 75 ./user1
3477 0 2 2 17.1 15.0 39.6 18.4 120 0 ./user2

Figure 12.5: Example of a frame scheduler monitoring window

This will run the fsched info command every second and display its output
full screen. An example of such a screen is displayed in Figure 12.5.
For each task, fsched info displays execution statistics: last, running
average, min and max execution times in microseconds, total number of
execution cycles, and number of overruns. Percentage of the current CPU
time used by the RT tasks is also displayed.

12.12 Conclusion
PSDD offers a simple means of writing complex real-time code in user space,
while still allowing for the normal RTCore approach of splitting real-time
logic from management code. Users with no knowledge of GPOS kernel
programming can use it for rapid prototyping and deployment of real-time
applications. Others may use it as a testbed for code that will eventually
run in kernel mode.
Chapter 13

The Controls Kit (CKit)

13.1 Introduction
During the implementation of controllers and control algorithms, one finds
oneself needing to handle parameter updates and alarms in a well behaved,
controlled manner. Moreover, these may sometimes be handled in the context
of a distributed application, as would be the case in dangerous environments.
For example, a fully automated assembly plant may need to be centrally
monitored and tuned from a remote location.
FSMLabs has addressed this problem by introducing the FSMLabs Con-
trols Kit (CKit). It is a collection of utilities for building control systems and
control interfaces using XML to describe control objects. The Controls Kit
provides software for exporting RTLinux control variables, including meth-
ods for defining composite objects, setting alarms and triggers, updating and
exporting control information to either a local or remote machine. The CKit
makes it easy to develop both the localized and distributed application via
a set of API interfaces and libraries as well as the highly portable XML
document standard.
This document provides an overview of the CKit by working through a
simple PID example. For more in depth documentation, please refer to the
CKit manual.


Figure 13.1: Detail of PID algorithm

13.2 CKit by Example: PID Control

The following example is designed to provide a good working overview of
the FSMLabs Controls Kit (CKit). As such, it describes a simple imple-
mentation of a SISO, proper, anti-windup, and high frequency limited PID
controller (Figure 13.1) through the use of the demo FSMLabs controls li-
brary (FSMCL) supplied by default with the CKit. The code assumes that
a separate device driver is already running and which initializes the AD and
DA hardware on demand.

13.2.1 Overview
The design of our example is as follows. There are two threads of inter-
est. The first, and lowest priority thread, is the Linux thread. The second,
and higher priority thread, is our control thread. The context of the lower
priority thread (Linux) will be used to initialize the hardware, initialize the
PID parameters, and also to run our trigger function which will be used to
command a RUN or STOP to our task.
The program will sit idly once loaded into kernel memory. Then, as soon
as the “Run” parameter is updated to TRUE, our program will begin exe-
cution on a periodic basis until the “Run” parameter is updated to FALSE.

At the core of the entire CKit design is the idea of entities. For this
program, we thus need three entities. These entities are listed as:

1. Test: This is our toplevel “group” entity. Its job is to group all of the
underlying entities into one coherent group for ease of manipulation.

2. Run: This is a boolean which we will additionally define as a trigger

so that whenever this parameter changes, it will run a trigger function
which will commence/terminate execution of the main periodic task.

3. myPID: this is an algorithm entity defined within the FSMLabs controls

library (libFSMCL.a). This entity contains all the parameters needed
for our strictly proper PID algorithm.

13.2.2 Coding
The global declarations and includes needed for this program are rather
First, we include the appropriate libraries which correspond to the core
RTLinux, the core CKit, and the FSMCL library functions, respectively. We
also define simple constants for use within our program:

#include <rtl.h>
#include <ck_module.h>
#include <FSMCL_core.h>

#ifndef FALSE
#define FALSE (0)
#define TRUE (1)

Next, we declare our entities for the PID, the boolean, and the test group,

static FSMCL_PID_entity MainPID;

static CK_entity RunBool;
static CK_entity TestGroup;

Finally, we make the necessary declarations for the runtime, high priority
thread, and the trigger function which will do the job of starting or stopping
the runtime task:

static pthread_t testThrd;

static void *threadSrc (void *arg);
static void triggerFcn(void *arg);

Initialization and Cleanup

Next come the module initialization and shutdown routines init module and
cleanup module, respectively. During the module initialization, we must
initialize our top level group, the Run boolean, and the PID entity. Finally,
we need to create the toplevel thread:
We initialize the toplevel group, where we register the variable “Test-
Group” as the toplevel directory by assigning “NULL” for its parenthood.
For the sake of interacting with the user, we give it a name of “Test”, and a
tooltip of “Test Group for PID Example” as follows:

"Test Group for PID Example",

Second, we initialize the PID entity, and assign it to belong to the “Test”
group by passing it the pointer to the “TestGroup” entity for its parenthood:

"Test PID controller",

Next, we initialize the runtime boolean, assign it to the Test group entity,
set it so that by default it will be set to FALSE, and associate the function
“trigger function()” to it, so that next time that this variable is updated,
the given function will run in the context of the Linux thread. We pass a
void pointer to this function which points to the boolean variable itself so
that it can be used within the function.

"Set to 1 to run, 0 to stop",
(void *)&RunBool);

We now have an option here. We can either initialize all of the parameters
for the PID controller here, or we can do it from the command line through
a script prior to running. For this example, we choose to do it within the
program as such:

/* Initialize the PID controller gains (min, max, current) */

CK_scalar_init_float_val(&(MainPID.K), 0.0, 10.0, 5.0);
CK_scalar_init_float_val(&(MainPID.Td), 10.0, 100.0, 50.0);
CK_scalar_init_float_val(&(MainPID.Tint), 1.0, 100.0, 25.0);

/* anti-windup and high frequency limit */

CK_scalar_init_float_val(&(MainPID.Tlim), 1.0, 100.0, 25.0);
CK_scalar_init_float_val(&(MainPID.N), 3.0, 20.0, 10.0);

/* Saturation model for actuator */

-10.0, 10.0, 0.0);
90.0, 110.0, 100.0);

/* And our reference point */

CK_scalar_init_float_val(&(MainPID.SetPoint), -1.0, 1.0, 0.0);

Finally, we initialize our main thread using the usual RTLinux API call:

pthread_create(&testThrd, NULL, threadSrc, 0);

which simply creates a thread, whose entry point is the function threadSrc().
We want to create a well behaved program. Consequently, we must
cleanup after ourselves. Thus, in the cleanup routine (cleanup module()),
we must destroy both our entities and also the main thread. To do so, we
simply type:

/* Cleanup our thread */

pthread_join(test, NULL);

/* Cleanup our entities */


Note that in this example, all we did was to cleanup the toplevel entity which
automatically cleans up all of its children.

Trigger Function
Next, we design the trigger function. As you may recall, each time that
the runtime flag “RunBool” is updated from the command line, this trigger
function will execute. Each time that it executes, we shall – depending on
the value of the variable – take appropriate actions. Thus, the entire trigger
function is written as:

static void triggerFcn(void *arg)

int val;
CK_entity *entity;

if (arg==NULL){

entity = (CK_entity *)arg;

val = CK_scalar_get_boolean(entity);

if (val){
CK_message("Commencing control run");
} else {
CK_message("Controller shut down by user");

In this example, we will generate an alarm of level 0 (eg. a low priority

message) through the CKit infrastructure each time the boolean is updated.
If the program is to be started, then we will kick start the main thread.
Otherwise, we simply print out a message and let the main thread shut itself

down after the next iteration. Notice that we could have added additional
if-then statements here that would detect if there has been a change in the
variable since the last time that this trigger function was run.

Our final coding concern is the design of the main thread. Here, we set up two
while loops. The innermost loop is nothing more than your standard periodic
loop as seen in many of the RTLinux example programs. The external while
loop is an infinite loop which will be used to run and stop our program.
Thus, in this design, the outermost loop will first check the value of the
runtime boolean. If the value is false, it will simply suspend the thread and
go to sleep until it is kick started by the trigger function. At that point, it
will initialize the internal values of the PID controller and will immediately
begin execution of the periodic component of the thread. Within the periodic
component, we will sample our A/D boards, calculate the PID control using
the sensed values, and finally write out the control output through our DA
The periodic component will continue to execute until the runtime boolean
becomes false. As soon as it becomes false, the code will exit the innermost
while loop. And then, prior to once again suspending itself, the code will
convert the thread into a non-periodic thread.
The entire code for the main thread is shown in what follows:

static void *threadSrc (void *arg)

struct sched_param p;
float Sens; /* Sensor value */
float Control; /* Control output */
float Period; /* PID controller period. Needed by
the PID function */

Period = 1.0; /* Desired Period, in seconds */

/* set the priority, give ourselves FP permission,

and make our task periodic */
p.sched_priority = 1;
pthread_setschedparam (pthread_self(), SCHED_FIFO, &p);

/* Main Loop */

while (1) {
if (!CK_scalar_get_boolean(&RunBool)){

/* Reset the internal variables for the PID controller */


/* now, make ourselves periodic */

pthread_make_periodic_np (pthread_self(), gethrtime(),

/* Periodic loop */
pthread_wait_np ();

// sample sensors from A/D

Sens = SampleAD();

// Run PID controller

FSMCL_PIDcontrol(&Sens, &Control, &Period, &MainPID);

// write out to D/A


/* make ourselves non-periodic */

pthread_make_periodic_np (pthread_self(), 0, 0);

return 0;

We thus compile the program as usual for an RTLinux program, but

we add the library: “-lFSMCL” to link against the FSMCL library. Once
compiled, we simply insert the core fsm ckit.o module and the resulting
test.o binary.

13.2.3 Running and Stopping the Program

Up to this point, our program is sitting idly in RTLinux space. That is, our
entities have been registered, but the main thread is currently suspended due
to the pthread suspend() call.

Thus, at this point, we must enable the CKit daemon which will moni-
tor alarms and commands. This is a daemon with rich functionality, such as
alarm actions, shell execution on behalf of RTLinux programs, and XML/Text
output streams. To start up the CKit daemon, we simply type:

ck_intercessor -A"mail -s’%L:%M-%R’"

at the command line. Immediately we will see a set of alarms appear which
show all the registered parameters up to that point. In this case, we are
going to trap all alarms of level 10 (Hexadecimal 0xA) so that as soon as
they occur, they shall be emailed to our technician with the actual message
appearing as the subject line. In this example, we are using the special
expansion variables %L,%M, and %R to denote the alarm level, the alarm
message, and the recommended action. For example, a critical alarm with
the message “Startup failure” and recommended action “Did you turn on the
power relay?” would appear in the subject line in techman’s email program
as: “10:Startup failure - Did you turn on the power relay?”. Please refer to
the CKit documentation for complete documentation on the rich capabilities
of this daemon. We can also redirect all output into a pipe, which can then be
networked to a remote machine via netcat or FSMLabs’ XML-RPC interface.
We next have two ways of proceeding. We can use the commnad line pro-
grams ck hrt op or ck hrt op net (the XML/Text utilities used for local
and networked parameter manipulation, respectively) which can be embed-
ded within any scripting language. Alternatively, and for simplicity’s sake,
we can use the standalone graphical front end ck hrt op GUI1 to view and
update the parameters in either local or remote machines. Thus, we start up
the graphical front end by typing:

ck hrt op GUI

or alternatively, if the PID controller is somewhere on the network:

ck hrt op GUI http://url to ckit box

Immediately, we obtain the front end as seen in Figure 13.2. Thus, we can
now proceed to view and update any parameters, and when ready, we simply
set the “Run” boolean to “TRUE”.
This is actually a Perl program written using the GtkPerl libraries.

Figure 13.2: The CK graphical user interface. Notice that this GUI parses
the XML generated by the underlying CK tools.

Alternatively, we could have used the shell commands ck hrt op or ck -

hrt op net to obtain either a human readable tree (Figure 13.3), or an XML
tree (Figure 13.4) which can be parsed by XML-aware programs. A third
type of output (not shown) produces a colon (:) delimetered table, useful for
parsing with simple scripts containing grep, cut, and sed commands.

13.3 Conclusion
You now have the basic tools needed to write your own distributed and local
controller algorithms and still manipulate parameters and alarms in real time.
From here, we highly recommend that you look at the CKit documentation
which more readily describes all of the features and capabilities for each of
the CKit commands.
13.3. CONCLUSION 153

Figure 13.3: Human readable tree output of all parameters.

Figure 13.4: XML output showing all parameters.

Part III


Appendix A

List of abbreviations

• AGP: Advanced Graphics Port

• API: Application Programming Interface

• APIC: Advanced Programmable Interrupt Controller

• APM: Advanced Power Management

• BIOS: Basic Input Output System

• CLI: CLear Interrupt flag

• CPU: Central Processing Unit

• DA/AD: Digital to Analog / Analog to Digital conversion

• DAQ: Data AcQuisition

• DMA: Direct Memory Access

• DRAM: Dynamic RAM

• EDF: Earliest Deadline First

• FAQ: Frequently Asked Questions

• FIFO: First In First Out

• FP: Floating Point


• GNU: GNU’s Not Unix (a recursive acronym)

• GPOS: General Purpose Operating System

• GUI: Graphical User Interface

• IDE: Integrated Device Electronics / Integrated Development Environ-


• IP: Internet Protocol

• IPC: Inter Process Communication

• IRQ: Interrupt ReQuest

• ISA: Industry Standard Architecture / Instruction Set Architecture

• ISR: Interrupt Service Routine

• NVRAM: Non-Volatile RAM

• OS: Operating System

• PCI: Peripheral Component Interconnect

• PIC: Programmable Interrupt Controller

• PLIP: Parallel Line Internet Protocol

• POSIX: Portable Operation System Interface eXchange

• RAM: Random Access Memory

• RFC: Request For Comment

• RMS: Rate Monotonic Scheduler

• ROM: Read Only Memory

• RPM: RedHat Package Manager

• RT: Real Time

• RTOS: Real Time Operating System


• SCSI: Small Computer System Interface

• SHM: SHared Memory

• SLIP: Serial Line Internet Protocol

• SMI: System Management Interrupt

• SMM: System Management Mode

• SMP: Symmetric Multi Processor

• SRAM: Static RAM

• STI: SeT Interrupt flag

• TCP: Transmission Control Protocol

• TCP/IP: Transmission Control Protocol / Internet Protocol

• TLB: Translation Lookaside Buffer

• UDP: User Datagram Protocol

• UP: Uni Processor

• XT-PIC: Old XT (Intel 8086) Programmable Interrupt Controller

Appendix B


• GPOS : General Purpose Operating System - The non-realtime oper-

ating system that RTCore is running as the lowest priority thread.

• RTCore : The core technology that powers RTLinuxPro and RTCore/BSD.

Viewing the system as running two operating systems, RTCore is the
RTOS that provides the deterministic control needed for real-time ap-

• EDF-scheduler : In this scheduling strategy, rather than using the

priority of a task to direct scheduling, the scheduler selects the task
with the closest deadline. In other words, it selects the task with the
least time left until it should be run. This scheduling strategy has
a ”flat” priority and is optimal for systems that handle asynchronous
events and non-periodic real-time tasks.

• FIFO Scheduler (SCHED FIFO): A First In First Out Scheduler is one

in which all processes/threads at the same priority level are scheduled
in the order they arrived on the queue. When the scheduler is called
the queue is checked for jobs of the highest priority is checked first. If
there is no thread runnable in the highest priority level the next level is
checked and so forth. A job scheduled with a policy of SCHED FIFO
can monopolize the CPU if it is always ready to run, and if there is no
mechanism to preempt it.

• Frontside Bus : This is the high speed bus that exists between the CPU
and memory.


• Host Bridge : The host bridge acts as a hub between most major
subsystems in a PC. It acts as an interface between CPUs, memory,
video, and other busses, such as PCI.

• North Bridge : The north bridge of a machine is the controller responsi-

ble for high speed operations. Bus components that require high speed
access, such as CPU to memory, PCI interaction, etc., are considered
part of the north bridge.

• PCI-Bridge: A logic chip (controller) connecting PCI busses. Access to

PCI devices runs over the PCI-Bridge from another subsystem to the
PCI bus where the peripheral device is located.

• PCI-ISA-Bridge: To support legacy ISA devices most PC’s have a ISA-

bus available via the PCI bus. The connecting controller is referred to
as PCI-ISA-Bridge.

• Rate Monotonic Scheduler (RMS) : An optimized scheduling policy

that is applicable if all tasks have a common periodicity, the criteria is
that all tasks fit the requirement

X Ci 1
< n(2 n − 1)
i=1 Ti

with C being the worst case execution time and T the period of each
task. As the task number n increases, the utilization converges to about
69%, which is not as efficient as other schedulers, but is preferable in
situations requiring static scheduling.

• South Bridge : The south bridge is the collection of controllers that

deal with slower component systems, such as serial controllers, floppy,
PCI-ISA bridges, etc.

• Asynchronous Signals: All signals that reach a thread from an external

source, meaning that a different thread of execution is posting a signal
via pthread kill(). Not all thread functions are async safe, as signals
may come at any time, even when the thread is not ready to be inter-
rupted. An asynchronous signal is delivered to the process and not to
a specific thread within a multithreaded process.

• Async Safe : A thread function that can handle asynchronous sig-

nals without leading to race conditions or synchronisation problems
(like blocking other threads indefinitely, leading to inconsistency in
global variables, etc.) are considered to be async safe functions. Func-
tions that are not async safe should be used with these possible side
effects in mind, meaning that the points at which they are safe to
call should be set appropriately. If a thread has a cancellation state
of PTHREAD CANCEL ENABLE and the cancellation type set to
PTHREAD CANCEL ASYNCHRONOUS then only async-safe func-
tions should be used, or signal handlers must be installed.

• Atomic Operation : A execution operation during which a context

switch can occur but state is preserved. During atomic operations it
is legal to assume that conditional variables, mutexes, etc. will be
unchanged, as proper locking has taken place. An atomic operation
behaves as if it where completed as a single instruction.

• Barrier : A thread synchronisation primitive based on conditional vari-

ables. A barrier is a point in the execution stream at which a set
of threads will wait until all threads requiring synchronisation have
reached it. After all threads have reached the barrier the condition
predicate is set TRUE and execution of all threads can continue.

• Busy wait loop : This is the act of waiting for an event in a running
process, using the CPU during the wait. Rather than being put to
sleep and rescheduled when the event occurs, the process spins doing
useless activity during the wait. This saves the overhead of scheduling
another process in and then having to reschedule the first.

• Cache flush : A cache flush involves writing the content of the cache to
memory or to whatever media is appropriate. This is only necessary on
hardware that does not support write through caching, or on SMP sys-
tems when a task moves between CPUs. Generally cache flushes have a
noticable influence on performance, especially for real-time operations.
As the flushed data must be refetched, the resulting delay from a flush
may result in jitter.

• Conditional Variable : A condition variable is a complex synchronisa-

tion mechanism comprised of a conditional variable and its predicate,

as well as an associated mutex. A thread acquires the mutex and then

waits until the condition is signaled, then performs the task depending
on the condition, releasing the mutex afterward.

• Context Switch : removing the currently running thread from the pro-
cessor and starting a different thread on this CPU. A context switch in
RTCore will only save the state of the integer registers unless floating
point is enabled. (See pthread attr setfp np.)

• Deadlock : Deadlock occurs if synchronisation primitives are used in-

consistently such that different threads are control are waiting for each
other to release resources. An example of such a setup is two threads
that each acquire a mutex that the other is waiting for. Since both
threads are blocked neither will free the mutex they hold and thus
both are blocked infinitely.

• Detached thread : When creating a thread with pthread create, the

attributes passed will by default make the thread joinable, meaning
that another thread can call join on it. This is commonly done to
catch return status and to finish the cleanup of the joinable thread. If
the thread’s state is set to PTHREAD CREATE DETACHED, then
all resources of this thread will be released when the thread exits, so
there is no return status, and no further synchronization needed.

• Embedded system : Operating systems and software for systems that

perform non-traditional tasks are refer1red to as embedded systems.
These systems span a wide range, but in general, embedded systems
are low memory systems and have restrictions with respect to available
mass-storage devices as well as minimal power.

• Global Variables : Global variables are those that are visible through-
out the application, rather than being restricted to a specific thread.
The variables themselves are not protected against concurrent activity,
and usually require some kind of synchronization primitives to ensure
safe handling.

• Handler : If an event should be handled in a specific way, a function or

thread will be programmed that can respond to this event (e.g. update
the pixels on the screen if the mouse moves). The association of this
function or thread with a specific event makes it to the handler of this

event. The handler must be explicitly registered with whatever will

detect the event, which is generally the operating system.

• Hard Real-time : Systems capable of guaranteeing worst case jitter and

a worst case response time, regardless of system load, qualify as being
hard real-time systems.

• InterProcess Communication : Commonly referred to as IPC, this refers

to any mechanism by which multiple processes can coordinate their ac-
tion. These mechanisms range from files, shared memory, semaphores,
and other shared resources.

• Interrupt : All processors have the capability to receive external signals

via dedicated interrupt lines. If an interrupt line is set the processor
will halt execution and jump to a interrupt handling routine. Interrupts
are electric signals caused by some hardware (peripherals like network
cards or IDE disk controllers) and have a software counterpart that is
part of the operating system.

• Interrupt Interception : In RTCore, no interrupt will directly reach

Linux’s interrupt handlers as every interrupt is handled by the inter-
rupt interception code first. If there is a real-time handler available
this handler will be called, otherwise it will be passed on to Linux for
handling when there is time.

• Interrupt Handler : The action that should be taken when an interrupt

occurs is defined in a kernel thread that is called upon recieving that
interrupt. The mapping of interrupt service routines to an interrupt
handler is done by the GPOS kernel as well as by a real-time thread.
This means that there can be two handlers for the same interrupt in
RTCore: In this case the real-time handler is called first, and only if
the task is not destined for the real-time handler will it be passed to
the GPOS interrupt handler for execution.

• Interrupt Mask : An interrupt mask determines which interrupts ac-

tually can reach the system. A bit mask is used to enable/disable

• Interrupt Response Time: On asserting an hardware interrupt the sys-

tem will call the associated interrupt service routine. The time from

the assertion of the interrupt (the electric signal being active on the
interrupt pin) to the point where this interrupt service routine is called
is defined as the interrupt response time. In practice the interrupt
response time is the time from asserting the interrupt until the sys-
tem acknowleges it or respond with a noticable action. This time is
therefore a little longer than the ”theoretical” interrupt response time.

• Instruction Set: To communicate with a specific hardware a set of op-

erations is used to directly communicate with the hardware (i.e. ma-
nipulate register content). This instruction set is hardware specific and
directly maps to machine code.

• Jitter : Jitter values represent the time variance in completion of an

event. This can represent anything from task completion variance to
real-time scheduling variance.

• Kernel : The kernel is the core of an operating system, providing the

basic resources and controlling access to these resources.

• Kernel Module : Modules are dynamically loaded capabilities, repre-

sented as object code that is linked into the kernel as needed. Once a
kernel module is loaded it is no different from a statically compiled in
kernel function.

• Kernel Thread : A kernel thread is similar to a normal thread in that it

represents a specific execution path, although in this case it runs within
the kernel. Kernel threads can have more restrictions than normal
threads, such as stack space, but offer the advantage of access to kernel
structures and subsystems.

• Latency : The time between requesting an action and the actual oc-

• Local Variables : As opposed to global variables, local variables are

only visible to a single thread or single execution scope.

• Multithreaded : A process that has more than one flow of control (In
general, there are also shared resources between these control paths).

• Mutex (Mutual Exclusion Object): A mutex is an object that allows

multiple threads to synchronize access to shared resources. A mutex

has two states: locked and unlocked. Once a mutex has been locked
by a thread all other threads that try to lock it will block until the
thread that acquired the mutex unlocks it. After this one of the blocked
threads will acquire it.

• Polling : Polling is the strategy of checking a condition or a condition

change while in a loop. Generally polling is an expensive strategy to
test conditions/condition changes.

• Priority Inversion : If a high priority thread blocks on a mutex (or

any other synchronisation object) that was previously locked by a low
priority task, this will lead to priority inversion: The lower priority
thread must gain a higher priority in order to guarantee execution
time. Otherwise another high priority thread may come along and
block execution of the lower priority task from running, preventing
freeing of the mutex and also stalling both the low and high priority
threads. and thus the mutex will not be unlocked. This scenario leads
to a lower priority task blocking a high priority task which is an implicit
priority inversion.

• Process : An entity composed of at least one thread of execution and

a set of resources managed by the operating system that are assigned
to this entity.

• Race Condition : If two executing entities compete for a resource and

there is no control ensuring safe access of the resource, unpredictable
behavior can occur. Race conditions can occur with any shared re-
sources if appropriate synchronization is not done by all entities that
require access to this resource.

• Re-entrant Function : A reentrant function will behave in a predictable

way even if multiple threads are using it at the same time. Any syn-
chronisation or access of global data is handled in a way that it is safe
to call these functions multiple times without fear of data corruption.

• RR Scheduler : In Round Robin Scheduling, there are different priority

levels available, and the ordering of threads/processes is the same as
in SCHED FIFO. The difference is that each scheduling entity has a
defined time-slice. If it does not exit or block before the time-slice

expires it will be preempted by the kernel and the next runnable thread
will be scheduled.

• Scheduler : The thread that handles the task-queue of the system, it

decides which process is to be run next after a process gives up the
CPU (either by exiting or blocking). The order in which the scheduler
will grant control to the CPU is described by the scheduling policy and
the priority assigned to each task.

• Scheduling Jitter : The variance of time between the point at which

a process requested scheduling and the time at which it actually runs.
In the common literature, scheduling jitter will sometimes refer to the
absolute deviation from the requested timing.

• Semaphore : The simplest form of a semaphore is equivalent to a mutex,

the binary semaphore. Associated with a semaphore is a counter that
defines the number of threads that can access the protected resource via
the semaphore. On access of the protected resource a thread acquires
the semaphore by decrementing the counter. If the counter reaches
0 no other threads can access the protected resource. When a thread
releases the protected resource it increments the semaphore again. The
underlaying mechanism is a conditional variable with the condition
counter > 0.

• Shared Memory : Memory accessed by more than one process. Shared

memmory can be accessed from a real-time process as well as from
a non-real-time (GPOS processes) for data exchange or for process
synchronisation. RTCore offers this mechanism, although there are
many types of shared memory systems.

• Signal : A numeric value delivered to a process via system call, de-

scribing an action to be taken by the process. The process may accept
a signal or mask it. If a process has a signal handler installed for the
signal number sent this handler will be executed on arival of the signal.
Signals issued from a thread within a process can be posted to a spe-
cific thread (via the thread id), while signals sent between prcesses are
received at the process level and are not directed to a specific thread.

• Signal Handler : To manage asynchronous signals at a process level

signal handlers are installed. These can then be called by the thread

that received the signal. Note that signal handlers are installed at the
process level and not at the thread level, so if a asynchronous signal is
received, it cannot be directed at a specific thread. Only signals issued
from within the process can be sent to specific thread thread IDs that
exist within that specific process.

• Sigaction : The sigaction call controls the actions taken upon reception
of a given set of signals. It sets up signal handlers for the action, among
other things.

• Soft Interrupts : All GPOS interrupts in RTCore are soft interrupts.

These interrupts are not directly related to hardware events but are the
hardware events that the real-time kernel has passed on to the GPOS
for management if there was no real-time interrupt handler assosciated
with the interrupt.

• Soft Realtime : Systems that can provide guaranteed average response

times to a class of events, but cannot provide a guaranteed maximum
scheduling variance.

• Spinlock : Waiting on a mutex can be done in a infinite loop, probing

for the mutex on every iteration. Spinlocks block the CPU, and are
thus ”expensive” operations if it is not ensured that the thread will
only spin for a very short time. A spinlock is efficient only if it is
not active longer than the amount of time it would take to perform a
context switch.

• Spurious Wakeup : If a thread is waiting on a conditional variable

and receives a signal it can be woken up and could return from the
wait, even if the condition the code was waiting for hasn’t really oc-
curred. To prevent race conditions due to spurious wakeups evaluation
for condition variables and condition wait is done in a loop. This way,
a spurious wakeup will be caught and the code will continue to wait
for the condition.

• Stack : To pass arguments and context information for function calls

each process and thread have a stack associated with it. This stack is
private to each process or thread.

• Symbol Table : A symbol table exists to map an mnemonic symbol to

an address or location where the contents are stored. In the context of
this book, this generally refers to the kernel symbol table, which maps
the addresses of kernel structures. In the general case, this can be used
anywhere, such as in a custom application.

• Synchronous Signal : Any signal that is the result of the threads ac-
tion, and occurs in direct reaction to that action. This is opposed to
asynchronous signals, which may arrive at any time and may not be
related to a thread action. An example of a synchronous signal would
be a thread that does a division by zero causing an FPE INTDIV. Syn-
chronous signals are delivered to the process that caused the it and not
to the specific thread.

• Task : In the process model a task represents a process, while in the

thread model a task can be a process (single threaded process) or a
thread of execution within a process. The task concept is used in the
V1 API of RTCore and was replaced by the thread based POSIX API.
Usage of the V1 API is not recommended.

• Task Priority : Every task will be called by the scheduler to execute in

an order specified by its priority level. POSIX specifies a minimum of
32 priority levels. Besides the priority of a task its scheduling policy
(SCHED FIFO, SCHED RR, etc) will influence when it is run as well.
A priority level only specifies its rank within the scheduling policy, and
is in relation to other tasks in that same scheduling class.

• Thread : Each independant flow of control within a process that has

an execution context associated with a instruction sequence that can
be executed. A thread is fully described by its context, instruction
sequence, and state.

• Thread Context : Each thread exists within the context of a process,

and this context is comprised of a set of resources, such as register
context, stack, private storage area, attributes and the instructions to
execute. It also includes the structures through which the thread is
accessible (thread structures and other management constructs).

• Ticks : Each cycle executed by a processor counts as a tick. Processors

such as the Pentium maintain a count of the number of these ticks that

have occurred since boot time. This value is useful as an indicator of

the length of time of a task, among other things.

• Timers: Hardware components on the motherboard or integrated into

the CPU that measure time, and can be the source of a periodic trigger.

• User-space Thread : These are threads created, synchronized, and ter-

minated using the threads API running in user space. They are as-
sociated with a kernel-scheduled entity (a process) and are not visible
to the kernel, but are rather scheduled by a separate scheduling entity
that lives within the process. The kernel only sees a single process and
will not distinguish between the different threads. Note that this differs
from userspace POSIX threads, in which each thread appears to the
kernel as a schedulable process.

• User Mode : A mode of operation where access is restricted to a subset

of available functions available to normal user-space processes. Kernel
level subsystems and special processor modes are not available to the
userspace code. A thread executing application code will do this in user
mode util it issues a system call, which the kernel will then execute on
behalf of the process. Once the system call completes and returns, user
mode is reentered.

• User-space : The memory space a user process exists in. Execution

of user code and all resources associated with a user mode operation
reside in user space. User space is left when a privileged operation is
executed (syscall).
Appendix C

Familiarizing with RTLinuxPro

This chapter is intended to provide a simple overview of how to interact with

RTLinuxPro. It is assumed that the kit is already installed as defined by the
instructions provided with the CD or download.
RTLinuxPro installs into a root directory point as defined by the version,
and cannot be safely moved from that point. This is because all of the tools
are built against a known location, and depend on the existence of that
point for configuration, libraries, and other information. This allows the kit
to be installed on any distribution, regardless of host glibc version, installed
utilities, etc.
This installation directory is different in every version, so that you can
keep multiple installations on the same machine. The root path is /opt/rtldk-x.y,
where x is the major release number, and y is the minor release number. The
remainder of this chapter will assume this path as the root location of all

C.1 Layout
The installation guide should walk you through the specifics of each directory,
so we will focus only on the important ones here:

1. bin - This is where all of the tool binaries exist. Make sure that the full
path to this directory is first in your $PATH variable, so that you use
these tools before any others in your path. (Running gcc -v should
report information including the /opt/rtldk-x.y path if this is con-
figured properly.


2. rtlinux kernel x y - This is the prepatched kernel to be loaded on the

real-time system, whether that is the development host or a target
board. You’ll need to take the precompiled image and install it like any
other kernel or rebuild it to suit your environment. (The x y value will
correspond to the major and minor kernel version numbers provided
with the release.

3. rtlinuxpro - All of the RTCore components (and optionally, code),

scripts, examples, drivers, and other RTCore-specific tools are con-
tained here.

There are many other directories, but these are central to our uses here.
Also of note is the docs directory, which contains API documentation and
more information on getting started with RTCore.

C.1.1 Self and cross-hosted development

There is a large divergence in ways that the kit is used, as each embedded
system has its own set of specific requirements. In many cases, the develop-
ment kit will be installed on an x86 machine, but built with compilers and
real-time code targetting a different architecture. In this case, you will need
to find the correct way of getting the kernel image, real-time modules, and
filesystem to the embedded device, whether it is a flash procedure, a BOOTP
configuration with an NFS root, or whatever is appropriate. For simplicity’s
sake, we will assume that the installation of the development kit is on the
machine that will be used for the actual real-time execution. (Although this
is not always an optimal solution.)
If you are doing cross-hosted development, refer to the installation in-
structions provided with RTLinuxPro. Provided in installation manual are
some example procedures for getting a kernel built and transferred to the tar-
get board. As each board varies slightly, we won’t be covering the specifics
of this procedure here.

C.2 Loading and unloading RTCore

RTCore must be loaded in order for any real-time services to be available.
The process is simple, once you have compiled and installed your the patched

kernel. This procedure is the same as with any other kernel, and as the proce-
dure is beyond the scope of this book, we suggest the normal Kernel-HOWTO
for details. Essentially, it involves changing to the rtlinux kernel 2 4 di-
rectory, building a kernel image suited to your device needs (or using the
provided stock image), and installing that image. This may be a local LILO
or GRUB update, or it might be a matter of making the image available for
TFTP by an embedded board. Again, we assume a self hosted development
environment for this example.
Once the system is running the correct kernel, RTCore can be loaded
with the following commands:

cd /opt/rtldk-x.y/rtlinuxpro
./modules/rtcore &

This will load the RTCore OS found in the installation, which will vary
based on any additional components installed. Unloading the OS consists of:

killall rtcore

C.2.1 Running the examples

Now you can run some of the examples, or run the regression test with
scripts/ in the rtlinuxpro directory. In order to get a feel
for the steps needed to load and run real-time code, it is worth stepping
through the examples provided. Each of the examples is built to be self-
explanatory, and can be run by just executing the local binary.
Once the application code is running, the test will generally continue
indefinitely. After you are done running it, the application can be stopped
with a CTRL-C.

C.3 Using the root filesystem

Included with the development kit is a root filesystem, built for the intended
target of the kit. This means that if you are using the generic PowerPC
version of the kit, there is a root filesystem containing a set of binaries built
for generic PowerPC root. This will provide a solid Linux installation for
use by the development system. For a generic PowerPC version, there is a

ppc6xx root directory inside the development kit tree, but this name will
vary by architecture.
For a generic x86 system as we described in the installation section, you
will likely use the host filesystem already present. However, if you intend
to use separate systems for development and testing (as advised) or are tar-
getting a different architecture completely, this option should help speed the
development process. For many embedded systems, it is much simpler to NFS
root mount a remote filesystem, at least for testing, rather than rebuilding
an image every time you generate new binary code for the target.
For most distributions, exporting this tree is a very simple exercise,
and is no different than exporting any other NFS mount point. Edit your
/etc/exports and run exportfs -a, and the tree will be available to the
embedded system. In many environments, it is also advisable to simply have
the device retrieve its kernel image from DHCP, and build the image such
that it automatically mounts the root filesystem from your development ma-
chine. If this is useful for your environment, the kernel build offers an option
to build the boot parameters in as automatic arguments to the bootstrap
process. For example, under a PowerPC build, under the kernel’s ’General
setup’ option, you can set the boot options to be (as one variable):

root=/dev/nfs nfsroot=${RTLDK_ROOT}/ppc6xx_root ip=bootp

The setting defined here sets the root filesystem to be NFS, and that this
root system lives on the machine at, under ${RTLDK ROOT}/ppc6xx root.
Be sure to replace ${RTLDK ROOT} with the correct location of your de-
velopment kit. The IP setting configures the device to use bootp in order to
configure itself, although there are many options that may be used in order to
configure the interface. These arguments are built into the kernel image, and
are passed as normal parameters to the boot process during a TFTP-based
boot, just as if they were typed in at a LILO prompt. For more information
on these options, refer to Documentation/nfsroot.txt inside the Linux ker-
nel tree. Many users need read/write access to the root filesystem, at least
for testing. Add a rw after the root=/dev/nfs to use this, or remount your
NFS root as read/write on the target, with:

mount -o remount,rw${RTLDK_ROOT}/ppc_root /

Once these options are configured and your remote device is using the
NFS mount as its root filesystem, you can do all development on the host
C.4. SUMMARY 177

machine with the development kit, and move the resulting images under
the NFS mount point. For simplicity, it is often useful to simply copy the
rtlinuxpro directory from the kit under the NFS root mount. While some
of these pieces should be removed for the final system, this simple copy will
allow access to all of the targetted real-time code needed for the embedded

C.4 Summary
This chapter on development kit usage might come across as being rather
light, and there is a reason for that. The development kit is intended to be
simple to use, and to allow a programmer to install a stable build environment
for producing real-time code. As such, this involves installing the kit, placing
the tools in your path, and then using the various components (such as the
root filesystem and modules) as needed. The intent is that configuration and
use is as simple as possible, allowing the programmer to concentrate on the
task at hand, and not have to be distracted by development tool problems
in the build environment. Specific details such as board configuration for
network boot is described in more detail in the devkit manual.pdf document
provided with RTLinuxPro.
Appendix D

Important system commands

This is an overview with some usage examples that might be helpful when
working with RTCore, and most UNIXes in general.

The bzip2 and bunzip2 command are for compressing and decompressing
.bz2 files. Bzip2 offers better compression rates than gzip, and is becoming
more popular on FTP sites and other distribution locations.

bunzip2 linux-2.4.0.tar.bz2
Decompress the compressed archive.
bzip2recover file.bz2
Recover data from a damaged archive.
bunzip2 -t file.bz2
Test if the file could be decompressed, but don’t do it.

The kernel logs important messages in a ring buffer. To view the contents
of this buffer you can use the dmesg command.

Dump the entire ring-buffer content to the terminal.
dmesg -c
Dump it to the terminal and then clear it.
dmesg -n level
Set the level at which the kernel will print a message to the console. Set-


ting dmesg -n 1 will only allow panic messages through to the console, but
all messages are logged via syslog.

find can be used to find a specific fileset in a directory hierarchy, and
optionally execute a command on these files.

find .
List all files in the current directory and below.
find . -name "*.[ch]"
List all files in the directory and below that end in .c or .h (c-sources and
header files)
find . -type f -exec ls -l {} ;
Find all regular files an display a long listing (ls -l) of them.
find . -name "*.[ch]" -exec grep -le min ipl {} ;
List all files in the directory hirarchy that contain the string ” min ipl”
in them.
find /usr/src/ -type f -exec grep -lie MONOTONIC {} ;
List all files below /usr/src/ that contain the string MONOTONIC, using
a non case sensitive search. (MONoToniC will also match.)

grep is for searching strings using regular expressions. Regular expres-
sions are comprised of characters, wildcards, and modifiers. Refer to the grep
man page or a book on regular expression syntax for details.

grep -e STRING *.c

Display all lines in all files ending with .c that contain STRING.
grep -ie STRING *
Display all lines in all files of the local directory that contain STRING in
upper or lower case. (e.g. StrInG)
grep -ie "void pthread" *.c
Find the string ”void pthread” in any .c files. The quotations are required
to enclose the blank in the string.
grep -e "char msg" *.c
Find the declaration of ”char *msg” in the .c files of the local directory.
The ”*” must be escaped so that it is not interpreted as a wild card.

The gunzip command will decompress .gz files. You will not need any
options for decompression. For compressing files use gzip.
gunzip FILE.gz
Decompress FILE.gz which will rename it to FILE in the process.
gunzip -c FILE.gz
Decompress FILE.gz and send the decompressed output to standard out-
put, and not to a file.
gzip FILE
Compress FILE renaming it to FILE.gz with the default compression
gzip -9 FILE
Compress FILE with the best compression ratio. (This will be slow)

Init is the master resource controller of a SysV-type Unix system. While
testing an RTCore system it is advisable to do this in init or runlevel 1, which
is a single user mode without networking and with a minimum set of system

init 1
Put the computer into runlevel 1. (No networking, single user mode)
init 2
After tests in init 1 ran successfully, bring the box back up to a multiuser
networking system. This need not be runlevel 2, and will vary depending on
which UNIX you are running. Check /etc/inittab to see which runlevel
is the system default runlevel. It should be safe to run it back up to the
runlevel set as default.
init 6
Reboot the system.
init 0
Halt the system.

Many but not all Linux systems have the locate database available, which
caches all filenames on the system and make it easier to locate a specific file.

locate irq.c

List all files on the system that have irq.c in them. (alpha irq.c and
irq.c.rej will also match.)
locate rtlinux | more
If the search is too general, output will be more than a screen. By piping
the output into the ”more” program a paged listing is displayed.

GNU make is one of the primary tools of any development under modern
UNIXes. Given a makefile, Makefile, or GNUmakefile, which are the default
name make will look for, make will build a source tree, resolving dependan-
cies based on the information and macros given in the makefile.

make -f GNUmakefile
This will run make with the provided my makefile, if the name isn’t one
of the default names that GNU make will search for.
make -n
This will instruct make only to report what it would do, but will not
actually process any source files.
make -k
Normally make will terminate on the first fatal error it encounters. With
the -k flag make can be forced to continue. This makes sense if within a
source tree multiple independant executables are to be built, and one wants
to build the rest even if the first fails.
make -p -f /dev/null
Show the database settings that make will apply by default without actu-
ally compiling anything. This will list all implied rules and variable settings.

objdump allows you to view symbol information in object files, such as
kernel modules. It also allows you to disassemble object files. This is helpful
when trying to locate what could be causing system hangs with a module.
The output is not very user friendly, but if short functions were used it should
not be too hard to read. If long functions with many flow control statements
were used, it can be close to unreadable.

Archives ending in .tar (Compressed tar files will end in .tar.gz tar.bz2
or .tgz) can be unpacked with tar. To make this operation safe, check what

is in the archive and where it will be unpacked to first!

tar -tf rtlpro cd.tar

List the files contained in the archive.
tar -tvf rtlpro cd.tar
Gives you more details on the files than the above command.
tar -xvf rtlpro cd.tar
Unpack the rtlpro cd.tar archive in verbose mode. This will list every file
as it is handled.
tar -cvf mycode.tar mycode
This will pack up the content of the directory ”mycode” into the archive
mycode.tar, naming every file as it is processed.

To get the exact system name of the running kernel, use the uname com-
mand. A common problem is that one has the wrong kernel running and
runs into ”funny” problems this way, such as symbol problems on module
load. Running uname should clear up any question of what kernel is active.

Print the system type. (e.g. ”Linux”)
uname -m
Print the system hardware type. (e.g. ”i586”)
uname -r
Print the kernel release name of the running system. (e.g. 2.4.16-rtl)
uname -a
Print the full system string, dumping all known information about the
running kernel.
Appendix E

Things to Consider

There are limits to RTCore introduced by underlying hardware that in prin-

ciple cannot be bypassed in software. These limits need to be considered
during a project’s planning stage, or at the latest, when selecting a hardware
platform. Example code provided in the RTLinuxPro package will perform
some basic tests on your system in order to judge its appropriateness for
real-time work, but here are some common stumbling blocks that developers
run into.

E.0.1 System Management Interrupts (SMIs)

Essentially all Pentium-class systems have the capability to use SMIs, but
it has only rarely been done. Some platforms, though, make heavy usage
of SMIs to control peripheral devices like sound cards or VGA controllers.
SMIs are interrupts that can’t be intercepted from software. Consequently,
RTCore will be prevented from operating correctly during SMI execution.
Preventing SMIs from controlling hardware is generally not a problem: Sim-
ply select peripheral devices that don’t require SMIs. This is a simple choice
for almost all ISA/PCI/AGP cards, although it is not necessarily true for
onboard controllers. In rare cases, SMIs have been ”used” to correct design
bugs in the hardware, so make sure to keep away from such hardware when
selecting components for a real-time system. Check with your vendor for


E.0.2 Drivers that have hard coded cli/sti

There are drivers available for Linux which may have hard coded cli/sti (clear
interrupt flag/set interrupt flag), that will cause problems in conjunction
with RTCore. A known example of such a problem is VMWare, which can
cause a significant disruption of a real-time system, up to the point where
it is unusable. To make sure a driver is not using cli/sti, use the command
objdump to check for cli instructions. Good candidates for such hard-coded
cli/sti’s are binary released drivers for Linux. Vendors of such drivers most
likely did not take real-time requirements into account when designing their
drivers. It is very important to perform this check on binary drivers - if you
don’t see delays during normal execution, it is not safe to assume that they
are not there, as that code path may not have been triggered yet.

E.0.3 Power management (APM)

Most laptops and some desktop PCs now have power management hardware
included, which optimizes power consumption by reducing system clock fre-
quency, memory timings and bus frequencies (Probably other things as well).
This has clear implications for real-time systems; if timers change their be-
havior during operations, consequences are at best hard to predict. In gen-
eral, a system that is using power management will not be very good for
real-time operations, unless these effects have been explicitly addressed by
drivers and the core real-time system. If this is not the case, power manage-
ment must be disabled.

E.0.4 Hardware platforms

RTCore is dependent on certain hardware behavior for successful operation.
This might be most obvious for peripheral devices like data acquisition boards
or stepper motor controller boards, but ”standard” hardware dependencies
are often overlooked.
Depending on application demands, hardware platform selection can make
or break the project. It is important to find a platform that can provide the
performance and accuracy you need for your application. With RTLinuxPro,
a targetted evaluation is recommended to ensure that the machine can pro-
vide appropriate accuracy, followed by a strict analysis of program demands
to see if the specifications of both hardware and software can be met.

There may be a lot of flexibility here, depending on need. For a very high
performance application, the range of possible architectures may be limited
to a small handful of target systems. But for others, such as a few low
frequency sampling threads, a much slower system will likely be cheaper and
still provide ample resources.
A prime example of this is the National Semiconductor Geode processor.
While it is x86-compatible, many operations are virtualized on the chip,
meaning that performance may degrade during certain time windows. (Video
and audio are two known problem areas.) While the chip goes into System
Management Mode (SMM) to handle this activity, hardware-induced jitter
may spike as high as 5 milliseconds. For many applications, this is the kiss
of death - but others may be fine with this level of jitter. For these lower
bandwidth applications, the Geode is a cheap x86-compatible solution for
the field, and the jitter is within specification.
It is because of these situations that FSMLabs recommends evaluation
and testing with the RTLinuxPro test suite, followed by hard analysis of
application demands. If the target hardware will suit the application, it may
not matter if there is potentially 5 millisecond jitter - in this case, a Geode
is perfectly suitable. The important part is that requirements are built and
understood so that the proper hardware and software configuration can be

E.0.5 Floppy drives

Typical PCs include a floppy drive. For historic reasons, the floppy drive is
able to change the bus speeds, and floppy drivers do CMOS calls to select
the floppy type. The consequence for RTCore is that scheduling jitter can
substantially increase if the floppy is accessed. The simplest solution is not
to have a floppy drive on a real-time system. If a floppy drive is absolutely
necessary, these effects must be taken into account. That is, you must test
your real-time threads while accessing the floppy drive, to ensure that it is
not disturbing real-time operation in an unacceptable manner.

E.0.6 ISA devices

In a PC-based system, compatibility with older hardware is available only
at a relatively high performance penalty. A typical example of this is the
PCI-ISA bridge that can be the dominating cause of worst-case system jitter

in a system. When making the decision of which hardware to select for a

real-time system, careful consideration should be made concerning the ISA
bus. If a system can be designed without an ISA-bus, it is the preferable
RTCore will not be able to compensate for slow hardware in all cases:
If the bus is controlled by an ISA device, RTCore will have to wait. When
an ISA DMA request occurs, everything is clocked down to the speed of
the ISA bus and waits until the transfer finishes. Thankfully, the ISA bus
is being removed entirely from many modern designs, so unless you have
specific hardware that is ISA-only, the entire issue can be avoided.

E.0.7 DAQ cards

Data acquistition is one of the more common tasks where one would use an
RTCore-based PC system. When designing such a system, it is important
to carefully consider which data acquisition peripherals should be used. De-
pending on project demands, there are a variety of cards offering varying
levels of capabilities.
Depending on the included hardware, some cards will sample data au-
tonomously, buffering into their own internal storage, and will only notify
the system when a large amount of data has been collected. For acquisition
rates that outpace the timing capabilities of the host machine, this can be
beneficial, but it usually comes with some kind of cost.
On the other hand, some cards operate in a polling manner, allowing you
to set up the sampling rate purely from real-time code. This has advantages
in that you can use simpler hardware without internal buffering, but it re-
quires the host machine to be capable of performing the requested sampling
rate. For most applications, the best choice is somewhere in the middle,
allowing some work to be done on the board, and some in RTCore, without
raising costs too much.
Appendix F

System Testing

When selecting a platform for RTCore the only way to know if it will really do
the job for you is to test on the actual hardware. While the test environment
does not have to exactly mirror the target environment, the closer you get to
the final system, the more reliable the results will be. In general the outcome
of these tests provide answers to three essential questions:

1. Can I run RTCore on this system at all or is it simply not suited?

2. What is the worst case scheduling jitter to be expected in this system


3. What interrupt response may I expect from my peripheral devices?

This will not eliminate the requirement to evaluate the final system setup
you wish to deploy, but it will minimize the risk of running into hardware
related problems during project development.

F.1 Running the regression test

The regression test will let you know if RTCore will operate properly on the
selected system. If the regression test fails the system is either not installed
correctly or is simply not suitable for real-time work. If the regression test
fails for you please contact
After you compile and load the updated kernel change to the rtlinuxpro
directory and issue the following command:


bash scripts/

This will then run a set of tests, which MUST all return the status [ OK
]. If any of the test fail, contact If the first test passed
without any errors, running the regression test for a while is generally helpful
also. To run the test in an infinite loop issue the following command again
from the rtlinuxpro directory:

bash scripts/

(This is the normal regression script run in an infinite loop, printing the
number of runs completed as it goes.)

F.1.1 Stress testing

The idea of testing under heavy load cannot be stressed enough. It is im-
portant to see how the real-time system behaves in terms of scheduling when
placed under varying loads. Some jitter shift will occur due to hardware load,
but this should be minimal.
Running the jitter test is easy - change directories to the rtlinuxpro/
measurement directory. There you will find a ’jitter.rtl’ binary. Run this,
and it will print out worst case timings seen so far on each CPU. It will only
print a message when a new value is seen.
At this point, the real-time threads are running, and it’s time to place the
machine under load. This can vary greatly by hardware, but here is a basic
start. It is important to put the machine under heavy interrupt, memory,
and CPU load. First, change to the kernel directory and run:

make dep
make clean
make -j 60

And/or on another console, log in and run several instances of:

find / 2>&1 > /dev/null &

This will add to the thrashing of the GPOS VM. Increase the number
of find processes running, preferably staggered in time so that the buffer
cache is cycled through. Add other applications until swapping is induced,

and the system is under heavy load. For SMP machines, it helps to have
more instances running, as each CPU thrashes over the PCI bus. For some
embedded boards, running make on the kernel is not feasible, but a high
number of finds is a good approximation, when done in conjunction with the
next step.
Finally, run a ping flood (ping -f machine) from another machine on
the network, at least over a 100Mbit wire. This, in addition to the disk work,
will put the machine under heavy interrupt load. Feel free to add more work,
as RTCore will handle the load. It is important that you determine what
your hardware is capable of doing with respect to real-time demands.
Many test applications from other vendors do very short tests, either in
time or number of interrupts (some as short as a minute). Due to potential
cache interactions and other factors, it is important that a test machine be
placed under load for a long time, preferably days. FSMLabs performs all
testing under heavy load for a period of at least 48 hours before releasing
any kind of performance numbers.

F.2 Jitter measurement

We just ran the jitter test, but let’s take a closer look at the mechanics of what
we’re after. Scheduling jitter is defined as the difference between the time
that code was scheduled to run and the actual point at which it executes.
Scheduling overhead and hardware latencies contribute to this value, and
while some jitter will nearly always happen, it is important to get a worst
case value for your hardware.
Note that most companies provide worst case numbers in terms of context
switch times. This number is in most cases useless except from a marketing
standpoint. Consider an absolute worst case in the real world, where a thread
needs to execute at time x. Context switch tells only a small part of the work
that must happen here. First, the timer interrupt needs to occur indicating
that it’s time to work. Then the scheduler needs to be woken in order to
determine who gets executed next. Finally there has to be a context switch
into the context of the thread that should be run.
RTCore is well optimized for these situations. When FSMLabs quotes
worst case numbers, it is the sum total of not just context switch, but all
three factors:

interrupt latency + scheduling overhead + context switch = worst case

The previously run test involves a real-time thread scheduled on each

CPU to be run every 1000 microseconds. At each scheduling point, it cal-
culates the delta of how far off it was from the expected scheduling point.
The code will perform 1000 samples per second, and will push the results to
a handler that may dump results through to the controlling terminal.
In general, the load on the machine will not affect the running of the
real-time code, although high interrupt rates will cause a shift in the worst
case value. As we just covered in F.1.1, the machine should be placed under
heavy load in order to get an accurate worst case value. Once you have
gathered the data you need, kill the userspace application and unload the
real-time module.
Appendix G

Sample programs

Here we’ve collected the source code for all of the examples used in the book.
They are also be provided in the RTLinux r /Pro distribution.

G.1 Hello world

#include <stdio.h>

int main(void)
printf("Hello from the RTL base system\n");
return 0;

G.2 Multithreading
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <rtl posixio.h>

pthread t thread;

void *thread code(void *t)


struct timespec next; 10
int count = 0;

clock gettime( CLOCK REALTIME, &next );

while ( 1 ) {
timespec add ns( &next, 1000*1000 );
&next, NULL);
if (!(count % 1000)) 20
printf("woke %d times\n",count);

return NULL;

int main(void)
pthread create( &thread, NULL, thread code, (void *)0 );
rtl main wait();

pthread cancel( thread );

pthread join( thread, NULL );

return 0;

G.3.1 Real-time component
#include <stdio.h>
G.3. FIFOS 195

#include <pthread.h>
#include <unistd.h>
#include <sys/mman.h>
#include <rtl posixio.h>

pthread t thread;
int fd1;

void *thread code(void *t) 10

struct timespec next;

clock gettime( CLOCK REALTIME, &next );

while ( 1 ) {
timespec add ns( &next, 1000*1000*1000 );


&next, NULL); 20

write( fd1, "a message\n", strlen("a message\n"));


return NULL;

int main(void)
mkfifo( "/communicator", 0666); 30

fd1 = open( "/communicator",


ftruncate(fd1, 16<<10);

pthread create( &thread, NULL, thread code, (void *)0 );

rtl main wait();


pthread cancel( thread );
pthread join( thread, NULL );

close( fd1 );
unlink( "/communicator" );

return 0;

G.3.2 Userspace component

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>

int main(int argc, char **argv) {

int fd;
char buf[255];
fd = open("/communicator", O RDONLY);
while (1) {

G.4 Semaphores
#include <stdio.h>
#include <pthread.h>
#include <unistd.h>
#include <rtl posixio.h>

#include <posix/semaphore.h>

pthread t wait thread;

pthread t post thread;
sem t sema;
void *wait code(void *t)
while (1) {
sem wait(&sema);
printf("Waiter woke on a post\n");

void *post code(void *t)

{ 20
struct timespec next;

clock gettime( CLOCK REALTIME, &next );

while ( 1 ) {
timespec add ns( &next, 1000*1000*1000 );

clock nanosleep( CLOCK REALTIME,

printf("Posting to the semaphore\n");
sem post(&sema);

return NULL;

int main(void)
sem init(&sema, 1, 0); 40

pthread create( &wait thread, NULL, wait code, (void *)0 );


pthread create( &post thread, NULL, post code, (void *)0 );

rtl main wait();

pthread cancel( post thread );

pthread join( post thread, NULL );

pthread cancel( wait thread ); 50

pthread join( wait thread, NULL );

sem destroy(&sema);

return 0;

G.5 Shared Memory

G.5.1 Real-time component
#include <rtl.h>
#include <time.h>
#include <fcntl.h>
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/mman.h>

#define MMAP SIZE 5003

pthread t rthread, wthread;
int rfd, wfd;
unsigned char *raddr, *waddr;

void *writer(void *arg)

struct timespec next;
struct sched param p;

p.sched priority = 1; 20
pthread setschedparam(pthread self(), SCHED FIFO, &p);

waddr = (char*)mmap(0,MMAP SIZE,PROT READ|PROT WRITE,

MAP SHARED,wfd,0);
if (waddr == MAP FAILED) {
printf("mmap failed for writer\n");
return (void *)−1;

clock gettime(CLOCK REALTIME, &next); 30

while (1) {
timespec add ns(&next, 1000000000);
&next, NULL);
} 40

void *reader(void *arg)

struct timespec next;
struct sched param p;

p.sched priority = 1;
pthread setschedparam(pthread self(), SCHED FIFO, &p);

raddr = (char*)mmap(0,MMAP SIZE,PROT READ|PROT WRITE, 50

MAP SHARED,rfd,0);
if (raddr == MAP FAILED) {
printf("failed mmap for reader\n");
return (void *)−1;

clock gettime(CLOCK REALTIME, &next);

while (1) {
timespec add ns(&next, 1000000000);
&next, NULL);
printf("rtl_reader thread sees "
"0x%x, 0x%x, 0x%x, 0x%x\n",
raddr[0], raddr[1], raddr[2], raddr[3]);

int main(int argc, char **argv) {

wfd = shm open("/dev/rtl_mmap_test", RTL O CREAT, 0600); 70

if (wfd == −1) {
printf("open failed for write on "
"/dev/rtl_mmap_test (%d)\n",errno);
return −1;

rfd = shm open("/dev/rtl_mmap_test", 0, 0);

if (rfd == −1) {
printf("open failed for read on "
"/dev/rtl_mmap_test (%d)\n",errno); 80
return −1;

ftruncate(wfd,MMAP SIZE);

pthread create(&wthread, NULL, writer, 0);

pthread create(&rthread, NULL, reader, 0);

rtl main wait();

pthread cancel(wthread);
pthread join(wthread, NULL);
pthread cancel(rthread);
pthread join(rthread, NULL);

munmap(waddr, MMAP SIZE);

munmap(raddr, MMAP SIZE);

shm unlink("/dev/rtl_mmap_test"); 100
return 0;

G.5.2 Userspace application

#include <stdio.h>
#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdlib.h>
#include <errno.h>

#define MMAP SIZE 5003 10

int main(void)
int fd;
unsigned char *addr;

if ((fd=open("/dev/rtl_mmap_test", O RDWR))<0) {
} 20

addr = mmap(0, MMAP SIZE, PROT READ, MAP SHARED, fd, 0);
if (addr == MAP FAILED) {
printf("return was %d\n",errno);

while (1) {
printf("userspace: the rtl shared area contains" 30
" : 0x%x, 0x%x, 0x%x, 0x%x\n",
addr[0], addr[1], addr[2], addr[3]);

munmap(addr, MMAP SIZE);

return 0;
} 40

G.6 Cancel Handlers

#include <rtl.h>
#include <time.h>
#include <unistd.h>
#include <pthread.h>
#include <stdio.h>

pthread t thread;
pthread mutex t mutex;

void cleanup handler(void *mutex) { 10

pthread mutex unlock((pthread mutex t *)mutex);

void *thread handler(void *arg)

pthread cleanup push(cleanup handler,&mutex);
pthread mutex lock(&mutex);
while (1) { usleep(1000000); }
pthread cleanup pop(0);
pthread mutex unlock(&mutex); 20
return 0;

int main(int argc, char **argv) {

pthread mutex init (&mutex, NULL);
pthread create (&thread, NULL, thread handler, 0);

rtl main wait();

pthread cancel (thread); 30

pthread join (thread, NULL);
pthread mutex destroy(&mutex);
return 0;

G.7 Thread API

#include <rtl.h>
#include <time.h>
#include <pthread.h>
#include <stdio.h>

pthread t thread1, thread2;

void *thread stack;

void *handler(void *arg)

{ 10
printf("Thread %d started\n",arg);
if (arg == 0) { //first thread spawns the second
pthread attr t attr;
pthread attr init(&attr);
pthread attr setstacksize(&attr, 32768);
pthread attr setstackaddr(&attr,thread stack);
pthread create(&thread2,&attr,handler,(void*)1);

return 0; 20

int main(int argc, char **argv) {

thread stack = rtl gpos malloc(32768);

if (!thread stack)
return −1;

pthread create(&thread1, NULL, handler, (void*)0);

rtl main wait();

pthread cancel(thread1);
pthread join(thread1, NULL);
pthread cancel(thread2);
pthread join(thread2, NULL);
rtl gpos free(thread stack);

G.8 One Way queues

#include <rtl.h>
#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <pthread.h>

pthread t thread1, thread2;

DEFINE OWQTYPE(our queue,32,int,0,−1);

DEFINE OWQFUNC(our queue,32,int,0,−1); 10
our queue Q;

void *queue thread(void *arg)

int count = 1;
struct timespec next;

clock gettime(CLOCK REALTIME, &next);

while (1) {
timespec add ns(&next, 1000000000); 20
clock nanosleep(CLOCK REALTIME,

if (our queue enq(&Q,count)) {

printf("warning: queue full\n");
void *dequeue thread(void *arg)
int read count;
struct timespec next;

clock gettime(CLOCK REALTIME, &next);

while (1) {
timespec add ns(&next, 500000000);
clock nanosleep(CLOCK REALTIME,
read count = our queue deq(&Q);
if (read count) {
printf("dequeued %d\n",
read count);
} else {
printf("queue empty\n");
int main(int argc, char **argv) {
our queue init(&Q);
pthread create (&thread1, NULL,
queue thread, 0);
pthread create (&thread2, NULL,

dequeue thread, 0);

rtl main wait();

pthread cancel (thread1); 60

pthread join (thread1, NULL);
pthread cancel (thread2);
pthread join (thread2, NULL);

G.9 Soft IRQs

#include <rtl.h>
#include <time.h>
#include <stdio.h>
#include <pthread.h>

pthread t thread;
static int our soft irq;

void * start routine(void *arg)

{ 10
struct sched param p;
struct timespec next;

p . sched priority = 1;
pthread setschedparam (pthread self(),

clock gettime(CLOCK REALTIME, &next);

while (1) {
timespec add ns(&next, 500000000); 20
clock nanosleep (CLOCK REALTIME,
rtl global pend irq(our soft irq);
return 0;

static int soft irq count;

void soft irq handler(int irq, void *ignore, 30

struct rtl frame *ignore frame) {
soft irq count++;
printf("Recieved soft IRQ #%d\n",soft irq count);

int main(int argc, char **argv) {

soft irq count = 0;
our soft irq = rtl get soft irq(soft irq handler,
"Simple SoftIRQ\n");
if (our soft irq == −1) 40
return −1;
pthread create (&thread, NULL, start routine, 0);

rtl main wait();

pthread cancel (thread);

pthread join (thread, NULL);
rtl free soft irq(our soft irq);
return 0;
} 50

G.10 PSDD sound speaker driver

* Interrupt handling in PSDD. Play sounds with PC speaker.
* Written by Michael Barabanov, 1997-2001
* Copyright (C) Finite State Machine Labs Inc., 1995-2002
* All rights reserved

#define RTL RTC A 10


#define RTL RTC B 11 10

#define RTL RTC C 12
#define RTL RTC D 13

#define RTL RTC PORT(x) (0x70 + (x))

#define RTL RTC WRITE(val, port) do { rtl outb p((port),RTL RTC PORT(0)); \
rtl outb p((val),RTL RTC PORT(1)); } while(0)
#define RTL RTC READ(port) ({ rtl outb p((port),RTL RTC PORT(0)); \
rtl inb p(RTL RTC PORT(1)); })

#include <rtl.h> 20
#include <rtl pthread.h>
#include <rtl unistd.h>
#include <rtl ioctl.h>
#include <rtl fifo.h>
#include <rtl time.h>
#include <rtl signal.h>
#include <rtl errno.h>
#include <stdio.h>
#include <arch/rtl io.h>
#include <sys/fcntl.h> 30

#define FIFO NO 3
#define RTC IRQ 8

int fd fifo;
int fd irq;
rtl pthread t thread;
char save cmos A;
char save cmos B;
static int filter(int x) {
static int oldx;
int ret;

if (x & 0x80) {
x = 382 − x;

ret = x > oldx;

oldx = x;
return ret; 50

void *sound thread(void *param) {

char data;
char temp;
struct rtl siginfo info;
while (1) {
rtl read(fd irq, &info, sizeof (info));
(void) RTL RTC READ(RTL RTC C); /* clear IRQ */ 60
rtl ioctl(fd irq, RTL IRQ ENABLE);

if (rtl read(fd fifo, &data, 1) > 0) {

data = filter(data);
temp = rtl inb(0x61);
temp &= 0xfc;
if (data) {
temp |= 3;
rtl outb(temp,0x61); 70
return 0;

int init module(void) {

char ctemp;

char devname[30];
sprintf(devname, "/dev/rtf%d", FIFO NO); 80
fd fifo = rtl open(devname, O WRONLY|O CREAT|O NONBLOCK);
if (fd fifo < 0) {
rtl printf("open of %s returned %d; errno = %d\n",
devname, fd fifo, rtl errno);
return −1;

rtl ioctl (fd fifo, RTF SETSIZE, 4000);

fd irq = rtl open("/dev/irq8", O RDONLY);

if (fd irq < 0) { 90
rtl printf("open of /dev/irq8 returned %d; errno = %d\n",
fd irq, rtl errno);
rtl close(fd fifo);
return −1;

rtl pthread create (&thread, NULL, sound thread, NULL);

/* program the RTC to interrupt at 8192 Hz */

save cmos A = RTL RTC READ(RTL RTC A); 100
save cmos B = RTL RTC READ(RTL RTC B);

/* 32kHz Time Base, 8192 Hz interrupt frequency */

ctemp &= 0x8f; /* Clear */
ctemp |= 0x40; /* Periodic interrupt enable */

(void) RTL RTC READ(RTL RTC C); 110

return 0;

void cleanup module(void) {

rtl pthread cancel (thread);
rtl pthread join (thread, NULL);

RTL RTC WRITE(save cmos A, RTL RTC A);

RTL RTC WRITE(save cmos B, RTL RTC B); 120
rtl close(fd irq);
rtl close(fd fifo);
Appendix H

The RTLinux Whitepaper

This is a fairly old whitepaper on the concepts behind RTLinux

r , but it is
still fairly accurate and provides a good summary of the general principles
behind its development.
RTLinux Whitepaper
Victor Yodaiken
RTLinux is an operating system in which a small real-time kernel co-
exists with the Posix-like Linux kernel. The intention is to make use of
the sophisticated services and highly optimized average case behavior of a
standard time-shared computer system while still permitting real-time func-
tions to operate in a predictable and low-latency environment. At one time,
real-time operating systems were primitive, simple executives that did little
more than offer a library of routines. But real-time applications now rou-
tinely require access to TCP/IP, graphical display and windowing systems,
file and data base systems, and other services that are neither primitive nor
simple. One solution is to add these non-real-time services to the basic real-
time kernel, as has been done for the venerable VxWorks and, in a different
way, for the QNX microkernel. A second solution is to modify a standard
kernel to make it completely pre-emptable. This is the approach taken by
the developers of RT-IX (Modcomp). RTLinux is based on a third path in
which a simple real-time executive runs a non-real-time kernel as its lowest
priority task, using a virtual machine layer to make the standard kernel fully
In RTLinux, all interrupts are initially handled by the Real-Time kernel
and are passed to the Linux task only when there are no real-time tasks


to run. To minimize changes in the Linux kernel, it is provided with an

emulation of the interrupt control hardware. Thus, when Linux has disabled
interrupts, the emulation software will queue interrupts that have been passed
on by the Real-Time kernel. Real-time and Linux user tasks communicate
through lock-free queues and shared memory in the current system. From
the application programmers point of view, the queues look very much like
standard UNIX character devices, accessed via POSIX read/write/open/ioctl
system calls. Shared memory is currently accessed via the POSIX mmap
calls. RTLinux relies on Linux for booting, most device drivers, networking,
file-systems, Linux process control, and for the loadable kernel modules which
are used to make the real-time system extensible and easily modifiable. Real-
time applications consist of real-time tasks that are incorporated in loadable
kernel modules and Linux/UNIX processes that take care of data-logging,
display, network access, and any other functions that are not constrained by
worst case behavior.
In practice, the RTLinux approach has proven to be very successful.
Worst case interrupt latency on a 486/33Mhz PC measures well under 30mi-
croseconds, close to the hardware limit. Many applications appear to benefit
from a synergy between the real-time system and the average case optimized
standard operating system. For example, data-acquisition applications are
usually composed a simple polling or interrupt driven real-time task that
pipes data through a queue to a Linux process that takes care of logging
and display. In such cases, the I/O buffering and aggregation performed by
Linux provides a high level of average case performance while the real-time
task meets strict worst-case limited deadlines.
RTLinux is both spartan and extensible in accord with two, somewhat
contradictory design premises. The first design premise is that the truly time
constrained components of a real-time application are not compatible with
dynamic resource allocation, complex synchronization, or anything else that
introduces either hard to bound delays or significant overhead. The most
widely used configuration of RTLinux offers primitive tasks with only stati-
cally allocated memory, no address space protection, a simple fixed priority
scheduler with no protection against impossible schedules, hard interrupt dis-
abling and shared memory as the only synchronization primitives between
real-time tasks, and a limited range of operations on the FIFO queues con-
necting real-time tasks to Linux processes. The environment is not really as
austere as all that, however, because the rich collection of services provided
by the non-real-time kernel are easily accessed by Linux user tasks. Non-

real-time components of applications migrate to Linux. One area where we

hope to be able to make particular use of this paradigm is in QOS, where it
seems reasonable to factor applications into hard real-time components that
collect or distribute time sensitive data, and Linux processes or threads that
monitor data rates, negotiate for process time, and adjust algorithms.
The second design premise is that little is known about how real-time
systems should be organized and the operating system should allow for great
flexibility in such things as the characteristics of real-time tasks, communi-
cation, and synchronization. The kernel has been designed with replaceable
modules wherever practical and the spartan environment described in the
previous paragraph is easily improved (or cluttered, depending on one’s point
of view). There are alternative scheduling modules, some contributed by the
user community, to allow for EDF and rate-monotonic scheduling of tasks.
There is a semaphore module and there is active development of a richer set
of system services. Linux makes it possible for these services to be offered by
loadable kernel modules so that the fundamental operation of the real-time
kernel is run-time (although not real-time) reconfigurable. It is possible to
develop a set of tasks under RTLinux, test a system using a EDF schedule,
unload the EDF scheduling module, load a rate monotonic scheduling mod-
ule, and continue the test. It should eventually be possible to use a memory
protected process model, to test different implementations of IPCs, and to
otherwise tinker with the system until the right mix of services is found. 1
This capability has been introduced with PSDD. The scheduling example is further
defined with the userspace frame scheduler described in detail in the PSDD chapter.