Professional Documents
Culture Documents
22 May 2022
1
This course continues a course from the previous semester that provided a basic understanding
of how algorithms that are computable on computers are executed on them.
The aim of this course is to show in the first part how the additional software layer operating
system can manage all the known components of a computer in such a way that they can be
used by one or more application programs in parallel without having to deal with each possible
interaction separately in each program, and in the second part how autonomous computers can
communicate and interact with each other through fixed or mobile networking.
After successful completion of this module students can
ˆ explain the tasks and functions of operating systems,
ˆ understand and use basic operating system concepts and evaluate their implementations and
potential problems,
ˆ classify existing operating systems and assess future developments.
They are also able to,
ˆ represent and classify the basic concepts of computer networks,
ˆ name the tasks of communication layers in a reference model and explain them using the
example of local networks and the Internet,
ˆ understand the communication protocols in use there and indicate their characteristics and
limitations.
The content of this course covers the following topics:
ˆ Operating systems
De nition, Evolution, Tasks, Basic Concepts
Processes, scheduling, interprocess communication, synchronization, deadlocks, threads
Memory management
File Management
input/output management
Architecture
ˆ Computer networks
History, classification, layer model
Physical layer, transmission media, line coding
Data link layer, MAC sublayer, Ethernet, WLAN
Network layer, routing, IP addresses, IP, ICMP
Transport layer, TCP, UDP, TLS
Application layer, DHCP, DNS, SMTP, POP3, HTTP
Chapter 3
Operating systems
An operating system is a software layer that serves as an intermediary between the user of a
computer and the computer hardware. On the one hand the operating system allows the user to
use the hardware resources in a comfortable way without having to go to the machine level,
which is easy to understand but complex to program, on the other hand it ensures an efficient
administration of these resources and the proper operation of the computer at the same time.
According to the German Industrial Standard for Information Processing (DIN 44300), the
operating system:
The programs of a digital computing system which, together with the characteristics
of this computing system, form the basis of the possible modes of operation of the
digital computing system and which, in particular, control and monitor the execution
of programs.
2
CHAPTER 3. OPERATING 3
The first operating system (also called GM OS) was developed by the General Motors Research
Laboratories for the IBM 701 in 1955. It was a batch operating system and was called GM OS.
Its further development took place in 1956 together with the airplane manufacturer North
American Aviation for the successor model IBM 704 and was called GM-NAA Input/Output
system. Programs were written in higher programming languages like FORTRAN and ALGOL.
Several programs could be read in together with their complete input data as punch card stacks
via the card reader of the computer and were compiled and executed one after the other. Results
were output via a printer. Control instructions could be formulated with a Job Control Language
(JCL) and inserted into the punch card stack (cf. Figure 3.1).
The computer systems of the following years until 1980 were usually central mainframes with
acquisition costs sometimes in the range of several million US$. In order to utilize them
adequately, new concepts like multiprogramming and time-sharing were developed.
With multiprogramming the jobs were no longer executed strictly one after the other like in
batch mode, but their execution was interleaved. If a program had to wait for the completion of
a triggered input/output operation, for example, the operating system simply switched to another
program. In the main memory (in separate partitions) several programs were loaded at the same
time. Scheduling procedures ensured that the CPU always had a program to execute. An
important representative was the operating system OS/360 of the IBM computer architecture
System/360 from 1964.
In the mid-1960s, batch mode was increasingly replaced by dialog mode. Several users were
connected to the central computer via data display devices (terminals) and processed their jobs
interactively. To ensure a minimum response time for each user, the CPU was assigned to all
jobs in turn for only a short time slice at a time. This gave the impression that the computer was
working on all jobs simultaneously. This was referred to as time-sharing1. The concept was
introduced in 1961 with the operating system CTSS (Compatible Time-Sharing System) for
IBM 7094 mainframes and further developed in 1969 by the operating system MULTICS
(Multiplexed Information and Computing Service) for Honeywell 6180 machines.
Besides the mainframes, the class of so-called minicomputers developed in the late 1960s on the
basis of the new IC technology. These computers could be accommodated in a single rack frame
and required only 5% of the purchase costs of a mainframe.
1
Today the term multitasking is more common. Historically, time-sharing describes the sharing of
computing time between several users and multitasking describes the sharing of computing time between several
programs of the same user, e.g. on a personal computer.
CHAPTER 3. OPERATING 4
ten. Two of the developers of MULTICS, Ken Thompson and Dennis Ritchie, both working at
AT&T's Bell Laboratories, reimplemented a reduced version on the DEC PDP-7 minicomputer.
Since the new operating system allowed only two users, each of whom could execute only
one job at a time, they called their system UNICS. Later it became Unix . Unix used a file
system with directory structure for the first time. Furthermore, access to local drives and
network drives was mapped via the same directory structure (Ma- xime: Everything is a le ).
As a source operating system, Unix spread mainly to universities and research institutions.
In the mid-1970s, advances in IC technology led to the emergence of another class of computer
- the class of microcomputer. Microcomputers are compact single-user systems with a
microprocessor as CPU. With a price of sometimes less than 1000 US$ they were affordable for
private households and due to their compactness they fit on a table, which is why they were also
called personal calculators or desk calculators. The first single-chip microprocessor was the 8-
bit processor Intel 8080 from 1974, which was also used in the first commercial microcomputer
Altair 8800. For this platform Gary Kildall developed the first microcomputer operating system
CP/M (Control Program for Microcomputers) with his own company foundation Digital
Research.
Early microcomputers such as the Apple II, the Commodore PET and the Tandy TRS-80
showed the enormous market potential of this class of computer and prompted the industry
leader IBM to also offer a microcomputer, the IBM 5150, which was launched in 1981. The
processor used was the Intel 8088, a low-cost variant of Intel's 16-bit microprocessor 8086. The
IBM PC was developed in a very short time, which is why its operating system, called DOS,
was licensed from the then little-known company Microsoft. Microsoft itself had bought DOS
(Disk Operating System) from Seattle Computer Products, where it was originally developed
under the name 86-DOS by Tim Paterson as a reimplementation of CP/M for 8086 processors.
With the addition of a graphical user interface, DOS became Windows in 1985.
Graphical User Interfaces (GUI) in combination with a mouse as control device were developed
as a concept in the research center Xerox PARC and were already used in the microcomputers
Xerox Star (1981), Apple Lisa (1983) and Apple Macintosh (1984). The operating system of
the Macintosh was originally only called System (plus version number) and introduced
revolutionary concepts at that time, like the recycle bin, the desktop, navigating in the file
system with the help of icons and the Undo function. It was renamed Mac OS in 1994 with
version 7.5.1. After the return of Steve Jobs and the incorporation of his company NeXT into
Apple, Mac OS was replaced in 2001 by the former NeXT operating system derived from
Unix and henceforth called Mac OS X or macOS since 2016.
Unix was commercialized in the meantime by AT&T and marketed as System V from 1983.
System V was thus in competition with a variant BSD (Berkeley Software Distribution), which
continued to be made available as open source software by the Californian University in
Berkeley. Further derivatives developed from both lines, such as HP-UX (Hewlett-Packard),
AIX (IBM), Solaris (Sun/Oracle) from System V and Ultrix (DEC), NeXTStep (NeXT) from
BSD.
A legal dispute soon developed between AT&T and Berkeley. Since it was unclear whether
BSD would remain free software, Richard Stallman at MIT began developing a new open source
project in 1984 that he called GNU ( GNU's Not Unix ). Although this project produced
excellent software (e.g. the GNU C compiler), the development of the central functions of an
operating system, the so-called operating system kernel, was slow. In parallel the nnish student
Linus Torvalds developed a Unix-like operating system kernel, which he brought into the GNU
Project in 1992. GNU/Linux was born and developed (abbreviated to Linux) into the preferred
operating system, especially for servers.
CHAPTER 3. OPERATING 5
Around the turn of the millennium, mobile operating systems for mobile devices such as
PDAs, mobile phones, smartphones or tablets emerged. Early systems were EPOC2, its
successor Symbi- an OS3 and BlackBerry OS4. With the Apple iPhone, what was initially
called iPhone OS, now iOS, a macOS derivative, appeared in 2007. A year later, the HTC
Dream, the first smartphone with the Linux-based operating system Android, came on the
market. Basically, Android is an open source software coordi- nated by the Open Handset
Alliance led by Google. Because Google has secured trademark rights to both the Android
name and its associated logo, and because most Android installations come with pre-installed
Google apps that are free but proprietary, Android is perceived as a Google operating system.
Android is now the operating system with the most installations worldwide.
In order to understand how operating systems work properly, we will first introduce some basic
concepts, which will be reiterated and deepened in the further course.
3.1.3.1 Processes
Processes (also known as tasks in some operating systems or jobs in early systems) are a key
concept in all operating systems. A process describes a program in execution. Each process is
assigned its own address space and a set of resources. The address space contains the executable
program code and all data required for or during program execution. The set of resources
includes the computational regi- ster as well as the instruction counter and the stack pointer. It
also includes the files opened by the process, the reserved I/O devices, other processes5
associated with the process, events that the process is waiting for, and other information that is
important during execution (process ID, priority, state6, CPU time already consumed, etc.). All
this information together is called the process context.
2
EPOC was developed in 1989 by Psion for 8086-based organizers. The name is derived from the English
epoch, the
beginning of an era.
3
Symbian emerged in 1998 from the 32-bit version of EPOC for ARM-based smartphones and was further
developed by Psion, Nokia, Sony Ericsson and Motorola.
4
The proprietary BlackBerry OS operating system appeared in 1999 and powered the BlackBerry
smartphones with QWERTY keyboards that were popular with business professionals at the time.
5
A process can generate other processes. One speaks then of parent process and/or child process.
6
Possible states are, for example, ready, running, waiting, finished.
CHAPTER 3. OPERATING 6
Each process needs its individual (linear) address area, which generally starts at address 0 and
ends at a highest address (at most this is address 2N 1, if N represents the address- width). This
address area is called logical address space (or virtual address space). This also includes the
stack and the heap (cf. Figure 3.3).
On the other hand there is the physical address space given by the size of the main memory.
Since in general several processes exist in main memory at the same time (including the
operating system itself with all its tables and data structures), a dynamic allocation between the
logical addresses used in each process and the physical addresses in main memory has to be
performed. This is the task of memory management.
The main tasks of the operating system include the fair management of all resources and the
protection of concurrent processes from each other, e.g. preventing processes from accidentally
or intentionally accessing main memory areas of the operating system or other processes or
getting in each other's way when using input/output devices.
CHAPTER 3. OPERATING 7
To perform its tasks, the operating system itself needs unrestricted access to all operating
resources of the computer and the entire physical address range of the main memory. However,
it would be too careless to give such comprehensive permissions to every application process.
Operating systems therefore implement a simple protection concept by distinguishing between
two working modes:
Only in privileged mode all machine commands may be executed and every memory address
and every device can be accessed. This working mode is exclusively reserved for the operating
system.
Application processes always run in non-privileged mode. If they require access to operating
system resources or services, such as the creation of a process or the creation of a file, they must
inform the operating system of this by means of a system call (syscall).
It is not mandatory that all operating system services run in privileged mode. System programs
such as editors, compilers, linkers, or utilities can also be run in user mode. The part of the
operating system that is always run in privileged mode is called the operating system kernel.
Today's computer architectures also support the two operating modes by hardware. Thus there is
a flag in the status register of a processor which indicates the respective mode. A misused
privileged instruction can thus be detected and intercepted by the hardware.
System calls allow application processes to cause the operating system to perform privileged
operations.
With a system call, a change from the application process to the operating system kernel takes place,
i.e. the executable code7 , which is primarily kept resident in the main memory and realizes the
privileged functions of the operating system. System calls use software interrupts or traps for
this purpose. With the triggering of a trap, the running process is interrupted.
7
The core is not itself a process in the sense of the de nition, since it is not managed, but manages itself.
CHAPTER 3. OPERATING 8
and the corresponding trap handler is started, which is executed in core mode. After completion
of this routine, the interrupted process is continued in user mode (see Figure 3.4).
System calls are the convenient way for an application program to send information to or read
information from the hardware, other processes, or the operating system kernel itself. Most
modern operating systems package system calls in library functions, which they make available
via an application programming interface (API).
Apart from the CPU and the main memory, all essential resources are peripheral devices such as
storage devices (hard disks, optical storage devices or flash memory) or other input/output
devices (keyboard and mouse, monitor, network adapter for wired and wireless communication
networks, camera, microphone, loudspeaker, etc.).
Due to the large variety of these devices, it is difficult to provide specific system calls for each
device via a standardized API. Today's operating systems solve this by including an abstraction
layer called the kernel I/O subsystem.
On the one hand, the I/O subsystem provides abstract I/O devices that can be accessed with a
uniform set of functions. On the other hand, the special hardware conditions of the devices are
met by precisely fitting software modules, which are referred to as device drivers, and which
can operate the interfaces of the device control (i.e. of the I/O plant) as required (cf. Figure 3.5).
In addition, the I/O subsystem performs some generic management tasks, such as scheduling the
timing of I/O system calls, pu erating transmitted data volumes, or handling transmission errors.
3.1.3.6 Files
represents. Typically, a file is stored on a secondary storage medium and is identifiable by a file
name.
Files are in turn organized in directories. Directories are files themselves and can in turn be
incorporated into higher-level directories. This creates a directory tree, the root of which is the
root directory.
A root directory is closely related to the concept of a partition: there is exactly one root
directory per partition. A partition is a part of a physical or logical9 storage device called a drive
(volume).
The part of the operating system that specifies and handles the filing organization of files is
called the file system. The file system is based on the device management.
A user process cannot create, read, write, or delete files directly. It must request the file system
to do this for it by means of a system call. Corresponding calls are e.g. open(), read(), write(),
close(), mkdir(), rmdir(), . . .
The services of an operating system are made available at interfaces. There is the interface to
the human user, the user interface, and the interface to the application software, the
programming interface.
User interfaces are also called shells in the context of operating systems because they are like a
shell around the core of the operating system. A shell is not part of the core and works in user
mode. Shells are therefore also easily replaceable.
User interfaces or shells are common in two basic forms:
ˆ text-oriented or
ˆ gra k-oriented.
9 For
example, a RAID array of multiple storage devices forms a single logical drive. RAID stands for Redundant
Array of Inexpensive Disks.
CHAPTER 3. OPERATING 1
Text-oriented user interfaces (CLIs) are based on the input capabilities of the classic ASCII
screen terminals of the 1970s. Examples are the CMD shell and PowerShell for Windows or
the Bourne shell10 (sh) of early Unix and the Bourne-Again shell ( bash), which is used by
default in GNU/Linux and macOS.
These are text line-oriented programs in which the user enters an operating system instruction as
a command line. This is interpreted by the shell and converted into a system call. A shell also
allows several commands to be combined into a command script, which is then read from a file.
For this purpose, shells de nerate their own scripting language. The acronym CLI is therefore
often explained with .
Today, most users prefer graphical user interfaces (GUI) and mouse-driven interactions.
Programs and files are accessed by clicking on icons, and they are handled by visual controls in
dialog boxes, such as buttons, checkboxes, menus, and so on.
The further development of user interfaces is moving towards Voice User Interfaces (VUI) and
Conversational User Interfaces (CUI).
Application programs gain access to operating system functions via an application programming
interface (API). The most commonly used APIs are the Windows API (WinAPI for short) and
POSIX (Portable Operating System Interface) as a standardized API for Unix and all
derivatives.
Both APIs are written in the C programming language and make their functionality available via
numerous program libraries and corresponding header files.
Example 3.1 Three user processes A, B, C have been loaded into the main memory. They
occupy individual address areas, as indicated in Figure 3.6.
The CPU can only execute the instructions of exactly one process at any time. The scenario in
Figure 3.6 indicates that the first instruction of process B is just being started. For simplicity, we
assume that each memory address holds exactly one instruction. Thus, the next instruction to be
executed will be fetched from address 8001, and so on. The behavior of a process can thus be
characterized by recording the sequence of addresses from which the executed instructions are
fetched. The result is called a trace.
In order to be able to run all three processes quasi-parallel, the system switches from one
process to the next after a short time (in the example after 6 commands), i.e. the CPU is
removed from the running process and allocated to the next process. The switching is done by
an operating system component called process switch (Dispatcher).
10
named after its developer, Stephen Bourne, Bell Labs
CHAPTER 3. OPERATING 1
Figure 3.7 shows the nested traces of the three processes and the process switcher (grayed out)
for the first 52 instruction cycles. Of course, the process switcher runs through the same code
each time, since it must provide the same functionality each time.
Process A performs an I/O operation with the fourth instruction. This implies commanding an
I/O device and waiting for an interrupt with which the I/O device indicates the completion of the
I/O operation. To continue using the CPU while waiting, the system immediately switches to the
next process. 2
CHAPTER 3. OPERATING 1
Figure 3.7: Nested traces of the three processes and the process switcher [2].
Example 3.1 illustrates that the CPU can execute at most one process at any time, while all other
processes have to wait: Processes can be in one of two different states, computing or non-
computing, according to this simple view.
A simple process model with two process states can be represented by a state diagram as in
Figure 3.8.
From a model perspective, all processes that are in the non-computing state must wait in a
queue. A corresponding queue diagram is shown in Figure 3.9. This diagram also describes the
behavior of the dispatcher: a process that is interrupted
CHAPTER 3. OPERATING 1
is put back into the queue or leaves the system when it is finished. Then the next process is
fetched from the queue and its execution is started or continued.
The creation of a process, i.e. its entry into the queue, is triggered by the following events:
A process terminates and exits the system due to the following possible causes:
Batch processes contain, for example, a HALT statement or an explicit system call for
termination. Interactive processes end when the user triggers termination (e.g. by selecting
a corresponding menu item or using the Close button).
ˆ an error has occurred
the process has detected the error itself (e.g. a missing or incorrect input) and
terminated itself as a result,
the error is caused by the process (e.g. a time limit is exceeded, an unauthorized
instruction is executed, memory limits are violated, it divides by 0, etc.) and the
operating system terminates the process,
ˆ the process is terminated by another process
All operating systems offer system calls with which a (suitably authorized) process can
terminate another process. Well-known applications are the Windows Task Manager or
Unix system monitors such as top, htop, atop, vtop, etc.
In some operating systems (e.g. Solaris), the termination of a parent process also
terminates all child processes.
A terminated process frees the CPU for the next process. In the simplest case this is the process
at the head of the queue. We then speak of a FIFO operation strategy. Other operation strategies
can also make sense. The decision as to which process is actually fetched next from the queue,
and at what time, is made by a separate operating system component called the scheduler. We
will discuss scheduling strategies in Section 3.2.5. The dispatcher only performs the pure
switching from one process to the next.
The dispatcher can also forcibly withdraw the CPU from a computing process, e.g. because its
time slice has expired, an interrupt request has arrived, or the p r o c e s s h a s issued a system
call and now has to wait for an event that indicates the completion of the initiated process (e.g.
I/O operation, process synchronization). In such cases it must be ensured that the interrupted
process can be continued later in the same context, i.e. with the same contents of the CPU
registers. For this purpose, the context of the interrupted process is saved in a data structure
called a process control block.
After the context of the interrupted process has been saved, the context of the next process can
be (re)established. This process is also referred to as a context switch. The terms context switch
and process switch describe the same thing.
The process control block (PCB) of a process is created dynamically and can be referenced via
the process table of the operating system. Figure 3.2 shows a process list and, as an example, a
process image of two processes A and B. The part designated as context corresponds to the
process control block. The part designated as Context corresponds to the process control block.
In general, a PCB contains the following information:
ˆ Information about the main memory area allocated to the process, i.e., the initial address and
the length of the process image,
ˆ a list of all opened files and all I/O devices assigned to the process,
ˆ the priority of the process,
ˆ if the process is in the blocked state, an identification of the event for which the process is
waiting,
ˆ the waiting time accumulated so far,
ˆ the CPU time used so far (for accounting purposes), etc.
In the 2-state process model, it was assumed that all processes are always ready for execution.
However, it can happen that a process is still waiting for the completion of an I/O operation at
the time when it is to be switched to, i.e. it is blocked by the I/O operation that is still pending. In
this case, switching to it is pointless.
To be able to represent such situations, the previous non-computing state is divided into two
new states: ready and blocked. A process in the ready state could immediately continue its
execution as soon as the CPU would be allocated to it. A process in the state
blocked or waiting) is still waiting for a certain event that is absolutely necessary for the further
process flow.
In addition, there are two further states that are related to the existence of the PCB. For a
process in the new state, the PCB has already been created, but data and machine code have not
yet been loaded into the main memory. A process in the terminated or exit state has already
finished its execution, but the PCB still exists, e.g. to take over administrative information.
CHAPTER 3. OPERATING 1
In accordance with the division of the original non-computing state into two states ready and
blocked, two queues are now also required in this model (Figure 3.12). The next computing
process is always taken from the ready queue. Processes that have to wait for an event are
temporarily placed in the blocked queue.
Alternatively, instead of a single blocked queue, several queues can be implemented, each
representing a specific event (e.g., interruption by a specific I/O device, reception of a message
from a specific communication channel). Thus, when an event occurs, the entire blocked queue
does not have to be searched to find those processes that are waiting for this event.
The Ready queue can also be divided into several queues, for example for different priority
classes, if this plays a role in scheduling.
In the 5-state model it was assumed that all processes that are in one of the three main states are
also fully loaded in the main memory. However, with a high job load it is possible that all
started processes together have more main memory requirements than the main memory can
provide. In this case, a method of memory management (Section 3.3) is used, which is called
swapping.
During swapping, a process is temporarily swapped out of main memory into secondary
memory. As soon as the scheduler selects this process for CPU allocation it is
CHAPTER 3. OPERATING 1
Candidates for swapping are preferably processes from the ready queue, since these processes
are completely dormant. Processes that are waiting for an event should be handled more
carefully. It is possible that such a process (P1 ) is waiting for the completion of an I/O
operation, for which it has specified a page range from its own address space. If P1 is now
swapped out and a process P2 is swapped in instead (cf. Figure 3.13), the I/O operation may no
longer use the originally specified address range. Processes from the blocked queue may
therefore only be swapped out if they are waiting for events that refer to global memory areas
managed by the operating system.
For processes in systems that support swapping, two additional states (and queues) are required:
ready/suspend and blocked/suspend. Both states indicate that the respective process is currently
not in main memory, but has been temporarily swapped out.
Swapping is a very expensive process, since a process image quickly has a size of several
MBytes, which must be swapped out and later swapped in again by means of I/O operations.
Swapping out a process in its entirety is therefore hardly common today. Instead, parts of the
process are swapped out. Corresponding methods are discussed in Section 3.3 under the heading
Memory Management.
3.2.5 Scheduling
The CPU can only be allocated to one process at a time. Generally, there are several processes
waiting to be allocated at any one time. The determination of a suitable sequence is called
scheduling. The operating system component that performs the scheduling is called scheduler .
Derived from the 7-state process model, a distinction is made between three scheduler types:
ˆ The long-term scheduler (also called job scheduler) decides which new processes are
loaded into the main memory and thus included in the ready queue. It thus determines the
degree of multiprogramming.
Its main goal is to find a balanced mix of CPU-intensive and I/O-intensive jobs. Too high
a proportion of CPU-intensive processes would lead to long residence times in the ready
queue, and too high a proportion of I/O-intensive processes would not be able to utilize the
CPU to capacity.
In the case of real-time operating systems, it must also be ensured that new processes are
only permitted if all real-time deadlines of existing processes can continue to be met.
ˆ The medium-term scheduler is an essential part of the swapping function. It decides which
processes are to be swapped out or swapped in and when.
For example, processes that have not been active for a long time, that have a low priority,
or that take up a lot of memory are swapped out. They are swapped back in when their
memory requirements can be met or the event they were waiting for has occurred.
ˆ The short term scheduler (also CPU scheduler ) selects which process from the ready queue
is to be executed next. This is necessary each time the process to which the CPU was last
allocated (i.e., the process in the computing state) is moved to the state
changes to blocked.
Furthermore, a CPU scheduler can also withdraw the CPU from the computing process.
Schedulers with this capability are called preemptive schedulers. Reasons for the
withdrawal are e.g. the reaching of a predefined time limit (indicated by a timer interrupt)
or the arrival of a higher-priority process in the ready queue.
A scheduler tri ts its selection decisions based on certain scheduling strategies. Such strategies
should be based on the following criteria:
ˆ Fairness: Each process receives a fair share of the total available CPU time.
ˆ Utilization: The CPU and other (expensive) resources (e.g. the memory device) are
always as well and evenly utilized as possible.
ˆ Throughput : The maximum number of processes whose processing is completed
within a specified time period.
ˆ Turnaround time, processing time: The time span from the creation of a process to its
completion is minimal. The turnaround time is composed of the waiting time in all queues
and the operating time by the CPU.
ˆ Response time, latency (for interactive processes): The time between a user input and the
response is minimal.
ˆ Compliance with deadlines (for real-time processes): guaranteed deadlines are met.
There is no optimal scheduling strategy that satisfies all of the above criteria. For example, a
strategy that ensures minimum response times by only running background processes when no
interactive user is working on the computer fails the criterion of minimum processing time and
disadvantages background processes. When selecting a scheduling strategy, a compromise
between the criteria must always be found.
Before going into detail about different scheduling strategies, two terms should be introduced
first.
The time span during which a process is in the computing state at a stretch is called CPU burst.
Scheduling strategies assume that it is possible to estimate how long the CPU bursts of a process
last. In general, the scheduler estimates the length of the next burst based on the previous burst
lengths.
A non-displacing (or non-preemptive) scheduling strategy allows a process to execute its CPU
burst in full until it blocks or terminates. A displacing (or preemptive) scheduling strategy takes
CPU away from the computing process as soon as, for example, a time slice has expired or a
higher priority process becomes ready.
Let us first consider three non-preemptive scheduling strategies.
CHAPTER 3. OPERATING 2
Example 3.2 We consider three processes that enter the Ready queue one after the other at time
0. Their CPU bursts have the following lengths:
Process Burst
length
P1 24
P2 3
P3 3
If the three processes had been arranged in a different order, e.g. P1 last, the flow chart would be
as follows:
Example 3.3 In this example, we consider four processes that enter at time 0 in the
sequence of their indices. Their respective next CPU bursts have the following lengths:
Process Burst
length
P1 6
P2 8
P3 7
P4 3
CHAPTER 3. OPERATING 2
With the FCFS strategy, the mean waiting time for this example would be 10.25. 2
The SJF algorithm has been shown to be optimal in terms of average waiting time: moving a
short process before a long process reduces the waiting time of the short process more than it
increases the waiting time of the long process. Consequently, the average waiting time is
reduced.
The particular difficulty is to estimate the length of the next CPU burst as accurately as possible.
As mentioned above, this is not possible. An approximate approach is to use the exponentially
smoothed average of all previous CPU bursts of the process as an estimate: If tn is the length of
the last CPU burst, then the new estimate τn+1 is recursively de ned as.
τn+1 = α tn + (1 - α) τn
with a weight 0 < α < 1. Often we choose α =1 . 2
A disadvantage of this strategy becomes obvious if a new process with a short burst time is
added regularly. If in example 3.3 every 3 time units a new process with a burst time of 3 enters
the queue, the processes with longer burst times would starve.
Priority (non-preemptive)
In priority scheduling, processes are divided into different priority classes. The CPU allocation
is based on the priority class: The higher the priority, the sooner a process is selected.
Priorities can be specified internally, i.e. by the operating system itself, or externally. Internal
priorities refer, for example, to the main memory requirement, the secondary memory
requirement, the number of opened files or the time criticality. The SJF strategy can also be
interpreted as a priority strategy: The shorter the burst time of a process, the higher its priority.
External priorities tend to have commercial or political backgrounds. Thus, processes of paying
users or of key departments sometimes get higher priority.
Furthermore, priorities can be assigned statically or dynamically. A static assignment bears the
risk (as with SJF) that processes with low priority can starve. Processes with dynamic priority
change their priority over time. For example, the priority could be increased in proportion to the
accumulated waiting time or decreased in proportion to the CPU time consumed.
Example 3.4 Let there be five processes with different priorities, which occur at time 0 in the
order of their indices. A small priority number corresponds to a high priority.
CHAPTER 3. OPERATING 2
The priority-driven process would be planned according to the following Gantt chart:
A general problem with non-preemptive scheduling is that a process can occupy the CPU for as
long as it likes. This could lead to a computer hanging in the event of program errors or
producing very long response times in dialog mode. Real-time operation is not possible at all.
Such problems are avoided with preemptive scheduling methods. Preemptive algorithms make it
possible to displace processes in order to achieve more fairness or to further reduce the mean
waiting time or the mean response time.
Example 3.5 Consider the same three processes as in Example 3.2 and assume a time
quantum of q = 4.
The flow chart according to the RR algorithm is as follows:
The waiting time for process P1 is (0 + 6 + 0 + 0 + 0) time units, for P2 4 time units and
for P3 7 time units. The average waiting time results in
6+4+ 17
7 = = 5, 67.
3
3
2
When parameterizing RR methods, the choice of the time quantum q proves to be quite
critical. If q is chosen too small, then very many context switches are necessary. If q is
chosen too large, then the system behavior approaches the FCFS procedure. A rule of
thumb says that 80% of all CPU bursts should be shorter than the time quantum.
CHAPTER 3. OPERATING 2
Non-displacing SJF scheduling would result in a mean waiting time of 7.75 time units. 2
Priority (preemptive)
In displacing priority scheduling, when a process enters the ready queue, its priority is
compared with the priority of the process being executed by the CPU at that time. If its
priority is higher, it displaces the computing process. As long as no new process arrives, the
algorithm behaves as in non-preemptive priority scheduling.
Example 3.7 Suppose we have four processes that occur in the following manner (a lower
priority number corresponds to a higher priority):
The risk remains that with static priorities, processes with lower priority can starve if
processes with higher priority consistently occur. 2
CHAPTER 3. OPERATING 2
A variant of priority scheduling manages different priority classes in separate queues instead of
using different priority numbers.
In addition, there is a higher-level scheduling procedure between the queues. In the simplest
case, this could be preemptive priority scheduling: Processes in lower-level queues only come
into play when all higher-level queues are empty. If a new process arrives there, they are
deprived of CPU. In order to prevent starvation in subordinate queues, a time-slice procedure
can be used, which allocates time slices to the different queues, the length of which is
proportional to the degree of priority, e.g. 50% for system processes, 40% for foreground
processes and 10% for background processes.
The priority classes are statically assigned to the processes in the MLQ method. As a final
scheduling method, a variant is considered in which the processes are no longer permanently
assigned to a queue, but are allowed to switch between queues.
Processes in the lowest queue will starve if new processes with short burst lengths arrive
continuously.
Most common operating systems, including Windows and Linux, use scheduling algorithms that
essentially follow preemptive priority strategies. The size of the time quantum that controls
the displacement depends on the priority class.
Processes that are executed in parallel in the operating system are rarely independent of each
other, but often interact with each other. Two basic patterns of interaction can be distinguished:
Coordination and Cooperation.
Coordination is about ensuring that two or more processes do not interfere with each other when
accessing the same data or resources. This is referred to as process synchronization. We discuss
process synchronization in section
3.2.7 Cooperation is the exchange of information between two or more processes. This is called
inter-process communication (IPC).
There are two basic models of interprocess communication:
In principle, an operating system strictly ensures that processes never read or write beyond the
memory area exclusively allocated to them. In order to enable communication via shared
memory areas, processes communicating with each other must each map a section of their
address space to the same physical memory area and register its shared use with the operating
system.
When a process writes data to a position in this area, the data is immediately available to all
other processes that share this area. The data does not need to be recopied. The data is accessed
like a regular memory access, i.e. without the detour via system calls.
CHAPTER 3. OPERATING 2
Shared memory areas are persistent: their names and contents are retained until all processes that
have registered their access have also logged off11.
When using shared memory areas, the processes themselves are responsible for coordinating
their access. A typical source of errors are race conditions, which are difficult to detect during
testing because they occur non-deterministically.
Example 3.8 Two concurrent processes both want to increment the value of a shared variable.
Assume that the value of the variable is initially 1. After both processes have each incremented
the value, the expected new value is 3.
If both processes are executed one after the other, this result is also achieved:
However, it is by no means certain that a sequential execution actually takes place. For example,
urgent scheduling strategies can also lead to the following sequence:
Such conflicts can be avoided by using methods for process synchronization (see Section 3.2.7).
When working with shared memory areas, the use of such methods is mandatory.
3.2.6.2 Messaging
ˆ send(B, message) causes a message to be sent from the private memory area message to the
receiver process B,
ˆ receive(A, message) takes a message from the sender process A into the private storage area
message.
Pipes have a strict FIFO behavior: messages (or bytes) sent one after the other are also received
in the same order. Message queues are more flexible here. The kernel pu er is implemented as a
chained list of messages (cf. Figure 3.20). Several processes may send messages and several
processes may read messages. New messages are appended to the end of the message queue.
The reading can take place in different order, depending on which message type or key is
required.
In general, the system calls send() and receive() can be executed either blocking (synchronous)
or non-blocking (asynchronous):
ˆ In the case of blocking (synchronous) sending, the sending process is blocked until the
message is received by the receiving process or by the message queue.
CHAPTER 3. OPERATING 2
Various combinations of send and receive are possible. If both calls are executed blocking, one
speaks of a rendezvous of the communicating processes. Most applications choose the
combination of non-blocking send and blocking receive.
Both basic models of interprocess communication are common in operating systems and most
systems implement both. Message passing is less error-prone and easier to implement, but is
only suitable for exchanging small amounts of data. Shared memory areas allow much faster
communication because system calls are only needed to set up the shared memory area, not for
each individual communi- cation operation. However, they require solid programmed process
synchronization.
The model of message transmission can even be transferred to processes that run on different
computers that are connected to each other via a network. In this case12 the messages are not
exchanged via a message queue, but via sockets connection end points of a communication
channel. Unlike message queues, sockets do not store data. If a receiving process does not read
the data entering via a socket, it is lost.
If secondary processes want to access the same data or resources, this must be done in a
coordinated manner in order to avoid data inconsistencies or jamming situations. The
coordination of the timing of several secondary processes is called process synchronization.
12
Data exchange via sockets is particularly common for network connections that use the Internet transport
protocols. In principle, however, processes that are located on the same computer may also communicate with
each other via sockets.
CHAPTER 3. OPERATING 2
In Example 3.8, we already learned about a problem that can arise from a race condition when
accessing shared data. Solutions for such problem cases use the notion of critical section.
A critical section is a section of code in which shared data or resources are accessed. It must be
ensured that a process is never displaced when it is in a critical section.
In Example 3.8, it is easy to see that the block of three instructions represents a critical section
for each of the two processes. The correct result is achieved when the execution of the critical
sections is not interrupted, but the entire critical section is executed as an indivisible unit.
Ensuring this protection is one of the most important aspects of kernel and system
programming. Operating systems use various approaches for this purpose. All of them have in
common that they require a process to obtain permission before entering its critical section and
to indicate this when leaving the critical section. An entry and an exit section are responsible
for these operations.
The general structure of a typical process with a critical section can therefore be represented as
follows:
while (true) do
{...
entry_section
critical_section
exit_section
...
}
When dealing with critical sections in concurrent processes, the following four criteria should
be considered13:
1. Mutual exclusion: No two processes may be in their critical sections at the same time.
2. Progress: If no process is in a critical section, but there are processes that want to enter
their critical section, at least one of these processes must be able to do so.
3. Limited waiting, also fairness: If a process has announced that it wishes to enter its critical
stage, it must be able to do so within a finite time.
4. Universality: No assumptions must be made about the number of processes, number of
CPUs, scheduling methods, execution speeds, or other external factors.
In order to approach the problem of dealing with critical sections, we first consider only two
processes (in contrast to what criterion 4 requires).
13
after Edsger W. Dijkstra, 1965
CHAPTER 3. OPERATING 3
Example 3.9 The two processes are called P0 and P1 . To ensure mutual exclusion of their
cri- tical sections, we provide for a strictly alternating order.
The alternation can be realized by a common variable turn, which takes either the value 0 or 1,
indicating which process is allowed to enter its critical section.
The program code for the two processes then looks like this:
turn = turn =
... ...
} }
In their entry sections, the processes perform a busy waiting until it is their turn. In their exit
sections, they ensure that it is the turn of the other process.
This solution o enbar satisfies criterion 1. However, a problem occurs when one of the two
processes terminates. The other process can then enter its critical section at most once more.
After that, it hangs in a waiting loop. Criterion 2 is not fulfilled.
A second solution modifies the idea of strict alternation in that each process now gives way to
the other process before its own entry into the critical section and only enters itself if the other
process does not want to. In this way, each process could also enter its critical section several
times in succession.
To realize this idea, a shared Boolean vector flag[2] is set up to indicate whether process i
currently has an entry request.
The following program code describes the modified process behavior.
Initially flag[0] = flag[1] = false is set.
Now criterion 2 is fulfilled. However, it can now happen that both processes announce their
entry request at the same time, i.e. both set their flag to true. Thereupon both processes
wait endlessly. Criterion 3 is not fulfilled.
CHAPTER 3. OPERATING 3
The third proposed solution fixes this problem situation, in which both processes want to give
way to the other, by introducing a directive that alternately allows one process to enter and the
other to enter. This can be achieved by combining the first two proposed solutions:
This proposed solution is known as the algorithm of Peterson14 . It satisfies the first three
criteria:
The criterion of generality is not fully met by Peterson's algorithm. The algorithm is limited
to only two processes15 and assumes a single CPU. Otherwise, the algorithm represents a
software solution that essentially only requires the hardware to execute the operations LOAD
and STORE, the reading and writing of variables in the main memory, atomically, i.e. their
execution cannot be interrupted.
Implicitly, the algorithm assumes in-order execution of the operations, which modern
processors often do only on explicit instruction. The fact that an out-of-order execution can pose
problems is well illustrated by the second solution in Example 3.9, where the two instructions in
the entry section of each process are swapped in their order.
To simplify programming, modern computers offer special hardware operations.
The spin in the name indicates that spinlocks work with busy waiting, i.e. they consume CPU
time while waiting. This is justified if the waiting time is on average shorter than a context
switch. Otherwise, it makes more sense to temporarily put waiting processes into the blocked
state. This is done by a synchronization object called a semaphore.
3.2.7.3 Semaphore
A semaphore S is a data structure that (apart from initialization) is only accessed via two
standard atomic methods: wait() and signal(). The data structure consists of an integer cnt and
a queue queue.
The definitions of wait and signal17 are as follows:
wait(S) { signal(S) {
S.cnt-- S.cnt++
if (S.cnt < 0) then if (S.cnt ≤ 0) then {
{ block process get process from S.queue
and put in S.queue } and unblock}
} }
Example 3.10 Two or more cyclic processes use a common pu er with a fe- sten length N . At
least one process behaves like a producer, i.e. it writes data items to the Pu er. At least one
process behaves like a consumer, i.e. it reads data elements from the Pu er.
In this scenario, producer processes must block if they want to write to a full pu er. Consumer
processes must block if they want to read from an empty pu er. This is best controlled with two
semaphores: a first semaphore full counts the number of occupied pu er locations (initialized
with 0), a second semaphore empty counts the number of free pu er locations (initialized with N
).
After a creator process has created a new data element, it first decrements empty, checks
whether any pu er space is available at all, and goes into the wait state if this is not the case.
Otherwise, it writes the data item to the pu er and increments full. If there are processes
Dijkstra
17
originally called these functions P and V from (niederld.) Probeer te verlagen (engl. try to lower) and
Verhogen (engl. increase).
CHAPTER 3. OPERATING 3
were waiting for a new data element, it frees a receiver process from the wait state.
A consumer process operates accordingly.
The following process descriptions map this behavior:
producer() { consumer() {
while (true) do { while (true) do {
produce(item) wait(full)
wait(empty) get_buffer(item)
put_buffer(item) signal(empty)
signal(full) consume(item)
} }
} }
This solution works correctly when there is only one producer and one consumer.
If there are several creators, however, a race situation may occur: the function put_buffer()
must be implemented in such a way that it first determines the next available buffer
location and then writes the passed data element into it. Concurrently executed producer
processes could now each identify the same buffer location and thus overwrite each other's
data.
Accordingly, in the case of several consumers, the situation can arise in which different
consumer processes identify the same production location and then consume the same data
element twice.
Writing to the Pu er or reading from the Pu er thus represent critical sections that may only
be executed under mutual exclusion. This can be achieved by a semaphore mutex, which is
initially assigned the value 1. At the beginning of a critical section it is decremented with the
wait function. Thus, exactly one single process can enter the critical section. When leaving
the critical section, the value of mutex is incremented again with the signal function. If its
value was negative, a process from the set of blocked processes is released again.
The solution for multiple producers and consumers is as follows:
producer() { consumer() {
while (true) do { while (true) do {
produce(item) wait(full)
wait(empty) wait(mutex)
wait(mutex) get_buffer(item)
put_buffer(item) signal(mutex)
signal(mutex) signal(empty)
signal(full) consume(item)
} }
} }
Finally it should be noted that it is important that the wait operations on the semaphores full
and empty are used outside the critical sections. If a process would block within its critical
section because it waits for information to be consumed when Pu er is full or for new
information to be generated when Pu er is empty, it would never reach the end of its critical
section. Thus, it prevents other processes from entering their critical sections and doing exactly
what it is waiting for. This situation is called jamming. 2
CHAPTER 3. OPERATING 3
3.2.7.4 Jams
Two or more processes are stuck if each of those processes is waiting for an event
that only another process in that set can cause.
Jamming is not to be confused with starvation, where a process is continually denied the
necessary resources to do its job.
A classic problem that illustrates uptightness is Dijkstra's philosophers problem ( The dining
philosophers problem , 1965).
Example 3.11 Five philosophers spend their lives thinking and eating. The philosophers sit
around a round table with a bowl of rice in the middle. The table is set with five individual
chopsticks. The philosophers do not interact with their neighbors. Occasionally they try to pick
up two chopsticks (one at a time) to eat with them. They only reach the chopsticks to the left
and right of their plate. Of course, they need both chopsticks to eat. When they are full, they put
the chopsticks back on the table.
The problem is to specify a process for each philosopher that will create an adhesion-free situation in
which no philosopher starves to death.
A first proposed solution is to represent each of the N= 5 chopsticks as a semaphore: semaphore
chopstick[N].
A philosopher tries to grab a chopstick by performing a wait operation on the corresponding
semaphore. He releases his chopstick by performing the signal operation.
Then the following process descriptions map the behavior of philosophers:
philosopher(i) {
while (true) do {
think()
wait(chopstick[i]) // grab left chopstick
wait(chopstick[(i+1)%N]) // grab right chopstick
eat()
signal(chopstick[i]) // release left chopstick
signal(chopstick[(i+1)%N]) // release right chopstick
}
}
CHAPTER 3. OPERATING 3
In this realization, however, a problem arises: if all five philosophers become hungry at the
same t i m e and reach for their left chopstick, then they collectively wait for the right chopstick
with this one chopstick in their hands. This creates a jam.
One can now discuss various solutions to this problem. For example, all
even-numbered philosophers to be right-handed, i.e. they pick up their right-handed rod first.
One achieves this by swapping the two wait calls. However, this results in different
philosophers.
One could change the behavior in such a way that after picking up the left rod, it is checked
whether the right rod is available. If it is not, the philosopher releases his left stick, waits a
certain time, and then repeats the whole process. If the wait time is x, this solution fails just as
the original solution did when all five philosophers become hungry at the same time. Now no
jamming occurs because all processes continue to be active, but the philosophers still starve.
This situation is called livelock.
If the waiting time is chosen randomly, the solution would work practically. In safety-critical
applications, however, one would also want to exclude the unlikely case that several processes
generate the same random number sequence.
Finally, as in Example 3.10, one could consider eating as a critical section and use a semaphore
to ensure that a philosopher must first check that no other philosopher is eating before reaching
for a chopstick himself, and, after eating himself and putting back his two chopsticks, signal that
he is now done. From a theoretical point of view, this solution is free of jamming and starvation.
However, it has a performance drawback: it allows only one philosopher to eat at a time. With
five chopsticks available, it should be possible for two philosophers to eat at the same time.
So it is better to ensure that never two neighboring philosophers eat at the same time. For this
purpose, a state vector state is introduced, which indicates for each philosopher whether he is
currently in the state THINK, HUNGRY or EAT. If a philosopher is in the state HUNGRY, it must wait if at
least one of its two neighbors is in the state EAT. Only if both neighbours leave this state, his
waiting ends and he may enter the state EAT himself. The waiting is realized by a semaphore.
For this purpose, a semaphore phil[i] is assigned to each philosopher or process.
The two subroutines acquire_chopsticks() and release_chopsticks() map the required
operations.
philosopher(i) {
while (true) do {
think()
acquire_chopsticks(i)
eat()
release_chopsticks(i)
}
}
acquire_chopsticks sets the state of philosopher i to HUNGRY and checks whether neither of
its neighbors is in state EAT. If this is the case, its own state may be set to EAT immediately.
Otherwise, further processing blocks. To avoid race conditions, each change of state is to be
treated as a critical section: at any time, only a single process may change the state vector.
CHAPTER 3. OPERATING 3
acquire_chopsticks(i)
{ wait(mutex)
state[i] = HUNGRY
test(i)
signal(mutex)
wait(phil[i])
}
The checking is done in a subroutine test(). If the check condition is met, i.e. no wait is
required, the semaphore phil[i] is incremented. The wait operation in acquire_chopsticks()
decrements the semaphore again, but does not block. If, on the other hand, the check condition
is not met, no signal is sent and the process blocks at the wait operation.
test(i) {
if (state[i] == HUNGRY && state[(i+N-1)%N] ≠ EAT
&& state[(i+1)%N] ̸= EAT) then {
state[i] = EAT
signal(phil[i])
}
}
At some point, a blocked process must also be unblocked, i.e. a signal operation must be
performed to unblock the wait. This must happen at the latest when neither of the two neighbors
is eating anymore.
It is therefore a good idea to do this as part of the subroutine release_chopsticks by simply
executing the test function for both direct neighbors. This function checks first the left, then
the right neighbour whether it is hungry and whether the second chopstick is available (for one
chopstick each this is certainly true in this case). If these conditions are fulfilled, the release is
signalled.
release_chopsticks(i)
{ wait(mutex)
state[i] = THINK
test((i+N-1)%N)
test((i+1)%N)
signal(mutex)
}
This solution is jam-free and allows maximum parallelism for an arbitrary number N of
philosophers or processes. 2
In complex, concurrent systems, jamming is difficult to avoid. This difficulty is due to the fact
that wait and signal operations are often scattered throughout the processes in such a way that it
is not easy to keep track of their effect. Due to their dynamic nature, jams are often not
reproducible during testing.
In the literature, numerous procedures are proposed for combating jamming, which can be
divided into prevention, avoidance and re- covery. We do not want to go into these procedures
in detail here. Many procedures are very complex and therefore hardly applicable in practice.
3.2.8 Threads
In the same way that complex tasks can be subdivided into parallel processes, a single process
can be subdivided into several execution threads that can be executed in parallel. Such an
execution thread is called a thread.
CHAPTER 3. OPERATING 3
The essential difference between a process and a thread is that a thread is executed in the
context of a process and uses the same resources that the core has allocated to the process, in
particular the same main memory area (address space). A thread is de ned only by its own
register set, instruction counter, and stack area. It is therefore also called a lightweight process.
The threads themselves are responsible for synchronizing access to shared resources.
The user-level threads are implemented by the application programmer and run in the user's
memory space. The kernel has no knowledge of whether a process uses multiple threads or not.
To manage its threads, each process uses its own private thread control block, which is similar
to a process control block. For creating, terminating and scheduling its threads, the process uses
appropriate functions from thread libraries18. This makes it possible to use threads even on
operating systems that do not support multithreading on the hardware side.
This brings us to core-level threads. Such threads are processed quasi- or true-parallel by the
hardware and the operating system kernel manages them similar to processes
except that they take place in the same process context.
Ultimately, there must be a relationship between user threads and kernel threads. The different
possibilities are shown in Figure 3.23.
Besides the purely software-based model (a) with thread libraries, model (b) is the obvious
model if hardware support is available. This model allows maximum parallelism, but the
developer must be careful not to create too many threads within an application in order not to
impair the performance of the overall system too much (some operating systems limit the
number of allowed threads for this reason). A mixed solution is model (c). On the user side, the
number of threads is not limited. A thread library is used to map to (a generally smaller number
of) kernel threads. Care is taken to ensure that the CPU can still execute at least one thread even
if other threads have to wait due to blocking system calls.
18
On Unix-based systems, Pthreads (POSIX threads) is often used. Windows offers Win32 threads.
CHAPTER 3. OPERATING 4
In the introductory section 3.1.3.2 the terms logical and physical address space have already
been introduced. The logical address space of a process generally starts with an address 0 and
ends with a highest address limit.
For this address block a free, contiguous address block of the same size must be found in the
main memory. This, however, generally starts at a base address base > 0. Thus, all logical
addresses must be incremented by the value base in order to convert them into physical
addresses (cf. Figure 3.24).
For the values base and limit, special hardware registers can be available (as in Fig. 3.24), which
may only be accessed by the operating system by means of privileged instructions. Thus the
operating system can easily realize its protection function by comparing each address generated
in user mode with the registers: a valid address must lie in the (right-o en) interval [ base, base +
limit [. Violations generally trigger a trap and are treated as serious errors.
Modern processors have a hardware component for address transformation, which is called a
Memory Management Unit (MMU)19. Its functional principle is illustrated by Figure
3.25. The relocation register corresponds to the base register in Fig. 3.24.
19
The MMU is placed between CPU and L1 cache.
CHAPTER 3. OPERATING 4
The physical address is calculated in the MMU at runtime of the program and loaded into the
address register MAR of the memory device.
Since processes are swapped in and out of main memory dynamically, there are unused gaps
between processes. New processes can be swapped into these gaps as required. There are
various strategies for this, including:
ˆ First Fit: If memory is needed for a new process, the list of free gaps is searched from
the beginning until a gap is found that is large enough for the requesting process. The
unused remaining space remains as a smaller gap.
ˆ Best Fit: The list of free gaps is completely searched20 to find the smallest gap large
enough to accommodate the requesting process.
This saves large gaps for other processes, but the remaining gaps are often too small to be
of further use.
ˆ Worst Fit: New processes are stored in the largest available gap.
This is to ensure that the unused part can still be used for future requirements.
The three strategies differ in terms of time required and average degree of memory utilization.
Simulations show that Worst-Fit achieves the worst memory utilization. First-Fit and Best-Fit
are roughly equivalent in this respect, but First-Fit is faster.
Over time, all three strategies lead to the creation of many small gaps. This phenomenon is
referred to as (external) fragmentation. Fragmentation has an unfavorable effect when a new
request can no longer be served, although there would be sufficient memory available in the
sum of all gaps (see Figure 3.26). Defragmentation helps against this: from time to time the
allocated address blocks are pushed together to a large block.
The problem of fragmentation is alleviated by memory allocation strategies that allocate
memory space in non-contiguous blocks. Depending on whether these blocks have variable or
fixed size, a distinction is made between segmentation and paging.
20
The search effort can be reduced if the free list is kept as a sorted list.
CHAPTER 3. OPERATING 4
3.3.2 Segmentation
Segmentation divides the logical address space into segments of different size, which are
oriented to the logical components of a program (main program, subroutines, data areas, library
functions, runtime system, etc.).
Accordingly, a logical address is divided into two parts:
These two-dimensional logical addresses are mapped into one-dimensional physical addresses
via segment tables. Each process has its own segment table. The segment number forms the
index for the segment table. Each entry specifies the base address of the respective segment in
the main memory as well as its segment length (Figure 3.27). This allows faulty access that
extends beyond the segment boundary to be detected and prevented.
Different segments of the same process may be stored non-contiguously in main memory. This
significantly reduces but does not completely eliminate the problem of external fragmentation.
CHAPTER 3. OPERATING 4
3.3.3 Paging
With the page addressing method, the logical address space is divided into uniformly sized
sections. Each individual section is called page. The main memory is divided into
correspondingly large sections. Here the single section is called page frame. The page or tile
size is determined by the operating system21.
Each tile can hold arbitrary pages. When a process is created, its pages are loaded from the file
system into the next best free tiles. It is not mandatory that these tiles be adjacent. The
assignment of which page is in which tile is made using a page table. Figure
3.28 shows an example.
The mapping of a logical address into a physical address is simple: Let the logical address space
have a size of 2m bytes. The size of a page is 2n bytes. Then the more significant m n bits of a
logical address correspond
- to the page number. With the corresponding entry in the page-table
the page number is determined. If the page number is now replaced by the tile number, the
physical address is obtained.
Example 3.12 Assume that in Figure 3.28 the page size is 512 bytes. The logical address 1234 is
therefore in page 2.
One recognizes this, if one of the dual number 123410 = 100 1101 00102 the lower-order n = 9
digits (since 512 = 29 ). The two most significant digits remain 102 = 210 .
The same result is obtained by dividing the decimal number 1234 by 512.
According to the page table in Figure 3.28, page 2 is mapped into tile 3. Since 310 = 112
the physical address is therefore 11 | 0 1101 00102 = 174610 . 2
Paging excludes external fragmentation: Any free page frame can be assigned to any process that
needs it. However, there may be so-called internal fragmentation. If the
21
Strictly speaking the processor architecture usually defines several possible page sizes from which the
operating system selects one. A typical value is 4 KiByte. Some systems also use page sizes up to the MiByte
range.
CHAPTER 3. OPERATING 4
process image is not an integer multiple of the page size, the last allocated tile will only be
partially filled. This waste is called internal fragmentation.
If these two properties are given, during the execution of a process not necessarily all its pages
or segments must be in the main memory at the same time. It is sufficient that for the respective
next execution step only those address blocks are loaded which contain the next instruction and,
if applicable, the data referenced in this instruction.
In this way more processes can be kept in memory at the same time, i.e. the effort for swapping
is reduced considerably. Moreover, the single process is likely to be even larger than the entire
main memory: Each process has virtually unlimited memory space22 available.
Most systems today that manage virtual memory are based on paging:
Some earlier systems23 managed their virtual memory exclusively by segmentation. One
advantage of segmentation is that it is visible to the programmer: logically related units are
managed in one piece. The processes are effectively separated from each other by their own
segment tables. The common use of memory data is easier to realize. However, there is still the
problem of intersection, since segments can only be stored contiguously in the main memory. In
today's systems, segmentation is therefore supported at most in combination with paging24.
Figure 3.29 shows how the address conversion must take place in a corresponding system.
In this lecture, we will limit our discussion of virtual memory management to paging.
Since not all pages of a process must exist in main memory, a page table entry must be extended
by a valid bit that indicates whether the corresponding page exists in memory or not (cf. Figure
3.30).
If a process now accesses an address in a page that does not exist in main memory, i.e. whose
valid bit shows the value i (invalid ), the MMU will generate a page fault when determining the
physical address, which triggers a trap. In the course of interrupt handling, the operating system
must find a free page frame that can accommodate the requested page. If no free page frame is
available, a stored page must be displaced from the main memory to save space.
So there are several tasks, for each of which there are different approaches.
22
A hard upper limit is the width of an address register or the address bus. 64-bit architectures would for
example offer an address space of 16 EiByte. This theoretically possible size is often not used to its full extent
and is internally limited to 48-bit addresses. With 48-bit addresses a logical address space of 256 TiByte is
representable.
23
e.g. the MCP operating system for Burroughs computers
24
e.g. the instruction set architecture x86 from Intel offers this combination
CHAPTER 3. OPERATING 4
Figure 3.29: Address conversion when combining segmentation and paging [2].
When deciding when to load a page into main memory, two main strategies are used:
With demand paging, a page is loaded on demand only when a page fault occurs that affects that
page. Pages that are never accessed are therefore never fetched into memory.
During prepaging, pages that were not immediately requested are also loaded in order to
proactively avoid expected page faults. The selection of such pages is based on empirical
values, e.g., when a page fault is handled, the following n pages are loaded at the same time,
or when a waiting process is continued, all pages that were loaded when the process was
displaced are reloaded.
CHAPTER 3. OPERATING 4
When storing a new page, another page may have to be displaced from the main memory to
make room for the new page. The memory management must therefore decide which page
should be displaced. If possible, this should be a page that is not accessed frequently and whose
displacement does not immediately produce the next page fault.
It should also be noted that some pages must remain resident in memory. This concerns a large
part of the operating system kernel and some time-critical data structures such as I/O pu ers. An
exchange of such pages can be prevented by associating a lock bit with each page frame. If this
lock bit is set, the page loaded in the page frame must not be displaced (frame locking).
Common replacement algorithms for non-locked pages are:
Example 3.13 Assume that a main memory is divided into 3 tiles. Accesses are made
sequentially to the following pages: 2, 3, 2, 1, 5, 2, 4, 5, 3, 2, 5 , 2 .
The behavior of the four replacement algorithms is shown below [2]. The different performance
of the algorithms is reflected in the number of page faults (F) that occurred, where page faults
are only counted when all tiles are occupied.
CLOCK means that the ∗ use bit is set for the corresponding page. The arrow indicates the
position of the pointer. 2
The best performing algorithm in Example 3.13 (according to the non-implementable OPT
strategy) is LRU, followed by CLOCK and FIFO. This ranking is also confirmed by
measurements in practice. Unfortunately, LRU is difficult to implement. One software solution,
for example, is to keep an age counter and a usage bit for each page. If a page is referenced
within an observation interval, its usage bit is set. After the observation interval has expired, the
use bit for each page is moved to the left into the age counter,
i.e. a right shift is performed (cf. Figure 3.32). The next observation interval then begins. If a
page has to be replaced, the page whose age counter shows the lowest value is selected. This
procedure is called aging algorithm.
To avoid such overhead, real-world operating systems usually use strategies based on the
CLOCK algorithm.
Page tables can become very large. For example, a 32-bit address space allows the addressing
of 232 bytes, i.e. 4 GiByte. With a page size of 4 KiByte the page table must be
CHAPTER 3. OPERATING 4
Provide space
∼ for 1 million table entries. Page tables of this size can no longer be kept resident
in memory.
One solution is a two-tier (hierarchical) page table: the original page table is split into subpage
tables and relocated to virtual memory. The references to the subpage tables are combined in a
main page table, which remains resident in memory. The subpage tables can be swapped to
secondary storage like any other page, causing a page fault when accessed.
With a two-page table, however, two accesses are now necessary for address conversion:
first to the main page table, then to the sub-page table. In most computers, logical addresses
are converted into physical addresses with the aid of a special hardware component, the
MMU (Memory Management Unit). To avoid duplicate memory access, an MMU usually
uses a special cache called a translation lookaside bu er (TLB). It contains the most recently
computed translations. Typically, there are 256 to 4096 entries. If a request to translate a
logical address is received by the MMU, the MMU first searches the TLB. In the case of a
TLB hit, the physical address can be returned immediately. In the case of a TLB miss, a two-
step resolution is still necessary. The result is copied into the TLB.
Free tiles are managed in a (usually global) free frame list. When a sy- stem starts, this list
includes all available main memory tiles. Since in principle any free tile can hold any page, the
free frame list can be implemented as an unsorted, simply chained list.
As long as the free frame list still contains free tiles, no page replacements have to be
performed. It can be advantageous not to allocate the free tiles completely, but to retain a
(small) free frame pool. In the case of a page replacement, the tile of the page to be replaced is
included in the pool and the tile that has been held back the longest in the free frame pool is
used instead for the page that caused the page fault. If the page that has just been replaced is
accessed again a short time later and the tile with this page is used again, the page is replaced.
CHAPTER 3. OPERATING 4
still exists in the pool, you can save reloading the page from secondary storage.
A file (le) describes a set of logically related data that can be permanently stored on a suitable
data carrier. A file is assigned a number of attributes, e.g.:
Files can be structured in several ways. Three types are common, which are shown in Figure
3.33 are outlined. Format (a) represents data as an unstructured sequence of bytes. Its
interpretation is determined by the accessing application program. Both Unix and Windows use
this approach.
In the second format (b) a file is formed by a sequence of records of fixed length, each of which
has the same internal structure. In the third format (c), a file consists of records (possibly of
different lengths), each of which is stored at a fixed position.
CHAPTER 3. OPERATING 5
have a key field. This unique key not only identifies a data record, but also sorts it into the
file. The sorting is done by an index structure , in which pointers to the data records are
stored. In Fig. 3.33 (c) the index structure is a B-tree. The formats (b) and especially (c) are
used for very large files in which databases store their data.
To manage data in files, the file manager provides a number of operations:
All file operations are implemented via system calls of the operating system and made available
to programmers through the API.
Files are usually organized into directories to simplify their management.
A file can be located by its position in the directory tree. If you line up the names of all
subdirectories in descending order from the root directory to the file itself25 , the result is a path
that uniquely identifies a file.
To free a user from always having to specify the absolute path name for each file reference, a
working directory can be set. File names can then be specified relative to this working directory.
Access to files as well as directories can be restricted via access rights. There are three types of
access rights:
ˆ Read permission: a file may be opened or copied, or the contents of a directory may be
displayed.
ˆ Write permission: a file may be modified, deleted, or renamed, or the contents of a directory
may be changed by adding or deleting files.
25
In Windows systems, the root directory is preceded by an identifier letter for the drive in which the directory
tree is located.
CHAPTER 3. OPERATING 5
In general, the creator of an object determines its access rights. However, these rights can be
changed (with the appropriate authorization) or transferred to other users. The file management
offers corresponding commands for this.
Files and directories represent the user's view on the locally available data stocks of a computer.
The organization of how files and directories are physically stored on the connected storage
media or data carriers, so that fast access even to very large data sets, simultaneous access by
several processes and data modi cation are possible, is the task of the operating system. The part
of the operating system responsible for this is the file system.
A file system organizes the files and directories on a volume. A drive26 abstractly describes a
contiguous set of data blocks. A data block is the smallest transport unit that is exchanged
between the drive and the main memory during a read or write access. Traditionally data blocks
are 512 bytes in size, today's systems usually work with block clusters of 4096 bytes or 2048
bytes on optical media. Each data block on a medium has a unique address, which is either
determined by sequential numbering or is based on the medium's design (e.g. magnetic hard
disks often use CHS addressing): Cylinder Head
Sector ).
An operating system can support different file systems on different drives, which means that the
special features of the respective data media can be t a k e n into account (hard disk, optical
drive, flash memory, magnetic tape, etc.).
In general, a file system consists of several layers. Each layer uses the functions of the lower
layer to provide functions for higher layers. Figure 3.35 shows an example of a typical
multilayer structure:
ˆ The logical le system manages the directory structure and the assignment of file names
to file control blocks. A file control block (FCB), also known as a file descriptor, is a
data structure that is used by an operating system to
all metadata of a file,
Block number information for locating the data on the data carrier, and
26
Drive here means a logical drive. Logical drives are mounted on physical drives,
i.e. storage devices (such as hard disk drives or USB sticks) by partitioning. A logical drive thus corresponds to a
partition.
CHAPTER 3. OPERATING 5
dynamic information for opened files, such as pointers to the position that is currently
being read,
to store. Unix systems also call the FCB I-node (from index node).
When an application program needs a file, a request is made to the logical file system.
This checks whether the file exists in the specified path and whether all permissions for
file access are available. If everything is correct, the file can be accessed via the file
organization module.
ˆ The file organization module maps the logical blocks or records of a file to physical blocks
of the storage medium. The functions of this layer also include free space management,
which locates unallocated blocks and makes them available to the file organization module
on request.
Once the file organization module has determined which physical block the application
program needs, it passes this information to the primitive file system.
ˆ The primitive file system (basic le system) reads and writes physical blocks on the storage
medium. It does this using device-specific software (called device drivers) that interacts
with the I/O controller (see Chapter 2, I/O Works) of the respective storage device to copy
data blocks back and forth between the main memory and the addressed areas of the I/O
device.
The same primitive file system can be used by multiple file systems, each of which has its
own file organization modules and logical file systems on top of it.
In order to organize the files and directories on a drive, an appropriate organizational structure is
necessary, which implements the file system. Figure 3.36 shows a typical division of a hard disk
drive into several partitions, each containing the data structures for a file system.
The first sector of the hard disk is the boot sector (Master Boot Record , MBR). It contains a
startup program, the boot loader of the 1st level. When the computer is booted, this boot
program selects the partition from the partition table from which the operating system is to be
started. The partition table contains the start and end addresses of the individual partitions. Each
partition can represent a different file system.
CHAPTER 3. OPERATING 5
Each partition starts with a boot block that contains the boot loader of the 2nd stage27. This
knows the respective file system and can load the file with the program code of the operating
system. The boot block is followed (in Unix-like systems) by the superblock . It contains the
essential administrative information about the file system: its size, a pointer to the list of free
blocks and their number, a pointer to the list of file control blocks or I-nodes and their number,
and so on. This is followed by areas for these lists, ending with the largest area in the file
system, the actual data area with the root directory and all its directories and files. Other file
systems such as Microsoft's NTFS combine the superblock and the following two lists into a
master file table.
When a drive is mounted in the operating system (mount a volume), the operating system gains
access to the drive and adds its file system to the existing directory hierarchy. Important
administrative information is transferred to data structures in the main memory. Data structures
held in main memory that are relevant for the file system are:
ˆ the mount table, which contains all drives that are currently mounted in the system,
ˆ the directory cache, which temporarily stores the directory information of recently accessed
directories in order to have them quickly available for repeated access, and
ˆ two tables of opened files (open- le tables),
the system-wide open file table (system-wide OFT), which contains copies of the file
control blocks of all opened files, as well as other information such as the number of
processes currently accessing this file, and
the process-specific open file table (per-process OFT), which contains a pointer to the
corresponding entry in the system-wide OFT for each of the files opened by the
individual process, as well as other information such as a pointer to the current
position in the file.
By all these data structures on the storage medium as well as in the main memory it is mapped
which storage media contain which files and directories and which physical blocks are occupied
for them. In case of new requirements, the file manager also has to decide which blocks to
allocate to a file or directory at all. The goal is to ensure both efficient use of disk space and fast
access to file blocks. There are different strategies for this.
27
For partitions without a bootable operating system, the boot block remains empty.
CHAPTER 3. OPERATING 5
The first question when creating a new file is whether to allocate all the space needed for the file
at once or in portions. Preallocation has the advantage that an application will not fail
because the file system runs out of space. Also, an application is not slowed down by
having to find new space during execution. This can be important for applications that
require a guaranteed response time. For many applications, however, it is difficult to estimate
the final size of the file to be created in advance. This is where dynamic allocation comes in
handy, allocating memory on a per-portion basis. The question that arises now is the
appropriate portion size. Large portions increase performance, small portions and portions with
variable size minimize fragmentation, portions with fixed size simplify reallocation.
Three main allocation methods have evolved and are used in common operating systems:
contiguous allocation, concatenated allocation, and indexed allocation. Each method has its
advantages and disadvantages. Most file systems are limited to exactly one of these methods.
In contiguous allocation, each file is stored as a contiguous sequence of blocks on the medium.
The file size must be specified when it is created. If sufficient space is available, the file is
placed on the storage medium and an entry is made in the directory. This directory entry
specifies the address of the start block and the number of successively occupied blocks (cf.
Figure 3.37).
This method is particularly advantageous for sequential processing, especially on disk drives,
where it minimizes the head movements of the read/write head. The disadvantage is that the file
size is often not known in advance and after a short time a high external fragmentation of the
storage medium occurs. To counteract fragmentation, the same algorithms can be used as for
main memory management (Section 3.3.1), i.e. First Fit, Nearest Fit, Best Fit, etc.. Nevertheless,
defragmentation of the medium will become necessary from time to time.
CHAPTER 3. OPERATING 5
A special form are extent-based file systems which reserve so-called extents instead of
individual physical blocks. Extents are larger, contiguous areas of physical blocks. If it turns out
that an extent is too small for a file, the file is dynamically allocated another extent. If the extent
size is too large, the rate of internal fragmentation increases. If extents are allowed to have
different sizes, the problem of external fragmentation remains.
With linked allocation, the extent is reduced to a single block to some extent. Each file is a
concatenation of individual physical blocks. For each file, the directory contains a pointer to the
first and optionally the last block. Each block contains a pointer to the next block, and the last
block contains a null pointer (see Figure 3.38). This eliminates the possibility of external
fragmentation. Any free block can be used to satisfy a request. The size of a file need not be
known in advance. As long as there are free blocks, a file can grow.
However, this method also has disadvantages. It only works efficiently if a file is accessed
strictly sequentially. Direct access to a block in the middle of the file is not possible. Instead, it
is necessary to read all blocks from the beginning until the block you are looking for is reached.
In the case of disk drives, this generally requires many repositionings of the read/write head.
Another disadvantage is the space requirement for the pointers. If a pointer occupies 4 bytes of a
512-byte block, 0.78 percent of the memory volume is lost for pointer management. This ratio
can be improved by using larger clusters instead of single blocks. Most file systems today use
clusters consisting of 4 or 8 physical blocks. Of course, large clusters again increase internal
fragmentation.
Finally, there is a considerable risk of data loss if pointers are damaged by errors and thus
possibly point to a block of another file. Double concatenating or additionally storing the file
name and a running block number in each block could reduce this risk, but would lead to even
more overhead for each file.
One application is the FAT16/32 file system, which was used for two decades for hard disk
drives in DOS and Windows 9x systems. Instead of pointers in each block
CHAPTER 3. OPERATING 5
a file allocation table (FAT) is located in the main memory (cf. Fig. 3.39). This way, the two
disadvantages mentioned above can be avoided. For large volumes, however, the search effort
in the FAT is not insignificant.
Indexed allocation no longer scatters the pointers to the blocks of a file over the entire volume
or a table mapping it, but brings together all the pointers relevant to the file in an index block.
Each file has its own index block, which is accommodated by an additionally required physical
block. The ith entry in the index block points to the ith block of the file. The directory now only
contains the address of the index block (see Figure 3.40).
The advantages of the indexed allocation are that a random access to the individual
CHAPTER 3. OPERATING 5
blocks is possible and, as with concatenated allocation, no external fragmentation occurs. On the
other hand, one additional block per file is needed as index block. For very small files, a large
part of the index block is wasted with null pointers. For very large files, a single index block
may not be enough to hold all pointers. In this case, the following solutions are available:
ˆ Combined approach
The pointers in the index block are divided into groups. In a first group the pointers point
directly to the first file blocks (advantageous for small files), in further groups the pointers
point to index blocks of higher levels.
This approach is followed by many Unix-like file systems that use I-nodes. In an I-node,
up to 12 entries point directly to file blocks. If these pointers are not sufficient, an entry in
the inode points to an index node which can hold up to 256 more pointers. Such an index
node is called a simple indirect block. If this is still not enough, the next two entries
contain pointers to a double indirect and a triple indirect block (see Figure 3.42).
Just like allocated space, unallocated (free) space must also be managed: all of the allocation
strategies discussed in Section 3.4.3 require knowledge of which physical blocks are available
on a disk. Their amount changes each time a file is created and deleted.
CHAPTER 3. OPERATING 5
To manage free memory blocks offer several methods: Bit vectors, concatenation, in-dex
blocks, and free area lists.
A bit vector contains one bit for each physical block on the disk. If the block is free, this bit has
the value 0; if the block is occupied, its value is 1 (or vice versa in some implementations).
The main advantage of the bit vector is its ease of use. To f i n d the next free block or a group
of n consecutive free blocks, many machine instruction sets provide corresponding bit pattern
instructions. However, this advantage only comes into its own if the entire vector can be kept in
main memory. This can be critical, because although a bit vector looks compact, its size is
considerable. A 1 TB hard disk with 4 KB blocks requires a bit vector of 32 MB length!
In the second approach, a concatenation of all free blocks is formed. A pointer to the first free
block is stored at a special location in the file system. This first free block then contains a
pointer to the next free block, and so on. (cf. Figure 3.44). In this way, no storage space is
actually required for free space management: the concatenation t a k e s place in the free blocks
themselves.
One disadvantage is that the chain must always be traversed block by block, e.g. to determine
the number of free blocks. Fortunately, however, running through the chain is the exception.
The next free block is always indicated directly by the head of the chain. However, it is
CHAPTER 3. OPERATING 5
difficult to find a contiguous memory area. A significant risk here is also the loss of information
in the event of damage to a pointer.
The search effort can be reduced if the set of all free blocks is taken as a file whose drive
occupancy is indicated by index blocks: the first free block is used as index block. If an index
block has room for n pointers, the first n 1 pointers point to free blocks and the nth - pointer
points to the next index block in each case.
Finally, the list of free areas is the last approach discussed here. Such a list of free areas is
especially advantageous if files are to be stored mainly in contiguous blocks. It consists of
entries, each containing the address of a first free block and the number n of free blocks
immediately following this block.
An assignment as in Figure 3.44, for example, would result in the following list:
For an e cient insertion and deletion, the list is saved as a concatenated list or as a B-tree.
kernel provides a unified set of abstract I/O functions. This abstraction layer is called kernel I/O
subsystem.
Overall I/O management makes up a significant portion of the operating system in quantitative
terms. Mainly due to the growing diversity of I/O devices, about 60% of an operating system's
program co- des is attributable to I/O management.
ˆ Transfer direction
Input-only devices can only be read from (e.g. CD/DVD-ROM, scanner), output-only
devices can only be written to (e.g. graphics card/monitor, printer), combined I/O devices
can be both read and written to (e.g. HDD, SSD).
ˆ transfer unit
Character devices transmit data as a stream of bytes (e.g. keyboard, mouse, serial ports).
Block devices transfer data blocks of fixed length (e.g. disk drives, USB sticks). There are
also devices that do not fit into this category, e.g. timers. They only cause interruptions in
previously set intervals.
ˆ Device speed
There are several measures to describe the speed of an I/O device.
The data transfer rate is calculated from the amount of data per time span. While a very
slow device such as a keyboard requires a data rate of just 20 fps, previously common
serial interfaces such as PS/2 supported up to 12 kbps. USB 2.0 offers 480 Mbit/s, USB
3.1 already 10 Gbit/s. Disk drives reach about 300 MB/s, whereas the SATA interface,
which is often used for this purpose, allows up to 600 MB/s. SSD drives achieve 1 to 5
GB/s via the PCIe interface.
The access time is the second important measure. It describes the time span that elapses
between the request of information and its arrival. It is made up of various delays that
occur, for example, due to the intermediate storage of data, the conversion, the error check
or the positioning time. The positioning time plays an important role especially for devices
with sequential access.
The cycle time is also often considered. This is the shortest period of time between the
start of a cycle and the time at which the next cycle can start. In addition to the access
time, the cycle time includes a possible regeneration time of the I/O device.
ˆ synchronicity
A synchronous device uses the same clock as the CPU registers. Synchronous transfers are
time deterministic and are preferably used for the transfer of large amounts of data.
An asynchronous device operates independently of the system clock. To indicate the time
for a data transfer special control signals have to be used. In the simplest case this is a
strobe signal that is activated to trigger a transfer. It can be set either by the source or
the destination. More common is a handshaking procedure that additionally uses a second
signal, the Ack signal, to indicate to the other side that the request indicated by the strobe
has been fulfilled and that the strobe signal can be deactivated again.
CHAPTER 3. OPERATING 6
With this many variations, it is imperative to provide a uniform way for the operating system
and application programming to communicate with the various I/O devices. This is provided by
an abstraction layer called I/O subsystem.
The number of I/O devices connected to a computer not only varies from computer to computer,
but also usually changes over time for the individual computer. It is not possible to support
every special function by every operating system for every device available on the market or
coming onto the market in the future. From the point of view of an operating system, all I/O
devices should therefore ideally look more or less the same.
This is implemented in such a way that the operating system provides a standard set of generic
I/O functions via its API. An example is compiled in Table 3.1. I/O devices are treated as virtual
files in these calls and represented by symbolic device names.
All device drivers align with this standard set to ensure a uniform API for a wide range of I/O
devices of different types. The I/O subsystem maps the symbolic device names in the function
call to functions of the corresponding driver (cf. Figure 3.45).
3.5.2.2 Pu eration
It often makes sense to decouple application processes and sequences in the I/O device by
means of intermediate control. There are three main reasons for using I/O pumping.
CHAPTER 3. OPERATING 6
First, it can be used to compensate for different device speeds. A slow device can write data
to a pu er, and only when the pu er has reached a sufficient level is the pu er content
transferred to the application process as a block. A simple performance analysis proves the
advantage: let T be the time needed to transfer the data of a block and C be the time needed to
process this block and then request the next block. Without pu ering, the processing time is
T + C, but with pu ering it is max{T, C} + M , where M describes the time required to move
the data block from pu er to main memory (it should hold M < min{T, C}).
If M is not negligible and the I/O device is a fast device, double pu ering is often used. Here, the
I/O device transfers data into (or out of) one pu er while the operating system empties (or fills)
the other pu er. After each block transfer is complete, the system switches between the pu ers.
This allows newly arriving data to continue to be pu ered while the previously arriving data is
transferred. Figure 3.46 illustrates the different forms of I/O pumping.
A second reason for an I/O-pu eration is the possibility to compensate for different trans-
ference block sizes. For example, a byte stream supplied by a character-oriented device can
be buffered in a pu er and then written to the hard disk block by block.
Third, pu ering serves to preserve copy semantics. When a data block is to be transferred
from main memory to an I/O device, its content is expected to be exactly the same as was
valid when the transfer command was called. To prevent the data block in main memory from
being modified before it is completely transferred, it can first be copied to a pu er. 28 The
actual transfer then takes place from this pu er.
Spooling enables the multiple use of devices that can actually only be used exclusively (usually
printers). Processes do not access the device directly, but send a request to a spooler. A spooler
is a system program that runs in the background and places requests from application processes
in a queue (in secondary memory). From there, the spooler executes them one after the other as
soon as the corresponding device is free. Spooling is also popular when I/O data can be
generated significantly faster than the target device can process it.
An alternative to spooling is a device reservation by means of the system calls open and close. If
the exclusive device is not available, the calling process is blocked and queued. Sooner or later,
the requested device becomes available and the foremost process in the queue is allowed to
occupy it and continue its execution. However, there is a risk of bonding here if the applications
are programmed immaturely. A very simple approach is to have open() return an error code if
the device is not available.
I/O requests can fail in many ways. Many error possibilities are already prevented by the fact that
I/O instructions are always privileged instructions and must be processed via system calls.
However, there may be temporary reasons as well as persistent reasons why an I/O request
cannot be executed. Causes such as a read error from disk or a timeout error can often be fixed
by the operating system itself by repeating the instruction. Other causes, such as an invalid
command or a technical defect in an I/O device, must be indicated as unrecoverable errors.
28 The
same e ect can be e ciently achieved by a temporary write protection to the main memory page.
CHAPTER 3. OPERATING 6
In most operating systems, unrecoverable errors are reported by returning an error code.
However, it is often difficult to localize the exact cause of the error with this error code alone.
Many devices, however, are capable of providing much more detailed error information. Such
error information is stored by the I/O subsystem in device-specific error codes.
or system-wide error log files (error-log les) are recorded.
In Unix operating systems, error log files are managed by the background process syslogd , in
Windows by a utility eventvwr .
Device drivers are software modules that are integrated into an operating system to control a
specific device or a class of similar devices. Device drivers encapsulate device details and create
a uniform interface for the I/O subsystem. On the one hand, a device driver interacts with the
respective I/O plant by controlling a data transfer in hardware via control signals and pu er
registers connected to the computer's bus system. On the other hand, the driver provides the
operating system with function calls in a uniform format so that the device functions can be
accessed without requiring detailed knowledge about the device hardware.
The dynamic integration of device drivers into the operating system takes place via a data
structure known as a driver table. When a standard I/O function is called by the system, the
jump address of the device-specific routine can be determined via the driver table if the device
name is known. Figure 3.48 illustrates this process. The entries in the driver table are created
when a driver is installed.
On the hardware level, the device driver controls the interaction of the CPU with the respective
device controller. This interaction can take place through:
ˆ Read/write registers (via port-mapped I/O or memory-mapped I/O, see section 2.3.6 on I/O works
in chapter 2).
CHAPTER 3. OPERATING 6
ˆ Reading/writing of memory areas (e.g. in the main memory or on graphics cards, via
memory-mapped I/O)
However, control registers or pu er areas do not allow a device to sign directly to the CPU that
an operation has completed or data is ready. For this purpose, device controllers gain access to
hardware interrupts (generally via an interrupt controller, which receives interrupt signals from
various units of the computer and masks or prioritizes them if necessary). A corresponding
routine for interrupt handling (interrupt service routine ISR, or interrupt handler ) is registered
with the operating system when the driver is installed.
In order to illustrate the principle of calling a driver and how it works, a typical sequence of an
I/O operation is shown using the example of a read request. Figure 3.49 illustrates this sequence.
It can be seen from the large number of steps involved in the different components that an I/O
operation is a very time-consuming operation that consumes a large number of CPU cycles.
Since all subroutines operate in the same address space and they can call each other without
having to use interprocess communication methods, monolithic cores operate very fast. One
disadvantage is that adding new functions or removing obsolete ones is very tricky.
For increasingly larger and more complex operating systems, the monolithic core has been
increasingly modularized. In addition to the actual operating system, which is loaded when the
computer is booted, there are dynamically loadable extensions. In Unix systems they are called
shared libraries, in Windows they are called dynamic link libraries (DLL).
This also means that most unixoid systems (BSD, System V, Linux, Solaris) and Windows (up
to Windows 98) follow the monolithic approach.
The consistent implementation of the modularization idea leads to the micro-core approach.
As early as the late 1960s, operating systems were developed as layered systems (e.g.
MULTICS). In this type of architecture, each layer29 interacts only with the layers immediately
above and below it, and the lower layers provide services to the higher layers via an interface
that hides their implementation. Thus, the implementation of each layer could be changed
without requiring a change to the adjacent layers. However, a service request from a user
process now had to traverse several layers of system software before it was serviced, causing
performance to suffer compared to a monolithic core.
Traditionally, all layers were housed in the core. Moving the boundary between core and
user mode dates back to the pioneering work of the Danish computer scientist Per Brinch
Hansen in 1970. The result was the system Nucleus, whose core only contains primitives for
process management and communication between processes. All other operating system
functions including all strategies of the operating system are implemented by processes
outside the core.
29
MULTICS represents layers as so-called rings.
CHAPTER 3. OPERATING 6
The Mach system developed at Carnegie Mellon University in 1985 coined the term
microkernel. The central idea of the microkernel architecture is that the core provides only the
most elementary services, such as minimal process and memory management as well as the
support of interprocess communication by means of message passing. All other functions are
implemented as separate processes (servers) that communicate with requesting processes
(clients) in user mode (Fig. 3.51).
Microkernel-based operating systems are highly modular, which makes them extensible,
portable, and scalable. Operating system components outside the core can fail without the
operating system as a whole breaking down. A disadvantage, however, is the high
communication overhead, which generally leads to a lower performance compared to monolithic
systems.
The microkernel approach is often chosen for the implementation of distributed operating
systems and real-time operating systems, such as QNX, PikeOS, Symbian OS, Google Fuchsia.
Common operating systems such as iOS, macOS (since OS X) and Windows
NT/Vista/XP/7/8/10 try to combine the advantages of the microkernel and the monolithic core.
They use a hybrid core, which for speed reasons contains components that microkernels usually
do not contain30. It is not specified which components are additionally compiled into the core in
systems with hybrid cores, which is why it is disputed whether the hybrid core is not purely a
marketing concept.
30
The higher speed results from the fact that fewer context changes and less inter-process communication are
necessary.
Chapter 4
Computer networks
Although the technological foundations of computer communication were laid more than fifty
years ago, it was not until the beginning of the 1990s that it began to take off for commercial
and private applications. Today, computer networks form a global, ubiquitous communication
platform with a formative importance for modern life at work and at leisure.
The individual computer is understood less and less as a computing machine, but much more as
the end device of a computer network. In the eyes of the user, the network as a whole forms a
single pool for all existing and available resources1 and services2. Accordingly, it is not the
individual computer but the network in which it is integrated that determines the benefit for the
user.
The most interesting and impressive network is undoubtedly the worldwide Internet, which has
evolved from a research project linking a handful of institutes to a network for everyone,
actively used today3 by 63% of all people on earth.
68
CHAPTER 4. COMPUTER 6
Remote users could transmit their data via teleprinter in the telex network with a transmission
speed of 50 bit/s or with the help of a modem in the analogue telephone network with 300 bit/s.
This was called remote data transmission. This was referred to as remote data transmission
(RDT).
The increasing demand for fast text and data communication led to the construction of special
data networks from the mid-1960s onwards. Thus, in January 1967, Deutsche Bundespost began
test operations of the Datex-L data network for synchronous data transmissions via switched-
through dial-up connections with an initial speed of 2400 bps.
Simultaneously with the development of terminal network technology, the pioneering
achievements of computer network technology were also made at the end of the 1960s. In
addition to the SITA network4 of the International Air Transport Association (IATA) for
information and booking services, the ARPA network is particularly noteworthy [8].
At that time, the time of the cold war between the USA and the USSR, the American military
was afraid that an enemy attack could render their own communication network inoperable.
This consisted, according to the time, of a powerful central computer with many terminals
connected to it. The entire data stock was stored on the central computer. If it were destroyed,
the entire network would be switched off.
In 1962, the U.S. Air Force commissioned the Rand Corporation, a think tank in Santa Monica,
California (Rand stands for Research and Development), to look for a way to maintain
communications after an attack. In particular, in the event of a nuclear first strike by the Soviet
Union, the launch sites for the nuclear missiles should still be able to launch a counterattack.
The solution proposed in 1964 by Rand engineer Paul Baran5 consisted of decentralization with
redundant data storage, whereby surviving computers were able to maintain connections to
each other despite the failure of other computers.
Two important features characterized his solution: all computers should not only be connected
to one (central) computer each, but also to each other in many cases. Thus there would always
be several ways to the same destination (like in the streets of a big city). If one way would be
destroyed, another one could simply be used. Second, the transmitted data is broken down into
small packets that independently find their way through the network to their destination and are
reassembled there. So, unlike the usual telephone connection, no path is specified. In this way,
the data volume should be better distributed in the network, in case of failure of a line, the
packets can simply take another path, the capacity of a line can be shared by several
communication sessions, in case of loss of a packet, only the lost packet has to be sent again,
but not the whole file.
This packet technique was already developed in 1962 by Leonard Kleinrock in his dissertation and
was not without controversy. Especially telephone engineers doubted that the evaluation of the
destination address and the calculation of the further path in each node of the network would
work for each passing packet.
Nevertheless, the research organization ARPA (Advanced Research Project Agency), which
was subordinate to the American Department of Defense, decided in 1968 to set up a
corresponding network. The so-called ARPA network began at the end of 1969 as a network of
four computers at four locations (University of California in Los Angeles, SRI in Stanford,
University of California in Santa Barbara and University of Utah in Salt Lake City), which came
from three different manufacturers (SDS6, IBM and DEC) and on which four different
operating systems ran (SEX, Genie, OS/MVT and Tenex). In order for these heterogeneous
systems to be able to communicate with each other at a rate of 50 kbit/s via AT&T leased lines,
one IMP
4
Societé Internationale des Télécommunications Aéronautiques
5
Baran's related report series On Distributed Communications is reproduced at
https://www.rand.org/about/history/baran.list.html.
6
Scienti c Data Systems
CHAPTER 4. COMPUTER 7
Below is a record of the rst message ever sent over the ARPANET. It took place at
22:30 hours on October 29, 1969. This record is an excerpt from the IMP Log that
we kept at UCLA. I was supervising the student/programmer Charley Kline and we
set up a message transmission to go from the UCLA SDS Sigma 7 host computer to
the SRI SDS 940 host computer. The transmission itself was simply to login to SRI
from UCLA. We succeeded in transmitting the l and the o and then the system
crashed! Hence, the rst message on the Internet was lo . We were able to do the full
login about an hour later.
Leonard Kleinrock, The Day the Infant Internet Uttered its First Words
The transmission protocol was called NCP (Network Control Protocol ), first applications were
remote login and file transfer between remote computers.
In 1973, the first two international nodes were connected to the ARPA network via relatively
slow cable links: the University College of London (Great Britain) and the Royal Radar
Establishment in Oslo (Norway). Thus, there were 37 nodes in total.
During this time, the problem of interconnecting different networks also arose. The nodes of the
ARPA network communicated via NCP, while other networks used other protocols. An
International Network Working Group (INWG) was formed to develop a new protocol that
could be used across networks, and a first draft of this was presented in September 1973 as the
Transmission Control Protocol (TCP)7. Subsequently, attempts to transmit voice led to the
realization that a simplified protocol that could also tolerate packet loss would be a useful
addition. In order not to have to duplicate the protocol part required in both variants, the until
then monolithic TCP was split into two protocols, TCP and IP.
In September 1981, the fourth and final version of the TCP/IP protocol family was published. In
1983 TCP/IP became the standard. All computers in the ARPA network were to use this
protocol only. The military part of the network was organizationally separated into MILNET,
the civilian parts, research, development and education remained in ARPANET, for which now
also the term Internet was used. In August 1983 562 computers were counted.
Germany was first connected to the Internet in 1984 via the University of Karlsruhe from the
EARN (European Academic Research Network) and in 1989 via the University of Dortmund from
the EUnet (UUCP network, Unix-to-Unix-Copy) [9, 10]. The first independent TCP/IP
networks in Germany were the state university network BelWü, which had been set up by the
state of Baden-Württemberg since 1987, but whose Internet connection did not take place until
1989, and the nationwide scientific network WiN, which was operated by the DFN-Verein8
from 1990.
In the USA, the NSF (National Science Foundation) put five supercomputer centers9 into
operation in 1986. In order to make their computing power available to the scientists of all
American research institutions, the NSFNET, which is based on TCP/IP, is rebuilt in 1988.
7
The main developers were Vinton G. Cerf and Robert E. Kahn. Their seminal paper A Protocol for Packet
appeared in 1974 in the IEEE Transactions on Communications Vol. 22 No. 5,
Network Intercommunication
https://www.cs.princeton.edu/courses/archive/fall06/cos561/papers/cerf74.pdf.
8
German Research Network, https://dfn.de
9
JvNC/Princeton, PSC/Pittsburgh, SDSC/UCSD, NCSA/UIUC, Theory Center/Cornell
CHAPTER 4. COMPUTER 7
and designed for a data rate of 1.5 Mbit/s. It was thus about 30 times faster than the ARPA-
NET. A gateway between the two networks was set up at Carnegie Mellon University. Soon the
NSFNET took over the main load in the Internet network of the USA. The ARPA network was
discontinued at the end of 1989.
The NSFNET was upgraded to 45 Mbit/s in 1990 with the help of Merit Network, MCI and
IBM and renamed ANSNET (Advanced Network & Service). Finally, ANSNET was sold to
America Online for 35 million US$ at the end of 1994.
The connection of more and more networks has led to exponential growth of the Internet,
which will not slow down in the foreseeable future: Through the Internet of Things (IoT),
sensors, devices and (mechanical) machines are now networking with each other, creating the
basis for digital transformation in industry, logistics, mobility, healthcare, energy supply and
agriculture. The Internet of Everything (IoE) combines this simple data collection with the
decentralized processing and analysis of data, as well as the derivation of insights and
→
decisions ( Data Science). It connects data ( Internet), things ( IoT), people and processes.
The IoE was already in 1991 → by the visionary→computer scientist Mark Weiser with the term
Ubiquitous Computing,
i.e. ubiquitous computing described.
The definition of the term computer network is not uniform in the literature. We will use the
following definition here:
Computer networks differ with respect to various characteristics. Some characteristics that are
often used for classifications are briefly mentioned below.
A public network is a network that is accessible to everyone for a reasonable fee. A private
network (corporate network) is used exclusively for internal communication within a private
household, a company or a comparable organization.
A special form is the virtual private network (VPN), which uses a public network infrastructure
to connect a closed user group.
Depending on whether a line or free space is used as the transmission medium, a distinction is
made:
CHAPTER 4. COMPUTER 7
Fixed networks or line-based networks use copper or fibre optic cables. Copper cables are
usually designed as twisted pairs (shielded or unshielded) or as coaxial cables.
Radio networks or wireless networks are based on radio connections. If the radio waves are
narrowly bundled, they are referred to as directional radio; if they are transmitted largely
omnidirectionally, they are referred to as broadcasting. The radio radius in broadcasting defines
a radio cell, which is why the term cellular radio is also used here.
The term topology describes the structure of the connections between computers. Depending on
whether the transmission paths are used exclusively by exactly two computers or jointly by
several computers, a distinction is made between two basic forms:
ˆ Star,
ˆ tree and
ˆ fully/partially meshed networks.
The star topology is historically the oldest form. The terminal networks of that time already
used this topology. There is a central node to which all other nodes are directly connected.
There are no further direct connections between the individual nodes. With this topology, the
connection costs for new nodes are low. The communication in the network can be controlled
centrally, which is relatively easy and cheap to realize. The functionality of the network is
independent of the failure of a peripheral node, but completely dependent on the stability and
performance of the central node.
The tree topology is a hierarchical continuation of the star topology in that outer nodes in turn
form the center of a (hierarchically subordinated) star. The communication between two nodes
is done by the next higher node in the tree, which is superior to both nodes, i.e. the control of the
communication is decentralized: every node that is not a leaf node is involved. The failure or
overload of a node higher in the hierarchy means the failure or impairment of all lower-level
nodes.
The general form is the meshed network . In a fully meshed network, each node is connected to
every other node by a direct link. This makes the network independent of the failure of
individual nodes. However, the connection effort is very high, which is why in practice only
partially meshed networks are usually implemented, i.e. each node is only connected to a part of
all other nodes.
CHAPTER 4. COMPUTER 7
Multi-point networks (also known as broadcast or multiple access networks) have only one
transmission channel that is shared by all network nodes. Any information sent by a source node
is received by all other nodes. The desired recipient must be identified via an address field in the
send data. All non-addressed network nodes also receive the information, but ignore it. Multiple
addressing is easy to implement using group addresses.
Typical multipoint network topologies are
ˆ Bus and
ˆ Ring.
In a bus topology, the individual nodes are connected via control units ( transceivers ) with the
continuous, common transmission medium, the bus. The bus topology is relatively inexpensive
to implement and allows nodes to be added and removed without any problems. The network is
independent from the failure of single nodes. The extension of a bus network is limited due to
physical boundary conditions.
Bus
Ring
In a ring topology, each node is connected to exactly two neighbors via a similar transmission
medium. There is a specified transmission direction, i.e. each node has a specified predecessor
and a specified successor. Transmitted data usually passes through the entire ring and is only
taken off the ring again by the transmitter. A ring network is not
CHAPTER 4. COMPUTER 7
The length of the network is not limited by the total size of the network, but by the length of the
single links between the nodes, so that wide networks are possible. If one node fails, the whole
network fails.
A central goal of computer networks is the interconnection of computers that are located at
different places. Thus, the maximum spatial distance that these computers may have from each
other is an important classification feature. A distinction is made according to ascending range:
ˆ Personal area networks (PAN) Smallest networks that are often set up in a close range of
max. 10 m around a person. They connect small personal devices (typically mobile phones,
laptops) for ad hoc communication or link these devices to a larger network. Common
transmission technologies are both wired (USB, Thunderbolt) and wireless (Bluetooth,
IR). Usually the number of network nodes is very limited, e.g. to a maximum of 8 devices.
ˆ Local area networks (LAN) Private networks that extend over several 100 m and connect
rooms, floors or buildings of a (company) site with each other. LANs are typically
implemented using multipoint topologies and use a reliable, fast transmission medium.
Examples are the LAN standards of the IEEE 802.3 (Ethernet) and IEEE 802.11 (WLAN)
families.
ˆ Metropolitan area networks (MAN) cover an area with a diameter of 50 ... 100 km in
diameter (e.g. an urban area or conurbation) and are essentially larger LANs. They are
mostly used as distribution networks between long-distance and local networks and are
generally public networks. MAN originally existed as a separate term due to the DQDB10
standard. Today, Carrier Ethernet (CE) is used in the fixed network sector. In the radio
sector, the WiMAX11 standard used to play a role. Today, the 4G mobile radio standard
LTE is being used.
ˆ Wide area networks (WAN) are networks that cover an area of several 100 ... 1000 km. As a
rule, this is a country. Long-distance networks are typically public networks. They are
designed for the transmission of information over long distances. Long-distance networks
are therefore characterized by a point-to-point topology in which the communicating
nodes (also called end systems) are connected to each other via internal nodes that serve
only as switches. These switching nodes or transit nodes are often also referred to as
routers.
In order to be able to reach a node of any other network from a node of a network, these
(sometimes very different) networks must be coupled together. The coupling is done by means
of special nodes, so-called gateways, in which all hardware and software implementations are
carried out. A set of such interconnected networks is called an interconnected network
(internet12 as short form for interconnected networks). A typical form of an interconnected
network is the coupling of several LANs by a WAN. Interconnected networks can be global.
Occasionally the term global area network (GAN) is used. This is a worldwide network that
connects several WANs (and sometimes LANs) by including satellites.
10
Distributed Queue Dual Bus, IEEE 802.6
11
Worldwide Interoperability for Microwave Access, IEEE 802.16
12
The spelling internet beginning with a lower case letter indicates that the term is used in a generic sense. In
contrast to that, the internet is a special, worldwide network.
CHAPTER 4. COMPUTER 7
The data transmission rate (also data rate or transmission speed) describes the amount of data
per unit of time that is transmitted via a transmission channel. It has become a- common to
roughly distinguish between
ˆ Narrow-band and
ˆ Broadband transmission
1 4
Figure 4.3: Line switching (switching nodes are drawn round, end nodes are drawn
square); connections 1 - 3 and 2 - 4 exist.
In store-and-forward switching, the information is transmitted from the source node via transit
nodes to the destination node on the basis of its destination address (analogy: letter post).
Depending on the procedure, the send information is temporarily stored in the transit nodes. If
the send information includes a complete message, this is also referred to as message switching.
Another variant is packet switching, in which the information to be transmitted is divided into
fragments of a certain length, so-called packets. Each packet is provided with control and
address information and sent individually. Packets that belong to the same message can reach
the recipient in different ways (see Figure 4.4).
CHAPTER 4. COMPUTER 7
R 3
1
μ1 :Z
4
Figure 4.4: Packet switching; four packets are routed on different paths from the
source node (Q) to the destination node (Z).
The last important classification feature is the protocols agreed upon in a network.
A network protocol or communication protocol is a precisely specified set of rules that defines
how information must be exchanged between communication partners that are on the same
network.
For a long time now, a communication protocol has no longer been designed monolithically, but
as a set of coordinated protocols, each of which describes only one aspect of the comprehensive
communication task. The totality of these cooperative protocols is called a protocol suite.
As with a construction kit, some of these protocols represent alternatives, others offer options,
and some protocols build on each other, i.e. to solve their own tasks they use services provided
by other protocols.
The structure of the interaction of all protocols of a protocol family is called network
architecture. Examples of well-known network architectures are the OSI architecture (Open
Systems Inter- connection), which was standardized by the ISO in 1984 as a reference model for
existing and future communication protocols, and the TCP/IP architecture of the Internet, which
will be the subject of this course.
The structuring concept for network architectures that has prevailed since the 1970s is layering.
The following analogy based on [5] is intended to illustrate this concept:
A German and a Frenchman want to communicate. Since they do not speak the same
language, they each hire an interpreter. The German wants to tell his partner about
his preference for the genus oryctolagus cuniculus. To do this, he uses the services of
his interpreter and gives him the message "I like rabbits".
The interpreter chooses English as the intermediate language, translates into I like
rabbits and uses the service of his secretary to send this message, for example by fax.
The secretary in France passes the received fax message on to her interpreter, who
recognises from the header that it is an English text, translates it into French and
passes it on to the actual target addressee as J'aime les lapins (see also Figure 4.5).
CHAPTER 4. COMPUTER 7
Germany France
ˆ
v
L: english L: english
Layer 2
I like rabbits I like rabbits Interpreter
ˆ
v
To: 00 33 . . . To: 00 33 . . .
From: +49 . . . . From: +49 . . . .
L: english L: english
Layer 1
Secretariats
I like rabbits I like rabbits
ˆ
v )
ˆ ˆ
Service
elements
req uest con mation resp onse indi ation
v r c t access point
v
Tues
ˆ ˆ
Service provider
In general, a service is characterized by an interaction between the service users and the service
provider. This interaction is described abstractly by a sequence of intersection events. These events
are also called service elements or service primitives. Four types of service elements are
distinguished, which are summarized in Table 4.1.
request triggered by the service user to request a service and the para-
meters that specify this service.
indication triggered by the service provider,
1. to indicate that a corresponding service has been requested by
the service user's partner instance and to pass the specific
parameters or
2. to inform the service user of an action initiated by the service
provider itself
response triggered by the service user in order to execute a service indicated by
indication.
to answer or serve the request
con rmation triggered by the service provider in order to receive a request from the
service user.
confirm or terminate a service request that has been triggered
Table 4.1: Service elements
The service elements are triggered in a specific order that is characteristic for the type of service
to be formed. For example, a confirmed service is described with all four service elements in the
order request indication response con rmation, which is already indicated by the arrow sequence
in Figure 4.6.
Other types of service require different sequences. The unconfirmed service (also called
datagram service), for example, uses only the request by one user and the indication to the other
user. The service triggered by the service provider is realized by only the indication at the user.
The request with indication by the service provider consists of a request by a user (request),
which is directly answered by the service provider with an indication.
CHAPTER 4. COMPUTER 7
Services can be combined with other services to form a new service. A connection-oriented
service, for example, is composed of the three sub-services:
ˆ Connection setup,
ˆ data transmission (English) and
ˆ Connection release.
A service is provided by the cooperation of active objects within the service provider. These
objects are called entities. Their cooperation is based on the exchange of information. This
includes the transmission of user information as well as the exchange of control information.
Corresponding instances of the same layer are called peer entities. The rule for the cooperation of
the peer entities is the protocol (cf. Figure 4.7, for the term protocol see also Section 4 . 1.2.7).
Service user
Service user
Service issues nte ˆ
ˆ
v t access point
v Tues
v ˆ Protocol v ˆ
)
Instance ( Instance
Service provider
The information units exchanged between the partner instances for protocol processing are
called protocol data units (PDU).
A PDU consists of the Service Data Unit (SDU), which is provided by the service user, and the
Protocol Control Information (PCI), which is required to control the communication with the
partner instance (see Figure 4.8).
The protocol control information is either specified (via the parameter list of a service element)
by the service user (e.g. destination address, transmission quality requirements) or generated by
the partner instances themselves (e.g. assignment of sequence numbers when splitting into
smaller units, addition of redundancy for error detection). It is referred to as the header.
13
A logical connection between end systems can also be established in a memory-switched network. It is
called a virtual connection.
CHAPTER 4. COMPUTER 8
v N + 1-PDU becomes
PCI + SDU to the N -SDU
N
PCI SDU PDU
v
v
and/or trailer is attached to the SDU and forms the PDU together with it. The exact structure of
a PDU is defined by the respective protocol.
With the exception of the lowest layer, the exchange of protocol data units is exclusively
virtual, i.e. by using services of the respective lower layer. Each instance of any layer N > 1 is
therefore at the same time a service user of a service provided by the layer (N 1). Only on the
lowest
- layer (N = 1) physical communication takes place.
The result is a model of several layers stacked on top of each other. Each layer describes the
projection of the communication into a certain abstraction level N , on which a well-de-
nized part of the entire communication process is processed. The degree of abstraction of the
layers and thus the number, designation, content and function of the layers differs from one
network architecture to another.
The information uss through a layer architecture is shown as an example in Figure 4.9. A PDU
M is to be transmitted from a protocol instance on layer 5 to its counterpart. The PCI at
layer i is noted as Hi in each case. The SDU at layer 3 is decomposed into two parts. H4 and
the beginning M1 of M form the first part, and the end M2 of M forms the second part. At
layer 2, a part of the PCI is transmitted as the appendix T .2
layer 5 protocol
M ( ) M
ˆ
v Layer 4 protocol
H4 M ( )H
4 M
*Y
Layer 3- Protocol
c a
H3 H4 M 1 H3 M 2( )H 3 H4M 1 H3 M
2
ˆ ˆ
v Layer 2-
v
H2 H3 H4 M 1 T2 H2 H3 M 2T2 (Minutes) H2 H3 H4 M 1 T2 H2 H3 M T2
2
ˆ
... 010010111011011100 ...
A network architecture describes all layers of a computer network as well as the protocols used
on these layers.
All of today's network architectures are based on the OSI reference model, which was
standardized by the ISO (International Standards Organization) in 1984. The OSI model (Open
Systems Interconnection) de nes seven logical layers, each with clearly defined tasks, but
independent of concrete technical implementations.
Although the Internet network architecture or TCP/IP architecture already existed when the
OSI model was created, it also fits into this model.
Strictly speaking, there is no o cial TCP/IP architecture. However, on the basis of the Internet
standards that have been developed, it is possible to identify four layers that can be largely
distinguished from one another:
The network access layer is not described in detail in the TCP/IP architecture because it is
hardware-dependent. Nevertheless, we want to take a look at this layer (host-to-network layer).
In the OSI reference model, its task is represented by two layers:
The transport layer, like the data link layer but at the level of end-to-end communication,
implements secure data transport. In addition, it realizes the adaptation of the transmission
quality (Quality of Service, QoS) required by the superordinate layer to the conditions of the
network layer or transit system (e.g., throughput, delay time, delay fluctuation, loss rate). The
transport layer provides its superordinate layer with the abstraction of a transport of arbitrarily
long data streams between end partners with selectable quality.
Finally, the application layer, as the uppermost layer14 and at the same time interface for
application programming, provides all services of the network in an application-supporting way.
Thus, the application layer is the only layer in the TCP/IP model that not only takes care of the
pure transport of information, but also considers the character of the information. Typical
application-related services are
z. e.g. file transfer, interaction with a remote computer, exchange of electronic mail, access to
networked multimedia documents and others. The communicating application processes can
abstract from all transport details of the underlying network. They themselves do not belong to
the application layer.
The overall representation of the architecture is shown in Figure 4.10. Here, end-to-end
communication takes place via a switching node. The usual names of the protocol data units for
the different layers, such as message, segment, packet, frame and bits, are also shown.
Network 1Network 2
14
The OSI reference model differs from the TCP/IP architecture model mainly in that the application layer is
divided into three layers, namely the session layer (OSI layer 5), the presentation layer (OSI layer 6) and the OSI
layer 7, which is also called application layer.
The function of the communication control layer is to manage the logical connection of application processes.
This is referred to as a session. This includes, for example, the multiplexing of several logical connections
into a common transport connection as well as the monitoring of possible rights restrictions of the
communicating partners or, if necessary, the protection of the session by cryptographic measures.
The representation layer ensures that the meaning of the transmitted data is preserved on the different systems. For
example, it may be necessary to transform binary number representations, character sets or data compressions.
CHAPTER 4. COMPUTER 8
Several protocols can be used per layer, which either complement each other or can be used
alternatively to meet specific transmission requirements.
The totality of all protocols of a network architecture is called a protocol suite. The most
important protocols of the TCP/IP protocol family are listed in Figure 4.11 and are discussed in
the following sections. All protocols are standardized by the Internet Engineering Task Force
(IETF) and published as Request for Comments (RFC)15. The name is derived from the review
phase of each standard, in which a draft is publicly submitted for discussion.
Layer 5
DNS DHCP
BGP RIP
UDP
TCP Layer 4
ICMP
OSPF IGMP Layer 3
IP
ARP RARP
IEEE 802.x
LAN protocols
Layer 2/1
PPP
A digital signal is a rectangular pulse train over the time axis, which typically distinguishes two
amplitudes (signal level). An amplitude change t a k e s place only at discrete points in time,
each representing an integer multiple of a constant time interval. The two signal levels each
describe a bit value, the time interval is called bit period. The rectangular pulse sequence thus
represents a string of individual bits that all have the same bit period (see Figure 4.12).
ˆ
H
L ) t
0 1 0 1 1 0
However, this representation represents an abstraction. Let us first consider the case where, as is
usual within a computer, a digital signal level is represented by a voltage. Let us further assume
that there is a square-wave pulse generator which can generate an ideal digital signal (which real
generators can only approximate). If this signal is now transmitted via an electrical double line,
forced effects occur which change the signal curve.
A double line not only has a resistance, but also an inductance and a capacitance (see equivalent
circuit diagram for a homogeneous line in Fig.
4.13 b). At the other end of the line, one will observe an oscillation as shown qualitatively in
Figure 4.13 c. The exact shape depends on the pulse frequency and the actual physical line
properties. The exact shape depends on the pulse frequency and the actual physical line
properties.
ˆ ˆ
) )
Figure 4.13: Transmission of a square wave pulse on a double line (input (a),
equivalent circuit (b) and output (c))
The theoretical background of this e ect was shown by Jean-Baptiste Fourier16 . He proved that
every signal can be represented as a superposition of several (generally infinitely many)
simple sinusoidal signals with different frequencies and amplitudes.
A typical property of any physical transmission medium is to filter certain frequencies. This is
the reason why signals are distorted in a certain way depending on the transmission medium:
certain frequencies are simply suppressed by the medium.
In addition to this frequency-dependent degradation of a signal, other interferences in the
transmission channel can change a signal. Here, the length-dependent degradation of the medium
16
French mathematician and physicist, 1768 - 1830
CHAPTER 4. COMPUTER 8
(ohmic) attenuation, the noise in the transmission channel and the statistical coupling of
interference signals.
However, by digitally interpreting a signal that has been disturbed (at most to a certain degree),
the original representation can be recovered: the signal amplitude is sampled approximately in
the middle of a bit period. The sampled value (from a continuous spectrum of values) is mapped
to a permissible value by rounding (quantization), i.e. a distorted value can still be interpreted
accurately within certain tolerances. The particular advantage of digital transmission technology
lies in this immunity to minor interference.
The representation of bits by physical signals can be done in different ways. This mapping is
called line coding.
A square pulse method has already been used in Figure 4.12. There a two-valued coding was
used, which associates a 1 with the higher and a 0 with the lower signal level. The respective
level remains unchanged for the entire bit period. Such an encoding is called NRZ encoding
(non-return to zero).
There are a variety of other square pulse techniques used in practice which have different
strengths and weaknesses. The NRZ code is easy to generate, but there is a danger that an out-
of-time receiver will not be able to tell how many consecutive equal bits have actually been
sent.
101100
NRZ
RZ ternary
Manchester
Di erential Manchester
Codings that avoid this problem are called self-synchronizing codings or clock recovery
codings. An example is the RZ coding, which returns to the zero level in every bit periode
center. However, this encoding must distinguish a total of three signal levels - it is therefore a
ternary code17. Figure 4.14 shows,
17 In
the literature, one can also find two-valued RZ codes, which, however, are not self-clocking.
CHAPTER 4. COMPUTER 8
that even with successive bits with the same value, the signal level changes in the middle of the
bit period. However, compared to NRZ encoded data, twice the clock rate is now required.
A self-synchronizing code does not have to be trivalent. Two-valued examples are:
ˆ the Manchester coding and
ˆ the Di erential Manchester coding.
Manchester coding uses level changes for coding: a 0 corresponds to a change from the higher
to the lower signal level, a 1 conversely to a change from the lower to the higher signal level18
(see Figure 4.14).
Di erential Manchester coding also uses a signal change in the middle of each bit period.
The difference is the significance of the start of the bit period for the logical interpretation: a 0
requires a signal change at the start of the bit period, a 1 requires no signal change.
The advantage of di erential coding is on the one hand that it is often easier to detect changes
than to detect levels. On the other hand, the polarity of the lines no longer plays a role in this
coding: only changes are considered and no levels are interpreted. Thus, the logical
interpretation does not change if the levels are reversed.
Square-wave pulse methods are used on electrical lines. They are based on the distinguishability
of amplitudes. Due to an unavoidable line attenuation, they are suitable at most for short
transmission distances (LAN area). Over longer distances or in cases where only transmission
channels are available for which limited frequency bands are specified (this includes optical or
radio waveguides), modulation methods are used.
In modulation processes, a (high-frequency) wave serves as a so-called carrier signal. Since a
simple wave does not yet transport any information, the digital information is imprinted by a
targeted, time-dependent change of this wave (lat. modulation)19.
This, in turn, can be done in different ways. A distinction is made between three basic procedures:
ˆ the amplitude modulation,
ˆ the frequency modulation and
Amplitude modulation (also known as amplitude shift keying, ASK) changes the amplitude
of the carrier. In the simplest case, two amplitudes are used, A(1) and A(0). In principle, A(0) =
0. However, since it is technically more advantageous if the receiver can receive the carrier
wave at any time, one usually uses A(0) > 0 (see Figure 4.15).
Amplitude modulation depends on the distinguishability of amplitudes, i.e. if the signal
attenuation is too large, there is a risk of misinterpretation of the amplitudes. This risk can be
avoided by always transmitting the same amplitude but varying the frequency of the carrier
wave instead (see figure 4.16). This procedure is called frequency modulation (also frequency
shift keying, FSK).
18
Some authors de nerate the logical values exactly the other way round.
19
Digital modulation is also called shift keying.
CHAPTER 4. COMPUTER 8
(1 )( 0 )
(1 )( 0 )
In phase modulation (also known as phase shift keying, PSK), information is transmitted by
changing the phase of the signal. In the simplest case, the phase of the carrier wave is shifted
by either 0º or 180º (see Figure 4.17).
(1 )( 0 )
By combining these basic methods, derived modulation methods can be developed. Quadrature
amplitude modulation (QAM), which combines amplitude and phase modulation, is widely
used. For example, by distinguishing four phase positions and two amplitudes, eight
distinguishable carrier modulations (so-called symbols) can be generated. In this case, one
speaks of 8-QAM.
In data transmission one is mostly interested in the theoretical upper limit of the transmission
speed, i.e. the maximum data transmission rate of a transmission channel. The key parameter
here is the bandwidth.
4.2.3 Bandwidth
According to the (continuous) Fourier transform every signal can be represented as an integral of
spectral oscillations over the continuum. In practice, however, a signal representation with
spectral frequencies
-∞ up ∞to + is impossible, either because of physical properties of the
transmission medium, or because the use of certain frequencies is either not allowed or
technically not accessible.
CHAPTER 4. COMPUTER 8
If the frequency range usable for the signal representation is characterized by a lower limit fmin
and an upper limit fmax, the di erence between the two is called the bandwidth of the
signal. If fmin = 0, this is referred to as the baseband.
The (digital) reading of a transmitted signal is done by sampling and quantization.
By the sampling theorem of Nyquist20 it is stated that in the ideal (undisturbed) medium any
signal in the baseband transmitted with a maximum frequency component fmax can be
completely reconstructed by a sampling rate fabtast if holds:
fabtast > 2 - fmax.
will continue...
20
Harry Nyquist, Swedish-American electrical engineer, 1889 - 1976
Bibliography
[1] Andrew S. Tanenbaum, Herbert Bos: Modern Operating Systems, Pearson Studies, 4th Au
age, (2016).
[2] William Stallings: Operating Systems Internals and Design Principles, Pearson Education,
9th
ed, (2018).
[3] Abraham Silberschatz, Peter B. Galvin, Greg Gagne: Operating System Concepts, John
Wiley & Sons, 10th ed, (2019).
[4] TechnologyUK: Operating System Architecture, https://www.technologyuk.net/compu-
ting/computer-software/operating-systems/operating-system-architecture.shtml.
[5] Andrew S. Tanenbaum, David J. Wetherall: Computer Networks, 5th edition, Pearson
Edu- cation (2011).
[6] James F. Kurose, Keith Ross: Computer Networking A Top-Down Approach, Pearson, 7th ed.
ed, (2017).
[7] Anatol Badach, Erwin Ho mann: Technik der IP-Netze, 4th Au age, Hanser (2019).
[8] Barry M. Leiner et. al: A Brief History of the Internet, Version
3.32,
http://www.isoc.org/internet/history/brie
f . s h t m l , (2003).
[9] Claus Kalle: Das Internet in Deutschland Ein alter Hut?, KOMPASS Nr. 64, Mitteilungen
des Regionalen Rechenzentrums der Universität zu Köln,
http://www.uni-koeln.de/rrzk/kompass/64/wmwork/www/k64_15.html, (1995).
[10] eco Electronic Commerce Forum Verband der deutschen Internetwirtschaft e.V.: Deutsches
Internet vor 1995, http://www.10-jahre-internet.de/2005/index.htm.
[11] IETF Internet Engineering Task Force: RFC Repository, http://www.ietf.org/rfc.html.
[12] William Stallings: Data and Computer Communications, 10th edition, Pearson Education
(2014).
89