Professional Documents
Culture Documents
by MEHMED DUHOVIĆ
August 2017
APPROVAL PAGE
MEHMED DUHOVIĆ
Graduation Committee:
ii
DECLARATION
I hereby declare that all information in this document has been obtained and
presented in accordance with academic rules and ethical conduct. I also declare that,
as required by these rules and conduct, I have fully cited and referenced all material
and results that are not original to this work.
iii
INTERNATIONAL UNIVERSITY OF SARAJEVO
Signature Date
iv
ABSTRACT
operating system and to explain the process of building the software and the
dependencies and elements that the system consists of. In addition, the thesis briefly
explains the reasons for utilizing C++ as the programming language of choice for the
project and describes the process of booting the computer. Furthermore, computer
organization and structure are briefly mentioned and labeled. Finally, the components
of the system including the structure of the base of the system , and the more
v
Contents
APPROVAL PAGE ................................................................................................................... ii
DECLARATION ...................................................................................................................... iii
LIST OF FIGURES ................................................................................................................ viii
LIST OF ABBREVIATIONS ................................................................................................... ix
ACKNOWLEDGEMENTS ....................................................................................................... x
1. INTRODUCTION ............................................................................................................. 1
1.1. OBJECTIVE .............................................................................................................. 1
1.2. THESIS STRUCTURE .............................................................................................. 2
2. BACKGROUND ............................................................................................................... 3
2.1. THE PROJECT .......................................................................................................... 3
2.2. C AND C++ PROGRAMMING LANGUAGES ...................................................... 3
2.2.1. OS RELATED CONCEPTS .............................................................................. 6
2.2.2. SPECIFICATIONS ............................................................................................ 9
2.3. RELATED WORK .................................................................................................... 9
3. THEORETICAL BASIS .................................................................................................. 11
3.1. COMPUTER ORGANIZATION ............................................................................ 11
3.2. BOOTPROCESS ..................................................................................................... 12
3.3. STRUCTURE .......................................................................................................... 14
4. REQUIREMENTS ........................................................................................................... 16
5. DESIGN AND IMPLEMENTATION............................................................................. 18
5.1. DESIGNING THE BASE OF THE SYSTEM ........................................................ 19
5.1.1. LINKER ........................................................................................................... 19
5.1.2. KERNEL .......................................................................................................... 20
5.1.3. MAKEFILE ..................................................................................................... 22
5.1.4. LOADER ......................................................................................................... 23
5.1.5. PRINTF ............................................................................................................ 25
5.2. ENVIRONMENT .................................................................................................... 26
5.3. MEMORY SEGMENTS AND PORTS .................................................................. 27
5.4. GLOBAL DESCRIPTOR TABLE .......................................................................... 29
5.5. HARDWARE COMMUNCATION ........................................................................ 32
5.6. INTERRUPTS ......................................................................................................... 34
vi
5.6.1. KEYBOARD ................................................................................................... 35
5.6.2. MOUSE............................................................................................................ 38
6. CONCLUSION ................................................................................................................ 41
REFERENCES ........................................................................................................................ 44
vii
LIST OF FIGURES
Figure 8 - Displaying the mouse and setting borders on the window .......................... 40
viii
LIST OF ABBREVIATIONS
AX Primary accumulator
BX Base register
IP Instruction Pointer
SP Stack Pointer
ix
ACKNOWLEDGEMENTS
Firstly, I need to say that this thesis would not exist without the help and support
of a number of great and important people — my thanks and gratitude to all of them
for being part of this journey and making this dissertation possible.
I want to start with thanking my family for their continuous and unparalleled
support, love and encouragement. I want to express gratitude to my parents for giving
I would like to express my gratitude towards my mentor - Assoc. Prof. Dr. Kanita
Karađuzović - Hadziabdić for the help, support and encouragement given during the
writing process of this dissertation. I also want to mention all the other professors
Finally, I want to thank all my friends and colleagues for the wonderful times we
Mehmed Duhović
x
1. INTRODUCTION
1.1. OBJECTIVE
system written in C++ and assembly, constructing the kernel image from scratch,
building and linking the kernel image and making a bootable file for virtualization of
the operating system. In addition, the thesis presents descriptions and explanations of
hardware, interrupts and interrupts manager, and adding and managing drivers
The goal is to build a stand-alone working (albeit simple) operating system written
Apart from the project that is built and that is accompanying the thesis, the thesis
describes the reasons why C++ is used as the language for the project, the
specifications of the program and gives a comprehensive guide in the architecture and
the computer systems itself, and ends with a short conclusion regarding the possible
uses of the system and the future of this project. In addition, during the dissertation
mentioned.
1.2. THESIS STRUCTURE
and the various functionalities, and challenges posed by implementing such a piece of
software are comprehensively addressed. The field of Operating Systems is vast, and
in this dissertation only a part of the most important concepts is mentioned along with
introduces the concepts and technologies used for the project and provides reasons
why C++ is used as the programming language for this task, Chapter 3 (Theoretical
Basis) discusses the theoretical basis and the computer organization, Chapter 4
(Design and Implementation) presents the idea behind the project and the process of
both designing and implementing this kind of program and its dependencies, and
finally Chapter 5 (Conclusion) tries to conclude the thesis and give possible uses of
the project.
2
2. BACKGROUND
The myOS, which is the innovative name of the project, is built mostly in C++,
with the assembler code being added only when necessary. C++ is an OO (object-
Functional languages have grown out of favor lately because they are considered
inefficient and time-consuming and because they put more emphasis on constructing
the compilers than on the language and the writing process. The language that is most
commonly used for writing Operating Systems is C. The Windows and Mac kernels
and basically all of Linux are written in C. A brief description of both C and C++ can
development. Not only is C the language of operating systems, it has also inspired
almost all of the most popular high-level languages in the use today. In fact, during
the development of the programming language itself, Dennis Ritchie (the creator) has
imagined C as the language to write embedded systems and application software in. C
code, but still advanced enough to be easy and understandable to use. C has a lot
3
features that make it the ultimate programming language for OS development
it has become the most used language in the field of Operating System
development and system programming. Also the early operating systems were
written in C and changing them now would cause more trouble than it is
worth.
RAM. It is useful for writing data into Memory mapped I/O, DMA
pointers.
addresses C can copy data do different locations, change and flip bits around
and in small amounts it can process data. This is really beneficial to the
because it can directly be translated to machine code some commonalities and partials
outside the realm of the high-level language the programmer chooses for writing the
OS. Another reason C is overlooked for this project is that procedural programming is
going out of fashion. Apart from those drawbacks C doesn‘t support object oriented
encapsulation and other advanced concepts. Another not so popular C element is that
C has no support for reusability of code doesn‘t provide any mechanisms of data
security, and there is no Run Time checking which makes it very difficult to fix bugs.
[2].
5
The reason C++ is chosen as the language to write the OS in this project is
because Assembly itself is quite tedious and can lead to many errors which can be
overlooked by the compiler. While the programmer would need to write at least a
environment for the OS written in C, in C++ only the first-stage bootloader, the
interrupt assembly file, interrupt file that has high importance and a couple of inline
instructions are necessary to write for the system to function properly. Most of the
functionalities that assembly takes care of can be managed by C++, and when that is
not possible, problems can be solved by writing simple inline assembly instructions.
Basic functions, such as writing the Global Descriptor Table in C require special
opcodes not available inside the C language, and writing the Interrupt Service
Routines require special handling by the CPU and not by the C languages, which
involves an extensive use of assembly language. To add, C++ is used is because C++
needed to use them. The programmer can use virtual functions, templates, classes,
important it is compatible to the C language. But in a sense the languages are quite
Systems (albeit less unknown) are also written in C++: BeOS, Chorus, Haiku,
Amoeba.
6
Below are mentioned some of the more important concepts and principles that this
the kernel into the memory, provide the accurate information to the kernel,
transfer control to the kernel when it requires it, and provide an adequate
environment for the kernel to run. It is important to note that the term
functionalities, and for the CPU, the bootloader is just a piece of software
Virtualization Program for the OS, because it loads the OS from a storage
device, or more than one if the multi-boot feature is enabled, and then it
runs the Operating System‘s startup procedure. In this project the GRUB
bootloader is used. GRUB brings the system into 32-bit protected mode
The Magic Number – Identifies a certain protocol or file format and gives
strongly typed data which the controlling programs can read and use. In
either from a software running on the computer telling the CPU to stop
7
what it is doing and immediately do another task. They are divided into
Interrupts (generated by the software, signaling the kernel that it needs its
attention).
manage all the important device functions such as starting the device,
closing the device, implementing the interrupt table for the device and also
reading and writing from the device. Some devices are both readable and
writable, some can only be written or read to. For example the console can
only be written to, and the keyboard can only be read from.
computer finds itself right after turning on the PC. It is a rudimental phase
ability to keep control of programs, there can only be one task at the time
compared to Real Mode. Basically, it fixes all problems the Real Mode
has, including the ability to keep programs from crashing each other
(called Protection, this is how Protected Mode got its name), there are
ways to access memory like paging, there are 4GB of address space. If it
8
2.2.2. SPECIFICATIONS
MyOS project was built on a computer running the Ubuntu kernel Linux Mint,
running as a virtual device inside the Windows 10 Operating System. The machine
runs on an i3 processor and has 4 GB of RAM, and has both integrated and devoted
graphics. The Virtual OS has received 2 GB of RAM (more than necessary), and 8
GB of HDD memory, in case the packages do require more size. More about packages
different types of operating systems types such as single- and multi-tasking systems,
running either one program at a time, or more, or single- and multi-user, either
identifying resources or processes belonging to multiple users or only one uesr, and
giving access for multiple users to interact with the system. In addition, there are
distributed, templated and embedded operating systems together with real-time and
library operating systems. Two main architectures are monolithic kernels and
operating systems, but they never gained the required fanbase to take over the market
of operating systems.
Some of the most popular monolithic kernels are Linux, written in 1991 by Linus
Torvalds, while he was a university student. He didn‘t wrote it alone, because he got a
lot of help and assistance from volunteers who (together with him) created a complete
9
and functional kernel. Some more relatable projects are ToaruOS 1.0[4] which is a
hobby operating system, with kernel, userspace and graphical user interface. that
managed to make its full release. Another good example is HaikuOS[5], which is
currently still in development. Haiku is also an open source operating system focusing
10
3. THEORETICAL BASIS
system is a software which controls the computer resources, and provides an adequate
environment for the user to execute programs. The operating system is vital to both
the User and the Hardware. The user sees the operating system as an intermediary
between himself and the computer, designed to maximize the work that the user is
performing. From the hardware‘s point of view, the OS is a program which manages
memory and other system resources, it imposes security policies, schedules threads
and processes, launches and closes user programs, and provides a0 user interface to
of software which manages the operations of a computer system, its resources and it
serves as an intermediary between the user and the computer system itself. The
computer system is made of hardware resources such as the central processing unit
(CPU), the memory, input/output (I/O) devices and application programs. One of
the most important parts of the Operating System is its Kernel, which is a hidden
program running at all times on the computer. Kernel manages access to resources
and handles interrupts (hardware driven events) and system calls (software driven
events). When the CPU is interrupted it stops every other activity and it immediately
11
3.2. BOOTPROCESS
but it is important to note how an OS is booted from the CPU. On the motherboard,
which connects all other parts of hardware together exists BIOS, which is a set of
When the computer is initially booted the BIOS checks the state of the
computer, calculates the dependencies and then loads the operating system into the
RAM by sending data to the motherboard and copying the contents of that data to the
RAM. The data is sent in form of assembly commands. That is called a bootstrap
program, it is the first program that the computer runs. Registers are memory holders
which are small enough to hold instructions, addresses or any other kind of data.
Some of them are AX, BX, EX, IP and SP. The bootstrap program tells the IP
(instruction pointer) to read the first block of the code just submitted to the RAM. The
CPU reads those instructions, and executes them. The bootloader is another
important program. While the bootstrap program loads the contents stored in fast
12
accessing memory, and automatically executes those instructions when the computer
is run, the bootloader runs before any other operating system but after the bootstrap.
Bootloader loads the operating system, and usually every operating system has a
specific set of bootloaders. The GRUB bootloader is used for this project and more
about the GRUB is written following chapters. The bootloader loads the kernel into
the RAM and the Instruction Pointer moves again to the beginning of the kernel –
The bootstrap will initialize all the system elements – memory, registers and
controllers. The only thing the bootstrap program needs to know is how to load the
operating system and how to execute the operating system. In the state in which the
bootstrap program runs, all other processes are halted until later when they the kernel
activates them. Intel CPUs are backwards compatible, meaning that during the
booting stage, in the primitive power up state they behave like the original Intel 8086
13
3.3. STRUCTURE
The computer will also need storage to store and retrieve the computer
programs and there inside the computer organization there is only one large storage
area that the computer can access directly and that is the RAM (random access
memory). The programs are typically executed by first fetching the instruction and
storing it into the above mentioned instruction register that holds the instruction
currently being executed. After that, the instruction can be executed in multiple ways,
it can be remodeled and stored in another register, it can generate new data, or it can
store the values into the memory. Another way of storing data, albeit a lot slower than
the main memory is the secondary storage, provided by the computer systems to hold
managing I/O devices. To add any kind of Input/Output functionality device driver for
each controller is needed. The instructions to I/O operations are stored in the registers.
If a key is pressed, triggering the interrupt handler the registers that store the
instructions will need to determine what action to take, and then the character is
stored in a buffer. After the operation finishes, the execution goes back to the
operating system.
CPU. The program itself is a non-executable and passive entity, and by the time it
process. When the word processing program is idle, then it is a program. The
operating system manages resources to accomplish the processes running time. The
14
OS is responsible for creating and deleting processes, stopping and resuming
An important point to note here is that although modern computer all run in
64-bit modes (and it has become more or less a standard in the modern age), when
myOS is booted (or any other OS for that matter) they start in the 32-bit compatibility
mode. The whole project is built on the basis of compatibility for the 32-bit machines.
15
4. SYSTEM REQUIREMENTS
This section mentions the necessary packages and dependencies needed for
managing files inside the system, and translating them into objects. If the system is
built on an Linux OS or any distribution of a Linux OS, most of those packages are
A note regarding the flat binary execution. Everything would be simpler if the
main() function (or any other entry point routine) of the C or C++ program would be
the first byte in the memory of the system. Most older computers had the flat binary
execution, meaning that as soon as the computers would boot up the first byte would
run (mostly an environment to store the operating system, or the operating system
itself). During the process of writing a system software it is possible to overwrite that
installed. The sudo apt-get install <packagename> is typed into the Linux Mint
terminal to install the packages. This command enables downloading and installing
the CPU that this user runs a certain command as another user, mostly a super user or
as the ‗root‘ user. Apt-get is a command line tool to provide program manipulations
from the packaging tools to the user. Those commands can install, update, delete and
The first package to be installed if it is not already is the g++ compiler of the
GNU Compiler Collection. GNU is already installed on most Linux systems, and the
16
g++ compilers main role is to enable simple compiling of C++ code in the command
line. The g++ is a useful and practical package, because it not only compiles single
C++ programs but also can compile multiple files and has the ability to add options
and flags to the compilation process, such as turning on the debugger or optimization.
Another package, also part of the GNU Collection is the binutils package, which is a
collection of binary tools. It is mostly used for linking and assembling files, but is also
often used as a way to do low-level manipulations on files. The two most common
commands used for this program will be the as (GNU assembler) and the ld (GNU
linker). The last package needed is the libc6-dev-i386 which contains necessary object
files which link and compile programs. The libc6 package allows for better support
The system implements the keyboard and mouse drivers (most important
functionalities) by connecting them through ports and sending the signals to the CPU
where the next steps are taken – communication between the hardware is the basis for
the drivers to work, and is the necessary requirement to enable PS2 device support for
17
5. DESIGN AND IMPLEMENTATION
In this section, insight into the implementation of the myOS is given. A section
explained. The nuisances and problems of building such a piece of software are also
elaborated on.
There are a couple of requirements that needs to be fulfilled before starting this
project. An environment in which C++ and Assembly code can be written (the
requirement are the compilers used to compile the C++ and Assembly code. More
about them is written in the following sections. A very important think to take into
account is that the system is divided into a number of different subsystems with their
dependencies, meaning that there is no clear system from the beginning and that most
of the program is written by starting one file, then moving to another, then returning
back to that file or creating a new one. The system is written by developing chunks of
18
5.1. DESIGNING THE BASE OF THE SYSTEM
5.1.1. LINKER
The linker script combines the object files compiled, manages the data
between those object files and maps the input files into the output file, and controls
memory management of both the input and the output files. The linker is used in case
the output files came out of different compilers. In the header the linker had received
the information such as the output architecture and the output format. Another
important header declaration is the declaration towards the loader function inside the
loader.s file, telling the linker where the starting position is. The linker scripts uses the
SECTIONS command, which loads the code at the memory written in the file.
Sections have similar properties regarding to their flags and user accessibility and are
grouped accordingly. The code is a non-modifiable entity and the flags given to the
code are read and execute. On the other hand all the variables can be modified. If the
code is supposed to be loaded at the address 0x010000, then the following command
is written:
SECTIONS
{
. = 0x0100000;
The ‗.‘ symbol is the special symbol which is the location counter. If there are
no other addresses of the output, the address is being set as the current value of the
special symbol. .text, .data and .bss are output sections, inside the braces of the
output sections we write the input sections which will be placed. The sections are later
19
followed by page-aligned sections. For example the code below tells the linker.ld file
that all multiboot, text and rodata files should be considered input files:
.text:
{
*(.multiboot)
*(.text*)
*(.rodata)
}
The symbol ‗*‘ is the wildcard symbol which matches all files with the same
name. The *(.multiboot ) command tells the linker that he should check for all input
files with the .multiboot extension. Since the linker starts at the 0x100000 and the first
output section is .text, that output section gets the address of 0x100000. After those
lines we come to the .data output file which gets the addresses immediately after the
.text output file. In the same way, the .bss output file gets the memory space
5.1.2. KERNEL
The kernel file is written as a regular cpp file. The main method is the
function, and also initializes a GDT, Interrupt Driver, Keyboard and Mouse drivers
among others. The main function of the kernel file is that it serves as the starting point
of the execution of the Operating System. A more complex kernel would also have
the tasks of managing memory and processes, scheduling processes, and other
advanced features.
20
For the printf function to exist and properly function and currently it doesn‘t
(there is no Operating System that exists, there is no printf function defined anywhere
in the system) and it has to be written. Thankfully, the buffer which stores text
memory is located in the physical memory (at 0xb800) and is defined as the unsigned
short data type which means that it stores 16-bits of memory space, which is then
divided into ‗upper‘ 8-bits and ‗lower‘ 8-bits. The upper bits tell define the
foreground and background colors (there are 16 colors to choose from, for myOS
black and white have been chosen as the default colors), and the lower 8 bits tells the
screen controller what to draw on the display. For example – if the word ―Hello‖ is to
be printed to the display, the memory would store the elements like this:
The for loop inside the kernel.cpp file tells the program to loop through the
memory and as long as there exist a string and as long as the bytes are not
overwritten, copy the string to the memory location. Another loop, this time the while
loop exists in the main method, telling the kernel not to stop the execution at any
point in time, basically going into an infinite loop - while(1). To make sure that the
bytes are not overwritten hex code 0xFF00 will be AND-ed to the stream (FF stands
for black foreground and background, 00 for empty memory space), and then the
result of that will be OR-ed to the character to output. If we try to compile the
21
kernel.cpp file while the only thing inside the file is the main method, an error will be
returned. Why does that happen? Because the myOS hasn't got any standard I/O
libraries to include in the header of the cpp file and the printf() command has no
operating system to encapsulate the command itself. There are two ways this could
work – either by including those files via a third-party plugin for example by
installing glibc, which defies the purpose of writing an operating system or writing the
commands. At this point nothing exists. No dynamic libraries exists to hold the
commands. Every command needed from now on should be written inside the project
to actually work.
5.1.3. MAKEFILE
Makefiles are simple ways to organize bigger chunks code for small, medium
or bigger sized projects using automation. Makefile written for a project can easily
and effectively build the whole project without the need to compile each file alone.
For example the instructions below simply mean, build an object file from this cpp
file, and use the g++ compiler and the parameters variable. The output will be the
%.o: %.cpp
exceptions -Wno-write-strings
22
Most makefile rules consist of targets, dependencies and commands. In the
myOS the makefile takes care of compiling the .o files from the .cpp files using the
g++ packet with parameters, and the .o files form the .s files using the assembler
command with parameters specific for assembler. In both cases we name the output
file as target file and input file as the input file (for the command). The mykernel.bin
is also created, depending on variable objects (they are the compiles .o files), and the
linker file. As mentioned above, the linker links the input files, and outputs the
mykernel.bin (the basic kernel). The next command in the makefile is the boot
command that uses the mykernel.bin file and copies the kernel to a specific folder
location. The makefile will be later improved as more files are added to ease the
5.1.4. LOADER
finished loading the image, and the function of the loader file is to write and add all
the necessary low-level specifications for the stack pointer to appear and in addition
add the required memory to run the boot. The first thing done is setting the stack
variable which is declared as being at least 2 MB in size to stop the stack pointer from
overriding the boot memory and the grub memory. The most important thing that the
loader needs to do is to load the kernelMain() method by calling it and hoping that he
linker will make the object file before the call is made, so that the call can be
successfull. Another important thing to note here is that again, the loader should never
stop working, and to make sure that never happens another infinite loop is written. For
23
the bootloader to recieve the loader file, a magic number is necessary. A cryptic bit of
code is the beginning of the loader.s file are just to create header variables.
.section .multiboot
.long MAGIC
.long FLAGS
.long CHECKSUM
The purpose of these three lines is to create variables for running a multiboot
header. The magic number is an encoded number which tells the bootloader where the
header is. The flags are using the bitwise left operator basically moving the bits so
they are either 1 or 2. And the checksum checks the header, if the checksum is a
number the header accepts the OS can be multibooted. An important prerequisite for
the whole project is a stack, and creating a stack is fairly simple in assembly by just
moving the register point to the end of an area of free memory. And the stack is
usually pushed to the .bss segment (block of free static variables), having mostly
24
uninitialized memory segments, where it can exist without the fear that something
may go wrong.
5.1.5. PRINTF
The printf() contained inside the glibc library is a complex piece of code. In
unsigned short data type. The memory address 0xb8000 maps the display driver, it is
initialized as a pointer to the unsigned short value. That value is anded to the variable
0xFF00. The string entered is then copied to the display driver memory address, and
then the string entered is added to make sure that the memory address which requires
A point to note here, the structures start_ctors and end_ctors are external
And then they are introduced to the linker to be used and to be available everywhere
start_ctors and end_ctors and invokes the constructor calls. That is one important
prerequisite to interact with the kernelMain function. This application will use both
memory-mapped I/O and I/O using ports. For the communication with the display
framebuffer has 80 columns and 25 rows, and they are numbered from 0, so they are
labeled 0-79 for columns and 0-24 for rows. As mentioned above framebuffer works
on the principle of memory-mapped I/O, and if the value 0x410F is written to address
25
0x000B8000, the display wil output a while letter A on the black background.
framebuffer the system needs some more commonly-used global functions, to read
and write for the I/O bus (ports will later be mentioned).
5.2. ENVIRONMENT
At this stage a working, albeit a very rudimental operating system exists in the
memory. A problem of running the environment appears. As it stands now the only
way to properly run the system is to reboot the machine and search for the operating
system from the BIOS table. That is tedious and time-consuming, and also extremely
difficult on the VirtualBox (the environment LinuxMint is run). To properly run the
system from the operating system environment the packages qEMU and xorriso are
26
virtualization and system emulation. Xorriso enables manipulation and management
To automatically add the ISO image the makefile is edited. The ISO is derived
from the mykernel.bin file, and the grub.cfg file is added manually to the new folder
where the ISO and the grub are copied. When communicating with hardware
necessarily signed short anymore. It stands for any signed 16 bit type, and it depends
on the platform. On some other platform the a signed 16 bit data type could be signed
int. All of this is done in case the system needs to be deployed to different platforms,
and different platforms can have different naming and initializing definitions. To
make sure that we properly create and rename basic types a new file is created named
sizes.h, to make sure that every data structure has the correct size and form and that
Memory segments are portions of address space, and they consist of the base
address and the limit. When a byte needs to be addressed in the memory, segments of
48-bits are used, 16 bits specify that exact segment and another 32-bits specify the
offset of the segment. Global Descriptor Tables and InterruptManager are segments.
If the memory is divided into segments, we could divide the memory as the kernel
memory, chunks of the kernel, probably some user programs, data of user programs
and similar. The data segments are memory parts the processor is not allowed to jump
into.
27
FIGURE 5 – SEGMENTATION
If a button on the keyboard is pressed the inputs from the keyboard go to the
keyboard driver which goes to the processor and causes an interrupt. If the program is
inside the user data, the CPU immediately stops, and moves the data to the kernel
memory. A problem that appears at this time is how to return the execution back to
the user processes. For that an interrupt descriptor table needs to be set for different
kinds of interrupts and to which memory segment it points to. How does an interrupt
know how much data to send, and how much data to receive if the environment itself
doesn‘t have that kind of information. It needs to be written in form of the segments,
by first creating a global descriptor table so the application knows what parts of
28
The Global Descriptor Table is an array of descriptors that gives information
about segments, basically defining them. It contains the starting point of the segment,
length of the segment and segment flags such as what kind of segment it is, what kind
of access rights does it hold, what privileges does it have, what direction does it have.
The entries are 8 bytes long, divided as shown in the image, assigning 16 bits to the
limit, 24 bits to the base pointer, 8 bits to the access rights, another byte is divided
between the flags and four more bits for the limit. The last 8 bits are given to the
pointer again. Basically the limit is the address to which the interrupt points, and the
base pointer is the pointer itself. They are divided into an 8 byte segment together
The Global Descriptor Table defines basic privileges for certain parts of
memory. Another important task that the GDT does is describing if a memory section
executable segment. The GDT is an array of 8-byte segment descriptors. The GDT
will be created by writing the gdt.h and the gdt.cpp files. The gdt.h file is the header
file for the Global Descriptor Table. The entries of the GDT are defined in it using the
custom-made data structures from sizes.h file. Another couple of data structures are
created including the constructor for the Global Descriptor Table, the Limit() structure
and the Base() structure. The items that should always be part of a GDT are the null
descriptor, the code segment (mentioned above) and the data segment (also mentioned
29
above). The most important fields for the programmer are the Type Field and the DPL
(Descriptor Privilege Level), and both of them are found inside the access byte.
they are given 64 MBs of memory and the access parameter changes depending on the
kind of segment (0x9A for code segments and 0x92 for data segments). The only
thing that changes is the access byte itself. If we compare the access bytes, we can see
that the flags that are changed mostly descriptor privilege flags, ranging from 3 for
user code and 0 for kernel code. Inside the constructor definition for the Global
Description table there is a cryptic inline assembly code which seem complex, but it is
a volatile command (which says that the compiler is not allowed to move the memory
and also preventing deletion of the memory block in case the values are unused)
calling the assembly function lgdt (load global descriptor table) into the memory
location ―p‖ which is the value created to store the GDT. Some more functionalities
are added such as the two function that just return the addresses of
30
Because the bytes are spread all around the Global Descriptor Table the only
way to properly assign them is to create an array of bytes and explicitly add values.
To properly create the segments every value needs to be checked, and inside the
example, if the limit value is no higher than 16-bits, the segment structure, will push
the hexadecimal value of 0x40 (which is 40) that makes the declaration clean. But,
most of the time problems may arise, because the limit has 20 bits in total, and there
doesn‘t exist a data structure that holds 20 bits, and to make things worse the limit is
divided in two parts (it was only 16 bits long in the first versions but because of
constant need for more memory it was increased). That problem was inconveniently
solved by giving the limit segment special powers. Basically, the limit can be
assigned to a 32bit data structure, but the 12bits, supposed to be the ‗trash‘ need all to
be ones so they can be cut off. Any other solution, such as considering these bits
empty, or making two or more structures to store the segment would be even more
nasty because devastating things such as memory overlapping or memory spilling can
happen. The first two bytes of the limit segment are set by the two commands:
The rest of the memory is supposed to go to the half-byte, and to the 6th index
of the segment array. But in the code before 0x40 is already declared in the 6th index,
so the only way to add the limit data to the segment array is to bit manipulate the byte
so the data gets saved only in the 4 lowest bits of the index.
31
After the encoding of the limit is finished, the base pointer needs to be
encoded. It is done by the four commands storing the base value in the 2nd, 3rd , 4th
and 7th index of the array. After this, almost all segments are properly declared. The
only one left is the access byte. With writing of the information to the access byte the
creation of the memory segments is finished. Now the entries can one-way
communicate with the CPU and send information about the segment structure. Now
the only thing left to write is the reverse process, the CPU being able to see the entries
of the segment, especially the base pointer and the limit entry. After writing the
Global Descriptor Table, the header needs to be included to the kernel.cpp file, and
The ultimate goal of the project accompanying the thesis is to enable mouse
and keyboard support for the terminal, creating a basic working environment for
future possible upgrades of the system. How does a keyboard work? When a key on
the keyboard is pressed a signal goes to the PIC (Programmable Interrupt Controller),
and the PIC will in most cases ignore the signal. If the engineer wants the system to
arrive, he needs to reprogram the PIC so that it doesn‘t ignore signals anymore. Apart
from that, to enable hardware support there needs to exist a mechanism to receive and
send signals, and technically most of the times that is done in the CPU with a
multiplexer (because this is more hardware oriented, and my thesis is mostly about
development of software, I won‘t take much time explaining how this work, also it is
32
not necessary to know how the multiplexer works in order to understand the code for
By connecting the hardware ports, the road toward writing the interrupts will
be easier, and with interrupts the basis of the ‗system shell‘ is created, thus, adding
even more advanced concepts than the keyboard and the mouse will be easily
manageable. There exists an assembly command which will be crucial for the
communication between the data and the hardware ports, and that is the outb
command. The commands OUT and IN are instructions for transferring data between
ports, and the b at the end signals that the byte are being transferred. For the port
communication to work a new file is created, named port.cpp which is more object-
oriented (than putting assembly commands straight into code). In the port.cpp file the
port class exists together with classes that inherit from it including the classes for
ports that send and receive 8 bits, 16 bits or 32 bits. The complex instruction __asm__
volatile("outb %0, %1" : : "a" (data), "Nd" (portnumber)); just reads and writes
variables from the assembler, and the colons are demitting the operands, in this case
after the assembly code two colons appear, meaning that the output operands are
skipped, and that only the input operands are read. Although it seem a bit cluttered the
inline assembly instructions are a nice addition because they enable the user to
increase performance and access assembly commands that don‘t exists in C or C++
without the need of writing an assembly file just for that cause. Depending on the size
of the input elements, we might change the outb command to outw (for a word or a 16
bit struture) or outl (for long or a 32bit structure). There structures are connecting
33
First step towards printing text on console window:
The function printf() has a issue. If the user wants to print another line of text
below the first line it wouldn‘t be possible, and the console output will continuously
print out only the first line, because it only points to the beginning of the video
memory, and every time the printf is rerun, it just starts printing out data from the
beginning of the video memory (at this point in time, the kernel has no notion of rows,
columns, breakpoints etc). To fix this small problem, height and width variables for
the screen window need to be created. Screen is 80 characters wide and 25 character
high, and to compute the memory locations the height and width variables need to be
manipulated.
5.6. INTERRUPTS
Interrupts are the first real step towards having a conversation with the
hardware. If the key on a keyboard is pressed, the signal travels to the PIC
interrupt information back to the CPU. For the interrupts to work a table similar to
the Global Descriptor Table needs to be built, called the Interrupt Descriptor Table,
consisting of the interrupt number, the pointer to the handler, flags, the memory
segment and the access right. Without the table telling the CPU what kind of
interrupts it encounters, the computer would most probably reboot every time an
interrupt occurs. So the IDT is responsible for telling what kind of interrupts occur,
and there exist a timer in the PIC which tells the CPU how long does an interrupt last.
34
For the interrupts to be implemented and added to the project three files are
needed, and we will call them interrupts.s (an assembly file), idt.h (Interrupt
Descriptor Table, the header file) and idt.cpp (Interrupt Descriptor Table the cpp file).
The idt.h file consist of a static function that handles interrupts with the parameters
being the interrupt number (a label for every interrupt), and the stack file. The
Now, after the idt.cpp file is compiled, the command nm checks for the
symbols inside the object file. After the symbol is found, and the symbol is a C++
representation of the function readable by the assembly, it will be the external source
for the assembly file to properly manipulate the code. Inside the assembly file a call to
the function inside the .cpp file is made, and the stack pointer and the interrupt
number are being pushed. Before the call is made, all important registers are pushed
to the stack (in case they get overwritten) and after the call all registers are popped
from the stack. After that the assembly file gives the control to the CPU again. The
to the PIC, and it has the built in values – 0x00 for the timer, 0x01 for the keyboard,
and others for different kind of interupts. Two macros are written, one for the
InterruptRequests and another one for the Exceptions. Interrupts, just like the globals
have their own descriptor table named Interrupt Descriptor Table. The entries are
5.6.1. KEYBOARD
As mentioned before the PIC is connected to the keyboard, mouse, and the
timer, and now after writing the code for the Interrupt Manager the CPU has
established the connection to the PIC. The only problem now is that the CPU has no
35
answer to any input from the PIC, and as soon as the interrupt arrives to the CPU,
CPU acknowledges the existence of the interrupt, but nothing happens after that. So, a
way to handle interrupts needs to be written. In addition, even when a code that
handles interrupts is written the keyboard still does not generate ASCII code, but it
generates interrupt codes. So for the keyboard to work as intentioned a method inside
the interrupt file will be written to handle interrupts and a mechanism for the system
to translate the interrupt codes from the keyboard into the ASCII codes. The keyboard
controller needs to control the keyboard auxiliary port, turn on or off the A20 gate,
and if the system needs to be improved it could use the PIT to control the speakers or
Inside the project now exists a working Interrupt Manager together with an
interrupt handler for the timer and they are coupled inside a C++ environment with
the main method, the Global Descriptor scan codes and the Interrupt scan codes, and
as an interrupt appears, he kicks the execution of the program outside the C++ bubble
and inside the assembly bubble thus losing the connection with the kernel. Inside the
assembly code, another ‗bubble‘ exists and is able to return the execution back to the
C++, and that execution context is found inside a static environment inside the C++
bubble, but for the interrupt to respond the execution needs to be set back to the
Interrupt construct, and for that constructs of the interrupts called data port and
command port are used. The keyboard responds to commands that are each one byte
in size, and some of them have as special meaning. The 0xF4 port command enables
sending the scanned codes from the keyboard back to the interrupt handler. There are
two auxiliary ports, one of them will be used for the keyboard and another one for the
36
mouse (or any other pointing device). Another two important ports are 0x64 for
reading the status register and sending keyboard controller commands, and 0x60,
which serves as an input/output buffer. Some other port commands are F5, which
deactivates the keyboard and sets default values, 0xF6 set default values, 0xFF which
resets the keyboards and does the self tests and others.
object files and the kernel a new method is initialized that will move the execution
point back into the C++ environment called DoHandleInterrupt(), and it simply calls
an instance of an currently ‗active interrupt‘ (and there can be only one of those). The
instance of the active interrupt is the segment that will control the interrupt cycles
inside the system. If the program is now run, and combined with the printf function
inside the DoHandleInterrupt method, it can display what kind of interrupt has
The keyboard service consists of two ports, a data port and a command port,
the constructor for the keyboard that takes a interrupt pointer as a parameter, a
destructor, and the handle interrupt method. The parameters for the constructor are a
call to the interrupt handler with the command 0x21 (the keyboard call), the data port
A key being pressed on the keyboard now, will trigger the CPU into
responding by sending the kernel the keyboard input, the kernel will still stop its
execution, it wouldn‘t continue running, because the PIC wouldn‘t accept the CPU
input. The PIC demands from the CPU that it fetches the input (in this case keyboard
stroke) before it sends the signal that it is done. The keyboard driver class will be
37
derived from the InterruptManager and it will have its own port, and when the
interrupt arrives the handler will just read from the port.
5.6.2. MOUSE
Mouse works in the same way as the keyboard does, uses the same ports and
interface, and the same interrupt handler functions. In order to distinguish which
commands are for which device the fifth bit is changed in the status byte for the
mouse, or it isn‘t changed if the keyboard is used. The difference is that once the
mouse has been initialized the mouse sends 3 or 4 byte packets to report on mouse
movement and button press/release events. The proper way to code the mouse
packages. The system looks at the output of those bytes and acts accordingly.
38
Mouse has the same instance variables as the keyboard including the data port
and the command port, and has an additional array with the size three of unsigned 8
bit integers, the value of the array, and the buttons pressed. In addition to variables,
there is a mouse constructor, a mouse destructor and a handler similar to the one
keyboard has. The constructor works on a different interrupt number, but uses the
same data port and command port. Command Port writes commands to activate the
mouse and to reach the current PIC state. The handler checks for the command port
response and for data received, with a specialized ‗mouse bit‘, that is an additional bit,
with the value 0x20 indicates that this next byte came from the mouse.
To finish writing the code for the Mouse Driver the service for moving the
mouse based on the inputs from the ports given is written. Two variables that keep the
horizontal and vertical values are set to the values of the first and second index of the
buffer array.
39
FIGURE 8 - DISPLAYING THE MOUSE AND SETTING BORDERS ON THE WINDOW
40
6. CONCLUSION
The project is now finished and the system is a good starting point in the field
bootloader, and the bootloader is the only already existing resource in the project.
GRUB collects the machine code of the kernel, saves it in fast accessing memory,
and jumps to it. Any code can be put in the machine code but most C or C++
programs expect an OS. On the basis on that bootloader other components are
assembled including the kernel. Kernel does all the heavy lifting, initializes
functions, calls classes, and in more advanced systems provides disk services and
A lot of possible improvements could be built upon this, and this project is
writing possible upgrades is that new functionalities can be written by just adding
new .h and .cpp (or .s) files, by just stacking new services on top of an already
working system. Furthermore, a positive lesson learned while building this project
implementation and maintenance, and that was one of the reasons why C++ was
A really important and helpful thing could be the Graphics Interface, so that
the future versions can have some attractive and intuitive User Interfaces. Virtual
storage for possible important data. Network access is also a viable possibility.
41
Things that improve the readability and things that don‘t require new additions to
the code itself could be added such as having a nice file structure, moving the
headers into one folder, and executables into another, having a separable folder
Currently, this system has no marketable value, has no practical usage and no
applications can run on it because it is just an empty system shell with a couple of
drivers added inside. But the process of writing it was interesting and the feeling
of having the full control over writing the machine and its code is quite powerful.
Also, during the two months that the system has been developed, I learned a huge
possible level, how systems work, how processes are created, how assembly code
is written, how a loader, makefile or a grub file are written. In addition, it was a
42
fun task to do, and I believe that this is not the final version of the software and
that there is a huge amount of work which can be done to improve the system
even further.
43
REFERENCES
1. Dougvj. Why is C used as the main programming language for operating systems? [closed].
StackOverflow WebSite. [Online] 12 30, 2013.
https://stackoverflow.com/questions/20839352/why-is-c-used-as-the-main-programming-
language-for-operating-systems.
6. Silberschatz, Abraham. Operating System Concepts. s.l. : John Wiley & Sons. Inc, 2005, pp.
3-22.
7. Erik Helin, Adam Helberg. The Little Book About OS Development. 2015.
11. Corbet, Jonathan. Linux Device Drivers, 3rd Edition. s.l. : O'Reilly, 2005. 0-596-00590-3.
12. Bovet, Daniel. Understanding the Linux Kernel. s.l. : O'Reilly Media, 2008.
44