You are on page 1of 32

Chapter 1 - Introduction to Linux Operating Systems

Introduction

This chapter provides an introduction to Linux systems. We start with a general description of
the components involved in computing and than focus on Linux powered operating systems.
From the large numbers of Linux distributions available, we use Community Enterprise
Operating System (CentOS), because it is very similar to the well-known Red Hat Enterprise
Linux. Therefore, this document presents the install procedure and the file system structure of
CentOS 6.3 . Last topic covered by this chapter refers to runlevels states of the computing
machine.

Hardware and Software

Hardware refers to the physical components that are part of a computer or other computing
machine (routers, switches, etc.) . Hardware is usually a collection of electronic components
hosted in a case or a chassis. Most systems follow a classic architecture with a standard set of
parts that work together through interfaces.

Motherboard

The motherboard is the heart of the computer. It provides the socket for the Central
Processing Unit (CPU), defines the maximum capacity of RAM, the interfaces available for
plugging in other cards (PCI, PCI-X, AGP, PCIe, etc), the interfaces for storage devices
(NVMe, SATA, SAS, ATA), provides chipsets for managing peripherals (southbridge) and
interconnects CPU, memory and graphic card through northbridge chipset. Moreover, the
motherboard can integrate several other components like Ethernet LAN and RAID
controllers.

Note that there is a huge difference between the motherboards used for desktops and the
motherboards used for servers. Some features available for servers are listed bellow:

hot swap components: memory dimms, hard drives or even CPUs could be replaced
while the system is running
support for multiple CPUs
support for advanced RAID controllers
multiple integrated LAN interfaces with advanced features
special design for a better cooling effect
support for high performance interfaces: SAS, PCI Express 8x/16x
integrated monitoring interface

Central Processing Unit

The CPU or the processor of the computer is considered by many guys the most important
part of the computer. The CPU is the component that fetches and executes instructions from
the main memory and it can be considered the brain (or the workhorse) of the system. Some
of the most important features of a CPU are described below:
architecture: the x86 architecture can use 32 bits registers (reffered as x86) or 64 bits registers
(reffered as x86_64). However, there are other available architecture on the market: ARM
architecture, Power PC architecture, etc.

number of cores: most CPUs include several execution units (cores) inside a single device

frequency: specified in hertz, in a simplified way describes how many operations a CPU can
perform in a given interval (second, for example)

cache memory: this is a very fast memory that can store frequently accessed data and
instructions. A larger cache memory can provide a serious boost of performance.

integrated technologies: most CPUs include special units for a set of purposes. For example,
newer CPUs include support for hardware virtualization technologies (Intel VT-x, AMD-V)
which provide better performance when using virtual machines.

Note: it is possible to extract information about the CPUs installed in a Linux machine by
inspecting the /proc/cpuinfo file.

Main Memory Random Access Memory

The Random Access Memory (yes, a volatile memory whose content is lost when the system
is powered down or rebooted) is the component that stores instructions and data accessed by
the CPU, as instructed by the operating system. Processes running in the OS require their own
memory space, thus it is important to consider that the capacity of this component influences
the processing capacity of the computer. Note that operating systems don't address the real
memory (physical memory) space. They use a virtual memory mechanism that can be mapped
either in the real memory or on a storage device. The extension of the memory on the storage
device is called swap. Let's present some of the characteristics of the Random Access
Memory:

capacity: measured in Gigabytes and is the sum of DIMMs (memory modules) installed in the
computer

generation: can be DDR1, DDR2, DDR3, DDR4. Newer generation offer better performances

peak transfer rate: usually expressed as PC-XXXX value. This describes how many
Megabytes a memory unit can transfer in a second. (eg. PC-3200 memories can transfer 3.2
GB of data in a second, with the highes performance DDR3, the PC3-10666, memories being
able to transfer up to 10.666 GB/s). However, the peak transfer rate is never achieved, it is
just a theoretical value.

Graphical Card

The graphical card represents a system specialized for executing video and graphical
processing instructions. By using such a device, the system can offload the this type of
processing from the main general purpose CPU to a specialized processor the GPU,
Graphical Processing Unit. The graphical card contains its own processing unit and memory
and it is connected to the motherboard through a large bandwidth interface: today the PCI
Express 16x is the mostly recommended interface, while AGP (Accelerated Graphics Port)
was used few years ago.

The main characteristics of the graphical card are presented below:

GPU and memory frequency


memory capacity
graphical card interface

Note that powerful graphical system can combine two or more graphical cards for improved
graphic performance through dedicated interfaces like Nvidia SLI or ATI Crossfire. Also,
newer designs can integrate the GPU right into the Central Processing Unit.

Storage Devices

We already discussed about the one type of memory: the RAM. However RAM is volatile,
therefore we need a way to store the data and allow it to safely survive a reboot. This is why
computers need persistent storage. Devices that can offer this type of storage could be internal
and usually non-removable (Hard Disk Drives, Solid State Devices), external removable
devices (USB external drives) or network based (remote shares exported through a protocol,
IP based storage area networks). Servers can connect to specialized storage devices like Fiber
Channel drives. Let's present some of the characteristics of the storage devices:

capacity

interface: examples are SATA, SAS, SCSI, IDE. While IDE ans SCSI are obsolete, SATA
and SAS are mostly used standards. SATA is recommended for desktops, while SAS is
recommended for enterprise storage, where disks are combined in RAID arrays.
rotational speed (not applicable for SSDs): SATA drives' rotating platters spin at 5400, 7200
or 10000 RPMs, while SAS drives spins platters at 10000 or 15000 RPMs. This grants SAS
drives faster average sector access.

data transfer speed: the speed at which data can be transferred to or from the drive. At this
point, the SSDs perform best. The lack of mechanical movement offers the best performances
for those drives based on NAND memory.

Network interfaces

Computers don't work alone anymore. Especially the Linux systems which use, by default,
Internet connectivity to install new software. Therefore, network interfaces became more and
more important with the growth of networking technologies. Most motherboards offer an
integrated network interface, with the possibility to add additional NICs (network interface
cards) using the available PCI/PCI-X ports. The motherboards designed for server use offer at
least two integrated Gigabit cards, with the option to add other cards using high performance
PCI Express 8x interface. Important features of a networking interface are:

technology: Ethernet, Wireless Ethernet, DSL, etc.

bandwidth: 10Mpbs, 100Mbps, 1 Gbps, 10 Gbps, etc.

auto MDI-X: allow the use of either copper crossover or copper straight through cable

port type: fiber or copper cable support

The components described here are among the most important hardware devices used by a
computer. However, there are a lot of additional devices:

power source: transform AC to DC and powers the system through connectors

RAID controllers: often installed in servers, combine multiple hard disk drives or SSDs in
RAID arrays

case or chassis: defines the type of motherboard that can be used, the power sources that can
be installed and the cooling system.

cooling system: keeps the system at an acceptable temperature. Can be a set of fans, water
based, etc.

Now that we have described briefly the hardware components, it is the time to start talking
about the software.

Device Firmware

The very first software component is the firmware that is stored in each component. Each
component has its own software written in a programmable memory. For example, the SSH
have their own controller firmware that can be updated when the vendor releases a new
version. This firmware provides an interface for the rest of the components or for the
Operating System and describes the set of operations that the component can perform.
BIOS/UEFI

The BIOS (Basic Input Output System) provided a very low level interface between the OS
and the hardware components. But that was only one of the features. Other features were:

test the installed components at startup through the POST (power on self test) procedure and
report errors through a standard code (usually beeps)

offer a interface to perform low level configuration

execute the first steps of the boot process and load the bootloader

In 2005, the BIOS, as it was defined by IBM was replaced with the UEFI specification
(Unified Extensible Firmware Interface), which provides improved flexibility, but still offers
all the services provided by BIOS.

Operating System

The operating system is a huge software component. It manages the available resources
(hardware components, storage space, processing power, memory, etc.), schedules and
controls processes and offers a supporting interface for other applications. The operating
systems provide a lot of additional services: security services (firewall, application jailing),
networking services (routing), accounting, etc.

The main component of the OS is the kernel. It is responsible for resource and process
management and it implements support for other services. It is important to remark one detail
here: from the development perspective, a programming error in the kernel can turn into a
blue screen of death. Here is an example of a kernel panic output which leaves the system
unusable:
The drivers are another important component that integrate into the kernel. The drivers are
software components that allow the operating system and applications to efficiently use the
hardware available. The drivers for physical components are written especially for the
controllers used on that board, so they are hardware dependent code.

Drivers can be a real pain with Linux. When you buy a new component and plan to use it in a
Linux powered system, you should check if that component is supported by the OS. A quick
search on the web would settle this. The big problem is that drivers (aka modules) are closely
associated with the kernel version and the developer should be aware of any changes in the
kernel. This is why vendors decide either to ignore Linux support, or they release a driver
version and than leave the maintenance to the development community. To make things
worse, if a driver contains an error, it turns into a kernel panic, as its code runs inside the
kernel space.

However, one of the most important features of the OS is the support and the abstraction layer
provided to the applications running on top of it. Linux kernel, for example, provides a system
call interface which simplifies a lot of the operations from the application development
perspective. We'll come back to those details later on.

Libraries

Libraries represent a set of functionalities that can be used and shared by several applications.
For example, think of the procedure that prints a string to the terminal. How many process do
this? A lot! Won't be a good idea to put that functionality in a binary file and share that among
all those processes that want to print a string to the terminal? Sure it would be. For example,
the print to terminal functionality is included in the standard C library (libc.so). For
inspecting the libraries used by an application it is possible to use ldd command.
Applications

Well, all the hardware and components exists to provide services and performance for
applications. Web servers, text editors, games, virtualization tools... all of them are
applications. The applications are mapped into processes that use libraries and the services
offered by the operating system.

Linux Architecture

A Linux powered system consists of a series of components. First of all, when talking about
Linux, it's the kernel. The Linux kernel sources are freely available on the Internet at the
following URL: https://www.kernel.org/. Apart from the kernel, a Linux system consists of
system applications and libraries tuned to work together.

It is very important to remark two aspects of Linux: it was designed to be multi-user and
multi-tasking. Multi-user means that a Linux system should support two or more concurrent
users. There is one root user on the system with full administrative access no matter what
restrictions or permissions are set. On the other side, there are the normal users. It is possible
to start a new session as root or a different user using su command.

Multi-tasking refers to the capability of the system to manage, schedule and run multiple
independent processes and to provide services for them.

There are two defined spaces: the user space and the kernel space. The kernel space is a
protected environment that cannot be accessed directly by any application. It has a different
memory space and when talking about scheduling, anything related to kernel is more
important than anything else.
On the other side, user space is reserved for applications. An error inside code running in user
space does not produce a panic, it justs influences the causing process. Data from userspace
can be directly accessed by the kernel, while the reverse is not true. The only way to get data
from the kernel is to request it through the specific system call and to let the kernel write it in
user space. The structure of the Linux architecture is presented in the following figure
(copyright to www.ibm.com).

Linux is about software, that is why it is not concerned about the hardware platform design.
The operating system provides architecture-dependent kernel code (yes, the kernel modules)
to interact with hardware. Those modules are inserted in the main kernel, which exposes a
system call interface to the applications running in userspace.

The applications running in userspace make use of the available libraries and place system
calls when they require access to functionalities managed by kernel. There are over 300
system calls in a Linux system related to process management, file management, device
management, information management and communication. For example, system calls are
OPEN, READ, WRITE operations when executed on a simple file. To analyze the system
calls executed by a process it is possible to use strace.
Linux Distributions

Ok, now we know that a Linux system uses the Linux kernel, a series of system applications
(for formatting, file system checking, process management, etc.) and a set of applications that
offer the desired services. But who can guarantee that all those applications will actually work
together? Well, here we can start talking about Linux distributions.

A Linux distribution is a collection of kernel, system applications and other software


applications tuned to work together. A distro can be targeted for a specific purpose (embedded
systems, real-time applications, etc.) or it can be a general purpose distro.

The sources download from kernel.org represent the so called vanilla version (original, not
customized). However, the developers that create and maintain distributions download it,
apply specific patches, compile the kernel and than ship it in the latest version of their distro.
Same things apply to the additional software shipped inside the distributions. As most of the
applications are open-source, it is possible to download project sources, modify them,
recompile and re-pack the software.

Well-known server distributions are: Red Hat Enterprise Linux, CentOS, Debian, Ubuntu,
Scientific Linux, etc. Examples of desktop distributions are: Ubuntu, Fedora, Open Suse,
Linux Mint, etc.

Server distributions rely on command line interface and are focused on delivering stable
software, while desktop offer advanced features and nice graphical user interfaces.
One more thing must be discussed here. It is a known fact that Linux-related stuff is open-
source. But if you look at distributions like Red Hat, you will notice that there is a cost that
should be paid to use the software on a machine. How can be that explained?

Well, it is pretty simple. If the software is open-source, it means that you can get the sources
for free. But who guarantees that those sources will actually work for you? Red Hat is a
corporation specialized on integrating existing open-source software, as well as developing
new software in new releases of the Red Hat Enterprise Linux distribution. When you pay a
subscription at Red Hat, you don't pay for the software. Instead, you pay for the support
offered: whenever something goes wrong with your RHEL system, you can call them and ask
for a solution. And they must deliver it. You also pay for the updates that you get from their
servers and for the personnel maintaining them. But be prepared: the subscription will cost
more than a Windows 7 license.

We need to discuss about licensing also. BSD Licensing offers a lot of freedom to developers
that create software based on BSD projects. You can modify the original software as you
want, with the single mention that the name of the original developers must be listed in the
documentation. However, licenses like GPL (General Public License) offer public access to
sources, but require that any modifications performed in the original sources will be made
public. Most of the open-source projects are licensed as GPL. This is why even if Red Hat
pays employees to customize such projects, they must make publicly available their
optimizations also. CentOS and Scientific Linux are distributions that are based on the
sources released by Red Hat, thus being called binary compatible distributions. The
similarities between RHEL, CentOS and Scientific Linux are very big. This is why if you
want to learn RHEL, but don't afford a subscription, you can safely try CentOS.

CentOS 6.x install

Installing the CentOS is simple, as you will see. First of all, you need to download one ISO
that would fit your needs. It is possible to download a minimal version, or the full featured
version. The difference between them is that the minimal version allows you to install only
the Minimal software selection, while using the full featured DVD set allows you to make
any software selection. Note that for a Desktop or a Minimal Desktop, the first DVD is
enough. You can download the DVDs for X86_64 architecture from this website:
http://isoredirect.centos.org/centos/6/isos/x86_64/

After you download the ISO from the website, burn it on a DVD and prepare for installing.
However, before performing the install procedure, check if you computer fits the minimal
system requirements:

1 Ghz CPU Frequency


512 MB RAM
5 GB disk space
Graphics card and monitor capable of rendering 1024x768

From my personal experience, I managed to boot Linux system with 500 Mhz CPU and 1 GB
disk space, but for the moment accept the previously described one.

Next step is to instruct your BIOS to boot from the newly burnt DVD. This will lead to the
following screen.
Just choose the first option to perform a new fresh install.
Click next to enter the install process.
Choose the install language.
Choose the keyboard layout.
If your does not contain specialized storage devices (SAN, advanced hardware RAID
controllers) select the Basic Storage Devices options.
Confirm discarding the data from the device that will be used for install.
Define a hostname for the system. The current example uses centosvm.example.com.

Configure the timezone. This setting is very important for logging and audit purposes.
Set the password for the administrative user (root).
Select the Create Custom Layout to perform advanced partitioning.
Next screen describes the partitioning selected.
First of all, the name of the drive is vda (/dev/vda) because it is a virtual drive used in a virtual
machine. On a real system you can fine a sda (/dev/sda) drive. In fact you can fine multiple
drives in a system which will be named considering the interface (sd for SATA, vd for VirtIO,
etc) and the order of the drive (first is a - vda, second is b - vdb, etc.). The partitions contained
inside a drive are numbered (vda1,vda2, etc.).
On a standard storage device it is possible to define no more than 4 partitions in the standard
zone reserved for this purpose. However, modern systems require more than 4 partitions. Let's
present the partition types used.

primary partitions: are defined in the MBR the zone dedicated for partition
definition. Can be at most 4 primary partitions. In our example vda1, vda2, vda3 are
primary partitions
extended partitions: are defined in the MBR, but only represent a container for logical
partitions. Cannot be formatted and used for data storage. Example: vda4
logical partitions: defined inside an extended partition and can be used as simple
standard partitions. Example: vda5, vda6.

If you look in the type column, three partitions are marked with ext4. Those partitions are
used for data storage and have a mount point configured. Check the next chapter for more
details about mount points and mounting. Ext4 stands for the file system used for
formatting.

The partition mounted in /boot will store the kernel image and bootloader files. It is
recommended to allocate a separate partition for /boot and it must be formatted with ext3/4
(RHEL 6) or ext3 (RHEL 5).

The /home partition will be used for storing users' files and we can also apply special
mounting options to control storage restrictions. The root partition, mounted on / will store the
rest of the file system structure.

The swap partition (marked with swap) is used as a memory extension. RAM content that is
not frequently accessed can be written to disk on this swap partition and later loaded back in
RAM. There is one legacy rule regarding the size of the swap partition that states that the size
of swap should be twice the amount of RAM. For a system with 1024 MB of RAM (1GB of
RAM) the swap should be about 2048 MB of RAM. However, with modern systems that have
large amount of memory the rule should be re-analyzed. The recommendations are presented
below:

1. Systems with 4GB of ram or less require a minimum of 2GB of swap space

2. Systems with 4GB to 16GB of ram require a minimum of 4GB of swap space

3. Systems with 16GB to 64GB of ram require a minimum of 8GB of swap space

4. Systems with 64GB to 256GB of ram require a minimum of 16GB of swap space

One more thing to remark: the LVM Physical volume partition was defined for later use when
configuring LVMs.

After defining the partitioning, the Anaconda installer will configure the bootloader. If the
installer detects other operating systems, they will be listed here. Just select to install
bootloader and the default boot target.
Select the Minimal Desktop software selection and click Next.
Now packages will be extracted and installed.
A successful install will result in the following screen.

After the reboot, the First boot wizard will be displayed. Here you can configure a system
user to avoid working directly with the root administrator for safety purposes. First boot is
not displayed if the graphical user interface mode is not installed.
After completing first boot (a restart might be requested), you will be able to login using the
Graphical User Interface (GUI). This concludes the install process.
File system structure

In a Windows system, the file system structure is defined by the partitions defined. Therefore
we have C:, D:, E: drives and so on. On these drives, the user can define any directory
structure, excepting C:, where there are some canonical folders that should exist (Program
Files, Windows, etc.). At this point, we can consider that each partition can store a separate
tree of folders and the structure of those trees is customizable.
On the other hand, in Linux things are a bit different. There is a canonical file structure,
named FHS https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard, with folders that
contain specific files. First of all, there is the root the '/'. Inside the root, there is a list of
standard folders with specific contents. But where are the partitions? Well, remember the
mount points we defined during the install process? Each partition used for data storage has a
mount point associated and that means that everything stored inside the mount point directory
is stored on that partition. Check the following figure for understanding the concept of
mounting.
As we can see in the figure, the /dev/sda2 partition was mounted on /, /dev/sda1 on /boot,
/dev/sda3 on /home and /dev/sda4 on /var. The idea is simple: if a file is written in
/home/student folder, it is actually stored on /dev/sda3 partition. Same applies for /var and
/boot folders. However, if anything is added in /etc or /usr, the /dev/sda2 is used.
Note that /boot, /home, /etc, /usr and /var, along with some other folders are standard folders,
part of the canonical Linux file system structure. Let's present some details about the standard
folders from the root of a Linux system.

/bin contains general purpose binaries used by the users of the system
/boot contains the kernel images and the bootloader files and configuration
/dev folder containing virtual file entries that represent the devices from the
computer
/etc contains configurations for most of the services
/home by default, this folder stores home directories for system users. Each user
should have a home directory where he can store his personal content and create files
and folders with more freedom.
/lib and /lib64 those folders contain libraries and kernel modules
/media and /mnt are empty folders by default and can be used as mount points for
DVD or network storage
/opt stores files installed by third party applications
/proc contains virtual files and folders that represent an interface with the kernel.
The contents inside this folder are generated on-the-fly, on user request. Here we can
find information about kernel settings and process accounting
/root is the home directory of the administrator, the root
/sbin system binaries used for administration are stored here. Usually, these binaries
are executed with root privileges
/tmp temporary files are written here. On some distributions, /tmp is stored in RAM.
However, in CentOS the content of /tmp is erased by a cyclic scheduled job
/usr this folder contains binaries and libraries for binaries that should be used by
normal users of the system, without special privileges
/var this is the file for storing variable content. /var will store databases, log files,
virtual machine drives, etc.

Runlevels

Runlevels are a classic Linux concept and represents states of the system. Services deployed
on the machine can be configured to start or to be stopped on specific runlevels. The default
runlevel entered after the boot process is configured in /etc/inittab file. Each runlevel is
associated with a number. Let's describe the 7 states:

0 shut down. A system entering 0 runlvel is starting the shut down process
1 single user mode. The system boots and provides automatically root access (In
fact, there is no log in process). This runlevel is used for recovery procedures (root
password recovery, for example)
2 multi-user mode without networking services
3 full multi-user mode with command line interface. This is the standard runlevel for
servers
4 not defined
5 full multi-user mode with graphical user interface
6 reboot. Entering runlevel 6 means reboot

If you want to configure the default initializing system runlevel edit the /etc/inittab file and
modify the number in the last line with the one associated with the desired runlevel. Than just
reboot and the machine will enter the configured runlevel.

Is is possible to switch runlevels without rebooting. The init <runlevel> command can be used
for that purpose. Note that the runlevel command displays two values: the previous and the
current runlevel.