You are on page 1of 10

2017

Principles of Unix : A Report


A BRIEF INTRODUCTION TO UNIX SYSTEM
AMIT GUPTA AXG162930
Table of Contents
PRINCIPLES OF UNIX : A REPORT 1

INTRODUCTION 3

THE EVOLUTION OF THE UNIX TIME-SHARING SYSTEM 4

FILE INPUT OUTPUT 5

FORK 6

SHELL 7

THREADS 8

OUT OF BAND DATA | NONBLOCKING AND ASYNCHRONOUS I/O 9

SUMMARY 10

2
INTRODUCTION
The report Principles of Unix introduces the concepts of Unix Systems and its intricacies. It starts with the early
days of Unix systems and the days before it came into existence. It talks about the evolution of such systems. The
report tells a story of how a system built for the use in businesses became a household possession. It talks about the
problems it faced in performing some of the most common tasks and how the developers improved it to remove its
inefficiencies and adding functionalities that made it user friendly. It talks about the architecture of the system. The
procedures and subroutines added during the later years are also discussed.

3
The Evolution of the Unix Time-sharing System

Multics project failing to deliver any sort of usable system showed that the failure of an organizational computer
product was necessary for the separation of computing services and computing research. Had it not happened the
computer research would still be controlled by the organizations and patents controlled by them and not available as
open source for researchers everywhere. The group wanted to keep the technical advances that they had achieved in
multics project to be used to increase user base and build community around it as there were no alternatives to multics.
It had achieved the ability of multi-user, time-sharing, remote access and was faster than the keypunching systems
that prevented close communication. Its usually a good software development practice to keep legacy software
running and have backwards compatibility. The money constraints and the pressure from the management and the
clients along with fear of not making a big impact and ending up with a similar failure as multics, caused the team of
Thompson, R.H.Canaday and Richie to design the files system of todays unix. The game of space travel proved to be
a milestone and paved the way for the new technology of preparing programs for PDP-7, introduced user level utilities
such as copy, print, delete, and edit files which was a major step up from the tape and paper. The name Unix can be
seen as a jab at the old failed system of Multics that sounds like Multix

PDP-7 was pretty identical to what unix is today and had the provision of i-lists that contained the meta data about
the files and the directories that are similar to that of todays unix. In my opinion the operating systems simplicity
was well thought of. It increased the speed, which was one of the major issues in the punch card based systems. Even
though the newly developed unix system clearly lacked the features of multiprogramming and overlapping IO it made
up for that in speed. The unix despite its limitations was way ahead of its times in providing options and scope for
future developments and additions to take place such as > < IO redirection operators. The limitation of the original
unix system of one process of terminal was frustrating and to not be able to define directory paths was one of the
major reasons. But the design of the system allowed addition of functions like fork exec etc.to be really simple.

The Later version of Unix for PDP - 11 improved its capabilities immensely and added major features like that of
Pipes was appreciated all around due to its inter process communication capabilities. Various efforts were made for
Unix for PDP - 11 to come up with a programming language as an attempt to move beyond the assembly language
that was used in PDP 7. The fortran was the first choice of development which was soon replaced by B which paved
the way for C. The Series of events that took place were essential and the limitation that were felt by the original
creators and the community involving the developers and users helped in significantly improving the system in a short
amount of time. The unix system took its final form quite early in its formative years and is pretty much close and
relevant to what it was back then.

4
File Input Output
Unix systems provide mainly 5 functions for read and write or Input and Output. They are open, read, write, lseek and
close. The read write functions described in this chapter are unbuffered I/O as they invoke system calls in the kernel.
The chapter introduces us to the concept of file descriptors that the kernel refers to keep track of opened files. There
are two kinds of functions available for opening a file. They can be used with a number of arguments which mainly
specify the mode for opening a file i.e. open them for reading, writing or both. The lowest numbered file descriptors
are beneficial as they can be easily used by the other applications to open new file on standard input and output.

Apart from open functions, some of the other useful functions are creat, close and lseek that are used to create a new
file, close files and record the current offset of the file respectively. Two of the most important functions are read and
write that return the number of bytes and the value of nbytes argument respectively. The efficiency is another important
issue in read and write operations. It is usually prevented by a mechanism in which, on detection of sequential reads
more data is read and is being requested by other applications.

The Unix system provides an amazing system of open file sharing which involves various different processes. The
chapter also introduces the data structure called the v-node that stores all the metadata of the opened file and makes it
easy for processes to use that information. The concept of atomic operations is introduced that pertain to only a single
process that helps in preventing confusing and a single file is being read and written upon.

5
Fork
The Unix Systems allow for the creation of a new process using the existing one using a c function called Fork. The
new process created is called the child process. The function when called returns twice. The return value of child being
0 and the child process id is the value returned to the parent so that it can have more than one child. If the child value
was something other than 0 it would be mean the child can have more than one parent and would not be a good idea.
It would terribly complicate the whole forking process. Getppid can always be called by child to know the parents
process id. The idea of child being a copy of the parent and having same data space, heap and stack allows the child
to perform the similar tasks as the parent. The modern implementations allow the parent and child to share the same
data, stack and heap which causes better memory utilization and lesser switching.

The file sharing has been made really efficient and simple for the parent and child processes because of the file
descriptors that open in the parent are duplicated in child without even using the dup function. The concept of parent
and child sharing the same file offset means when the parents standard output is redirected the child is able to write
to the standard output while the parent waits for the child to finish the process.

The fork is not failure proof and presence of way too many processes in the system and the number of processes for a
particular user exceeding the systems limit are two main reasons for fork to fail. Apart from the main use of making
child processes, two other uses of fork include parent and child executing different sections of the code at the same
time; and child, in the case of shells, using an exec right after it returns from the fork .

6
Shell
In this essay well talk about how the shells execute programs and their relationship with process groups, controlling
terminals, and sessions. When the shell doesnt support job control, following things happen: it shows foreground
process group 949. The background job is not put into its own process group and the controlling terminal isnt taken
away from the background job. When shell supports the job control, the background job is put in a background process
group and generates a SIGTTIN signal if background job tries to read from its controlling terminal.

Child is the last process in the pipeline and similarly all previous process in pipeline being children of the last process.
If last process in the pipeline turns out to be a child of the login shell, shell is notified upon process termination. The
importance and improvement in Bourne Again Shell is mentioned where it puts in the foreground process group the
foreground job or ps. Our login becomes the foreground process group while ps exits.

Under FreeBSD 8.0, Linux 3.2.0, and Mac OS X 10.6.8, the ps command, we can print the exact information. A
process does not have terminal process control group. A process belongs to a
process group, and the process group belongs to a session. The session may or may not have a
controlling terminal. If the session does have a controlling terminal, then the terminal device knows the process group
ID of the foreground process. This value can be set in terminal driver with tcsetpgrp. Its a good idea for foreground
process group id to be an attribute of terminal and not the process. This value is the TPGID value printed when ps
command is used. If it finds the absence of controlling terminal, ps prints either 0 or -1, depending on the platform.

7
Threads
To overcome the limitation of sharing between the process, a lightweight form of process called a thread was
introduced. They were introduced as an instrument to help processes share resources, implement synchronization to
maintain consistency. Threads are basically processes that perform one function at a time. They contain information
such as thread ID, register, stack, signal mask and errorno. Etc. Thread ID is used to identify a thread and has to be
unique only within the context of a process unlike a process ID. Traditional Unix systems were based on the single
thread per process model which is similar to pthreads that start with a single process but can add more threads when
created using pthread_create function.

The thread termination process is equally important and needs to be done synchronously. This means before calling
the exit command all the threads must synchronize in order for all of them to stop processing simultaneously. The
unix systems also introduced the concept of mutexes which basically prevents different processes from using a single
resource by putting a lock on the resource when its in use. It prevents other process to make changes to the resource
while it is being updated by a particular resource. Another type of lock is a Reader Writer lock and has more than one
mode unlike mutex locks. When a resource is held in Writer mode the process it is held by is the only one that can
write. When a resource is held by a read lock it can be read by multiple threads. The unix system also introduce spin
locks and barriers. Spin locks are similar to mutex but they dont block a process by sleeping but blocked by busy
waiting until the acquiring of the locks. Barriers help in synchronization by making threads to wait so that other threads
can reach the same point.

8
Out of Band Data | Nonblocking and Asynchronous I/O
For providing a higher priority data delivery, a new mechanism was introduced in Unix systems called the Out-of-
band Data as an optional feature. The noteworthy feature of out of band data is that it bypasses the line and is sent
before anyone else in the line. It is supported by TCP but not by UDP. Another name given to such type of data is
Urgent Data. TCP has the ability to sent only one byte of urgent data requiring the programmer to mention
MSG_OOB flag to any of the sent function. Upon exceeding the limit the last byte is treated as the urgent byte.

The signal that is received upon arrival of urgent byte is SIGURG. Another concept which allows us to direct the
position of the urgent data in the normal data is called the urgent mark. When theres an availability of urgent byte in
sockets read queue, theres one function that returns the file descriptor as having an exception condition pending. It
is called a select function. As TCP allows to queue only one byte of urgent data, arrival of another urgent bit before
the receiving of current bit, causes the existing one to be discarded.

The mode in the unix systems that doesnt prevent the data from being sent or received when not enough room in
socket or non-availability of data respectively is called the non-blocking mode. It doesnt stop the data to transfer but
it fails with an error flag of EWOULDBLOCK or EAGAIN. The support for general asynchronous I/O mechanism is
included in Single unix specification which is a different from the socket mechanism i.e. not standardized. Socket
based asynchromous I/O mechanism are often referred to as signal based I/O.

The socket based Asynchronous I/O allows for the arranging of SIGIO signal when reading from socket or during
availability of space in the write queue of socket

9
Summary
The failure of Multics project led to the development of Unix. Unix IO functions were among the first and the most
useful features that were introduced in the early Unix days. The process of forking allows processes to create their
child process and perform various activities one after the another. The introduction of threads in Unix systems was
considered to be a milestone because it added concurrency and ability of programs to run simultaneously on a single
machine. The concept of out of band data allowed the programmers to assign priority to some tasks so they may not
get stuck behind some non-priority tasks.

10

You might also like