You are on page 1of 79

Unix System Kernel

Instructors:
Fu-Chiung Cheng
( )
Associate Professor
Computer Science & Engineering
Tatung Institute of Technology
1

Unix: Introduction
Operating System: a system that manages the resources
of a computer.
Resources: CPUs, Memory, I/O devices, Network
Kernel: the memory resident portion of Unix system
File system and process control system are two major
components of Unix Kernel.

Architecture of Unix System


emacs
sh
cpp
cc as
ld

who
date
kernel
ed

hardware

OS interacts directly with


the hardware
Such OS is called
system kernel

wc
nroff

grep

Other apps

Unix System Kernel


Three major tasks of kernel:
Process Management
Device Management
File Management
Three additional Services for Kernel:
Virtual Memory
Networking
Network File Systems

Experimental Kernel Features:


Multiprocessor support
Lightweight process (thread) support

Block Diagram of System Kernel


User Programs

User Level
Kernel
Level

Libraries

System Call Interface

File Subsystem

Device drivers

Process

Inter-process
communication

control

Scheduler

subsystem

Memory
management

hardware control
hardware

Hardware Level

Process Control Subsystem


Process Synchronization
Interprocess communication
Memory management:
Scheduler: process scheduling
(allocate CPU to Processes)

File subsystem
A file system is a collection of files and directories on
a disk or tape in standard UNIX file system format.
Kernels file sybsystem regulates data flow between
the kernel and secondary storage devices.

Hardware Control
Hardware control is responsible for handling interrupts
and for communicating with the machine.
Devices such as disks or terminals may interrupt the
CPU while a process is executing.
The kernel may resume execution of the interrupted
process after servicing the interrupt.

Processes
A program is an executable file.
A process is an instance of the program in execution.
For example: create two active processes
$ emacs &
$ emacs &
$ ps
PID TTY TIME CMD
12893 pts/4 0:00 tcsh
12581 pts/4 0:01 emacs
12582 pts/4 0:01 emacs
$
9

Processes
A process has
text: machine instructions
(may be shared by other processes)
data
stack
Process may execute either in user mode and in kernel
mode.
Process information are stored in two places:
Process table
User table
10

User mode and Kernel mode


At any given instant a computer running the Unix system
is either executing a process or the kernel itself is running
The computer is in user mode when it is executing
instructions in a user process and it is in kernel mode
when it is executing instructions in the kernel.
Executing System call ==> User mode to Kernel mode
perform I/O operations
system clock interrupt

11

Process Table
Process table: an entry in process table has the following
information:
process state:
A. running in user mode or kernel mode
B. Ready in memory or Ready but swapped
C. Sleep in memory or sleep and swapped
PID: process id
UID: user id
scheduling information
signals that is sent to the process but not yet handled
a pointer to per-process-region table
There is a single process table for the entire system12

User Table (u area)


Each process has only one private user table.
User table contains information that must be accessible
while the process is in execution.
A pointer to the process table slot
parameters of the current system call, return values
error codes
file descriptors for all open files
current directory and current root
process and file size limits.
User table is an extension of the process table.
13

Process
table

Kernel
user
address address
space
space
Active process

resident
swappable
text
u area

data

Region
table

stack

Per-process
region table

14

Shared Program Text and


Software Libraries
Many programs, such as shell, are often being

executed by several users simultaneously.


The text (program) part can be shared.
In order to be shared, a program must be compiled using
a special option that arranges the process image so that
the variable part(data and stack) and the fixed part (text)
are cleanly separated.
An extension to the idea of sharing text is sharing
libraries.
Without shared libraries, all the executing programs
contain their own copies.
15

Region
table

Process
table
text
data
stack
Active process
text

Reference
count = 2

data
stack

Per-process
region table

16

System Call
A process accesses system resources through system call.
System call for
Process Control:
fork: create a new process
wait: allow a parent process to synchronize its
execution with the exit of a child process.
exec: invoke a new program.
exit: terminate process execution
File system:
File: open, read, write, lseek, close
inode: chdir, chown chmod, stat fstat
17
others: pipe dup, mount, unmount, link, unlink

System call: fork()


fork: the only way for a user to create a process in Unix
operating system.
The process that invokes fork is called parent process
and the newly created process is called child process.
The syntax of fork system call:
newpid = fork();
On return from fork system call, the two processes have
identical copies of their user-level context except for the
return value pid.
In parent process, newpid = child process id
In child process, newpid = 0;
18

/* forkEx1.c */
#include <stdio.h>

$ cc forkEx1.c -o forkEx1
$ forkEx1
Before forking ...
Child Process fpid=0
After forking fpid=0
Parent Process fpid=14707
After forking fpid=14707
$

main()
{
int fpid;
printf("Before forking ...\n");
fpid = fork();
if (fpid == 0) {
printf("Child Process fpid=%d\n", fpid);
} else {
printf("Parent Process fpid=%d\n", fpid);
}
printf("After forking fpid=%d\n", fpid);
}

19

/* forkEx2.c */
#include <stdio.h>
main()
{
int fpid;
printf("Before forking ...\n");
system("ps");
fpid = fork();
system("ps");
printf("After forking
fpid=%d\n", fpid);
}
$ ps
PID TTY
TIME CMD
14759 pts/9 0:00 tcsh
$

$ forkEx2
Before forking ...
PID TTY
TIME CMD
14759 pts/9 0:00 tcsh
14778 pts/9 0:00 sh
14777 pts/9 0:00 forkEx2
PID TTY
TIME CMD
14781 pts/9 0:00 sh
14759 pts/9 0:00 tcsh
14782 pts/9 0:00 sh
14780 pts/9 0:00 forkEx2
14777 pts/9 0:00 forkEx2
After forking fpid=14780
$ PID TTY
TIME CMD
14781 pts/9 0:00 sh
14759 pts/9 0:00 tcsh
14780 pts/9 0:00 forkEx2
After forking fpid=0 20

System Call: getpid() getppid()


Each process has a unique process id (PID).
PID is an integer, typically in the range 0 through 30000.
Kernel assigns the PID when a new process is created.
Processes can obtain their PID by calling getpid().
Each process has a parent process and a corresponding
parent process ID.
Processes can obtain their parents PID by calling
getppid().

21

/* pid.c */
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
main()
{
printf("pid=%d ppid=%d\n",getpid(), getppid());
}
$ cc pid.c -o pid
$ pid
pid=14935 ppid=14759
$
22

/* forkEx3.c */
#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
main()
{
int fpid;
printf("Before forking ...\n");
fpid = fork();
if (fpid == 0) {
printf("Child Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
} else {
printf("Parent Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
}
printf("After forking fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
23

$ cc forkEx3.c -o forkEx3
$ forkEx3
Before forking ...
Parent Process fpid=14942 pid=14941 ppid=14759
After forking fpid=14942 pid=14941 ppid=14759
$ Child Process fpid=0 pid=14942 ppid=1
After forking fpid=0 pid=14942 ppid=1
$ ps
PID TTY
TIME CMD
14759 pts/9 0:00 tcsh

24

System Call: wait()


wait system call allows a parent process to wait
for the demise of a child process.
See forkEx4.c

25

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
main()
{
int fpid, status;
printf("Before forking ...\n");
fpid = fork();
if (fpid == 0) {
printf("Child Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
} else {
printf("Parent Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
}
wait(&status);
printf("After forking fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid()); 26
}

$ cc forkEx4.c -o forkEx4
$ forkEx4
Before forking ...
Parent Process fpid=14980 pid=14979 ppid=14759
Child Process fpid=0 pid=14980 ppid=14979
After forking fpid=0 pid=14980 ppid=14979
After forking fpid=14980 pid=14979 ppid=14759
$

27

System Call: exec()


exec() system call invokes another program by replacing
the current process
No new process table entry is created for exec() program.
Thus, the total number of processes in the system isnt
changed.
Six different exec functions:
execlp, execvp, execl, execv, execle, execve,
(see man page for more detail.)
exec system call allows a process to choose its successor.
28

/* execEx1.c */
#include <stdio.h>
#include <unistd.h>
main()
{
printf("Before execing ...\n");
execl("/bin/date", "date", 0);
printf("After exec\n");
}
$ execEx1
Before execing ...
Sun May 9 16:39:17 CST 1999
$

29

/* execEx2.c */
#include <sys/types.h>
#include <unistd.h>
#include <stdio.h> $ execEx2
Before execing ...
After exec and fpid=14903
main()
$ Sun May 9 16:47:08 CST 1999
{
$
int fpid;
printf("Before execing ...\n");
fpid = fork();
if (fpid == 0) {
execl("/bin/date", "date", 0);
}
printf("After exec and fpid=%d\n",fpid);
30
}

Handling Signal
A signal is a message from one process to another.
Signal are sometime called software interrupt
Signals usually occur asynchronously.
Signals can be sent
A. by one process to anther (or to itself)
B. by the kernel to a process.
Unix signals are content-free. That is the only thing that
can be said about a signal is it has arrived or not

31

Handling Signal
Most signals have predefined meanings:
A. sighup (HangUp): when a terminal is closed, the
hangup signal is sent to every process in control terminal.
B. sigint (interrupt): ask politely a process to terminate.
C. sigquit (quit): ask a process to terminate and produce a
codedump.
D. sigkill (kill): force a process to terminate.
See signEx1.c

32

#include <stdio.h>
#include <sys/types.h>
#include <unistd.h>
main() {
int fpid, *status;
printf("Before forking ...\n");
fpid = fork();
if (fpid == 0) {
printf("Child Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
for(;;); /* loop forever */
} else {
printf("Parent Process fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
}
wait(status); /* wait for child process */
printf("After forking fpid=%d pid=%d ppid=%d\n",
fpid, getpid(), getppid());
33

$ cc sigEx1.c -o sigEx1
$ sigEx1 &
Before forking ...
Parent Process fpid=14989 pid=14988 ppid=14759
Child Process fpid=0 pid=14989 ppid=14988
$ ps
PID TTY
TIME CMD
14988 pts/9 0:00 sigEx1
14759 pts/9 0:01 tcsh
14989 pts/9 0:09 sigEx1
$ kill -9 14989
$ ps
...
34

Scheduling Processes
On a time sharing system, the kernel allocates the CPU to
a process for a period of time (time slice or time quantum)
preempts the process and schedules another one when
time slice expired, and reschedules the process to continue
execution at a later time.
The scheduler use round-robin with multilevel feedback
algorithm to choose which process to be executed:
A. Kernel allocates the CPU to a process for a time slice.
B. preempts a process that exceeds its time slice.
C. feeds it back into one of the several priority queues.
35

Process Priority
Priority Levels

Kernel Mode
User Mode

Processes

swapper
wait for Disk IO
wait for buffer
wait for inode
...
wait for child exit
User level 0
User level 1
...
User level n

36

Process Scheduling
(Unix System V)
There are 3 processes A, B, C under the following
assumptions:
A. they are created simultaneously with initial priority 60.
B. the clock interrupt the system 60 times per second.
C. these processes make no system call.
D. No other process are ready to run
E. CPU usage calculation: CPU = decay(CPU) = CPU/2
F. Process priority calculation: priority = CPU/2 + 60.
G. Rescheduling Calculation is done once per second.
37

Process A
Process B
Process C
Priority CPU count Priority CPU count Priority CPU count

0
1
2
3
4

60

60

60

75

60
30

60

60

67

15

75

60
30

60

63

67
33

67

15

75

60
30

63

7
...

67

15

76

38

Unix System Kernel


Instructors:
Fu-Chiung Cheng
( )
Associate Professor
Computer Science & Engineering
Tatung Institute of Technology
39

Booting
When the computer is powered on or rebooted, a short
built-in program (maybe store in ROM) reads the first
block or two of the disk into memory. These blocks
contain a loader program, which was placed on the disk
when disk is formatted.
The loader is started. The loader searches the root
directory for /unix or /root/unix and load the file into
memory
The kernel starts to execute.
40

The first processes


The kernel initializes its internal data structures:
it constructs linked list of free inodes, regions, page table
The kernel creates u area and initializes slot 0 of process
table
Process 0 is created
Process 0 forks, invoking the fork algorithm directly
from the Kernel. Process 1 is created.
In kernel mode, Process 1 creates user-level context
(regions) and copy code (/etc/init) to the new region.
Process 1 calls exec (executes init).
41

init process
The init process is a process dispatcher:spawning
processes, allow users to login.
Init reads /etc/inittab and spawns getty
when a user login successfully, getty goes through a login
procedure and execs a login shell.
Init executes the wait system call, monitoring the death
of its child processes and the death of orphaned processes
by exiting parent.

42

Init fork/exec
a getty progrma
to manage the line
When the shell
dies, init wakes up
and fork/exec a
getty for the line
Getty prints
login: message and
waits for someone
to login

The shell runs


programs for the
user unitl the
user logs off

The login process


prints the
password message,
read the password
then check the password

43

File Subsystem
A file system is a collection of files and directories on
a disk or tape in standard UNIX file system format.
Each UNIX file system contains four major parts:
A. boot block:
B. superblock:
C. i-node table:
D. data block: file storage

44

File System Layout


Block 0: bootstrap
Block 1: superblock
Block 2

...

Block n
Block n+1

Block 2 - n:i-nodes

Block n+1 - last:Files

...
The last Block
45

Boot Block
A boot block may contains several physical blocks.
Note that a physical block contains 512 bytes
(or 1K or 2KB)
A boot block contains a short loader program for
booting
It is blank on other file systems.

46

Superblock
Superblock contains key information about a file system
Superblock information:
A. Size of a file system and status:
label: name of this file system
size: the number of logic blocks
date: the last modification date of super block.
B. information of i-nodes
the number of i-nodes
the number of free i-nodes
C. information of data block: free data blocks.
47
The information of a superblock is loaded into memory.

I-nodes
i-node: index node (information node)
i-list: the list of i-nodes
i-number: the index of i-list.
The size of an i-node: 64 bytes.
i-node 0 is reserved.
i-node 1 is the root directory.
i-node structure: next page

48

mode
owner
timestamp
Size
Reference count
Block count
Direct blocks
0-9

I-node structure
Data block

Data block

Data block

Data block

Data block

Data block

...

Single indirect

Indirect block

Double indirect

Indirect block

Triple indirect

...

Indirect block
Indirect block
49

...

I-node structure
mode: A. type: file, directory, pipe, symbolic link
B. Access: read/write/execute (owner, group,)
owner: who own this I-node (file, directory, ...)
timestamp: creation, modification, access time
size: the number of bytes
block count: the number of data blocks
direct blocks: pointers to the data
single indirect: pointer to a data block which
pointers to the data blocks (128 data blocks).
Double indirect: (128*128=16384 data blocks)
50
Triple indirect: (128*128*128 data blocks)

Data Block
A data block has 512 bytes.
A. Some FS has 1K or 2k bytes per blocks.
B. See blocks size effect (next page)
A data block may contains data of files or data of
a directory.
File: a stream of bytes.
Directory format:
i-#

Next size

File name

pad
51

home

alex

john

jenny

Report.txt

bin

grep

i-#

Next

10

bin

pad

i-#

Report.txt

Next

notes
find

pad

i-#

Next

notes

pad

Next52

Boot Block
SuperBlock
i-nodes i-node

home
kc

alex

...

i-node
Report.txt

source

...

notes

i-node
grep

u area

find

In-core
inodes
i-node

Current
directory
inode

...

i-node

...

i-node

Device driver
&
Hardware
control

...
i-node
...
notes
...
source

...

Report.txt

...
Data
Blocks Current Dir53

In-core inode table


UNIX system keeps regular files and directories on block
devices such as disk or tape,
Such disk space are called physical device address space.
The kernel deals on a logical level with file system
(logical device address space) rather than with disks.
Disk driver can transfer logical addresses into physical
device addresses.
In-core (memory resident) inode table stores the
inode information in kernel space.
54

In-core inode table


An in-core inode contains
A. all the information of inode in disks.
B. status of in-core inode
inode is locked,
inode data changed
file data changed.
C. the logic device number of the file system.
D. inode number
E. reference count
55

File table
The kernel have a global data structure, called file table,
to store information of file access.
Each entry in file table contains:
A. a pointer to in-core inode table
B. the offset of next read or write in the file
C. access rights (r/w) allowed to the opening process.
D. reference count.

56

User File Descriptor table


Each process has a user file descriptor table to identify
all opened files.
An entry in user file descriptor table pointer to an entry
of kernels global file table.
Entry 0: standard input
Entry 1: standard output
Entry 2: error output

57

System Call: open


open: A process may open a existing file to read or write
syntax:
fd = open(pathname, mode);
A. pathname is the filename to be opened
B. mode: read/write
Example

58

#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>

$ cc openEx1.c -o openEx1
$ openEx1
Before open ...
fd1=3 fd2=4 fd3=5
$

main()
{
int fd1, fd2, fd3;
printf("Before open ...\n");
fd1 = open("/etc/passwd", O_RDONLY);
fd2 = open("./openEx1.c", O_WRONLY);
fd3 = open("/etc/passwd", O_RDONLY);
printf("fd1=%d fd2=%d fd3=%d \n", fd1, fd2, fd3);
}

59

U area

Pointer to
Descriptor table

User file
descriptor
table

file table

in-core
inodes

0
1
2
3
4
5
6
7

...

CNT=1 R
...
CNT=1 W

CNT=2
/etc/passwd

...

.
.
.

CNT=1
./openEx2.c

CNT=1 R
...

...

...
60

System Call: read


read: A process may read an opened file
syntax:
fd = read(fd, buffer, count);
A. fd: file descriptor
B. buffer: data to be stored in
C. count: the number (count) of byte
Example

61

#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>

$ cc openEx2.c -o openEx2
$ openEx2
=======
fd1=3 buf1=root:x:0:1:Super-Us
main()
fd1=3 buf2=er:/:/sbin/sh
{
daemo
int fd1, fd2, fd3;
char buf1[20], buf2[20]; =======
$
buf1[19]='\0';
buf2[19]='\0';
printf("=======\n");
fd1 = open("/etc/passwd", O_RDONLY);
read(fd1, buf1, 19);
printf("fd1=%d buf1=%s \n",fd1, buf1);
read(fd1, buf2, 19);
printf("fd1=%d buf2=%s \n",fd1, buf2);
printf("=======\n");
62
}

#include <stdio.h>
$ cc openEx3.c -o openEx3
#include <sys/types.h>
$ openEx3
#include <fcntl.h>
======
main()
fd1=3 buf1=root:x:0:1:Super-Us
{
fd2=4 buf2=root:x:0:1:Super-Us
int fd1, fd2, fd3;
char buf1[20], buf2[20]; ======
$
buf1[19]='\0';
buf2[19]='\0';
printf("======\n");
fd1 = open("/etc/passwd", O_RDONLY);
fd2 = open("/etc/passwd", O_RDONLY);
read(fd1, buf1, 19);
printf("fd1=%d buf1=%s \n",fd1, buf1);
read(fd2, buf2, 19);
printf("fd2=%d buf2=%s \n",fd2, buf2);
printf("======\n");
63
}

U area

Descriptor
table

User file
descriptor
table

file table

in-core
inodes

0
1
2
3
4
5
6
7

...

CNT=1 R
...
...

CNT=2
/etc/passwd

.
.
.

CNT=1 R

...

...

...
...
...
64

System Call: dup


dup: copy a file descriptor into the first free slot of the
user file descriptor table.
syntax:
newfd = dup(fd);
A. fd: file descriptor
Example

65

#include <stdio.h>
$ cc openEx4.c -o openEx4
#include <sys/types.h>
$ openEx4
#include <fcntl.h>
======
main()
fd1=3 buf1=root:x:0:1:Super-Us
{
fd2=4 buf2=er:/:/sbin/sh
int fd1, fd2, fd3;
char buf1[20], buf2[20]; daemo
======
buf1[19]='\0';
$
buf2[19]='\0';
printf("======\n");
fd1 = open("/etc/passwd", O_RDONLY);
fd2 = dup(fd1);
read(fd1, buf1, 19);
printf("fd1=%d buf1=%s \n",fd1, buf1);
read(fd2, buf2, 19);
printf("fd2=%d buf2=%s \n",fd2, buf2);
printf("======\n"); char buf1[20], buf2[20];
66
}

U area

Descriptor
table

User file
descriptor
table

file table

in-core
inodes

0
1
2
3
4
5
6
7

...

CNT=2 R
...
...

CNT=1
/etc/passwd

.
.
.

...

...

...

...
...
...
67

System Call: creat


creat: A process may create a new file by creat system
call
syntax:
fd = write(pathname, mode);
A. pathname: file name
B. mode: read/write
Example

68

System Call: close


close: A process may close a file by close system
call
syntax:
close(fd);
A. fd: file descriptor
Example

69

System Call: write


write: A process may write data to an opened file
syntax:
fd = write(fd, buffer, count);
A. fd: file descriptor
B. buffer: data to be stored in
C. count: the number (count) of byte
Example

70

/* creatEx1.c */
#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>
main()
{
int fd1;
char *buf1="I am a string\n";
char *buf2="second line\n";
printf("======\n");
fd1 = creat("./testCreat.txt", O_WRONLY);
write(fd1, buf1, 20);
write(fd1, buf2, 30);
printf("fd1=%d buf1=%s \n",fd1, buf1);
close(fd1);
chmod("./testCreat.txt", 0666);
printf("======\n");
}

71

$ cc creatEx1.c -o creatEx1
$ creatEx1
======
fd1=3 buf1=I am a string
======
$ ls -l testCreat.txt
-rw-rw-rw- 1 cheng
$ more testCreat.txt
...

staff

50 May 10 20:37 testCreat.txt

72

System Call: stat/fstat


stat/fstat: A process may query the status of a file (locked)
file type, file owner, access permission. file size, number
of links, inode number, access time.
syntax:
stat(pathname, statbuffer); fstat(fd, statbuffer);
A. pathname: file name
B. statbuffer: read in data
C. fd: file descriptor
Example
73

/* statEx1.c */
#include <sys/stat.h>
main()
{
int fd1, fd2, fd3;
struct stat bufStat1, bufStat2;
char buf1[20], buf2[20];
printf("======\n");
fd1 = open("/etc/passwd", O_RDONLY);
fd2 = open("./statEx1", O_RDONLY);
fstat(fd1, &bufStat1); fstat(fd2, &bufStat2);
printf("fd1=%d inode no=%d block size=%d blocks=%d\n",
fd1, bufStat1.st_ino,bufStat1.st_blksize, bufStat1.st_blocks);
printf("fd2=%d inode no=%d block size=%d blocks=%d\n",
fd2, bufStat2.st_ino,bufStat2.st_blksize, bufStat2.st_blocks);
printf("======\n");
}
74

$ cc statEx1.c -o statEx1
$ statEx1
======
fd1=3 inode no=21954 block size=8192 blocks=6
fd2=4 inode no=190611 block size=8192 blocks=
======
...

75

System Call: link/unlink


link: hardlink a file to another
syntax:
link(sourceFile, targetFile); unlink(file)
A. sourceFile targetFile, file: file name
Example:
Lab exercise: write a c program which use link/unlink
system call. Use ls -l to see the reference count.

76

System Call: chdir


chdir: A process may change the current directory
of a processl
syntax:
chdir(pathname);
A. pathname: file name
Example

77

#include <stdio.h>
#include <sys/types.h>
#include <fcntl.h>

$ ls -l /usr/bin
$

main()
{
chdir("/usr/bin");
system("ls -l");
}
78

End of
System Kernel Lecture

79

You might also like