You are on page 1of 4

CS2106 Introduction to ˆ Interrupt: Asynchronous (occurs independent of program

execution, e.g. timer, mouse movement, keypress). Executes


Non-preemptive variant: new higher priority process has to
wait for next round of scheduling
Inter-Process Communication
Operating Systems an interrupt handler, program execution is suspended. Low priority process can starve
Shared memory
Priority inversion: higher priority task forced to block while
Basics lower priority task gets to run ˆ Advantages:
Process Abstraction in Unix Multi-level feedback queue: Efficient (only the initial steps involves OS)
If P riority(A) > P riority(B) then A runs Easy to use (shared mem region behaves like normal mem)
ˆ Operating Systems ... ˆ pid t fork(void): If P riority(A) == P riority(B) then round-robin
- Manages resources and coordination (process Parent returns PID of child, child returns zero New job gets highest priority ˆ Disadvantages:
synchronization, resource sharing)
ˆ int execl(const char *path, const char *arg, ...) If a job fully utilized its time slice then priority reduced Synchronization (of access)
- Simplify programming (abstraction of hardware,
If a job yields/blocks then priority retained Implementation is usually harder
convenient services) int execv(const char *path, char *const argv[])
e.g. execl("/bin/ls", "ls", "-la", NULL) Lottery scheduling: Lottery tickets assigned to processes
- Enforce usage policies
(possibly unevenly depending on priority), and randomly
- Security of protection int shmget(key_t key /*can be IPC_PRIVATE*/,
- User program portability (across different hardware) ˆ init process is the root process (traditionally PID=1) chosen winner is allowed to run (preemptive)
size_t size, int shmflg /* IPC_CREAT | 600*/);
Parent can distribute tickets to its child processes, and each
- Efficiency (optimized for particular usage and hardware) ˆ int wait(int *status) //IPC_CREAT means memory will be created if nonexistent
shared resource (CPU, I/O) can have its own set of tickets
Set status to NULL to ignore status void *shmat(int shmid, const void *shmaddr /*NULL*/,
ˆ Kernel mode: complete access to all hardware resources int shmflg /*0*/);
User mode: limited access to hardware resources Least significant 8 bits is the value passed to exit(int)
Threads int shmdt(const void *shmaddr);
int shmctl(int shmid, int cmd /*IPC_RMID*/,
ˆ Monolithic OS: Kernel is one big special program
Microkernel OS: Kernel is very small and clean, and only Process Scheduling “Lightweight process” struct shmid_ds *buf /*unused for IPC_RMID*/);
provide basic and essential facilities (e.g. IPC, address space ˆ Benefits:
management, thread management); higher-level services ˆ Non-preemptive (Cooperative): Process stays Much less additional resources needed as compared to
scheduled until it blocks or yields Message passing
(device driver, process management, memory management, processes
file system) built on top of basic facilities and runs outside Preemptive: At the end of time quota, the process is No need for additional mechanism to pass information Messages stored in kernel memory space
the OS (using IPC to communicate). Kernel is more robust, suspended (it is still possible to block or yield early) between threads
and there is more isolation and protection between kernel Multithreaded programs can appear much more responsive ˆ Direct communication:
and higher-level services, but at lower performance. Batch Processing Multithread programs can take advantage of multiple CPUs Sender/receiver explicitly names the other party
ˆ Problems: One buffer per pair of (sender, receiver)
ˆ Type 1 Hypervisor: Runs directly on hardware No user interaction, non-preemptive scheduling is predominant
Parallel syscall possible - have to guarantee correctness Indirect communication:
Type 2 Hypervisor: Runs on a host operating system
ˆ Turnaround time: Total time taken from arrival to finish Process behaviour - fork()/exec()/exit() when there are Sender sends to mailbox/port
(including waiting time) multiple threads Receiver receives from mailbox/port
Process Abstraction Throughput: Number of tasks finished per unit time
ˆ User thread: Thread is implemented as a user library
Generic 5-State Process Model CPU utilization: Percentage of time CPU is doing work ˆ Blocking primitives (synchronous):
(just library calls); kernel is not aware of the threads Send() blocks until message is received
ˆ Generic 5-State Process Model:
ˆ First-come first-served: Use FIFO queue based on - Implemented by library: more flexible, e.g. customized Receive() blocks until message has arrived
New Terminated arrival time (when tasked is blocked it is removed; it is thread scheduling policy - One thread blocked -¿ all threads Non-blocking primitives (asynchronous):
admit placed at the back of queue when it is ready again). blocked - Cannot exploit multiple CPUs Send() does not block
switch: scheduled exit
create Guaranteed to have no starvation ˆ Kernel thread: Thread is implemented in the OS (using Receive() returns some indication if no message is available
Ready Running Shortest Job First: Select task with smallest CPU time
switch:
release CPU Hybrid Thread Model
(until next I/O). Starvation is possible because long job
syscalls); thread-level scheduling is possible - Multiple
threads from same process can run simultaneously - Thread ˆ Advantages:
event wait may never get a chance (when short jobs keep arriving). operations are syscalls: more resource-intensive and slower - Portable (can implement in distributed system or network)
 Have both Kernelaverageand User threads
event occurs
Prediction of CPU time usually uses exponential of Less flexible, so it can be generic enough for all Easier synchronization (blocking primitives implicitly
Blocked history  OS schedule on kernel threads only programs
multithreaded synchronize sender/receiver)
Shortest Remaining Time: Preemptive version of SJF Disadvantages:
 User thread can binds to ˆ a kernelthread
Hybrid thread model: User thread can bind to a kernel Inefficient (needs OS intervention)
Notes: generic process states, details vary in actual OS
ˆ Process Control Block (PCB): information about a48
[ CS2106 L2 - AY1819S1 ] ˆ Convoy effect: Many tasks contend for CPU (while I/O is thread Harder to use (messages limited in size/format)
process: registers, memory region info, PID, process state idle), and then contend for I/O (while CPU is idle)
 Offer great flexibility
ˆ Syscall mechanism: Unix Pipes
1. User program invokes library call Interactive Environment  Can limit the
2. Library call places the syscall number of designated concurrency
Preemptive scheduling algorithms are used to ensure good Pipes function as fixed-size circular byte buffer with implicit
location (e.g. register) synchronization
response time of any process / user - writers wait when buffer is full
3. Library call executes TRAP instruction to switch to
kernel mode ˆ Response time: Time between request and response by - readers wait when buffer is empty
4. Appropriate system call handler is determined using the system
syscall number as index (usually handled by a dispatcher) Predictability: Less variation in response time Time int pipe(int pipefd[2]); // create new pipe
5. Syscall handler is executed (this carries out the actual quantum: Execution duration given to a process, must be pipefd[0]: file descriptor for reading
request) a multiple of timer interrupt pipefd[1]: file descriptor for writing
6. Control returned to library call, switched back to user [ CS2106 L4 - AY1819S1 ]
20
mode ˆ Round robin: Like First-come first-served, but will be int pthread_create(pthread_t *thread,
7. Library call returns to user program interrupted when time quantum elapses const pthread_attr_t *attr /*NULL*/, Unix Signals
Priority scheduling: Each task gets a priority, highest void *(*start_routine) (void *) /*function ptr*/,
ˆ Exception: Synchronous (occurs due to program priority gets scheduled first. void *arg /*argument for start_routine*/); typedef void (*sighandler_t)(int);
execution, e.g. arithmetic errors, memory access errors). Preemptive variant: new higher priority process preempts int pthread_exit(void *retval); sighandler_t signal(int signum, sighandler_t handler);
Executes an exception handler, like a forced syscall. currently running lower priority process int pthread_join(pthread_t thread, void **retval); // returns previous signal handler, or SIG_ERR on error
Synchronization Memory Management Logical Address Translation: Essential Tricks
Paging Segmentation with
Segmentation with Paging: Illustration
Paging

Critical Sections ˆ Memory regions:


 Two ˆimportant
(Physical) frame
design decisions simplifies the
(Logical) page
S
Page Pg Table

ˆ Properties of correct implementation:


Text: for instructions address translation
- Frames and pages calculation
have the same size
limit Base

Data: for global variables - Logical memory remains contiguous but physical memory
Mutual exclusion: If there is a process in CS then all other
Heap: for dynamic allocations
1. Keep frame size (page size) as a power-of-2
process cannot enter CS may be disjoint
Segment Table
Stack: for function invocations 2. Physical frame size == Logical page size
Progress: If no process is in CS then one waiting process ˆ Address translation: P
should be granted access - Make frame size
ˆ Transient data: variables with automatic storage duration Page(= page size) aOffset
Number power-of-2 <
Yes Frame
Number
Bounded wait: After a process requests to enter the critical Physical
Persistent data: globals, dynamically allocated memory Logical No
section, there exists an upper bound of number of times Address
p d Memory
Addressing
other processes can enter the CS before this process Page
(m – n) bits n bits Error!
Independence: Process not in CS should never block other ˆ Alternatives for memory abstraction: Table

Address relocation: translate all addresses at load time CPU S P D F D


processes Translation
Base + Limit registers: generate instruction to add Base to mechanism
ˆ Symptoms of incorrect synchronization: all memory references at compile time, and check against
Deadlock: All processes blocked Limit for validity Physical
f d
Livelock: Processes are not blocked, but they keep changing Address
state to avoid deadlock and make no other progress Frame Number Offset
ˆ Memory partitioning: every process gets a contiguous [ CS2106 L8 - AY1819S1 ]
33
Starvation: Some processes are blocked forever memory region [ CS2106 L8 - AY1819S1 ]
8 Secondary Storage (With Paging)
Fixed partitioning: physical memory is split into fixed ˆ Fragmentation: Paging removes external fragmentation
ˆ Test and Set: TestAndSet Reg, Mem
number of partitions, each process occupies exactly one (all free frames can be used without wastage), but pages ˆ Some pages can be stored on secondary storage, so that a
- Atomically load current content at Mem into Reg, and
stores 1 into Mem partition can still have Translation Look-Aside Buffer: Illustrationprocess can use more logical memory than what is
internal fragmentation (logical memory
- Internal fragmentation when process does not need whole required may not be a multiple of page size) physically available
ˆ Peterson’s
Peterson's Algorithm:
Algorithm partition
ˆ Page table: ˆ Page table stores memory resident bit:
Turn 0 Dynamic partitioning: partition is created based on actual
Want[0] 0 Stores physical frame for CPU P D F D - memory resident: page in physical memory (RAM)
Want[1] 0 size of process, OS keeps track of memory regions, and - non-memory resident: page in secondary storage
split/merge free regions when necessary each logical page Page# Frame #
Want[0] = 1;
Turn = 1;
Want[1] = 1;
Turn = 0; - External fragmentation unused “holes” in physical ˆ Translation look-aside ˆ Page fault: When CPU tries to access non-memory
Physical
while (Want[1] && while (Want[0] && memory due to process creation/termination buffer (TLB): TLB Memory resident page
Turn == 1); Turn == 0); - OS locates the page in secondary storage and loads it into
cache of a few table entries
Critical Section Critical Section P
physical memory
Want[0] = 0; Want[1] = 0; Dynamic Allocation Algorithms ˆ Memory access time Frame #

Process P0 Process P1 with TLB: ˆ Thrashing: Page fault happens too often
 Assumption: ˆ Linear search based: = T LBhit + T LBmiss Page Table - for well-behaved programs it is unlikely to happen due to
 Writing to Turn is an atomic operation
First-Fit: take the first hole that is large enough = 40% × (1ns + 50ns) + 60% × (1ns + 50ns + 50ns) temporal and spatial locality
[ CS2106 L8 - AY1819S1 ]
[ CS2106 L6 - AY1819S1 ] 14
21
Disadvantages: busy waiting, low level, not general Best-Fit: take the smallest hole that is large enough
Worst-Fit: take the largest hole ˆ Context switching & TLB: On context switch: Page Table Structure
Merging & Compaction: - TLB entries are flushed (so incoming process won’t get
Semaphores ˆ Direct paging: All pages in single table, might occupy
Buddy System: Example
When partition is freed, try merging with adjacent holes incorrect translation)
several memory pages
ˆ Properties of correct implementation: Can move occupied partitions around to consolidate holes
 Assume:
- (Optional) Save TLB state to PCB, and restore TLB data
Mutual exclusion: If there is a process in CS then all other  the largest block is 512 (29)
for incoming process (to reduce TLB misses) ˆ 2-level paging: Keep a page directory, [[TODO]]
process cannot enter CS ˆ Buddy system:
 Only one free block of size 512 initially ˆ (x86) On a TLB miss, the hardware searches through the ˆ Inverted page table: Single table for all processes, stores
ˆ Wait()/P()/Down() and Signal()/V()/Up() T1 Request 100 page table (without invoking the OS); OS is informed only (pid, logical page) indexed by frame number
A[9] 0 on page fault
ˆ Threads queue up on a semaphore (fair scheduling) Block P allocated at 0
A[8] size = 128 Page Replacement Algorithms
ˆ Extensions for protection:
ˆ General semaphore: value can be any non-negative
integer


Starting address = 0
Size = 29
A[9] Logical Address Translation: Illustration
Access-right (RWX) bits: memory access is checked against
ˆ Optimum (OPT): Replace the page that will not be used
… A[8] 256 access right bits (by hardware)
Assume: again for the 7500
longest period of time, not feasible as it needs
ˆ Binary semaphore: value can be only 0 or 1 (undefined A[1] A[7]
Valid bit: represent
User invalid
Code logical
Segmentaddresses,
= 0invalid access future knowledge Global
128
behaviour to Signal() on binary semaphore which is NULL pointer to will be caught Global
by OS Data Segment = 1 Data
A[0] indicate no free
… ˆ First In First Out (FIFO): Evict the oldest page first
currently 1) Segment
block of this size … ˆ Page sharing:Heap Segment
Several = physical
processes use same 2 frame - simple to implement, OS maintains a queue of resident
Producer Consumer: Blocking Version … Stack Segment = 3
ˆ Producer-Consumer: - e.g. shared libraries, copy-on-write from fork() page numbers 6000
0 A[0]
while (TRUE) { while (TRUE) {
Free Memory = 512 Memory Access - can be tricked: try 3 / 4 frames with 1 2 3 4 1 2 5 1 2 3 4 5
Produce Item; 0 128 256
wait( notEmpty ); P Free Free Segmentation (Belady’s Anomaly)
5700
< Segment Id, Offset > User locality
wait( notFull ); wait( mutex ); [ CS2106 L7 - AY1819S1 ]
38 - does not exploit temporal
wait( mutex ); item = buffer[out]; - Each element A[J] is a linked list that keeps track of free Code
- Each region of memory is
ˆ Least Recently Used
buffer[in] = item;
in = (in+1) % K;
out = (out+1) % K;
blocks of size 2J Base Limit (LRU): Evict the page that has
Segment
count--; placed in a separate segment
count++; signal( mutex ); - Each free block is indicated just by the starting address 3500 2200 not been used for the longest time
signal( mutex ); signal( notFull ); so they can grow/shrink freely 0 3500
- There might be a smallest allocatable block size, i.e. a - makes use of temporalHeaplocality
}
signal( notEmpty );
Consume Item; constant K > 0 such that A[J] exists only when J ≥ K - Each memory segment has a 1 6000 1500 2900
- does not suffer from Belady’s Anomaly
Segment
}
- To allocate size 2S : find smallest free block with segment id and limit - difficult to2400
implement, needs hardware support:
Producer Process Consumer Process 2 2400 1100
size ≥ 2S , then repeatedly split until there is a block of size - Memory references are (option 1) store “time-of-use” and update it on every access,
 Initial Values: 2S , then return that block of size 2S specified as: 3 0 1300 need to search1300
through all pages to find earliest time-of-use
 count = in = out = 0 Stack
 mutex = S(1), notFull = S(K), notEmpty = S(0)
- To deallocate: if buddy is also free, merge with buddy and segment id + offset segment table (option 2) maintain Segment when page is accessed,
a “stack”;
repeat; otherwise add block to the linked list - Can cause external fragmentation remove from stack (if exists) and push on top of stack
[ CS2106 L6 - AY1819S1 ]
38
< 2, 500 > 0
Physical Memory
ˆ Second Chance Page Replacement (CLOCK): File System Management File System Implementations Disk Scheduling
Maintain a circular queue of page numbers, and each page Magnetic Disk in One Glance
table entry has a “reference bit”
Second-Chance Page Replacement (CLOCK) ˆ Access types: Read, Write, Execute, Append, Delete,
Generic
ˆ Generic Disk
disk Organization:
organization: Illustration
Rotation
 General Idea: List (retrieve metadata of the file) Simple Partition
(Change Sector)
Boot Code Table
 Modified FIFO to give a second chance to pages
that are accessed ˆ Access control list (ACL): list of user identities and Seek
(Change Track)
 Each PTE now maintains a "reference bit": allowed access types (very customizable but use large space)
 1 = Accessed, 0 = Not accessed MBR Partition Partition …… Partition Track
 Algorithm:
1. The oldest FIFO page is selected ˆ Permission bits: Owner/Group/Universe,
Read/Write/Execute, e.g. rwxr--r-- Disk
2. If reference bit == 0  Page is replaced
Head
3. If reference bit == 1  Page is given a 2nd chance OS Boot Partition Directory Files
File Data Sector
 Reference bit cleared to 0 Block Details Structure Info
 Arrival time reset  page taken as newly loaded ˆ In Unix, Minimal ACL = permission bits,
 Next FIFO page is selected, go to Step 2 Extended ACL = add named users/groups Information for all Data for all files ˆ First-Come-First-Serve (FCFS)
[ CS2106 L11 - AY1819S1 ]
are here 37
 Degenerate into FIFO algorithm files are here
 When all pages has reference bit == 1 [ CS2106 L11 - AY1819S1 ]
ˆ Shortest Seek First (SSF): closest track first
Second-Chance: Implementation Details ˆ File structure: 6
[ CS2106 L9 - AY1819S1 ]
37
Array of bytes (usual) Block Allocation ˆ SCAN (elevator), C-SCAN (outside to inside only)
 Use circular queue Arr. of fixed-length records (can jump to any record easily)
to maintain the
pages: Arr. of var.-length records (flexible but hard to find record) ˆ Contiguous: allocate consecutive disk blocks to a file ˆ LOOK (real elevator)
 With a pointer - simple to keep track, fast access (no need to seek)
pointing to the oldest - has external fragmentation
page ( the victim
page ) ˆ Access methods: File System Case Studies
Sequential access: have to read and rewind on order ˆ Linked list: each disk block stores next block number too
To find a page to be Random access: read in any order, exposed via either way:
FAT Microsoft FAT File System Layout
 - no external fragmentation
replaced: 1. Read(Offset): every read explicitly states position
 Advance until a - slow random access to file, part of block is used for pointer
page with '0' 2. Seek(Offset): special operation to move to new location ˆ Layout:
reference bit Direct access: like random access, but for fixed-length ˆ File allocation table (FAT): next block numbers stored
 Clear the reference records (e.g. in database) in single table that is always in memory M
B partition 1 partition 2 partition 3 partition 4
bit as pointer passes - faster random access R
through
- FAT keeps track of all disk blocks (takes up memory
ˆ Generic operations on file data: Create, Open, Read, space)
[ CS2106 L9 - AY1819S1 ]
38 Write, Reposition/Seek, Truncate
ˆ Indexed allocation: each file has an index block (stores
FAT
FAT Duplicate

ˆ Info kept for opened file: File pointer (current location list of blocks containing the file)
Frame Allocation - lesser memory overhead, fast direct access
optional
in file), Disk location, Open count (number of processed Boot root data blocks

that has this file opened) - limited max file size, index block overhead directory
ˆ Simple Approaches:
Equal allocation: Each process gets same number of frames ˆ Indexed allocation with linked list: index block ˆ File [allocation table contains one of:
CS2106 L12 - AY1819S1 ]
Proportional allocation: Each process gets number of ˆ Open file table (Unix): contains a pointer to next index block (of the same file) - FREE 4

frames proportional to its memory usage File Operations: Unix Illustration - no file size limit - <Block number> of next block
Op.Type: …
- EOF
Proc A PCB 0 File offset: … ˆ Multi-level index: like multi-level paging - BAD
ˆ Local/Global Replacement: Process make
file system 0
"File Data":
- very large file size limit Directory Entry Illustration
Local replacement: Evicted page selected from same process calls, usually 1 File1.abc
Combination: e.g. Unix I-node ˆ Directory … each
with file … … data block entry: file/subdirectory is
- thrashing limited to single process descriptor fd … …
- special
of a type of file represented as a
directory directory entry
Global replacement: Evicted page can be from any process fd Op.Type: Read
- root directory …

is stored in a special location, other
- can cause thrashing in other processes File Descriptor
x File offset: 1234
"File Data": Free Space Management
System Table directories stored in normal data blocks
Calls
… … ˆ Bitmap: Each disk block represented by 1 bit 8 bytes 3 1 10 2 2 2 4
Proc B PCB
- e.g. 1=free, 0=occupied
ˆ Working Set Model: 0
Op.Type: Write File Name Reserved
1 y File offset: 5678 File2.def
- Set of pages referenced by a process is relatively constant "File Data": ˆ Linked list: Use an unrolled linked list of disk blocks
… …
in a period of time (“locality”) - store the free list in free disk blocks File
fd Creation Date
- Page fault only
Working Setwhen changing
Model: to new locality
Illustration … … Extension
+ Time
File Descriptor
- Use magic constant 4 = working set window (interval) Table
Open File Table "Actual File"
 Example memory reference strings [ CS2106 L10 - AY1819S1 ] Directory Implementation Attributes First Disk
Block
- Two file descriptors (i.e. open file table entries) can point 28
… 2 6 5 7 1 1 2 7 5 6 3 4 4 4 3 5 3 …
to same file (e.g. two process open same file, or same ˆ sub-directory is usually stored as file entry with special type File Size in Bytes
[ CS2106 L12 - AY1819S1 ]
8
∆ ∆ process open file twice) in a directory
- Two processes use the same file descriptor (i.e. open file ˆ File deletion: set first letter of filename to 0xE5
t1 t2 table entry) (e.g. after fork()) ˆ Linear list:
- requires linear search, usually last few searches are cached ˆ Free space: must be calculated by going through FAT
 Assume
 ∆ = an interval of 5 memory references ˆ Links in directory structure): ˆ Hash table: ˆ Clusters: (for newer FATs) group of disk blocks as
 ∆)={1,2,5,6,7} (5 frames needed)
W(t1,∆ Hard link (limited to file only): ref counted, creates DAG - file name is hashed smallest allocation unit
Symbolic link (can be file or directory): uses special link - fast lookup, but hash table has limited size and depends
 W(t2,∆
∆)={3,4} (2 frames needed) file, can create general graph on good hash function ˆ Virtual FAT: use multiple dir. entries for long file name
 Try using different ∆ values
Ext2
ˆ Layout:
Ext2 FS: Layout
M
B partition 1 partition 2 partition 3 partition 4
R

BOOT Block Group 0 Block Group 1 Block Group 2 ...

Group
Descriptors
Data Blocks

Block
Super- Bitmap
Block I-node I-node
Bitmap Table

[ CS2106 L12 - AY1819S1 ]


21
ˆ Superblock, group desc. duplicated in each block group

ˆ Block, I-node bitmap 1=occupied, 0=free


Ext2:
ˆ I-Node I-Node
structure: Structure (128 Bytes)
File type (regular, directory,
special, etc) + File permission
Mode (2)
User Id (2 bytes)
Owner Info (4) + Group Id (2 bytes)

File Size in bytes. Larger for


File Size (4/8)
regular file ( 8 bytes )
Timestamps
(3 x 4) Creation, Modification &
Deletion timestamps
Data Block
Pointers (15 x 4) Indices of data blocks. 12 x direct, 1 x
Reference indirect, 1 x double indirect, 1 x triple indirect
Count (2)
Number of time this I-Node is
… Other …
referenced by directory entry
… Fields …

[ CS2106 L12 - AY1819S1 ]


24
ˆ Directory entry:
- sizeDirectory
includes allEntry
subfields and possible gap to next entry
(Illustration)
- root directory has a fixed I-node number

91 F 4 Hi.c 39 F 8 lab5.pdf

74 D 3 sub 0

Name
Use a 0 I-Node
Name Length number to indicate
unused entry
Entry Type

Entry Size

I-Node Number
[ CS2106 L12 - AY1819S1 ]
30
ˆ Deleting a file:
- remove its directory entry from the parent directory by
adjusting previous size to point to next entry
- update I-node bitmap by marking file’s I-node as free
- update block bitmap by marking file’s blocks as free

ˆ Hard link: multiple directory entries point to same I-node

ˆ Sym. link: file (not I-node) content is path of target file


- can become invalid if target is deleted

Journaling
Write information or actual data to separate log file before
performing file operation, so it can recover from system crash

You might also like