You are on page 1of 319

Assembly Programmer’s View

CPU Memory
Addresses
Registers Object Code
E Data Program Data
I OS Data
P Condition Instructions
Codes

Stack
Programmer-Visible State
 EIP (Program Counter)
 Address of next instruction
 Register File
 Memory
 Heavily used program data
 Byte addressable array
 Condition Codes  Code, user data, (most) OS
 Store status information about data
most recent arithmetic operation  Includes stack used to
 Used for conditional branching support procedures
Turning C into Object Code
 Code in files p1.c p2.c
 Compile with command: gcc -O p1.c p2.c -o p
Use optimizations (-O)
Put resulting binary in file p

text C program (p1.c p2.c)

Compiler (gcc -S)

Asm program (p1.s p2.s)


text

Assembler (gcc or as)

binary Object program (p1.o p2.o) Static libraries


(.a)
Linker (gcc or ld)

binary Executable program (p)


Compiling Into Assembly
C Code Generated Assembly
int sum(int x, int y) _sum:
{ pushl %ebp
int t = x+y; movl %esp,%ebp
return t; movl 12(%ebp),%eax
} addl 8(%ebp),%eax
movl %ebp,%esp
popl %ebp
ret

Obtain with command


gcc -O -S code.c
Produces file code.s
Assembly Characteristics
Minimal data types
 Integer data of 1, 2, or 4 bytes
 Data values
 Addresses (untyped pointers)
 Floating-point data of 4, 8, or 10 bytes
 No aggregate types such as arrays or structures
 Just contiguously allocated bytes in memory

Primitive operations
 Perform arithmetic function on register or memory data
 Transfer data between memory and register
 Load data from memory into register
 Store register data into memory
 Transfer control
 Unconditional jumps to/from procedures
 Conditional branches
Object Code
Code for sum Assembler
 Translates .s into .o
0x401040 <sum>:
0x55  Binary encoding of each instruction
0x89 • Total of 13
bytes  Nearly-complete image of executable
0xe5 code
0x8b • Each
0x45 instruction 1,  Missing linkages between code in
2, or 3 bytes different files
0x0c
0x03 • Starts at
0x45 address Linker
0x401040
0x08  Resolves references between files
0x89
 Combines with static run-time
0xec
0x5d libraries
0xc3  E.g., code for malloc, printf
 Some libraries are dynamically linked
 Linking occurs when program begins
execution
Machine Instruction Example
C Code
int t = x+y;
 Add two signed integers

Assembly
addl 8(%ebp),%eax  Add 2 4-byte integers
“Long” words in GCC parlance
Similar to Same instruction whether
expression signed or unsigned
y += x
 Operands:
y: Register %eax
x: Memory M[%ebp+8]
t: Register %eax
» Return function value in %eax

0x401046: 03 45 08
Object Code
 3-byte instruction
 Stored at address 0x401046
Disassembling Object Code
Disassembled
00401040 <_sum>:
0: 55 push %ebp
1: 89 e5 mov %esp,%ebp
3: 8b 45 0c mov 0xc(%ebp),%eax
6: 03 45 08 add 0x8(%ebp),%eax
9: 89 ec mov %ebp,%esp
b: 5d pop %ebp
c: c3 ret
d: 8d 76 00 lea 0x0(%esi),%esi

Disassembler
objdump -d p
 Useful tool for examining object code
 Analyzes bit pattern of series of instructions
 Produces approximate rendition of assembly code
 Can be run on either a.out (complete executable) or .o file
Alternate Disassembly
Disassembled
Object
0x401040: 0x401040 <sum>: push %ebp
0x55 0x401041 <sum+1>: mov %esp,%ebp
0x89 0x401043 <sum+3>: mov 0xc(%ebp),%eax
0xe5 0x401046 <sum+6>: add 0x8(%ebp),%eax
0x8b 0x401049 <sum+9>: mov %ebp,%esp
0x45 0x40104b <sum+11>: pop %ebp
0x0c 0x40104c <sum+12>: ret
0x03 0x40104d <sum+13>: lea 0x0(%esi),%esi
0x45
0x08
0x89 Within gdb Debugger
0xec gdb p
0x5d
0xc3 disassemble sum
 Disassemble procedure
x/13b sum
 Examine the 13 bytes starting at sum
What Can Be Disassembled?
% objdump -d WINWORD.EXE

WINWORD.EXE: file format pei-i386

No symbols in "WINWORD.EXE".
Disassembly of section .text:

30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff push $0xffffffff
30001005: 68 90 10 00 30 push $0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91

 Anything that can be interpreted as executable code


 Disassembler examines bytes and reconstructs assembly
source
Moving Data %eax

%edx
Moving Data %ecx
movl Source,Dest: %ebx
 Move 4-byte (“long”) word
%esi
 Lots of these in typical code
%edi
Operand Types
%esp
 Immediate: Constant integer data
 Like C constant, but prefixed with ‘$’ %ebp
 E.g., $0x400, $-533
 Encoded with 1, 2, or 4 bytes
 Register: One of 8 integer registers
 But %esp and %ebp reserved for special use
 Others have special uses for particular instructions
 Memory: 4 consecutive bytes of memory
 Various “address modes”
movl Operand Combinations
Source Destination C Analog

Reg movl $0x4,%eax temp = 0x4;


Imm movl $-147,(%eax) *p = -147;
Mem

Reg movl %eax,%edx temp2 = temp1;


movl Reg
Mem movl %eax,(%edx) *p = temp;

Mem Reg movl (%eax),%edx temp = *p;

 Cannot do memory-memory transfers with single


instruction
Simple Addressing Modes
Normal (R) Mem[Reg[R]]
Register R specifies memory address
movl (%ecx),%eax

Displacement D(R) Mem[Reg[R]+D]


 Register R specifies start of memory region
 Constant displacement D specifies offset
movl 8(%ebp),%edx
Using Simple Addressing Modes
swap:
pushl %ebp
movl %esp,%ebp Set
pushl %ebx Up
void swap(int *xp, int *yp)
{ movl 12(%ebp),%ecx
int t0 = *xp; movl 8(%ebp),%edx
int t1 = *yp; movl (%ecx),%eax
*xp = t1; Body
movl (%edx),%ebx
*yp = t0; movl %eax,(%edx)
} movl %ebx,(%ecx)

movl -4(%ebp),%ebx
movl %ebp,%esp Finish
popl %ebp
ret
Understanding Swap
void swap(int *xp, int *yp) •
{ • Stack
int t0 = *xp; •
Offset
int t1 = *yp;
*xp = t1; 12 yp
*yp = t0; 8 xp
}
4 Rtn adr
0 Old %ebp %ebp
-4 Old %ebx
Register Variable
%ecx yp movl 12(%ebp),%ecx # ecx = yp
%edx xp movl 8(%ebp),%edx # edx = xp
%eax t1 movl (%ecx),%eax # eax = *yp (t1)
%ebx t0 movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 123 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx 123
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 456 0x124
456 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx 123
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Address
Understanding Swap 456 0x124
123 0x120
0x11c
%eax 456 0x118
%edx 0x124 Offset
0x114
%ecx 0x120 yp 12 0x120 0x110
xp 8 0x124 0x10c
%ebx 123
4 Rtn adr 0x108
%esi
%ebp 0
0x104
%edi -4
0x100
%esp
movl 12(%ebp),%ecx # ecx = yp
%ebp 0x104 movl 8(%ebp),%edx # edx = xp
movl (%ecx),%eax # eax = *yp (t1)
movl (%edx),%ebx # ebx = *xp (t0)
movl %eax,(%edx) # *xp = eax
movl %ebx,(%ecx) # *yp = ebx
Indexed Addressing Modes
Most General Form
D(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]+ D]
 D: Constant “displacement” 1, 2, or 4 bytes
 Rb: Base register: Any of 8 integer registers
 Ri: Index register: Any, except for %esp
Unlikely you’d use %ebp, either
 S: Scale: 1, 2, 4, or 8
Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
Address Computation Examples

%edx 0xf000

%ecx 0x100

Expression Computation Address


0x8(%edx) 0xf000 + 0x8 0xf008
(%edx,%ecx) 0xf000 + 0x100 0xf100
(%edx,%ecx,4) 0xf000 + 4*0x100 0xf400
0x80(,%edx,2) 2*0xf000 + 0x80 0x1e080
Address Computation Instruction
leal Src,Dest
 Src is address mode expression
 Set Dest to address denoted by expression

Uses
 Computing address without doing memory reference
 E.g., translation of p = &x[i];
 Computing arithmetic expressions of the form x + k*y
 k = 1, 2, 4, or 8.

LEARN THIS INSTRUCTION!!!


 Used heavily by compiler
 Appears regularly on exams
Some Arithmetic Operations
Format Computation
Two Operand Instructions
addl Src,Dest Dest = Dest + Src
subl Src,Dest Dest = Dest - Src
imull Src,Dest Dest = Dest * Src
sall k,Dest Dest = Dest << k Also called shll
sarl k,Dest Dest = Dest >> k Arithmetic
shrl k,Dest Dest = Dest >> k Logical
k is an immediate value or contents of %cl
xorl Src,Dest Dest = Dest ^ Src
andl Src,Dest Dest = Dest & Src
orl Src,Dest Dest = Dest | Src
Some Arithmetic Operations
Format Computation
One Operand Instructions
incl Dest Dest = Dest + 1
decl Dest Dest = Dest - 1
negl Dest Dest = -Dest
notl Dest Dest = ~Dest
Using leal for
Arithmetic Expressions
arith:
pushl %ebp
Set
int arith movl %esp,%ebp
Up
(int x, int y, int z)
{ movl 8(%ebp),%eax
int t1 = x+y; movl 12(%ebp),%edx
int t2 = z+t1; leal (%edx,%eax),%ecx
int t3 = x+4; leal (%edx,%edx,2),%edx
int t4 = y * 48; sall $4,%edx Body
int t5 = t3 + t4; addl 16(%ebp),%ecx
int rval = t2 * t5; leal 4(%edx,%eax),%eax
return rval; imull %ecx,%eax
}
movl %ebp,%esp
popl %ebp Finish
ret
Understanding arith
int arith •
(int x, int y, int z) • Stack
{ Offset •
int t1 = x+y;
int t2 = z+t1; 16 z
int t3 = x+4; 12 y
int t4 = y * 48;
int t5 = t3 + t4; 8 x
int rval = t2 * t5; 4 Rtn adr
return rval; %ebp
0 Old %ebp
}

movl 8(%ebp),%eax # eax = x


movl 12(%ebp),%edx # edx = y
leal (%edx,%eax),%ecx # ecx = x+y (t1)
leal (%edx,%edx,2),%edx # edx = 3*y
sall $4,%edx # edx = 48*y (t4)
addl 16(%ebp),%ecx # ecx = z+t1 (t2)
leal 4(%edx,%eax),%eax # eax = 4+t4+x (t5)
imull %ecx,%eax # eax = t5*t2 (rval)
Understanding arith
# eax = x
movl 8(%ebp),%eax
int arith # edx = y
(int x, int y, int z) movl 12(%ebp),%edx
{ # ecx = x+y (t1)
int t1 = x+y; leal (%edx,%eax),%ecx
int t2 = z+t1; # edx = 3*y
int t3 = x+4; leal (%edx,%edx,2),%edx
int t4 = y * 48; # edx = 48*y (t4)
int t5 = t3 + t4; sall $4,%edx
int rval = t2 * t5; # ecx = z+t1 (t2)
return rval; addl 16(%ebp),%ecx
} # eax = 4+t4+x (t5)
leal 4(%edx,%eax),%eax
# eax = t5*t2 (rval)
imull %ecx,%eax
Another Example
logical:
int logical(int x, int y) pushl %ebp Set
{ movl %esp,%ebp Up
int t1 = x^y;
int t2 = t1 >> 17; movl 8(%ebp),%eax
int mask = (1<<13) - 7; xorl 12(%ebp),%eax
int rval = t2 & mask; sarl $17,%eax
return rval; andl $8185,%eax
} Body
movl %ebp,%esp
popl %ebp Finish
ret
213 = 8192, 213 – 7 = 8185

movl 8(%ebp),%eax eax = x


xorl 12(%ebp),%eax eax = x^y (t1)
sarl $17,%eax eax = t1>>17 (t2)
andl $8185,%eax eax = t2 & 8185
CISC Properties
Instruction can reference different operand types
 Immediate, register, memory

Arithmetic operations can read/write memory


Memory reference can involve complex computation
 Rb + S*Ri + D
 Useful for arithmetic expressions, too

Instructions can have varying lengths


 IA32 instructions can range from 1 to 15 bytes
Summary: Abstract Machines

Machine Models Data Control


C 1) char 1) loops
2) int, float 2) conditionals
mem proc 3) double 3) switch
4) struct, array 4) Proc. call
5) pointer 5) Proc. return

Assembly
1) byte 3) branch/jump
2) 2-byte word 4) call
mem regs alu 3) 4-byte long word 5) ret
Cond. 4) contiguous byte allocation
Stack processor 5) address of initial byte
Codes
Whose Assembler?
Intel/Microsoft Format GAS/Gnu Format
lea eax,[ecx+ecx*2] leal (%ecx,%ecx,2),%eax
sub esp,8 subl $8,%esp
cmp dword ptr [ebp-8],0 cmpl $0,-8(%ebp)
mov eax,dword ptr [eax*4+100h] movl $0x100(,%eax,4),%eax

Intel/Microsoft Differs from GAS


 Operands listed in opposite order
mov Dest, Src movl Src, Dest
 Constants not preceded by ‘$’, Denote hex with ‘h’ at end
100h $0x100
 Operand size indicated by operands rather than operator suffix
sub subl
 Addressing format shows effective address computation
[eax*4+100h] $0x100(,%eax,4)
19CS2106R​

Operating Systems Design​


Session 39: Deadlocks

© 2020 KL University
Common Concurrency Problems
More recent work focuses on studying other types of common concurrency bugs.

• Take a brief look at some example concurrency problems found in real code
bases.
• Focus on four major open-source applications
– MySQL, Apache, Mozilla, OpenOffice.
Application What it does Non-Deadlock Deadlock
MySQL Database Server 14 9
Apache Web Server 13 4
Mozilla Web Browser 41 16
Open Office Office Suite 6 2
Total 74 31

Bugs In Modern Applications


Make up a majority of concurrency bugs.
Two major types of non deadlock bugs:
• Atomicity violation
• Order violation
Atomicity-Violation Bugs
• The desired serializability among multiple memory
accesses is violated.
– Simple Example found in MySQL:
• Two different threads access the field proc_info in the struct
thd.

1 Thread1::
2 if(thd->proc_info){
3 …
4 fputs(thd->proc_info , …);
5 …
6 }
7
8 Thread2::
9 thd->proc_info = NULL;
Atomicity-Violation Bugs (Cont.)
• Solution: Simply add locks around the shared-
variable references.
1 pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
2
3 Thread1::
4 pthread_mutex_lock(&lock);
5 if(thd->proc_info){
6 …
7 fputs(thd->proc_info , …);
8 …
9 }
10 pthread_mutex_unlock(&lock);
11
12 Thread2::
13 pthread_mutex_lock(&lock);
14 thd->proc_info = NULL;
15 pthread_mutex_unlock(&lock);
Order-Violation Bugs
• The desired order between two memory accesses
is flipped.
– i.e., A should always be executed before B, but the
order is not enforced during execution.
– Example:
• The code in Thread2 seems to assume that the variable
mThread has already been initialized (and is not NULL).
1 Thread1::
2 void init(){
3 mThread = PR_CreateThread(mMain, …);
4 }
5
6 Thread2::
7 void mMain(…){
8 mState = mThread->State
9 }
Order-Violation Bugs (Cont.)
• Solution: Enforce ordering using condition
variables
1
2
pthread_mutex_t mtLock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t mtCond = PTHREAD_COND_INITIALIZER;
3 int mtInit = 0;
4
5 Thread 1::
6 void init(){
7 …
8 mThread = PR_CreateThread(mMain,…);
9
10 // signal that the thread has been created.
11 pthread_mutex_lock(&mtLock);
12 mtInit = 1;
13 pthread_cond_signal(&mtCond);
14 pthread_mutex_unlock(&mtLock);
15 …
16 }
17
18 Thread2::
19 void mMain(…){
20 …
Order-Violation Bugs (Cont.)
21 // wait for the thread to be initialized …
22 pthread_mutex_lock(&mtLock);
23 while(mtInit == 0)
24 pthread_cond_wait(&mtCond, &mtLock);
25 pthread_mutex_unlock(&mtLock);
26
27 mState = mThread->State;
28 …
29 }
Deadlock Bugs
Thread 1: Thread 2:
lock(L1); lock(L2);
lock(L2); lock(L1);
– The presence of a cycle
• Thread1 is holding a lock L1 and waiting for another one,
L2.
• Thread2 that holds lock L2 is waiting for L1 to be release.
Holds
Thread 1 Lock L1

Wanted by
Wanted by

Lock L2 Thread 2
Holds
Conditional for Deadlock
• Four conditions need to hold for a deadlock to
occur.
Condition Description
Mutual Exclusion Threads claim exclusive control of resources that they require.

Threads hold resources allocated to them while waiting for additional


Hold-and-wait
resources

No preemption Resources cannot be forcibly removed from threads that are holding them.

There exists a circular chain of threads such that each thread holds one more
Circular wait
resources that are being requested by the next thread in the chain

– If any of these four conditions are not met, deadlock


cannot occur.
Prevention – Circular Wait
• Provide a total ordering on lock acquisition
– This approach requires careful design of global locking
strategies.
• Example:
– There are two locks in the system (L1 and L2)
– We can prevent deadlock by always acquiring L1
before L2.
Prevention – Hold-and-wait
• Acquire all locks at once, atomically.
1 lock(prevention);
2 lock(L1);
3 lock(L2);
4 …
5 unlock(prevention);

– This code guarantees that no untimely thread switch


can occur in the midst of lock acquisition.
– Problem:
• Require us to know when calling a routine exactly which
locks must be held and to acquire them ahead of time.
• Decrease concurrency
Prevention – No Preemption
• Multiple lock acquisition often gets us into trouble because when
waiting for one lock we are holding another.
• trylock()
– Used to build a deadlock-free, ordering-robust lock acquisition protocol.
– Grab the lock (if it is available).
– Or, return -1: you should try again later.

1 top:
2 lock(L1);
3 if( tryLock(L2) == -1 ){
4 unlock(L1);
5 goto top;
6 }
Prevention – No Preemption (Cont.)
• livelock
– Both systems are running through the code sequence
over and over again.
– Progress is not being made.
– Solution:
• Add a random delay before looping back and trying the
entire thing over again.
Prevention – Mutual Exclusion
• wait-free
– Using powerful hardware instruction.
– You can build data structures in a manner that does
not require explicit locking.
1 int CompareAndSwap(int *address, int expected, int new){
2 if(*address == expected){
3 *address = new;
4 return 1; // success
5 }
6 return 0;
7 }
Prevention – Mutual Exclusion (Cont.)
• We now wanted to atomically increment a value
by a certain amount:
1 void AtomicIncrement(int *value, int amount){
2 do{
3 int old = *value;
4 }while( CompareAndSwap(value, old, old+amount)==0);
5 }

– Repeatedly tries to update the value to the new


amount and uses the compare-and-swap to do so.

– No lock is acquired
– No deadlock can arise
– livelock is still a possibility.
Prevention – Mutual Exclusion (Cont.)
• More complex example: list insertion
1 void insert(int value){
2 node_t * n = malloc(sizeof(node_t));
3 assert( n != NULL );
4 n->value = value ;
5 n->next = head;
6 head = n;
7 }

– If called by multiple threads at the “same time”, this


code has a race condition.
Prevention – Mutual Exclusion (Cont.)
• Solution:
– Surrounding this code with a lock acquire and release.
1 void insert(int value){
2 node_t * n = malloc(sizeof(node_t));
3 assert( n != NULL );
4 n->value = value ;
5 lock(listlock); // begin critical section
6 n->next = head;
7 head = n;
8 unlock(listlock) ; //end critical section
9 }

– wait-free manner using the compare-and-swap instruction


1 void insert(int value) {
2 node_t *n = malloc(sizeof(node_t));
3 assert(n != NULL);
4 n->value = value;
5 do {
6 n->next = head;
7 } while (CompareAndSwap(&head, n->next, n));
8 }
Deadlock Avoidance via Scheduling
• In some scenarios deadlock avoidance is
preferable.
– Global knowledge is required:
• Which locks various threads might grab during their
execution.
• Subsequently schedules said threads in a way as to
guarantee no deadlock can occur.
Example of Deadlock Avoidance via
Scheduling (1)
• We have two processors and four threads.
– Lock acquisition demands of the threads:
T1 T2 T3 T4
L1 yes yes no no
L2 yes yes yes no

– A smart scheduler could compute that as long as T1 and T2 are


not run at the same time, no deadlock could ever arise.

CPU 1 T3 T4

CPU 2 T1 T2
Example of Deadlock Avoidance via
Scheduling (2)
• More contention for the same resources
T1 T2 T3 T4
L1 yes yes yes no
L2 yes yes yes no

– A possible schedule that guarantees that no deadlock could


ever occur.

CPU 1 T4

CPU 2 T1 T2 T3

• The total time to complete the jobs is lengthened considerably.


Detect and Recover
• Allow deadlock to occasionally occur and then take some
action.
– Example: if an OS froze, you would reboot it.

• Many database systems employ deadlock detection and


recovery technique.
– A deadlock detector runs periodically.
– Building a resource graph and checking it for cycles.
– In deadlock, the system need to be restarted.
Resources
• Resource types R1, R2, . . ., Rm
CPU cycles, memory space, I/O devices
• Each resource type Ri has Wi instances.
• Sequence of events required to use a
resource
1. Request the resource.
2. Use the resource.
3. Release the resource.
The Deadlock Problem
• A set of blocked processes each holding a resource and waiting to
acquire a resource held by another process in the set
• Example
– System has 2 disk drives
– P1 and P2 each hold one disk drive and each needs another one
• Example
– semaphores A and B, initialized to 1
P0 P1
wait (A); wait(B)
wait (B); wait(A)
Resource Acquisition (1)
Resource Acquisition (2)
Deadlock Modeling (1)
Basic Facts
• If graph contains no cycles  no
deadlock

• If graph contains a cycle 


– if only one instance per resource
type, then deadlock
– if several instances per resource type,
possibility of deadlock
Deadlock Modeling (2)

An example of how deadlock occurs and how it can


be avoided.
Deadlock Modeling (3)

An example of how deadlock occurs and how it can


be avoided.
Deadlock Modeling (4)

An example of how deadlock occurs and how it can


be avoided.
Several Instances of a Resource Type
• Resources in Existence: A vector of length m indicates
the total number of resources in the system

• Available: A vector of length m indicates the number of


available resources of each type.

• Allocation: An n x m matrix defines the number of


resources of each type currently allocated to each
process.

• Request: An n x m matrix indicates the current request


of each process. If Request [ij] = k, then process Pi is
requesting k more instances of resource type. Rj.
Deadlock Detection with Multiple
Resources of Each Type (1)

The four data structures needed by the deadlock detection algorithm.


Deadlock Detection with Multiple
Resources of Each Type (2)
Deadlock detection algorithm:
1. Look for unmarked process, Pi , for which the
i-th row of R is less than or equal to A.
2. If such a process is found, add the i-th row of
C to A, mark the process, go back to step 1.
3. If no such process exists, algorithm
terminates.
Deadlock Detection with Multiple
Resources of Each Type (3)

Figure 6-7. An example for the deadlock detection algorithm.


Safe State
• When a process requests an available resource, system must
decide if immediate allocation leaves the system in a safe state

• System is in safe state if there exists a sequence <P1, P2, …, Pn>


of ALL the processes is the systems such that for each Pi, the
resources that Pi can still request can be satisfied by currently
available resources + resources held by all the Pj, with j < i
• That is:
– If Pi resource needs are not immediately available, then Pi
can wait until all Pj have finished
– When Pj is finished, Pi can obtain needed resources,
execute, return allocated resources, and terminate
– When Pi terminates, Pi +1 can obtain its needed resources,
and so on
Resource-Allocation Graph
A set of vertices V and a set of edges E.

• V is partitioned into two types:


– P = {P1, P2, …, Pn}, the set consisting of all the processes
in the system

– R = {R1, R2, …, Rm}, the set consisting of all resource types


in the system
• request edge – directed edge P1  Rj
• assignment edge – directed edge Rj  Pi
Resource-Allocation Graph (Cont.)
• Process

• Resource Type with 4 instances

• Pi requests instance of Rj Pi

Rj

• Pi is holding an instance of Rj Pi

Rj
Example of Detection Algorithm
• Five processes P0 through P4; three resource types
A (7 instances), B (2 instances), and C (6 instances)
• Snapshot at time T0:
Allocation Request Available
ABC ABC ABC
P0 010 000 000
P1 200 202
P2 303 000
P3 211 100
P4 002 002
• Sequence <P0, P2, P3, P1, P4> will result in Finish[i] = true for all i
Example (Cont.)
• P2 requests an additional instance of type C
Request
ABC
P0 0 0 0
P1 2 0 1
P2 001
P3 100
P4 002
• State of system?
– Can reclaim resources held by process P0, but insufficient resources to
fulfill other processes; requests
– Deadlock exists, consisting of processes P1, P2, P3, and P4
Banker’s Algorithm
• Multiple instances

• Each process must a priori claimmaximum use

• When a process requests a resource it may have to


wait

• When a process gets all its resources it must return


them in a finite amount of time
Data Structures for the Banker’s Algorithm

Let n = number of processes, and m = number of resources types.


• Available: Vector of length m. If available [j] = k, there are k
instances of resource type Rj available
• Max: n x m matrix. If Max [i,j] = k, then process Pi may request at
most k instances of resource type Rj
• Allocation: n x m matrix. If Allocation[i,j] = k then Pi is currently
allocated k instances of Rj
• Need: n x m matrix. If Need[i,j] = k, then Pi may need k more
instances of Rj to complete its task

Need [i,j] = Max[i,j] – Allocation [i,j]


Safety Algorithm
1. Let Work and Finish be vectors of length m and n, respectively.
Initialize:
Work = Available
Finish [i] = false for i = 0, 1, …, n- 1
2. Find and i such that both:
(a) Finish [i] = false
(b) Needi  Work
If no such i exists, go to step 4
3. Work = Work + Allocationi
Finish[i] = true
go to step 2
4. If Finish [i] == true for all i, then the system is in a safe state
Resource-Request Algorithm for Process Pi

Request = request vector for process Pi. If Requesti [j] = k


then process Pi wants k instances of resource type Rj
1. If Requesti  Needi go to step 2. Otherwise, raise error
condition, since process has exceeded its maximum claim
2. If Requesti  Available, go to step 3. Otherwise Pi must
wait, since resources are not available
3. Pretend to allocate requested resources to Pi by modifying
the state as follows:
Available = Available – Request;
Allocationi = Allocationi + Requesti;
Needi = Needi – Requesti;
 If safe  the resources are allocated to Pi
 If unsafe  Pi must wait, and the old resource-
allocation state is restored
Example of Banker’s Algorithm
• 5 processes P0 through P4;
3 resource types:
A (10 instances), B (5instances), and C (7 instances)
Snapshot at time T0:
Allocation Max Available
ABC ABC ABC
P0 010 753 332
P1 200 322
P2 302 902
P3 211 222
P4 002 433
Example (Cont.)
• The content of the matrix Need is defined to be Max – Allocation

Need
ABC
P0 743
P1 122
P2 600
P3 011
P4 431

• The system is in a safe state since the sequence < P1, P3, P4, P2, P0>
satisfies safety criteria
Example: P1 Request (1,0,2)
• Check that Request  Available (that is, (1,0,2)  (3,3,2)  true
Allocation Need Available
ABC ABC ABC
P0 010 743 230
P1 302 020
P2 301 600
P3 211 011
P4 002 431
• Executing safety algorithm shows that sequence < P1, P3, P4, P0, P2>
satisfies safety requirement
• Can request for (3,3,0) by P4 be granted?
• Can request for (0,2,0) by P0 be granted?

Operating Systems Design - 19CS2106R​
Session 37: The Producer/Consumer (Bounded
Buffer), Reader-Writer Locks, The Dining
Philosophers Problem

© 2020 KL University
The Producer/Consumer (Bounded-Buffer)
Problem
• Producer: put() interface
• Wait for a buffer to become empty in order to put data into it.
• Consumer: get() interface
• Wait for a buffer to become filled before using it.
1 int buffer[MAX];
2 int fill = 0;
3 int use = 0;
4
5 void put(int value) {
6 buffer[fill] = value; // line f1
7 fill = (fill + 1) % MAX; // line f2
8 }
9
10 int get() {
11 int tmp = buffer[use]; // line g1
12 use = (use + 1) % MAX; // line g2
13 return tmp;
14 }
The Producer/Consumer (Bounded-Buffer)
Problem
1 sem_t empty;
2 sem_t full;
3
4 void *producer(void *arg) {
5 int i;
6 for (i = 0; i < loops; i++) {
7 sem_wait(&empty); // line P1
8 put(i); // line P2
9 sem_post(&full); // line P3
10 }
11 }
12
13 void *consumer(void *arg) {
14 int i, tmp = 0;
15 while (tmp != -1) {
16 sem_wait(&full); // line C1
17 tmp = get(); // line C2
18 sem_post(&empty); // line C3
19 printf("%d\n", tmp);
20 }
21 }
22 …
First Attempt: Adding the Full and Empty Conditions
The Producer/Consumer (Bounded-Buffer)
Problem
21 int main(int argc, char *argv[]) {
22 // …
23 sem_init(&empty, 0, MAX); // MAX buffers are empty to begin with…
24 sem_init(&full, 0, 0); // … and 0 are full
25 // …
26 }

First Attempt: Adding the Full and Empty Conditions (Cont.)

• Imagine that MAX is greater than 1 .


• If there are multiple producers, race condition can happen at line f1.
• It means that the old data there is overwritten.

• We’ve forgotten here is mutual exclusion.


• The filling of a buffer and incrementing of the index into the buffer is a critical section.
A Solution: Adding Mutual Exclusion
1 sem_t empty;
2 sem_t full;
3 sem_t mutex;
4
5 void *producer(void *arg) {
6 int i;
7 for (i = 0; i < loops; i++) {
8 sem_wait(&mutex); // line p0 (NEW LINE)
9 sem_wait(&empty); // line p1
10 put(i); // line p2
11 sem_post(&full); // line p3
12 sem_post(&mutex); // line p4 (NEW LINE)
13 }
14 }
15
(Cont.)

Adding Mutual Exclusion (Incorrectly)


A Solution: Adding Mutual Exclusion
(Cont.)
16 void *consumer(void *arg) {
17 int i;
18 for (i = 0; i < loops; i++) {
19 sem_wait(&mutex); // line c0 (NEW LINE)
20 sem_wait(&full); // line c1
21 int tmp = get(); // line c2
22 sem_post(&empty); // line c3
23 sem_post(&mutex); // line c4 (NEW LINE)
24 printf("%d\n", tmp);
25 }
26 }

Adding Mutual Exclusion (Incorrectly)


A Solution: Adding Mutual Exclusion (Cont.)
• Imagine two thread: one producer and one consumer.
• The consumer acquire the mutex (line c0).
• The consumer calls sem_wait() on the full semaphore (line c1).
• The consumer is blocked and yield the CPU.
• The consumer still holds the mutex!
• The producer calls sem_wait() on the binary mutex semaphore (line p0).
• The producer is now stuck waiting too. a classic deadlock.
Finally, A Working Solution
1 sem_t empty;
2 sem_t full;
3 sem_t mutex;
4
5 void *producer(void *arg) {
6 int i;
7 for (i = 0; i < loops; i++) {
8 sem_wait(&empty); // line p1
9 sem_wait(&mutex); // line p1.5 (MOVED MUTEX HERE…)
10 put(i); // line p2
11 sem_post(&mutex); // line p2.5 (… AND HERE)
12 sem_post(&full); // line p3
13 }
14 }
15
(Cont.)

Adding Mutual Exclusion (Correctly)


Finally, A Working Solution
(Cont.)
16 void *consumer(void *arg) {
17 int i;
18 for (i = 0; i < loops; i++) {
19 sem_wait(&full); // line c1
20 sem_wait(&mutex); // line c1.5 (MOVED MUTEX HERE…)
21 int tmp = get(); // line c2
22 sem_post(&mutex); // line c2.5 (… AND HERE)
23 sem_post(&empty); // line c3
24 printf(“%d\n”, tmp);
25 }
26 }
27
28 int main(int argc, char *argv[]) {
29 // …
30 sem_init(&empty, 0, MAX); // MAX buffers are empty to begin with …
31 sem_init(&full, 0, 0); // ... and 0 are full
32 sem_init(&mutex, 0, 1); // mutex=1 because it is a lock
33 // …
34 }

Adding Mutual Exclusion (Correctly)


Reader-Writer Locks
• Imagine a number of concurrent list operations, including inserts and
simple lookups.
• insert:
• Change the state of the list
• A traditional critical section makes sense.
• lookup:
• Simply read the data structure.
• As long as we can guarantee that no insert is on-going, we can allow many lookups to
proceed concurrently.

This special type of lock is known as a reader-write lock.


A Reader-Writer Locks
• Only a single writer can acquire the lock.
• Once a reader has acquired a read lock,
• More readers will be allowed to acquire the read lock too.
• A writer will have to wait until all readers are finished.
1 typedef struct _rwlock_t {
2 sem_t lock; // binary semaphore (basic lock)
3 sem_t writelock; // used to allow ONE writer or MANY readers
4 int readers; // count of readers reading in critical section
5 } rwlock_t;
6
7 void rwlock_init(rwlock_t *rw) {
8 rw->readers = 0;
9 sem_init(&rw->lock, 0, 1);
10 sem_init(&rw->writelock, 0, 1);
11 }
12
13 void rwlock_acquire_readlock(rwlock_t *rw) {
14 sem_wait(&rw->lock);
15 …
A Reader-Writer Locks (Cont.)
15 rw->readers++;
16 if (rw->readers == 1)
17 sem_wait(&rw->writelock); // first reader acquires writelock
18 sem_post(&rw->lock);
19 }
20
21 void rwlock_release_readlock(rwlock_t *rw) {
22 sem_wait(&rw->lock);
23 rw->readers--;
24 if (rw->readers == 0)
25 sem_post(&rw->writelock); // last reader releases writelock
26 sem_post(&rw->lock);
27 }
28
29 void rwlock_acquire_writelock(rwlock_t *rw) {
30 sem_wait(&rw->writelock);
31 }
32
33 void rwlock_release_writelock(rwlock_t *rw) {
34 sem_post(&rw->writelock);
35 }
A Reader-Writer Locks (Cont.)
• The reader-writer locks have fairness problem.
• It would be relatively easy for reader to starve writer.
• How to prevent more readers from entering the lock once a writer is waiting?
The Dining Philosophers
• Assume there are five “philosophers” sitting around a table.
• Between each pair of philosophers is a single fork (five total).
• The philosophers each have times where they think, and don’t need any forks, and times where they eat.
• In order to eat, a philosopher needs two forks, both the one on their left and the one on their right.
• The contention for these forks.

P1
f2 f1

P2 P0

f3 f0

P3 P4
f4
The Dining Philosophers (Cont.)
• Key challenge
• There is no deadlock.
• No philosopher starves and never gets to eat.
• Concurrency is high.
// helper functions
while (1) { int left(int p) { return p; }
think();
getforks(); int right(int p) {
eat(); return (p + 1) % 5;
putforks(); }
}

Basic loop of each philosopher Helper functions (Downey’s solutions)

• Philosopher p wishes to refer to the for on their left  call left(p).


• Philosopher p wishes to refer to the for on their right  call right(p).
The Dining Philosophers (Cont.)
• We need some semaphore, one for each fork: sem_t forks[5].
1 void getforks() {
2 sem_wait(forks[left(p)]);
3 sem_wait(forks[right(p)]);
4 }
5
6 void putforks() {
7 sem_post(forks[left(p)]);
8 sem_post(forks[right(p)]);
9 }
The getforks() and putforks() Routines (Broken Solution)
• Deadlock occur!
• If each philosopher happens to grab the fork on their left before any philosopher can
grab the fork on their right.
• Each will be stuck holding one fork and waiting for another, forever.
A Solution: Breaking The Dependency
• Change how forks are acquired.
• Let’s assume that philosopher 4 acquire the forks in a different order.
1 void getforks() {
2 if (p == 4) {
3 sem_wait(forks[right(p)]);
4 sem_wait(forks[left(p)]);
5 } else {
6 sem_wait(forks[left(p)]);
7 sem_wait(forks[right(p)]);
8 }
9 }
• There is no situation where each philosopher grabs one fork and is stuck waiting for
another. The cycle of waiting is broken.
Thank you

Operating Systems Design - 19CS2106R​


Session 37: Binary Semaphores (Locks),
Counting Semaphores, algorithm semop

© 2020 KL University
Semaphore: A definition
• An object with an integer value
• We can manipulate with two routines; sem_wait() and sem_post().
• Initialization

• Declare a semaphore s and initialize it to the value 1


• The second argument, 0, indicates that the semaphore is shared between threads in the
same process.
Semaphore: Interact with semaphore
• sem_wait()
1 int sem_wait(sem_t *s) {
2 decrement the value of semaphore s by one
3 wait if value of semaphore s is negative
4 }

• If the value of the semaphore was one or higher when called sem_wait(),
return right away.
• It will cause the caller to suspend execution waiting for a subsequent post.
• When negative, the value of the semaphore is equal to the number of waiting
threads.
Semaphore: Interact with semaphore (Cont.)
• sem_post()
1 int sem_post(sem_t *s) {
2 increment the value of semaphore s by one
3 if there are one or more threads waiting, wake one
4 }

• Simply increments the value of the semaphore.


• If there is a thread waiting to be woken, wakes one of them up.
Binary Semaphores (Locks)
• What should X be?
• The initial value should be 1.

1 sem_t m;
2 sem_init(&m, 0, X); // initialize semaphore to X; what should X
be?
3
4 sem_wait(&m);
5 //critical section here
6 sem_post(&m);
Thread Trace: Single Thread Using A
Semaphore
Value of Semaphore Thread 0 Thread 1

1
1 call sema_wait()

0 sem_wait() returns

0 (crit sect)

0 call sem_post()

1 sem_post() returns
Thread Trace: Two Threads Using A Semaphore
Value Thread 0 State Thread 1 State
1 Running Ready
1 call sem_wait() Running Ready
0 sem_wait() retruns Running Ready
0 (crit set: begin) Running Ready
0 Interrupt; Switch → T1 Ready Running
0 Ready call sem_wait() Running
-1 Ready decrement sem Running
-1 Ready (sem < 0)→sleep sleeping
-1 Running Switch → T0 sleeping
-1 (crit sect: end) Running sleeping
-1 call sem_post() Running sleeping
0 increment sem Running sleeping
0 wake(T1) Running Ready
0 sem_post() returns Running Ready
0 Interrupt; Switch → T1 Ready Running
0 Ready sem_wait() retruns Running
0 Ready (crit sect) Running
0 Ready call sem_post() Running
1 Ready sem_post() returns Running
7
Semaphores As Condition Variables
1 sem_t s;
2
3 void *
4 child(void *arg) {
5 printf("child\n");
6 sem_post(&s); // signal here: child is done
7 return NULL;
8 }
9
10 int
11 main(int argc, char *argv[]) {
12 sem_init(&s, 0, X); // what should X be?
13 printf("parent: begin\n");
14 pthread_t c;
15 pthread_create(c, NULL, child, NULL);
16 sem_wait(&s); // wait here for child
17 printf("parent: end\n"); parent: begin
18 return 0; child
19 } parent: end
A Parent Waiting For Its Child The execution result
• What should X be?
• The value of semaphore should be set to is 0.
Thread Trace: Parent Waiting For Child (Case 1)
• The parent call sem_wait() before the child has called
sem_post().
Value Parent State Child State
0 Create(Child) Running (Child exists; is runnable) Ready
0 call sem_wait() Running Ready
-1 decrement sem Running Ready
-1 (sem < 0)→sleep sleeping Ready
-1 Switch→Child sleeping child runs Running
-1 sleeping call sem_post() Running
0 sleeping increment sem Running
0 Ready wake(Parent) Running
0 Ready sem_post() returns Running
0 Ready Interrupt; Switch→Parent Ready
0 sem_wait() retruns Running Ready
Thread Trace: Parent Waiting For Child (Case 2)
• The child runs to completion before the parent call - sem_wait().
Value Parent State Child State
0 Create(Child) Running (Child exists; is runnable) Ready
0 Interrupt; switch→Child Ready child runs Running
0 Ready call sem_post() Running
1 Ready increment sem Running
1 Ready wake(nobody) Running
1 Ready sem_post() returns Running
1 parent runs Running Interrupt; Switch→Parent Ready
1 call sem_wait() Running Ready
0 decrement sem Running Ready
0 (sem<0)→awake Running Ready
0 sem_wait() retruns Running Ready
How To Implement Semaphores
• Build our own version of semaphores called Zemaphores
1 typedef struct __Zem_t {
2 int value;
3 pthread_cond_t cond;
4 pthread_mutex_t lock;
5 } Zem_t;
6
7 // only one thread can call this
8 void Zem_init(Zem_t *s, int value) {
9 s->value = value;
10 Cond_init(&s->cond);
11 Mutex_init(&s->lock);
12 }
13
14 void Zem_wait(Zem_t *s) {
15 Mutex_lock(&s->lock);
16 while (s->value <= 0)
17 Cond_wait(&s->cond, &s->lock);
18 s->value--;
19 Mutex_unlock(&s->lock);
20 }
21 …
How To Implement Semaphores (Cont.)
22 void Zem_post(Zem_t *s) {
23 Mutex_lock(&s->lock);
24 s->value++;
25 Cond_signal(&s->cond);
26 Mutex_unlock(&s->lock);
27 }

• Zemaphore don’t maintain the invariant that the value of the semaphore.
• The value never be lower than zero.
• This behavior is easier to implement and matches the current Linux implementation.
Counting Semaphores

• These are normally initialized to some value N, which indicates the


number of resources (say buffers) available. We show examples of
both binary semaphores and counting semaphores throughout the
session.
• We often differentiate between a binary semaphore and a counting
semaphore, and we do so for our own edification. No difference
exists between the two in the system code that implements a
semaphore.
System V Semaphores
• A semaphore is UNIX System V consists of the following
elements:
• The value of the semaphore.
• The process ID of the last process to manipulate the semaphore.
• The number of processes waiting for semaphore value to increase.
• The number of processes waiting for the semaphore value to equal 0.
• The system calls are:
• semget to create and gain access to a set of semaphores.
• semctl to do various control operations on the set.
• semop to manipulate the values of semaphores.
semget creates an array of semaphores:
id = semget(key, count, flag);
The kernel allocates an entry that points to an array of semaphore structure
with count elements. It is shown in the figure:

The entry also specifies the number of semaphores in the array, the time of the
last semop call, and the time of the last semctl call.
Processes manipulate semaphores with
the semop system call:
oldval = semop(id, oplist, count);
where oplist is a pointer to an array of semaphore operations, and count is
the size of the array. The return value, oldval, is the value of the last
semaphore operated on in the set before the operation was done. The
format of each element of oplist is,
•The semaphore number identifying the semaphore array entry being
operated on
•The operation
•Flags

The algorithm is given below:


The semop algorithm:
/* Algorithm: semop
* Input: semaphore descriptor
* array of semaphore operations
* number of elements in array
* Output: start value of last semaphore operated on
*/
{
check legality of semaphore descriptor;
start: read array of semaphore operations from user to kernel space;
check permissions for all semaphore operations;
for (each semaphore operation in array)
{
if (semaphore operation is positive)
{
add "operation" to semaphore value;
if (UNDO flag set on semaphore operation)
update process undo structure;
wakeup all processes sleeping (event: semaphore value increases);
}
The semop algorithm (cont.)
else if (semaphore operation is negative)
{
if ("operation" + semaphore value >= 0)
{
add "operation" to semaphore value;
if (UNDO flag set)
update process undo structure;
if (semaphore value 0)
wakeup all processes sleeping (event: semaphore value becomes 0);
continue;
}
reverse all semaphore operations already done
this system call (previous iterations);
if (flags specify not to sleep)
return with error;
sleep (event: semaphore value increases);
goto start;
}
The semop algorithm (cont.)
else // semaphore operation is zero
{
if (semaphore value non 0)
{
reverse all semaphore operations done this system call;
if (flags specify not to sleep)
return with error;
sleep (event: semaphore value == 0);
goto start;
}
}
}
// semaphore operations all succeeded
update time stamps, process ID's;
return value of last semaphore operated on before call succeeded;
}
The semop algorithm
• If the kernel must sleep, it restores all the operations previously done and
then sleeps until the event it is sleeping for, happens, and then it restarts
the system call. The kernel stores the operations in a global array, it reads
the array from user space again if it must restart the system call. That is
how the operations are done atomically -- either all at once or not at all.
• Whenever a process sleeps in the middle of a semaphore operation, it
sleeps at an interruptible priority. If a process exits without resetting the
semaphore value, a dangerous situation could occur. To avoid this, a
process can set the SEM_UNDO flag in the semop call. If this flag is set,
the kernel reverses the effect of every semaphore operation the process
had done. The kernel maintains a table with one entry for every process in
the system. Each entry points to a set of undo structures, one for each
semaphore used by the process. Each undo structure is an array of triples
consisting of a semaphore ID, a semaphore number in the set identified by
ID, and an adjustment value. The kernel allocates undo structure
dynamically when a process executes its first semop system call with the
SEM_UNDO flag set.
Syntax of semctl:
semctl(id, number, cmd, arg);
where arg is declared as a union:
union semunion {
int val;
struct semid_ds *semstat;
unsigned short *array;
} arg;
where kernel interprets arg based on the value of cmd,
similar to the way it interprets the ioctl command.
Thank you
19CS2106R​

Operating Systems Design​


Session 36: Mutex, Concurrent Linked Lists

© 2020 KL University
Synchronisation and Communication
 The correct behaviour of a concurrent program depends on synchronisation
and communication between its processes
 Synchronisation: the satisfaction of constraints on the interleaving of the
actions of processes (e.g. an action by one process only occurring after an
action by another)
 Communication: the passing of information from one process to another
– Concepts are linked since communication requires synchronisation, and synchronisation
can be considered as contentless communication.
– Data communication is usually based upon either shared variables or message passing.

• A sequence of statements that must appear to be executed indivisibly is called a critical section
• The synchronisation required to protect a critical section is known as mutual exclusion
Synchronization Synchronize threads/coordinate their activities so that
when you access the shared data (e.g., global variables)
An example: race condition.
you are not having a trouble.

Multiple processes sharing a file or shared memory


segment also require synchronization (= critical section
handling).

Critical
section:
Critical
section:

critical section respected  not respected 


Protecting Accesses to Shared Variables: Mutexes
This program creates two threads, each of which executes the same function. The
function executes a loop that repeatedly increments a global variable, glob, by copying
glob into the local variable loc, incrementing loc, and copying loc back to glob. (Since loc
is an automatic variable allocated on the per-thread stack, each thread has its own copy
of this variable.) The number of iterations of the loop is determined by the command-line
argument supplied to the program, or by a default value, if no argument is supplied.
When we run the program by specifying that each thread should increment
the variable 1000 times, all seems well:
$ ./thread_incr 1000
glob = 2000
However, what has probably happened here is that the first thread
completed all of its work and terminated before the second thread even
started. When we ask both threads to do a lot more work, we see a rather
different result:
$ ./thread_incr 10000000
glob = 16517656
At the end of this sequence, the value of
4
glob should have been 20 million.
The problem here results from execution sequences such as the following
Protecting Accesses
to Shared Variables:
Mutexes

5
Protecting Accesses to Shared Variables:
Mutexes
1. Thread 1 fetches the current value of glob into its local variable loc. Let’s assume that the current value
of glob is 2000.
2. The scheduler time slice for thread 1 expires, and thread 2 commences execution.
3. Thread 2 performs multiple loops in which it fetches the current value of glob into its local variable loc,
increments loc, and assigns the result to glob. In the first of these loops, the value fetched from glob will
be 2000. Let’s suppose that by the time the time slice for thread 2 has expired, glob has been increased to
3000.
4. Thread 1 receives another time slice and resumes execution where it left off. Having previously (step 1)
copied the value of glob (2000) into its loc, it now increments loc and assigns the result (2001) to glob. At
this point, the effect of the increment operations performed by thread 2 is lost.
If we run the program in Listing 30-1 multiple times with the same command-line argument, we see that
the printed value of glob fluctuates wildly:
$ ./thread_incr 10000000
glob = 10880429
$ ./thread_incr 10000000
glob = 13493953
This nondeterministic behavior is a consequence of the vagaries of the kernel’s CPU scheduling
decisions. In complex programs, this nondeterministic behavior means that such errors may occur only
rarely, be hard to reproduce, and therefore be difficult to find.
6
Protecting Accesses to Shared Variables:
Mutexes
To avoid the problems that can occur when threads try to update a shared variable, we must use a mutex
(short for mutual exclusion) to ensure that only one thread at a time can access the variable. More generally,
mutexes can be used to ensure atomic access to any shared resource, but protecting shared variables is
the most common use.

A mutex has two states: locked and unlocked. At any moment, at most one thread may hold the lock on a
mutex. Attempting to lock a mutex that is already locked either blocks or fails with an error, depending on the
method used to place the lock.
When a thread locks a mutex, it becomes the owner of that mutex. Only the mutex owner can unlock the
mutex. This property improves the structure of code that uses mutexes and also allows for some
optimizations in the implementation of mutexes. Because of this ownership property, the terms acquire and
release are
sometimes used synonymously for lock and unlock.
In general, we employ a different mutex for each shared resource (which may consist of multiple related
variables), and each thread employs the following protocol for accessing a resource:
• lock the mutex for the shared resource;
• access the shared resource; and
• unlock the mutex.

7
Protecting Accesses to Shared Variables:
Mutexes
Finally, note that mutex locking is advisory, rather than mandatory. By this, we
mean that a thread is free to ignore the use of a mutex and simply access the
corresponding shared variable(s). In order to safely handle shared variables, all
threads must cooperate in their use of a mutex, abiding by the locking rules it
enforces.

8
9
Lock-based Concurrent Data structure
Adding locks to a data structure makes the structure thread safe.
A block of code is thread-safe if it can be simultaneously executed by
multiple threads without causing problems.
• Thread-safeness: in a nutshell, refers an application's ability to execute
multiple threads simultaneously without "clobbering" shared data or
creating "race" conditions.
• For example, suppose that your application creates several threads, each
of which makes a call to the same library routine:
• This library routine accesses/modifies a global structure or location in memory.
• As each thread calls this routine it is possible that they may try to modify this
global structure/memory location at the same time.
• If the routine does not employ some sort of synchronization constructs to
prevent data corruption, then it is not thread-safe.
Lock-based Concurrent Data structure
Solution #1
• An obvious solution is to simply lock the list any time that a thread attempts to
access it.
• A call to each of the three functions can be protected by a mutex.
Solution #2
• Instead of locking the entire list, we could try to lock individual nodes.
• A “finer-grained” approach.
1 // basic node structure
2 typedef struct __node_t {
3 int key;
4 struct __node_t *next;
5 pthread_mutex_t lock;
6 } node_t;
Concurrent Linked Lists
1 // basic node structure
2 typedef struct __node_t {
3 int key;
4 struct __node_t *next;
5 } node_t;
6
7 // basic list structure (one used per list)
8 typedef struct __list_t {
9 node_t *head;
10 pthread_mutex_t lock;
11 } list_t;
12
13 void List_Init(list_t *L) {
14 L->head = NULL;
15 pthread_mutex_init(&L->lock, NULL);
16 }
17
(Cont.)
12
Concurrent Linked Lists(Cont.)
(Cont.)
18 int List_Insert(list_t *L, int key) {
19 pthread_mutex_lock(&L->lock);
20 node_t *new = malloc(sizeof(node_t));
21 if (new == NULL) {
22 perror("malloc");
23 pthread_mutex_unlock(&L->lock);
24 return -1; // fail
26 new->key = key;
27 new->next = L->head;
28 L->head = new;
29 pthread_mutex_unlock(&L->lock);
30 return 0; // success
31 }
(Cont.)

13
Concurrent Linked Lists(Cont.)
(Cont.)
32
32 int List_Lookup(list_t *L, int key) {
33 pthread_mutex_lock(&L->lock);
34 node_t *curr = L->head;
35 while (curr) {
36 if (curr->key == key) {
37 pthread_mutex_unlock(&L->lock);
38 return 0; // success
39 }
40 curr = curr->next;
41 }
42 pthread_mutex_unlock(&L->lock);
43 return -1; // failure
44 }

14
Concurrent Linked Lists(Cont.)
 The code acquires a lock in the insert routine upon entry.

 The code releases the lock upon exit.


 If malloc() happens to fail, the code must also release the lock before
failing the insert.

 This kind of exceptional control flow has been shown to be quite error prone.

 Solution: The lock and release only surround the actual critical section in the
insert code

15
Concurrent Linked List: Rewritten
1 void List_Init(list_t *L) {
2 L->head = NULL;
3 pthread_mutex_init(&L->lock, NULL);
4 }
5
6 void List_Insert(list_t *L, int key) {
7 // synchronization not needed
8 node_t *new = malloc(sizeof(node_t));
9 if (new == NULL) {
10 perror("malloc");
11 return;
12 }
13 new->key = key;
14
15 // just lock critical section
16 pthread_mutex_lock(&L->lock);
17 new->next = L->head;
18 L->head = new;
19 pthread_mutex_unlock(&L->lock);
20 }
21

16
Concurrent Linked List: Rewritten(Cont.)
(Cont.)
22 int List_Lookup(list_t *L, int key) {
23 int rv = -1;
24 pthread_mutex_lock(&L->lock);
25 node_t *curr = L->head;
26 while (curr) {
27 if (curr->key == key) {
28 rv = 0;
29 break;
30 }
31 curr = curr->next;
32 }
33 pthread_mutex_unlock(&L->lock);
34 return rv; // now both success and failure
35 }

17
Scaling Linked List
 Hand-over-hand locking (lock coupling)
 Add a lock per node of the list instead of having a single lock for the entire list.

 When traversing the list,


 First grabs the next node’s lock.

 And then releases the current node’s lock.

 Enable a high degree of concurrency in list operations.


 However, in practice, the overheads of acquiring and releasing locks for each node of
a list traversal is prohibitive.

18
Pthreads Read-Write Locks
 Neither of our multi-threaded linked lists exploits the potential for simultaneou
s access to any node by threads that are executing Member.

 The first solution only allows one thread to access the entire list at any instant

 The second only allows one thread to access any given node at any instant.

 A read-write lock is somewhat like a mutex except that it provides two lock fun
ctions.

 The first lock function locks the read-write lock for reading, while the second l
ocks it for writing.

19
Pthreads Read-Write Locks
 So multiple threads can simultaneously obtain the lock by calling the read-lock
function, while only one thread can obtain the lock by calling the write-lock fu
nction.

 Thus, if any threads own the lock for reading, any threads that want to obtain
the lock for writing will block in the call to the write-lock function.

 If any thread owns the lock for writing, any threads that want to obtain the loc
k for reading or writing will block in their respective locking functions.

20
Pthreads Read-Write Locks
 Readerwriter locks are similar to mutexes, except that they allow for higher degrees of paralleli
sm. With a mutex, the state is either locked or unlocked, and only one thread can lock it at a ti
me. Three states are possible with a readerwriter lock: locked in read mode, locked in write mo
de, and unlocked. Only one thread at a time can hold a readerwriter lock in write mode, but m
ultiple threads can hold a readerwriter lock in read mode at the same time.

 When a readerwriter lock is write-locked, all threads attempting to lock it block until it is unloc
ked. When a readerwriter lock is read-locked, all threads attempting to lock it in read mode ar
e given access, but any threads attempting to lock it in write mode block until all the threads
have relinquished their read locks. Although implementations vary, readerwriter locks usually bl
ock additional readers if a lock is already held in read mode and a thread is blocked trying to
acquire the lock in write mode. This prevents a constant stream of readers from starving waitin
g writers.
21
Pthreads Read-Write Locks
 Readerwriter locks are well suited for situations in which data structures are read more of
ten than they are modified. When a readerwriter lock is held in write mode, the data stru
cture it protects can be modified safely, since only one thread at a time can hold the loc
k in write mode. When the readerwriter lock is held in read mode, the data structure it p
rotects can be read by multiple threads, as long as the threads first acquire the lock in re
ad mode.

 Readerwriter locks are also called sharedexclusive locks. When a readerwriter lock is read
-locked, it is said to be locked in shared mode. When it is write-locked, it is said to be lo
cked in exclusive mode.

 As with mutexes, readerwriter locks must be initialized before use and destroyed before f
reeing their underlying memory.

22
Pthreads Read-Write Locks

#include <pthread.h>
int pthread_rwlock_init(pthread_rwlock_t *restrict rwlock, const
pthread_rwlockattr_t *restrict attr);
int pthread_rwlock_destroy(pthread_rwlock_t *rwlock);
Both return: 0 if OK, error number on failure

#include <pthread.h>
int pthread_rwlock_rdlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_wrlock(pthread_rwlock_t *rwlock);
int pthread_rwlock_unlock(pthread_rwlock_t *rwlock);
All return: 0 if OK, error number on failure
23
Thank you
19CS2106R​

Operating Systems Design​


Session 35: Thread API and Condition Variables

© 2020 KL University
Concurrency vs. Parallelism
Concurrency: 2 processes or threads run concurrently (are concurrent) if
their flows overlap in time
Otherwise, they are sequential.
Examples (running on single core):
Concurrent: A & B, A & C
Sequential: B & C

Parallelism: requires multiple resources to execute multiple processes or


threads at a given time instant. A&B parallel:
Thread Concepts
• A typical UNIX process can be thought of as having a single thread of control: each
process is doing only one thing at a time. With multiple threads of control, we can
design our programs to do more than one thing at a time within a single process,
with each thread handling a separate task. This approach can have several benefits.
o We can simplify code that deals with asynchronous events by assigning a separate thread to
handle each event type. Each thread can then handle its event using a synchronous programming
model. A synchronous programming model is much simpler than an asynchronous one.
o Multiple processes have to use complex mechanisms provided by the operating system to share
memory and file descriptors. Threads, on the other hand, automatically have access to the same
memory address space and file descriptors.
o Some problems can be partitioned so that overall program throughput can be improved. A single
process that has multiple tasks to perform implicitly serializes those tasks, because there is only
one thread of control. With multiple threads of control, the processing of independent tasks can
be interleaved by assigning a separate thread per task. Two tasks can be interleaved only if they
don't depend on the processing performed by each other.
o Similarly, interactive programs can realize improved response time by using multiple threads to
separate the portions of the program that deal with user input and output from the other parts
of the program.
Multithreading Example: WWW
 Client (Chrome) requests a page from server (amazon.com).

 Server gives the page name to the thread and resumes listening.
 Thread checks the disk cache in memo; if page not there, do disk I/O;
sends the page to the client.
(New) Process Address Space w/ Threads
Thread State
 State shared by all threads in process:
 Memory content (global variables, heap, code, etc).
 I/O (files, network connections, etc).
 A change in the global variable will be seen by all other threads (unlike processes).

 State private to each thread:


 Kept in TCB (Thread Control Block).
 CPU registers, program counter.
 Stack (what functions it is calling, parameters, local variables, return addresses).
 Pointer to enclosing process (PCB).
Single- vs. Multi-threaded Processes

Shared and private stuff:


Single- to Multi-thread Conversion

Careful with global variable:


Single- to Multi-thread Conversion

Careful with global variable:


Thread Creation
• How to create and control threads?
#include <pthread.h>

int
pthread_create( pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg);

• thread: Used to interact with this thread.


• attr: Used to specify any attributes this thread might have.
• Stack size, Scheduling priority, …
• start_routine: the function this thread start running in.
• arg: the argument to be passed to the function (start routine)
• a void pointer allows us to pass in any type of argument.
Thread Identification
Recall that a process ID, represented by the pid_t data type, is a non-
negative integer. A thread ID is represented by the pthread_t data type.
Implementations are allowed to use a structure to represent the pthread_t
data type, so portable implementations can't treat them as integers.
Therefore, a function must be used to compare two thread IDs.

#include <pthread.h>
int pthread_equal(pthread_t tid1, pthread_t tid2);

Returns: nonzero if equal, 0 otherwise

11
Thread Identification
A thread can obtain its own thread ID by calling the pthread_self function.

#include <pthread.h>
pthread_t pthread_self(void);

Returns: the thread ID of the calling thread

This function can be used with pthread_equal when a thread needs to identify data structures that are
tagged with its thread ID. For example, a master thread might place work assignments on a queue and use the
thread ID to control which jobs go to each worker thread.
Thread Termination
• If any thread within a process calls exit, _Exit, or _exit, then the entire
process terminates. Similarly, when the default action is to terminate
the process, a signal sent to a thread will terminate the entire process.
• A single thread can exit in three ways, thereby stopping its flow of
control, without terminating the entire process.
• The thread can simply return from the start routine. The return value is the
thread's exit code.
• The thread can be canceled by another thread in the same process.
• The thread can call pthread_exit.
#include <pthread.h>
void pthread_exit(void *rval_ptr);
The rval_ptr is a typeless pointer, similar to the single argument passed to the start routine.
This pointer is available to other threads in the process by calling the pthread_join function.
13
Wait for a thread to complete
#include <pthread.h>
int pthread_join(pthread_t thread, void **rval_ptr);

Returns: 0 if OK, error number on failure

• The calling thread will block until the specified thread calls pthread_exit, returns
from its start routine, or is canceled. If the thread simply returned from its start
routine, rval_ptr will contain the return code. If the thread was canceled, the
memory location specified by rval_ptr is set to PTHREAD_CANCELED.
• By calling pthread_join, we automatically place a thread in the detached state
(discussed shortly) so that its resources can be recovered. If the thread was
already in the detached state, calling pthread_join fails, returning EINVAL.
• If we're not interested in a thread's return value, we can set rval_ptr to NULL. In
this case, calling pthread_join allows us to wait for the specified thread, but
does not retrieve the thread's termination status.
14
Joining

15
Detach
• The pthread_detach()routine can be used to explicitly detach a thread even
though it was created as joinable
• There is no converse routine
• Recommendations:
• If a thread requires joining, consider explicitly creating it as joinable
• This provides portability as not all implementations may create threads as joinable
by default
• If you know in advance that a thread will never need to join with another thread,
consider creating it in a detached state
• Some system resources may be able to be freed.

16
17
Locks
• a synchronization mechanism for enforcing limits on access to a resource in an environment where
there are many threads of execution

• Provide mutual exclusion to a critical section


• Interface
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);

• Usage (w/o lock initialization and error check)


pthread_mutex_t lock;
pthread_mutex_lock(&lock);
x = x + 1; // or whatever your critical section is
pthread_mutex_unlock(&lock);
• No other thread holds the lock  the thread will acquire the lock and enter the critical
section.
• If another thread hold the lock  the thread will not return from the call until it has
acquired the lock.

18
Locks (Cont.)
• All locks must be properly initialized.
• One way: using PTHREAD_MUTEX_INITIALIZER
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

• The dynamic way: using pthread_mutex_init()


int rc = pthread_mutex_init(&lock, NULL);
assert(rc == 0); // always check success!

19
Youjip Won
Locks (Cont.)
• Check errors code when calling lock and unlock
• An example wrapper
// Use this to keep your code clean but check for failures
// Only use if exiting program is OK upon failure
void Pthread_mutex_lock(pthread_mutex_t *mutex) {
int rc = pthread_mutex_lock(mutex);
assert(rc == 0);
}

• These two calls are used in lock acquisition


int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_timelock(pthread_mutex_t *mutex,
struct timespec *abs_timeout);
• trylock: return failure if the lock is already held
• timelock: return after a timeout

20
Locks (Cont.)
• These two calls are also used in lock acquisition
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_timelock(pthread_mutex_t *mutex,
struct timespec *abs_timeout);

• trylock: return failure if the lock is already held


• timelock: return after a timeout or after acquiring the lock

21
Condition Variables
A condition variable allows a thread to block itself until specified data
reaches a predefined state.
A condition variable is associated with a predicate.
When the predicate becomes true, the condition variable is used to signal one
or more threads waiting on the condition.
A single condition variable may be associated with more than one
predicate.
A condition variable always has a mutex associated with it.
A thread locks this mutex and tests the predicate defined on the shared
variable.
If the predicate is not true, the thread waits on the condition variable
associated with the predicate using the function
pthread_cond_wait.
Condition Variables
• Condition variables are useful when some kind of signaling must take place
between threads.
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);

• pthread_cond_wait:
• Put the calling thread to sleep.
• Wait for some other thread to signal it.
• pthread_cond_signal:
• Unblock at least one of the threads that are blocked on the condition variable
• A condition variable is a data object that allows a thread to suspend execution
until a certain event or condition occurs.
• When the event or condition occurs another thread can signal the thread to
“wake up.”

23
Condition Variables (Cont.)
• A thread calling wait routine:
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;

pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);

• The wait call releases the lock when putting said caller to sleep.
• Before returning after being woken, the wait call re-acquire the lock.
• A thread calling signal routine:
pthread_mutex_lock(&lock);
initialized = 1;
pthread_cond_signal(&init);
pthread_mutex_unlock(&lock);
24
Condition Variables (Cont.)
• The waiting thread re-checks the condition in a while loop, instead of
a simple if statement.
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;

pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);

• Without rechecking, the waiting thread will continue thinking that the
condition has changed even though it has not.

25
Condition Variables (Cont.)
• Don’t ever to this.
• A thread calling wait routine:
while(initialized == 0)
; // spin

• A thread calling signal routine:


initialized = 1;

• It performs poorly in many cases.  just wastes CPU cycles.


• It is error prone.

26
Compiling and Running
• To compile them, you must include the header pthread.h
• Explicitly link with the pthreads library, by adding the –lpthread flag.
prompt> gcc –o main main.c -lpthread

• For more information,


man 7 pthreads
man 3 pthread_create

27
/* A simple child/parent signaling example. - main-signal.c */
#include <stdio.h>
#include <pthread.h>
int done = 0;
void* worker(void* arg) {
printf("this should print first\n");
done = 1;
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
pthread_create(&p, NULL, worker, NULL);
while (done == 0)
;
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last
*/
28
/* A more efficient signaling via condition variables. - main-signal-cv.c */
#include <stdio.h>
#include <pthread.h>
/* simple synchronizer: allows one thread to wait for another structure
"synchronizer_t" has all the needed data methods are:
init (called by one thread)
wait (to wait for a thread)
done (to indicate thread is done) */
typedef struct __synchronizer_t {
pthread_mutex_t lock;
pthread_cond_t cond;
int done;
} synchronizer_t;
synchronizer_t s;
void signal_init(synchronizer_t *s) {
pthread_mutex_init(&s->lock, NULL);
pthread_cond_init(&s->cond, NULL);
s->done = 0;
}
void signal_done(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
s->done = 1;
pthread_cond_signal(&s->cond);
pthread_mutex_unlock(&s->lock); 29
}
void signal_wait(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
while (s->done == 0)
pthread_cond_wait(&s->cond, &s->lock);
pthread_mutex_unlock(&s->lock);
}
void* worker(void* arg) {
printf("this should print first\n");
signal_done(&s);
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
signal_init(&s);
pthread_create(&p, NULL, worker, NULL);
signal_wait(&s);
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal-cv.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last 30
*/
Thank you
19CS2106R​

Operating Systems Design​


Session 34: Thread API and Condition Variables

© 2020 KL University
Thread Concepts
• A typical UNIX process can be thought of as having a single thread of control: each
process is doing only one thing at a time. With multiple threads of control, we can
design our programs to do more than one thing at a time within a single process,
with each thread handling a separate task. This approach can have several benefits.
o We can simplify code that deals with asynchronous events by assigning a separate thread to
handle each event type. Each thread can then handle its event using a synchronous programming
model. A synchronous programming model is much simpler than an asynchronous one.
o Multiple processes have to use complex mechanisms provided by the operating system to share
memory and file descriptors. Threads, on the other hand, automatically have access to the same
memory address space and file descriptors.
o Some problems can be partitioned so that overall program throughput can be improved. A single
process that has multiple tasks to perform implicitly serializes those tasks, because there is only
one thread of control. With multiple threads of control, the processing of independent tasks can
be interleaved by assigning a separate thread per task. Two tasks can be interleaved only if they
don't depend on the processing performed by each other.
o Similarly, interactive programs can realize improved response time by using multiple threads to
separate the portions of the program that deal with user input and output from the other parts
of the program.
Thread Creation
• How to create and control threads?
#include <pthread.h>

int
pthread_create( pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg);

• thread: Used to interact with this thread.


• attr: Used to specify any attributes this thread might have.
• Stack size, Scheduling priority, …
• start_routine: the function this thread start running in.
• arg: the argument to be passed to the function (start routine)
• a void pointer allows us to pass in any type of argument.
Thread Identification
Recall that a process ID, represented by the pid_t data type, is a non-
negative integer. A thread ID is represented by the pthread_t data type.
Implementations are allowed to use a structure to represent the pthread_t
data type, so portable implementations can't treat them as integers.
Therefore, a function must be used to compare two thread IDs.

#include <pthread.h>
int pthread_equal(pthread_t tid1, pthread_t tid2);

Returns: nonzero if equal, 0 otherwise

4
Thread Identification
A thread can obtain its own thread ID by calling the pthread_self function.

#include <pthread.h>
pthread_t pthread_self(void);

Returns: the thread ID of the calling thread

This function can be used with pthread_equal when a thread needs to identify data structures that are
tagged with its thread ID. For example, a master thread might place work assignments on a queue and use the
thread ID to control which jobs go to each worker thread.
Thread Termination
• If any thread within a process calls exit, _Exit, or _exit, then the entire
process terminates. Similarly, when the default action is to terminate
the process, a signal sent to a thread will terminate the entire process.
• A single thread can exit in three ways, thereby stopping its flow of
control, without terminating the entire process.
• The thread can simply return from the start routine. The return value is the
thread's exit code.
• The thread can be canceled by another thread in the same process.
• The thread can call pthread_exit.
#include <pthread.h>
void pthread_exit(void *rval_ptr);
The rval_ptr is a typeless pointer, similar to the single argument passed to the start routine.
This pointer is available to other threads in the process by calling the pthread_join function.
6
Wait for a thread to complete
#include <pthread.h>
int pthread_join(pthread_t thread, void **rval_ptr);

Returns: 0 if OK, error number on failure

• The calling thread will block until the specified thread calls pthread_exit, returns
from its start routine, or is canceled. If the thread simply returned from its start
routine, rval_ptr will contain the return code. If the thread was canceled, the
memory location specified by rval_ptr is set to PTHREAD_CANCELED.
• By calling pthread_join, we automatically place a thread in the detached state
(discussed shortly) so that its resources can be recovered. If the thread was
already in the detached state, calling pthread_join fails, returning EINVAL.
• If we're not interested in a thread's return value, we can set rval_ptr to NULL. In
this case, calling pthread_join allows us to wait for the specified thread, but
does not retrieve the thread's termination status.
7
8
Locks
• a synchronization mechanism for enforcing limits on access to a resource in an environment where
there are many threads of execution

• Provide mutual exclusion to a critical section


• Interface
int pthread_mutex_lock(pthread_mutex_t *mutex);
int pthread_mutex_unlock(pthread_mutex_t *mutex);

• Usage (w/o lock initialization and error check)


pthread_mutex_t lock;
pthread_mutex_lock(&lock);
x = x + 1; // or whatever your critical section is
pthread_mutex_unlock(&lock);
• No other thread holds the lock  the thread will acquire the lock and enter the critical
section.
• If another thread hold the lock  the thread will not return from the call until it has
acquired the lock.

9
Locks (Cont.)
• All locks must be properly initialized.
• One way: using PTHREAD_MUTEX_INITIALIZER
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;

• The dynamic way: using pthread_mutex_init()


int rc = pthread_mutex_init(&lock, NULL);
assert(rc == 0); // always check success!

10
Youjip Won
Locks (Cont.)
• Check errors code when calling lock and unlock
• An example wrapper
// Use this to keep your code clean but check for failures
// Only use if exiting program is OK upon failure
void Pthread_mutex_lock(pthread_mutex_t *mutex) {
int rc = pthread_mutex_lock(mutex);
assert(rc == 0);
}

• These two calls are used in lock acquisition


int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_timelock(pthread_mutex_t *mutex,
struct timespec *abs_timeout);
• trylock: return failure if the lock is already held
• timelock: return after a timeout

11
Locks (Cont.)
• These two calls are also used in lock acquisition
int pthread_mutex_trylock(pthread_mutex_t *mutex);
int pthread_mutex_timelock(pthread_mutex_t *mutex,
struct timespec *abs_timeout);

• trylock: return failure if the lock is already held


• timelock: return after a timeout or after acquiring the lock

12
Condition Variables
• Condition variables are useful when some kind of signaling must take place
between threads.
int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex);
int pthread_cond_signal(pthread_cond_t *cond);

• pthread_cond_wait:
• Put the calling thread to sleep.
• Wait for some other thread to signal it.
• pthread_cond_signal:
• Unblock at least one of the threads that are blocked on the condition variable
• A condition variable is a data object that allows a thread to suspend execution
until a certain event or condition occurs.
• When the event or condition occurs another thread can signal the thread to
“wake up.”
• A condition variable is always associated with a mutex.

13
Condition Variables (Cont.)
• A thread calling wait routine:
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;

pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);

• The wait call releases the lock when putting said caller to sleep.
• Before returning after being woken, the wait call re-acquire the lock.
• A thread calling signal routine:
pthread_mutex_lock(&lock);
initialized = 1;
pthread_cond_signal(&init);
pthread_mutex_unlock(&lock);
14
Condition Variables (Cont.)
• The waiting thread re-checks the condition in a while loop, instead of
a simple if statement.
pthread_mutex_t lock = PTHREAD_MUTEX_INITIALIZER;
pthread_cond_t init = PTHREAD_COND_INITIALIZER;

pthread_mutex_lock(&lock);
while (initialized == 0)
pthread_cond_wait(&init, &lock);
pthread_mutex_unlock(&lock);

• Without rechecking, the waiting thread will continue thinking that the
condition has changed even though it has not.

15
Condition Variables (Cont.)
• Don’t ever to this.
• A thread calling wait routine:
while(initialized == 0)
; // spin

• A thread calling signal routine:


initialized = 1;

• It performs poorly in many cases.  just wastes CPU cycles.


• It is error prone.

16
Compiling and Running
• To compile them, you must include the header pthread.h
• Explicitly link with the pthreads library, by adding the –lpthread flag.
prompt> gcc –o main main.c -lpthread

• For more information,


man 7 pthreads
man 3 pthread_create

17
/* A simple child/parent signaling example. - main-signal.c */
#include <stdio.h>
#include <pthread.h>
int done = 0;
void* worker(void* arg) {
printf("this should print first\n");
done = 1;
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
pthread_create(&p, NULL, worker, NULL);
while (done == 0)
;
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last
*/
18
/* A more efficient signaling via condition variables. - main-signal-cv.c */
#include <stdio.h>
#include <pthread.h>
/* simple synchronizer: allows one thread to wait for another structure
"synchronizer_t" has all the needed data methods are:
init (called by one thread)
wait (to wait for a thread)
done (to indicate thread is done) */
typedef struct __synchronizer_t {
pthread_mutex_t lock;
pthread_cond_t cond;
int done;
} synchronizer_t;
synchronizer_t s;
void signal_init(synchronizer_t *s) {
pthread_mutex_init(&s->lock, NULL);
pthread_cond_init(&s->cond, NULL);
s->done = 0;
}
void signal_done(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
s->done = 1;
pthread_cond_signal(&s->cond);
pthread_mutex_unlock(&s->lock); 19
}
void signal_wait(synchronizer_t *s) {
pthread_mutex_lock(&s->lock);
while (s->done == 0)
pthread_cond_wait(&s->cond, &s->lock);
pthread_mutex_unlock(&s->lock);
}
void* worker(void* arg) {
printf("this should print first\n");
signal_done(&s);
return NULL;
}
int main(int argc, char *argv[]) {
pthread_t p;
signal_init(&s);
pthread_create(&p, NULL, worker, NULL);
signal_wait(&s);
printf("this should print last\n");
return 0;
}
/*
vishnu@mannava:~/threads$ cc main-signal-cv.c -lpthread
vishnu@mannava:~/threads$ ./a.out
this should print first
this should print last 20
*/
Thank you
19CS2106R​
Operating Systems Design​
Session 34: Shared Memory Interprocess
communication

© 2020 KL University
Shared Memory
• Normally, the Unix kernel prohibits one process from accessing
(reading, writing) memory belonging to another process
• Sometimes, however, this restriction is inconvenient
• At such times, System V IPC Shared Memory can be created to
specifically allow on process to read and/or write to memory created
by another process
• Efficiency:
• unlike message queues and pipes, which copy data from the process into
memory within the kernel, shared memory is directly accessed
• Shared memory resides in the user process memory, and is then shared
among other processes
Disadvantages of Shared Memory
• No automatic synchronization as in pipes or message queues (you
have to provide any synchronization). Synchronize with semaphores
or signals.
• You must remember that pointers are only valid within a given
process. Thus, pointer offsets cannot be assumed to be valid across
inter-process boundaries. This complicates the sharing of linked lists
or binary trees.
Shared Memory
Sharing the part of virtual memory and reading to and writing
from it, is another way for the processes to communicate. The
system calls are:
• shmget creates a new region of shared memory or returns an
existing one.
• shmat logically attaches a region to the virtual address space of
a process.
• shmdt logically detaches a region.
• shmctl manipulates the parameters associated with the region.
Shared Memory
• Allows multiple processes to share a region of
memory
• Fastest form of IPC: no need of data copying between client & server

• If a shared memory segment is attached


• It become a part of a process data space, and shared among multiple processes

• Readers and writers may use semaphore to


• synchronize access to a shared memory segment
Shared Memory Segment Structure
• Each shared memory has a structure
struct shmid_ds {
struct ipc_perm shm_perm;
struct anon_map *shm_amp; /* pointer in kernel */
int shm_segsz; /* size of segment in bytes */
ushort shm_lkcnt; /* # of times segment is being locked */
pid_t shm_lpid; /* pid of last shmop() */
pid_t shm_cpid; /* pid of creator */
ulong shm_nattch; /* # of current attaches */
ulong shm_cnattch; /* used only for shminfo() */
time_t shm_atime; /* last-attach time */
time_t shm_dtime; /* last-detach time */
time_t shm_ctime; /* last-change time */
};

• We can get the structure using shmctl() function.


• Actually, however, we don’t need to know the structure in detail.
shmget()
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>

int shmget(key_t key, int size, int flag);


Returns: shared memory ID if OK, -1 on error

• Obtain a shared memory identifier


• size: is the size of the shared memory segment
• flag: ipc_perm.mode
• Example
o shmId = shmget(key, size, PERM|IPC_CREAT|IPC_EXCL|0666);
Syntax for shmget:
shmid = shmget(key, size, flag);
where size is the number of bytes in the region. If the region is to
be created, allocreg is used. It sets a flag in the shared memory
table to indicate that the memory is not allocated to the region.
The memory is allocated only when the region gets attached. A
flag is set in the region table which indicates that the region
should not be freed when the last process referencing it, exits.
The data structure are shown below:
shmat()
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>

void *shmat (int shmid, void *addr, int flag);


Returns: pointer to shared memory segment if OK, -1 on error

• Attached a shared memory to an address

• flag = SHM_RDONLY: the segment is read-only

• addr==0: at the first address selected by the kernel (recommended!)

• addr!=0: at the address given by addr


Memory Layout
high address command-line arguments
and environment variables

stack 0xf7fffb2c

0xf77e86a0
shared memory 0xf77d0000

0x0003d2c8
heap malloc of 100,000 bytes
0x00024c28

uninitialized data 0x0003d2c8


(bss) 0x00024c28 array[] of 40,000 bytes

initialized data

low address text


Data Structures for Shared Memory
Algorithm: shmat
/* Algorithm: shmat
* Input: shared memory descriptor
* virtual addresses to attach memory
* flags
* Output: virtual address where memory was attached
*/
{ check validity of descriptor, permissions;
if (user specified virtual address)
{
round off virtual address, as specified by flags;
check legality of virtual address, size of region;
}
else // user wants kernel to find good address
kernel picks virtual address: error if none available;
attach region to process address space (algorithm: attachreg);
if (region being attached for first time)
allocate page tables, memory for region (algorithm: growreg);
return (virtual address where attached);
} region is to be attached is given as 0, the kernel chooses a convenient virtual address. If the
If the address where the
calling process is the first process to attach that region, it means that page tables and memory are not allocated for that
region, therefore, the kernel allocated both using growreg.
The syntax for shmdt:
shmdt(addr);
where addr is the virtual address returned by a prior shmat call.
The kernel searches for the process region attached at the
indicated virtual address and detaches it using detachreg.
Because the region tables have no back pointers to the shared
memory table, the kernel searches the shared memory table for
the entry that points to the region and adjusts the field for the time
the region was last detached.

Syntax of shmctl:
shmctl(id, cmd, shmstatbuf);
which is similar to msgctl
shmdt()
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>

void shmdt (void *addr);


Returns: 0 if OK, -1 on error

• Detach a shared memory segment


shmctl()
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>

int shmctl(int shmid, int cmd, struct shmid_ds *buf);


Returns: 0 if OK, -1 on error
• Performs various shared memory operations
• cmd = IPC_STAT:
fetch the shmid_ds structure into buf
• cmd = IPC_SET:
set the following three fields from buf: shm_perm.uid, shm_perm.gid, and
shm_perm.mode
• cmd = IPC_RMID:
remove the shared memory segment set from the system
Program to Demonstrate System V Shared memory
/* writememory.c – Program to write data into the attached shared memory segment */
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{ char *str;
int shmid;
key_t key = ftok("sharedmem",'a');
if ((shmid = shmget(key, 1024,0666|IPC_CREAT)) < 0) {
perror("shmget");
exit(1);
}
if ((str = shmat(shmid, NULL, 0)) == (char *) -1) {
perror("shmat");
exit(1);
}
printf("Enter the string to be written in memory : ");
gets(str);
printf("String written in memory: %s\n",str);
shmdt(str);
return 0;
Program to Demonstrate System V Shared memory
/* readmemory.c – Program to read data from the attached shared memory segment */
#include <stdio.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <stdlib.h>
int main()
{ int shmid;
char * str;
key_t key = ftok("sharedmem",'a');
if ((shmid = shmget(key, 1024,0666|IPC_CREAT)) < 0) {
perror("shmget");
exit(1);
}
if ((str = shmat(shmid, NULL, 0)) == (char *) -1) {
perror("shmat");
exit(1);
}
printf("Data read from memory: %s\n",str);
shmdt(str);
shmctl(shmid,IPC_RMID,NULL);
return 0;
Program to Demonstrate System V Shared memory – run
writememory.c and readmemory.c in separate terminals
[vishnu@team-osd ~]$ vi writememory.c
[vishnu@team-osd ~]$ cc writememory.c -o writememory
[vishnu@team-osd ~]$ ./writememory
Enter the string to be written in memory : hello
String written in memory: hello

[vishnu@team-osd ~]$ ipcs -m

------ Shared Memory Segments --------


key shmid owner perms bytes nattch status
0xffffffff 688285 vishnu 666 1024 0

[vishnu@team-osd ~]$ vi readmemory.c


[vishnu@team-osd ~]$ cc readmemory.c -o readmemory
[vishnu@team-osd ~]$ ./readmemory
Data read from memory: hello
Thank you
19CS2106R​
Operating Systems Design​
Session 33: Message Queue Interprocess
communication

© 2020 KL University
UNIX provides several different IPC mechanisms.
Interprocess interactions have several distinct purposes:
• Data transfer — One process may wish to send data to another process. The amount of data sent may vary
from one byte to several megabytes.
• Sharing data — Multiple processes may wish to operate on shared data, such that if a process modifies the
data, that change will be immediately visible to other processes sharing it.
• Event notification — A process may wish to notify another process or set of processes that some event has
occurred. For instance, when a process terminates, it may need to inform its parent process. The receiver
may be notified asynchronously, in which case its normal processing is interrupted. Alternatively, the
receiver may choose to wait for the notification.
• Resource sharing — Although the kernel provides default semantics for resource allocation, they are not
suitable for all applications. A set of cooperating processes may wish to define their own protocol for
accessing specific resources. Such rules are usually implemented by a locking and synchronization scheme,
which must be built on top of the basic set of primitives provided by the kernel.
• Process control — A process such as a debugger may wish to assume complete control over the execution
of another (target) process. The controlling process may wish to intercept all traps and exceptions intended
for the target and be notified of any change in the target's state.
Interprocess Communication
Processes have to communicate to exchange data and to synchronize
operations. Several forms of interprocess communication are: pipes,
named pipes and signals. But each one has some limitations.

• System V IPC
The UNIX System V IPC package consists of three mechanisms:
1. Message Queues/Messages allow process to send formatted data
streams to arbitrary processes.
2. Shared memory allows processes to share parts of their virtual
address space. an area of memory accessible by multiple processes.
3. Semaphores allow processes to synchronize execution. Can be used
to implement critical-section problems; allocation of resources.
IPC System Calls
msg/sem/shm get
Functionality Message Semaphore Shared Create new or open existing IPC structure.
Queue Memory Returns an IPC identifier

Allocate IPC msg/sem/shm ctl


msgget semget shmget Determine status, set options and/or permissions
Remove an IPC identifier
Access IPC msgsnd semop shmat msg/sem/shm op
msgrcv shmdt Operate on an IPC identifier
For example(Message queue)
IPC Control msgctl semctl shmctl add new msg to a queue (msgsnd)
receive msg from a queue (msgrcv)
They share common properties:
• Each mechanism contains a table whose entries describe all instances of
the mechanism.
• Each entry contains a numeric key, which is its user-chosen name.
• Each mechanism contains a "get" system call to create a new entry or to
retrieve an existing one, and parameters to the calls include a key and
flags.
• Processes can call the "get" system calls with the key IPC_PRIVATE to
assure the return of an unused entry. They can set the IPC_CREAT bit in
the flag field to create a new entry if one by the given key does not already
exist, and they can force an error notification by setting the IPC_EXCL and
IPC_CREAT flags, if an entry already exists for the key.
• The kernel uses the following formula to find the index into the table of
data structures from the descriptor: index = descriptor modulo (number of
entries in table);
• Each entry has a permissions structure that includes the user ID and group
ID of the process that created the entry, a user and group ID set by the
"control" system call (studied below), and a set of read-write-execute
permissions for user, group, and others, similar to the file permission
modes.
• Each entry contains other information such as the process ID of the last
process to update the entry, and time of last access or update.
They share common properties:
• Each mechanism contains a "control" system call to query status of an entry, to set status
information, or to remove the entry from the system.
• When a user allocates an IPC resource, the kernel returns the resource ID or descriptor ,
which it computes by the formula
• Id or descriptor = seq * table size + index;
• where seq is the sequence number of this resource, table size is the size of the resource
table, and index is the index of the resource in the table. This ensures that a new id is
generated if a table element is reused, since seq is incremented. This prevents processes
from accessing a resource using a stale id.
• For example, if the table of message structures contains 100 entries, the descriptors for
entry 1 are 1, 101, 201, and so on. When a process removes an entry, the kernel
increments the descriptor associated with it by the number of entries in the table: The
incremented value becomes the new descriptor for the entry when it is next allocated by a
"get" call. Processes that attempt to access the entry by its old descriptor fail on their
access. Referring to the previous example, if the descriptor associated with message
entry 1 is 201 when it is removed, the kernel assigns a new descriptor, 301, to the entry.
Processes that attempt to access descriptor 201 receive an error, because it is no longer
valid. The kernel eventually recycles descriptor numbers, presumably after a long time
lapse.
Messages/Message Queues
• There are four system calls for messages/messages queues:
1.msgget returns (and possibly creates) a message descriptor.
2.msgctl has options to set and return parameters associated
with a message descriptor and an option to remove descriptors.
3.msgsnd sends a message.
4.msgrcv receives a message.
Permission Structure
• ipc_perm is associated with each IPC structure.

• Defines the permissions and owner.


struct ipc_perm {
uid_t uid; /* owner's effective user id */
gid_t gid; /* owner's effective group id */
uid_t cuid; /* creator's effective user id */
gid_t cgid; /* creator’s effective group id */
mode_t mode; /* access modes */
ulong seq; /* slot usage sequence number */
key_t key; /* key */
};
Message Queues
• One process establishes a message queue that others may access.
Often a server will establish a message queue that multiple clients
can access
• Features of Message Queues
• A process generating a message may specify its type when it places the
message in a message queue.
• Another process accessing the message queue can use the message
type to selectively read only messages of specific type(s) in a first-in-
first-out manner.
• Message queues provide a user with a means of multiplexing data from
one or more producer(s) to one or more consumer(s).
Message Queue Structure
Message Queues
• Each queue has a structure
struct msqid_ds {
struct ipc_perm msg_perm;
struct msg *msg_first; /* ptr to first msg on queue */
struct msg *msg_last; /* ptr to last msg on queue */
ulong msg_cbytes; /* current # bytes on queue */
ulong msg_qnum; /* # msgs on queue */
ulong msg_qbytes; /* max # bytes on queue */
pid_t msg_lspid; /* pid of last msgsnd() */
pid_t msg_lrpid; /* pid of last msgrcv() */
time_t msg_stime; /* last-msgsnd() time */
time_t msg_rtime; /* last-msgrcv() time */
time_t msg_ctime; /* last-change time */
};

• We can get the structure using msgctl() function.


• Actually, however, we don’t need to know the structure in detail.
Syntax of msgget:
msgqid = msgget(key, flag);
where msgqid is the descriptor returned by the call, and key and flag
have the semantics described above for the general "get" calls. The
kernel stores messages on a linked list (queue) per descriptor, and it
uses msgqid as an index into an array of message queue headers.
The queue structure contains the following fields, in addition to the
common fields:
• Pointers to the first and last messages on a linked list.
• The number of messages and total number of data bytes on the
linked list.
• The maximum number of bytes of data that can be on the linked list.
• The process IDs of the last processes to send and receive
messages.
• Time stamps of the last msgsnd, msgrcv, and msgctl operations.
Syntax of msgget:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

int msgget(key_t key, int flag);


Returns: msg queue ID if OK, -1 on error

• Create new or open existing queue


• flag : ipc_perm.mode
• Example
msg_qid = msgget(DEFINED_KEY, IPC_CREAT | 0666);
Syntax of msgsnd: /* Algorithm: msgsnd
* Input: message queue descriptor
* address of message structure
msgsnd(msgqid, msg, * size of message
count, flag); * flags
* Output: number of bytes send
flag describes the action */
{ check legality of descriptor, permissions;
the kernel should take if it while (not enough space to store message)
runs out of internal buffer {
if (flags specify not to wait)
space. The algorithm is return;
given below: sleep(event: enough space is available);
}
get message header;
read message text from user space to kernel;
adjust data structures: enqueue message header,
message header points to data, counts,
time stamps, process ID;
wakeup all processes waiting to read message from queue;
}
The diagram shows the structure of message
queues:
msgsnd()
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

int msgsnd(int msqid, const void *ptr, size_t nbytes, int flag);
Returns: 0 if OK, -1 on error
• msgsnd() places a message at the end of the queue.
o ptr: pointer that points to a message
o nbytes: length of message data
o if flag = IPC_NOWAIT: IPC_NOWAIT is similar to the nonblocking I/O flag for file I/O.
• Structure of messages

struct mymesg {
long mtype; /* positive message type */
char mtext[512]; /* message data, of length nbytes */
};
/* Algorithm: msgrcv
* Input: message descriptor
* address of data array for incoming message

Syntax for msgrcv: *


*
*
size of data array
requested message type
flags
* Output: number of bytes in returned message
count = msgrcv(id, msg, */
maxcount, type, flag); { check permissions;
loop:
check legality of message descriptor;
// find message to return to user
if (requested message type == 0)
consider first message on queue;
else if (requested message type > 0)
consider first message on queue with given type;
else // requested message type < 0
consider first of the lowest typed messages on queue,
such that its type is <= absolute value of requested type;
if (there is a message)
{ adjust message size or return error if user size too small;
copy message type, text from kernel space to user space;
unlink message from queue;
return;
}
// no message
if (flags specify not to sleep)
return with error;
sleep(event: message arrives on queue);
goto loop;
}
msgrcv()
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

int msgrcv(int msqid, void *ptr, size_t nbytes, long type, int flag);
Returns: data size in message if OK, -1 on error
• msgrcv() retrieves a message from a queue.
• type == 0: the first message on the queue is returned
• type > 0: the first message on the queue whose message type equals type is returned
• type < 0: the first message on the queue whose message type is the lowest value less than or equal
to the absolute value of type is returned
• flag may be given by IPC_NOWAIT
Algorithm: msgrcv
• If processes were waiting to send messages because there was no
room on the list, the kernel awakens them after it removes a
message from the message queue. If a message is bigger
than maxcount, the kernel returns an error for the system call leaves
the message on the queue. If a process ignores the size constraints
(MSG_NOERROR bit is set in flag), the kernel truncates the
message, returns the requested number of bytes, and removes the
entire message from the list.
• If the type is a positive integer, the kernel returns the first message of
the given type. If it is a negative, the kernel finds the lowest type of
all message on the queue, provided it is less than or equal to the
absolute value of the type, and returns the first message of that type.
For example, if a queue contains three messages whose types are
3, 1, and 2, respectively, and a user requests a message with type -
2, the kernel returns the message of type 1.
The syntax of msgctl:
msgctl(id, cmd, mstatbuf);
where cmd specifies the type of command, and mstatbuf is the address of a
user data structure that will contain control parameters or the results of a
query.
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>

Performs various operations on a queue int msgctl(int msqid, int cmd, struct msqid_ds *buf);
Returns: 0 if OK, -1 on error
• cmd = IPC_STAT:
fetch the msqid_ds structure for this queue, storing it in buf
• cmd = IPC_SET:
set the following four fields from buf: msg_perm.uid, msg_perm.gid,
msg_perm.mode, and msg_qbytes
• cmd = IPC_RMID:
remove the message queue.
example: sender.c – send/store 3 messages in
to MQ
#include <stdio.h> // sender.c
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#define DEFINED_KEY 0x10101010
main(int argc, char **argv)
{
int msg_qid;
struct {
long mtype;
char content[256];
} msg;
fprintf(stdout, "=========SENDER==========\n");
if((msg_qid = msgget(DEFINED_KEY, IPC_CREAT | 0666)) < 0) {
perror("msgget: "); exit(-1);
}
msg.mtype = 1; int i=3;
while(i--) {
memset(msg.content, 0x0, 256);
gets(msg.content);
if(msgsnd(msg_qid, &msg, sizeof(msg.content), 0) < 0) {
perror("msgsnd: "); exit(-1);
}
}
}
example: receiver.c – fetch 3 messages
from
#includeMQ
#include <stdio.h> // receiver.c
<sys/types.h>
#include <sys/ipc.h>
#include <sys/msg.h>
#define DEFINED_KEY 0x10101010
main(int argc, char **argv)
{
int msg_qid;
struct {
long mtype;
char content[256];
} msg;
fprintf(stdout, "=========RECEIVER==========\n");
if((msg_qid = msgget(DEFINED_KEY, IPC_CREAT | 0666)) < 0) {
perror("msgget: "); exit(-1);
}
int i=3;
while(i--) {
memset(msg.content, 0x0, 256);
if(msgrcv(msg_qid, &msg, 256, 0, 0) < 0) {
perror("msgrcv: "); exit(-1);
}
puts(msg.content);
}
}
Execute sender.c and receiver.c on two different terminals

[vishnu@team-osd ~]$ vi sender.c [vishnu@team-osd ~]$ vi receiver.c


[vishnu@team-osd ~]$ cc sender.c -o [vishnu@team-osd ~]$ cc receiver.c -o receiver
sender [vishnu@team-osd ~]$ ./receiver
[vishnu@team-osd ~]$ ./sender =========RECEIVER==========
=========SENDER========== i
i am
am happy
happy
Thank you
19CS2106R​

Operating Systems Design​


Session 32: Locking: Spin Locks

© 2020 KL University
Fetch-And-Add

• Atomically increment a value while returning


the old value at a particular address.
1 int FetchAndAdd(int *ptr) {
2 int old = *ptr;
3 *ptr = old + 1;
4 return old;
5 }

Fetch-And-Add Hardware atomic instruction (C-style)


Ticket Lock
• Ticket lock can be built with fetch-and add.
– Ensure progress for all threads.  fairness
1 typedef struct __lock_t {
2 int ticket;
3 int turn;
4 } lock_t;
5
6 void lock_init(lock_t *lock) {
7 lock->ticket = 0;
8 lock->turn = 0;
9 }
10
11 void lock(lock_t *lock) {
12 int myturn = FetchAndAdd(&lock->ticket);
13 while (lock->turn != myturn)
14 ; // spin
15 }
16 void unlock(lock_t *lock) {
17 FetchAndAdd(&lock->turn);
18 }
So Much Spinning
• Hardware-based spin locks are simple and they
work.

• In some cases, these solutions can be quite


inefficient.
– Any time a thread gets caught spinning, it wastes an
entire time slice doing nothing but checking a value.

How To Avoid Spinning?


We’ll need OS Support too!
A Simple Approach: Just Yield
• When you are going to spin, give up the CPU to another
thread.
– OS system call moves the caller from the running state to
the ready state.
– The cost of a context switch can be substantial and the
starvation problem still exists.
1 void init() {
2 flag = 0;
3 }
4
5 void lock() {
6 while (TestAndSet(&flag, 1) == 1)
7 yield(); // give up the CPU
8 }
9
10 void unlock() {
11 flag = 0;
12 }
Lock with Test-and-set and Yield
Using Queues: Sleeping Instead of
Spinning
• Queue to keep track of which threads are
waiting to enter the lock.
• park()
– Put a calling thread to sleep
• unpark(threadID)
– Wake a particular thread as designated by
threadID.
Using Queues: Sleeping Instead of
Spinning
1 typedef struct __lock_t { int flag; int guard; queue_t *q; } lock_t;
2
3 void lock_init(lock_t *m) {
4 m->flag = 0;
5 m->guard = 0;
6 queue_init(m->q);
7 }
8
9 void lock(lock_t *m) {
10 while (TestAndSet(&m->guard, 1) == 1)
11 ; // acquire guard lock by spinning
12 if (m->flag == 0) {
13 m->flag = 1; // lock is acquired
14 m->guard = 0;
15 } else {
16 queue_add(m->q, gettid());
17 m->guard = 0;
18 park();
19 }
20 }
21 …
Using Queues: Sleeping Instead of
Spinning
22 void unlock(lock_t *m) {
23 while (TestAndSet(&m->guard, 1) == 1)
24 ; // acquire guard lock by spinning
25 if (queue_empty(m->q))
26 m->flag = 0; // let go of lock; no one wants it
27 else
28 unpark(queue_remove(m->q)); // hold lock (for next thread!)
29 m->guard = 0;
30 }

Lock With Queues, Test-and-set, Yield, And Wakeup (Cont.)


Thread Creation
• How to create and control threads?
#include <pthread.h>

int
pthread_create( pthread_t* thread,
const pthread_attr_t* attr,
void* (*start_routine)(void*),
void* arg);

– thread: Used to interact with this thread.


– attr: Used to specify any attributes this thread might
have.
• Stack size, Scheduling priority, …
– start_routine: the function this thread start
running in.
– arg: the argument to be passed to the function
(start routine)
• a void pointer allows us to pass in any type of argument.
Thread Creation (Cont.)
• If start_routine instead required another
type argument, the declaration would look like
this:
– An integer argument:
int
pthread_create(…, // first two args are the same
void* (*start_routine)(int),
int arg);

– Return an integer:
int
pthread_create(…, // first two args are the same
int (*start_routine)(void*),
void* arg);
Example: Creating a Thread
#include <pthread.h>

typedef struct __myarg_t {


int a;
int b;
} myarg_t;

void *mythread(void *arg) {


myarg_t *m = (myarg_t *) arg;
printf(“%d %d\n”, m->a, m->b);
return NULL;
}

int main(int argc, char *argv[]) {


pthread_t p;
int rc;

myarg_t args;
args.a = 10;
args.b = 20;
rc = pthread_create(&p, NULL, mythread, &args);

}
Wait for a thread to complete
int pthread_join(pthread_t thread, void **value_ptr);

– thread: Specify which thread to wait for


– value_ptr: A pointer to the return value
• Because pthread_join() routine changes the
value, you need to pass in a pointer to that value.
Example: Waiting for Thread
Completion
1 #include <stdio.h>
2 #include <pthread.h>
3 #include <assert.h>
4 #include <stdlib.h>
5
6 typedef struct __myarg_t {
7 int a;
8 int b;
9 } myarg_t;
10
11 typedef struct __myret_t {
12 int x;
13 int y;
14 } myret_t;
15
16 void *mythread(void *arg) {
17 myarg_t *m = (myarg_t *) arg;
18 printf(“%d %d\n”, m->a, m->b);
19 myret_t *r = malloc(sizeof(myret_t));
20 r->x = 1;
21 r->y = 2;
22 return (void *) r;
23 }
24
Example: Waiting for Thread
Completion (Cont.)
25 int main(int argc, char *argv[]) {
26 int rc;
27 pthread_t p;
28 myret_t *m;
29
30 myarg_t args;
31 args.a = 10;
32 args.b = 20;
33 pthread_create(&p, NULL, mythread, &args);
34 pthread_join(p, (void **) &m); // this thread has been
// waiting inside of the
// pthread_join()
routine.
35 printf(“returned %d %d\n”, m->x, m->y);
36 return 0;
37 }
Example: Dangerous code
Be careful with how values are returned from a
thread.
1 void *mythread(void *arg) {
2 myarg_t *m = (myarg_t *) arg;
3 printf(“%d %d\n”, m->a, m->b);
4 myret_t r; // ALLOCATED ON STACK: BAD!
5 r.x = 1;
6 r.y = 2;
7 return (void *) &r;
8 }

When the variable r returns, it is automatically de-


allocated.
Example: Simpler Argument Passing
to a Thread
Just passing in a single value
1 void *mythread(void *arg) {
2 int m = (int) arg;
3 printf(“%d\n”, m);
4 return (void *) (arg + 1);
5 } Just passing in a single value
6
7 int main(int argc, char *argv[]) {
8 pthread_t p;
9 int rc, m;
10 pthread_create(&p, NULL, mythread, (void *) 100);
11 pthread_join(p, (void **) &m);
12 printf(“returned %d\n”, m);
13 return 0;
14 }
CO4 – Concurrency
19CS2106R​

Operating Systems Design​


Session 31: Locking: Spin Locks

© 2020 KL University
Recap of CO3
• Operating system organization: creating and running the first process, Page
tables: Paging, hardware, Process address space
• Page tables: Physical memory allocation
• Systems calls, exceptions and interrupts, Assembly trap handlers
• Disk driver and Disk scheduling
• Manipulation of the process address space
• Page tables: User part of an address space, sbrk, exec
• memory management policies: swapping, demand paging
• memory management policies: Page faults and replacement algorithms
• TLB, Segmentation
• Hybrid approach: paging and Segmentation, Multi-level paging
CO4 Topics
• Locking
• Inter-process communication
• Models of Inter-process communication
• Thread API, Conditional Variable
• Mutex, Concurrent Linked List
• Semaphores
• Concurrency Control Problems
• Deadlocks
• Boot Loader
Process memory layout
A program is a file containing a range of information that describes how to construct a process at run time.
The memory allocated to each process is composed of a number of parts, usually referred to as segments.
These segments are as follows:
a. Text: the instructions of the program.
b. The initialized data segment contains global and static variables that are explicitly Initialized
c. The uninitialized data segment contains global and static variables that are not explicitly initialized.
d. Heap: an area from which programs can dynamically allocate extra memory.
e. Stack: a piece of memory that grows and shrinks as functions are called and return and that is used to
allocate storage for local variables and function call linkage information
Several more segment types exist in an a.out, containing the symbol table, debugging information, linkage
tables for dynamic shared libraries, and the like. These additional sections don't get loaded as part of the
program's image executed by a process.
The size(1) command reports the sizes (in bytes) of the text, data, and bss segments. For example:

$ size /usr/bin/cc /bin/sh


text data bss dec hex filename
79606 1536 916 82058 1408a /usr/bin/cc
619234 21120 18260 658614 a0cb6 /bin/sh
Stack and Heap Segment
Stack Segment
• The stack segment is used to store local variables, function
parameters, and the return address. (A return address is the memory
address where a CPU will continue its execution after the return from
a function call).
• Local variables are declared inside the opening left curly brace of a
function body, including the main() or other left curly braces that are
not defined as static. Thus, the scopes of those variables are limited
to the function’s body. The life of a local variable is defined until the
execution control is within the respective function body.
Heap Segment
• The heap area is allocated to each process by the OS when the process is created.
Dynamic memory is obtained from the heap. They are obtained with the help of the
malloc(), calloc(), and realloc() function calls. Memory from the heap can only be
accessed via pointers. Process address space grows and shrinks at runtime as
memory gets allocated and deallocated. Memory is given back to the heap using
free(). Data structures such as linked lists and trees can be easily implemented
using heap memory. Keeping track of heap memory is an overhead. If not utilized
Typical memory arrangement
Typical memory arrangement
Race Condition Example - 1
As an example of why we need locks, consider several processors sharing a single
disk, such as the IDE disk in xv6. The disk driver maintains a linked list of the outstanding disk requests (4226)
and processors may add new requests to the list concurrently (4354). If there were no concurrent requests, you
might implement the linked list as follows:
Race Condition When multiple CPUs updating the same data simultaneously;
without careful design such parallel access is likely to yield
incorrect results or a broken data structure.

Race Condition occurs when multiple process are trying to do something with shared data and the
final outcome depends on the order in which the processes run
Race Condition
This implementation is correct if executed in isolation. However, the code is not
correct if more than one copy executes concurrently. If two CPUs execute insert
at the same time, it could happen that both execute line 15 before either executes
16 (see Figure 4-1). If this happens, there will now be two list nodes with next set
to the former value of list. When the two assignments to list happen at line 16, the
second one will overwrite the first; the node involved in the first assignment will be
lost.
The lost update at line 16 is an example of a race condition. A race condition is
a situation in which a memory location is accessed concurrently, and at least one
access is a write. A race is often a sign of a bug, either a lost update (if the
accesses are writes) or a read of an incompletely-updated data structure. The
outcome of a race depends on the exact timing of the two CPUs involved and how
their memory operations are ordered by the memory system, which can make
race-induced errors difficult to reproduce and debug.
Race Condition example – 2 i = 5 (shared)

• You also need to synchronize two or more threads


that might try to modify the same variable at the
same time. Consider the case in which you
increment a variable . The increment operation is
usually broken down into three steps.
1. Read the memory location into a register.
2. Increment the value in the register.
3. Write the new value back to the memory
location.
• When two or more processes are reading or writing
some shared data and the final result depends on
who runs precisely when, is called race condition.
• Race condition occurs when two or more operations
occur in an undefined manner
• Race condition should be avoided because they can
cause fine errors in applications and are difficult to
debug
Simple example of the kind of problems that can occur
when shared resources are not accessed atomically.
Sequence-number-increment Problem Example - 3
• The technique used by the print spoolers is to have a file for each printer that
contains the next sequence number to be used. The file is just a single line
containing the sequence number in ASCII. Each process that needs to assign a
sequence number goes through three steps:
1. it reads the sequence number file,
2. it uses the number, and it increments the number
3. and writes it back.

• The problem is that in the time a single process takes to execute these three steps,
another process can perform the same three steps. Chaos can result, as we will see
in some examples that follow.
sequence-number-increment problem
#define MAXLINE 4096 /* max text line length */
#define SEQFILE "seqno" /* filename */ void
#define LOCKFILE "seqno.lock"
my_lock(int fd)
void my_lock(int), my_unlock(int);
{
int main(int argc, char **argv)
return;
{int fd;
}
long i, seqno;
pid_t pid;
ssize_t n; void
char line[MAXLINE + 1]; my_unlock(int fd)
pid = getpid(); {
fd = open(SEQFILE, O_RDWR, 0666); return;
for (i = 0; i < 20; i++) {
}
my_lock(fd); /* lock the file */
lseek(fd, 0L, SEEK_SET); /* rewind before read */
n = read(fd, line, MAXLINE);
line[n] = '\0'; /* null terminate for sscanf */
n = sscanf(line, "%ld\n", &seqno);
printf("%s: pid = %ld, seq# = %ld\n", argv[0], (long) pid, seqno);
seqno++; /* increment sequence number */
snprintf(line, sizeof(line), "%ld\n", seqno);
lseek(fd, 0L, SEEK_SET); /* rewind before write */
write(fd, line, strlen(line));
my_unlock(fd); /* unlock the file */
}
exit(0);
}
If the sequence number in the file is initialized to one, and a single copy of the program
is run, we get the following output:
[vishnu@team-osd ~]$ cc seqnumnolock.c When the sequence number is again initialized to
one, and the program is run twice
[vishnu@team-osd ~]$ vi seqno in the background, we have the following output:
[vishnu@team-osd ~]$ ./a.out [vishnu@team-osd ~]$ vi seqno
./a.out: pid = 5448, seq# = 1 [vishnu@team-osd ~]$ ./a.out & ./a.out &
[1] 7891
./a.out: pid = 5448, seq# = 2 [2] 7892
./a.out: pid = 5448, seq# = 3 [vishnu@team-osd ~]$ ./a.out: pid = 7892, seq# = 1 ./a.out: pid = 7892, seq# = 20
./a.out: pid = 5448, seq# = 4 ./a.out: pid = 7892, seq# = 2 ./a.out: pid = 7891, seq# = 20
./a.out: pid = 7892, seq# = 3
./a.out: pid = 5448, seq# = 5 ./a.out: pid = 7891, seq# = 21
./a.out: pid = 7892, seq# = 4
./a.out: pid = 5448, seq# = 6 ./a.out: pid = 7892, seq# = 5 ./a.out: pid = 7891, seq# = 22
./a.out: pid = 7892, seq# = 6 ./a.out: pid = 7891, seq# = 23
./a.out: pid = 5448, seq# = 7
./a.out: pid = 7892, seq# = 7 ./a.out: pid = 7891, seq# = 24
./a.out: pid = 5448, seq# = 8 ./a.out: pid = 7892, seq# = 8 ./a.out: pid = 7891, seq# = 25
./a.out: pid = 5448, seq# = 9 ./a.out: pid = 7892, seq# = 9 ./a.out: pid = 7891, seq# = 26
./a.out: pid = 5448, seq# = 10 ./a.out: pid = 7892, seq# = 10
./a.out: pid = 7891, seq# = 27
./a.out: pid = 7892, seq# = 11
./a.out: pid = 5448, seq# = 11 ./a.out: pid = 7892, seq# = 12 ./a.out: pid = 7891, seq# = 28
./a.out: pid = 5448, seq# = 12 ./a.out: pid = 7892, seq# = 13 ./a.out: pid = 7891, seq# = 29
./a.out: pid = 5448, seq# = 13 ./a.out: pid = 7892, seq# = 14 ./a.out: pid = 7891, seq# = 30
./a.out: pid = 7891, seq# = 8 ./a.out: pid = 7891, seq# = 31
./a.out: pid = 5448, seq# = 14 ./a.out: pid = 7892, seq# = 15 ./a.out: pid = 7891, seq# = 32
./a.out: pid = 5448, seq# = 15 ./a.out: pid = 7892, seq# = 16
./a.out: pid = 7891, seq# = 33
./a.out: pid = 5448, seq# = 16 ./a.out: pid = 7892, seq# = 17
./a.out: pid = 7891, seq# = 17
./a.out: pid = 7891, seq# = 34
./a.out: pid = 5448, seq# = 17 ./a.out: pid = 7892, seq# = 18 ./a.out: pid = 7891, seq# = 35
./a.out: pid = 5448, seq# = 18 ./a.out: pid = 7891, seq# = 19 ./a.out: pid = 7891, seq# = 36
./a.out: pid = 5448, seq# = 19 ./a.out: pid = 7892, seq# = 19 [1]- Done ./a.out
./a.out: pid = 5448, seq# = 20 [2]+ Done ./a.out
Critical Section do {
entry section
• A critical section is a block of a
critical section
code that only one process at a
time can execute exit section
• The critical section problem is
to ensure that only one process } while (TRUE);
at a time is allowed to be
operating in its critical section
• Each process takes permission
from operating system to enter
into the critical section
The term critical section is used to refer to a section of code that accesses a shared
resource and whose execution should be atomic; that is, its execution should not be
interrupted by another thread that simultaneously accesses the same shared resource.
Mutual exclusion
• If a process is executing in its critical section , then no other process is
allowed to execute in the critical section
• No two process can be in the same critical section at the same time.
This is called mutual exclusion
Locks: The Basic Idea
• Ensure that any critical section executes as if it were a single
atomic instruction.
• An example: the canonical update of a shared variable

balance = balance + 1;

• Add some code around the critical section

1 lock_t lk; // some globally-allocated lock ‘mutex’


2 …
3 lock(&lk);
4 balance = balance + 1;
5 unlock(&lk);
Locks: The Basic Idea
• Lock variable holds the state of the lock.
• available (or unlocked or free)
• No thread holds the lock.

• acquired (or locked or held)


• Exactly one thread holds the lock and presumably is in a critical section.
The semantics of the lock()
• lock()
• Try to acquire the lock.
• If no other thread holds the lock, the thread will acquire the lock.
• Enter the critical section.
• This thread is said to be the owner of the lock.

• Other threads are prevented from entering the critical section while the first
thread that holds the lock is in there.
Building A Lock
Efficient locks provided mutual exclusion at low cost.
Building a lock need some help from the hardware and the
OS.
Design
Evaluating locks – Basic criteria
• Mutual exclusion
• Does the lock work, preventing multiple threads
from entering a critical section?

• Fairness
• Does each thread contending for the lock get a
fair shot at acquiring it once it is free? (Starvation)

• Performance
• The time overheads added by using the lock
Controlling Interrupts
• Disable Interrupts for critical sections
• One of the earliest solutions used to provide mutual exclusion
• Invented for single-processor systems.
1 void lock() {
2 DisableInterrupts();
3 }
4 void unlock() {
5 EnableInterrupts();
6 }

• Problem:
• Require too much trust in applications
• Greedy (or malicious) program could monopolize the processor.
• Do not work on multiprocessors
• Code that masks or unmasks interrupts be executed slowly by modern CPUs
Why hardware support needed?
First attempt: Using a flag denoting whether the lock is held or
not.
The code below has problems.
1 typedef struct __lock_t { int flag; } lock_t;
2
3 void init(lock_t *mutex) {
4 // 0  lock is available, 1  held
5 mutex->flag = 0;
6 }
7
8 void lock(lock_t *mutex) {
9 while (mutex->flag == 1) // TEST the flag
10 ; // spin-wait (do nothing)
11 mutex->flag = 1; // now SET it !
12 }
13
14 void unlock(lock_t *mutex) {
15 mutex->flag = 0;
16 }
Why hardware support needed? (Cont.)
• Problem 1: No Mutual Exclusion (assume flag=0 to begin)
Thread1 Thread2
call lock()
while (flag == 1)
interrupt: switch to Thread 2
call lock()
while (flag == 1)
flag = 1;
interrupt: switch to Thread 1
flag = 1; // set flag to 1 (too!)

• Problem 2: Spin-waiting wastes time waiting for another thread.

• So, we need an atomic instruction supported by Hardware!


• test-and-set instruction, also known as atomic exchange
Test And Set (Atomic Exchange)
• An instruction to support the creation of simple locks
1 int TestAndSet(int *ptr, int new) {
2 int old = *ptr; // fetch old value at ptr
3 *ptr = new; // store ‘new’ into ptr
4 return old; // return the old value
5 }

• return(testing) old value pointed to by the ptr.


• Simultaneously update(setting) said value to new.
• This sequence of operations is performed atomically.
A Simple Spin Lock using test-and-set
1 typedef struct __lock_t {
2 int flag;
3 } lock_t;
4
5 void init(lock_t *lock) {
6 // 0 indicates that lock is available,
7 // 1 that it is held
8 lock->flag = 0;
9 }
10
11 void lock(lock_t *lock) {
12 while (TestAndSet(&lock->flag, 1) == 1)
13 ; // spin-wait
14 }
15
16 void unlock(lock_t *lock) {
17 lock->flag = 0;
18 }

• Note: To work correctly on a single processor, it requires a preemptive scheduler.


A Simple Spin Lock using open()
We are guaranteed that only one process at a time can create the file (i.e., obtain the lock), and to release the lock,
we just unlink the file
void
my_lock(int fd)
{
int tempfd;
while ( (tempfd = open(LOCKFILE, O_RDWR|O_CREAT|O_EXCL, 0644)) < 0) {
if (errno != EEXIST)
printf("open error for lock file");
/* someone else has the lock, loop around and try again */
}
close(tempfd); /* opened the file, we have the lock */
}
void
my_unlock(int fd)
{
unlink(LOCKFILE); /* release lock by removing file */
}
Evaluating Spin Locks
• Correctness: yes
• The spin lock only allows a single thread to entry the critical section.

• Fairness: no
• Spin locks don’t provide any fairness guarantees.
• Indeed, a thread spinning may spin forever.

• Performance:
• In the single CPU, performance overheads can be quire painful.
• If the number of threads roughly equals the number of CPUs, spin locks work
reasonably well.
Load-Linked and Store-Conditional
• Test whether the value at the address(ptr) is equal to expected.
• If so, update the memory location pointed to by ptr with the new value.
• In either case, return the actual value at that memory location.
1 int CompareAndSwap(int *ptr, int expected, int new)
{
2 int actual = *ptr;
3 if (actual == expected)
4 *ptr = new;
5 return actual;
6 }

Compare-and-Swap hardware atomic instruction (C-style)

1 void lock(lock_t *lock) {


2 while (CompareAndSwap(&lock->flag, 0, 1) == 1)
3 ; // spin
4 }

Spin lock with compare-and-swap


Load-Linked and Store-Conditional
1 int LoadLinked(int *ptr) {
2 return *ptr;
3 }
4
5 int StoreConditional(int *ptr, int value) {
6 if (no one has updated *ptr since the LoadLinked to this
address) {
7 *ptr = value;
8 return 1; // success!
9 } else {
10 return 0; // failed to update
11 }
12 }

• The store-conditional only succeeds if no intermittent store to the address has


taken place.
• success: return 1 and update the value at ptr to value.
• fail: the value at ptr is not updates and 0 is returned.
1
Load-Linked and Store-Conditional (Cont.)
void lock(lock_t *lock) {
2 while (1) {
3 while (LoadLinked(&lock->flag) == 1)
4 ; // spin until it’s zero
5 if (StoreConditional(&lock->flag, 1) == 1)
6 return; // if set-it-to-1 was a success: all done
7 otherwise: try it all over again
8 }
9 }
10
11 void unlock(lock_t *lock) {
12 lock->flag = 0;
13 }

Using LL/SC To Build A Lock

1 void lock(lock_t *lock) {


A more concise form of the
2 while (LoadLinked(&lock->flag)||!StoreConditional(&lock->flag, lock() using LL/SC
1))
3 ; // spin
4 }
Implementation of locks in xv6
Locking Introduction
• xv6 runs on multiprocessors
• Computers with multiple CPUs executing independently
• These multiple CPUs share physical RAM, and xv6 exploits the sharing
to maintain data structures that all CPUs read and write
• This sharing raises the possibility of one CPU reading a data structure
while another CPU is mid-way through updating it
• When multiple CPUs updating the same data simultaneously; without
careful design such parallel access is likely to yield incorrect results or
a broken data structure.
Locking Introduction
• Even on a uni-processor, an interrupt routine that uses the same data
as some interruptible code could damage the data if the interrupt
occurs at just the wrong time
• Any code that accesses shared data concurrently must have a strategy
for maintaining correctness despite concurrency.
• The concurrency may arise from accesses by multiple cores, or by
multiple threads, or by interrupt code.
• xv6 uses a handful of simple concurrency control strategies; much
more sophistication is possible - lock
Locking Introduction
• A lock provides mutual exclusion, ensuring that only one CPU at a
time can hold the lock.
• If a lock is associated with each shared data item, and the code
always holds the associated lock when using a given item, then we
can be sure that the item is used from only one CPU at a time.
• In this situation, we say that the lock protects the data item.
Code: Locks
• xv6 has two types of locks: spin-locks and sleep-locks.
• xv6 represents a spin-lock as a struct spinlock.
Code: Locks
• The important field in the structure is locked
• a word that is zero when the lock is available and non-zero when it is
held.
• Logically, xv6 should acquire a lock by executing code like
• This code does not guarantee mutual exclusion on
Code: Locks a multi-processor.
• It could happen that two CPUs simultaneously
reach line 25
lk->locked
• If it is zero then both grab the lock by executing
line 26
lk->locked = 1;
• At this point, two different CPUs hold the lock,
which violates the mutual exclusion property.
• Rather than helping us avoid race conditions, this
implementation of acquire has its own race
condition.
• The problem here is that lines 25 and 26 executed
as separate actions.
• In order for the routine above to be correct, lines
25 and 26 must execute in one atomic(i.e.,
indivisible) step
Code: Locks
• To execute those two lines atomically, xv6 relies on a special x86
instruction , xchg
xchg(volatile uint *addr, uint newval)
• Locks (i.e., spinlocks) in xv6 are implemented using the xchg atomic
instruction
• In one atomic operation, xchg swaps a word in memory with the
contents of a register.
• The function acquire repeats this xchg instruction in a loop
acquire(struct spinlock *lk)
• Each iteration atomically reads lk->locked and sets it to 1
while(xchg(&lk->locked,1)!=0)
Code: Locks
• If the lock is already held, lk->locked will already be 1, so the xchg
returns 1 and the loop continues.
• If the xchg returns 0, however , acquire has successfully acquired
the lock->locked was 0 and is now 1—so the loop can stop.
• Once the lock is acquired , acquire records, for debugging, the
CPU and stack trace that acquired the lock.
• If a process forgets to release a lock, this information can help to
identify the culprit.
• These debugging fields are protected by the lock and must only
be edited while holding the lock.
Code: Locks

• The function release is the opposite of acquire: it clears the


debugging fields and then releases the lock.
release(struct spinlock *lk)
• The function uses an assembly instruction to clear locked, because
clearing this field should be atomic so that the xchg instruction won’t
see a subset of the 4 bytes that hold locked updated.
• The x86 guarantees that a 32-bit movl updates all 4 bytes atomically.
Code: Using locks
• xv6 uses locks in many places to avoid race conditions
• A hard part about using locks is deciding how many locks to use and
which data and invariants each lock protects.
• There are a few basic principles.
• First, any time a variable can be written by one CPU at the same time
that another CPU can read or write it, a lock should be introduced to
keep the two operations from overlapping.
• Second, remember that locks protect invariants: if an invariant
involves multiple memory locations, typically all of them need to be
protected by a single lock to ensure the invariant is maintained.
Code: Using locks
• These two rules says when locks are necessary but say nothing about
when locks are unnecessary
• It is important for efficiency not to lock too much, because locks
reduce parallelism.
• If parallelism isn’t important, then one could arrange to have only a
single thread and not worry about locks.
• A simple kernel can do this on a multiprocessor by having a single lock
that must be acquired on entering the kernel and released on exiting
the kernel
Sleep locks
• Sometimes xv6 code needs to hold a lock for a long time.
• For example, the file system keeps a file locked while reading and
writing its content on the disk, and these disk operations can take
tens of milliseconds.
• Efficiency demands that the processor be yielded while waiting so
that other threads can make progress, and this in turn means that xv6
needs locks that work well when held across context switches.
• xv6 provides such locks in the form of sleep-locks.
• Xv6 sleep-locks support yielding the processor during their critical
sections.
Sleep locks
• This property poses a design challenge: if thread T1 holds lock L1 and
has yielded the processor, and thread T2 wishes to acquire L1, we
have to ensure that T1 can execute while T2 is waiting so that T1 can
release L1.
• T2 can’t use the spin-lock acquire function here: it spins with
interrupts turned off, and that would prevent T1 from running.
• To avoid this deadlock, the sleep-lock acquire routine (called
acquiresleep) yields the processor while waiting, and does not disable
interrupts
acquiresleep(struct sleeplock *lk)
• At a high level, a sleep-lock has a locked field that is protected by a
spinlock, and acquiresleep’s call to sleep atomically yields the CPU
and releases the spin-lock.
Sleep locks
• The result is that other threads can execute while acquiresleep waits.
• Because sleep-locks leave interrupts enabled, they cannot be used in
interrupt handlers.
• Because acquiresleep may yield the processor, sleep-locks cannot be
used inside spin-lock critical sections (though spin-locks can be used
inside sleep-lock critical sections).
• xv6 uses spin-locks in most situations, since they have low overhead.
• It uses sleep-locks only in the file system, where it is convenient to be
able to hold locks across lengthy disk operations
Limitations of locks
• Locks often solve concurrency problems cleanly, but there are times
when they are awkward.
• Sometimes a function uses data which must be guarded by a lock,
but the function is called both from code that already holds the lock
and from code that wouldn't otherwise need the lock.
• One way to deal with this is to have two variants of the function, one
that acquires the lock, and the other that expects the caller to already
hold the lock
• Another approach is for the function to require callersto hold the lock
whether the caller needs it or not,as with
Limitations of locks
• Kernel developers need to be aware of such requirements.
• It might seem that one could simplify situations where both caller and
callee need a lock by allowing "recursive locks" ,
• so that if a function holds a lock , any function it calls is allowed to re-
acquire the lock.
Locks in xv6
initlock
acquire
release
Spin Lock
wakeup
bcache.lock
cons.lock
ftable.lock
Thank you

You might also like