Professional Documents
Culture Documents
Threads.
Concurrent work on the same process for efficiency.
Several activities going on as part of the same process.
Share registers, memory, and other resources.
Careful w/ data synchronization (race conditions & deadlocks):
Process (a single thread) vs. Multithreads
3 / 144
Naming
4 / 144
T5 T3
foo
Threads vs. Processes
10 /
144
Thread 1
Thread 2
(Old) Process Address Space
11 /
144
0xFFFFFFFF
(Reserved for OS)
Stack
Stack pointer
Address space
Heap
Uninitialized vars
(BSS segment)
Initialized vars
(data segment)
Code Program counter
0x00000000 (text segment)
(New) Process Address Space w/ Threads
12 /
144
Implementing Threads
13 /
144
Single threaded
main()
computePI(); //never finish
printf(“hi”); //never reach here
A process has a single thread of control: if it blocks on something nothing else can be done.
Multi-threaded
main()
createThread( computePI() ); //never finish
createThread( printf(“hi”) ); //reaches here
main()
createThread( scanf() ); //not finish ‘till user enters
createThread( autoSaveDoc() ); //reaches here while waiting on I/O
Thread Behavior
20 /
144
Execution flow:
Threads on a Single CPU
21 /
144
Still possible.
Multitasking idea
Share one CPU among many processes (context switch).
Multithreading idea
Share the same process among many threads (thread switch).
Whenever this process has the opportunity to run in the CPU, OS can select one
of its many threads to run it for a while, and so on.
One pid, several thread ids.
Luckily this is usually not the case, e.g., 1 thread does the I/O, ..
Select your threads carefully, one is I/O-bound, other is CPU-bound, ..
With multicores, we still gain big even if threads are all CPU-bound.
Single-threaded Process on Multiple CPUs
23 /
144
Note that even if you have 8 CPUs, a single-threaded process can use
only one of them at a time
Obviously other processes may utilize the unused CPUs but if there are fewer
processes than the CPU count then the system is underutilized
If you implement the multithreaded version of the same program, then
all 8 CPUs can serve the same process and make it finish very early
Multithreading Concept
24 /
144
Responsiveness
One thread blocks, another runs.
One thread may always wait for the user.
Resources sharing
Very easy sharing (use global variables; unlike msg queues, pipes, shmget).
Be careful about data synchronization tough.
Economy
Thread creation is fast.
Context switching among threads may be faster.
‘cos you do not have to duplicate code and global variables (unlike processes).
Scalability
Multiprocessors can be utilized better.
Process that has created 8 threads can use all 8 cores (single-threaded proc. utilize 1
core).
Multithreading Example: WWW
27 /
144
Server gives the page name to the thread and resumes listening.
Thread checks the disk cache in memo; if page not there, do disk I/O;
sends the page to the client (network I/O).
Threading Support
28 /
144
User-level threads: are threads that the OS is not aware of. They exist
entirely within a process, and are scheduled to run within that process'
time slices.
Kernel-level threads are scheduled by the OS, and each thread can be
granted its own time slices by the scheduling algorithm. The kernel
scheduler can thus make intelligent decisions among threads, and
avoid scheduling processes which consist of entirely idle threads (or
I/O bound threads). A task that has multiple threads that are I/O
bound, or that has many threads (and thus will benefit from the
additional time slices that kernel threads will receive) might best be
handled by kernel threads.
Kernel-level threads require a system call for the switch to occur; user-
level threads do not.
Threading Support
31 /
144
Functions in pthread library are actually doing linux system calls, e.g.,
pthread_create() clone()
thread1 thread2
int main(..)
{
..
..
pthread_create(&tid,…,runner,..);
pthread_join(tid); wait
printf (sum);
}
runner (..)
{
..
sum = ..
pthread_exit();
}
Single- to Multi-thread Conversion
33 /
144
In a simple world
Identify functions as parallel activities.
Run them as separate threads.
In real world
Single-threaded programs use global variables, library functions (malloc).
Be careful with them.
Global variables are good for easy-communication but need special care.
Single- to Multi-thread Conversion
34 /
144
Thread 0
read
Thread 1
Thread 2
/* shared */
asd unsigned int cnt = 0; /* thread routine */
volatile
void *count(void *arg) {
//see Note section below for volatile
int i;
for (i=0; i<NITERS; i++)
#define NITERS 100000000
cnt++;
int main() {
return NULL;
pthread_t tid1, tid2;
}
Pthread_create(&tid1, NULL,
count, NULL);
linux> ./badcnt
Pthread_create(&tid2, NULL,
BOOM! cnt=198841183
count, NULL);
linux> ./badcnt
Pthread_join(tid1, NULL);
BOOM! cnt=198261801
Pthread_join(tid2, NULL);
linux> ./badcnt
if (cnt != (unsigned)NITERS*2)
BOOM! cnt=198269672
printf("BOOM! cnt=%d\n",
cnt); cnt should be
else equal to 200,000,000.
printf("OK cnt=%d\n", What went wrong?
cnt);}
Thread Issues
40 /
144
The part of the process that is accessing and changing shared data is
called its critical section.
Change X
Change X
Change Y
Change Y
Change Y Change X
Critical
section:
Critical
section:
Producer Consumer
or
Producer Consumer
Synchronization
48 /
144
Then:
register1 Count PRODUCER (count++)
6
5 5
4
6
register1 = count
register1 = register1 + 1
register2 count = register1
4
5
CONSUMER (count--)
register2 = count
register2 = register2 – 1
CPU count = register2
Main Memory
Synchronization
50 /
144
Critical Section
Thread 1
(modify account balance)
Synchronization
52 /
144
Critical Section
Thread 2 Thread 1
(modify account balance)
2 thread must wait
nd
Critical Section
Thread 2
(modify account balance)
Busy wait:
Synchronization
60 /
144
flag[]
Shared Variables: turn i=0, j=1 are local.
Synchronization
61 /
144
do {
acquire lock
critical section
release lock
remainder section
} while (TRUE);
How to implement acquire/release lock?
Use special machine instructions: TestAndSet, Swap.
Synchronization
64 /
144
{
boolean rv = *target;
*target = TRUE;
return rv:
} //atomic (not interruptible)!!!!!!!!!!!!
Synchronization
65 /
144
// critical section
} while (TRUE);
Synchronization
66 /
144
Can be suspended/interrupted b/w TestAndSet & CMP, but not during TestAndSet.
Synchronization
67 /
144
{
boolean temp = *a;
*a = *b;
*b = temp;
} //atomic (not interruptible)!!!!!!!!!!!!
Synchronization
69 /
144
// critical section
// remainder section
} while (TRUE);
Synchronization
70 /
144
Solution: Semaphores.
wait() = P() = down(). //modify semaphore s via these functions.
signal() = V() = up(). //modify semaphore s via these functions.
These functions can be implemented in kernel as system calls.
Kernel makes sure that wait(s) & signal(s) are atomic.
Solution: Semaphores.
Operations (kernel codes).
Busy-waiting vs. Efficient
Solution: Semaphores.
Operations.
wait(s):
if s positive
s-- and return
else
s-- and block/wait (‘till somebody wakes you
up; then return)
Synchronization
75 /
144
Solution: Semaphores.
Operations.
signal(s):
s++
if there’s 1+ process waiting (new s<=0)
wake one of them up
return
Synchronization
76 /
144
Solution: Semaphores.
Types.
Binary semaphore
Integer value can range only between 0 and 1; can be simpler to implement;
aka mutex locks.
Provides mutual exclusion; can be used for the critical section problem.
Counting semaphore
Integer value can range over an unrestricted domain.
Can be used for other synchronization problems; for example for resource
allocation.
Example: you have 10 instances of a resource. Init semaphore s to 10 in this case.
Synchronization
77 /
144
Solution: Semaphores.
Usage.
An integer variable s that can be shared by N processes/threads.
s can be modified only by atomic system calls: wait() & signal().
s has a queue of waiting processes/threads that might be
sleeping on it.
typedef struct {
int value;
struct process *list;
} semaphore;
Solution: Semaphores.
Usage.
Binary semaphores (mutexes) can be used to solve critical
section problems.
Solution: Semaphores.
Usage.
Process 0 Process 1
do { do {
wait (mutex); wait (mutex);
// Critical Section // Critical Section
signal (mutex); signal (mutex);
// remainder section // remainder section
} while (TRUE); } while (TRUE);
Solution: Semaphores.
Usage.
Kernel puts processes/threads waiting on s in a FIFO queue. Why
FIFO?
Synchronization
81 /
144
Solution: Semaphores.
Usage other than critical section.
Ensure S1 definitely executes before S2 (just a synch problem).
P0 P1
… …
S1; S2;
…. ….
Synchronization
82 /
144
Solution: Semaphores.
Usage other than critical section.
Ensure S1 definitely executes before S2 (just a synch problem).
P0 P1
… …
S1; S2; Solution via semaphores:
…. …. Semaphore x = 0; //inited to 0
P0 P1
… …
S1; wait (x);
signal (x); S2;
…. ….
Synchronization
83 /
144
Solution: Semaphores.
Usage other than critical section.
Resource allocation (just another synch problem).
We have N processes that want a resource that has 5 instances.
Solution:
Synchronization
84 /
144
Solution: Semaphores.
Usage other than critical section.
Resource allocation (just another synch problem).
We’ve N processes that want a resource R that has 5 instances.
Solution:
Semaphore rs = 5;
Every process that wants to use R will do wait(rs);
If some instance is available, that means rs will be nonnegative no blocking.
If all 5 instances are used, that means rs will be negative block ‘till rs nonneg.
Every process that finishes with R will do signal(rs);
A blocked processes will change state from waiting to ready.
Synchronization
85 /
144
Solution: Semaphores.
Usage other than critical section.
Enforce consumer to sleep while there’s no item in the buffer
(another synch problem).
Producer Consumer
do { do {
// produce item wait (Full_Cells); //instead of
.. busy-waiting, go to sleep mode
put item into buffer and give CPU back to producer
.. for faster production (efficiency!!).
signal (Full_Cells); ..
remove item from buffer
} while (TRUE); ..
} while (TRUE);
Semaphore Full_Cells = 0; //initialized to 0 Kernel
Synchronization
86 /
144
Solution: Semaphores.
Synchronization
87 /
144
Solution: Semaphores.
Consumer can never cross the producer curve.
Difference b/w produced and consumed items can be <= BUFSIZE
Synchronization
88 /
144
Another problem:
Low-priority process may cause high-priority process to wait.
Synchronization
93 /
144
buffer
prod cons
full = 4
empty = 6
Problem: allow multiple readers to read at the same time. Only one single writer can
access the shared data at the same time (no reader/writer when writer is active).
Synchronization
98 /
144
Problem: allow multiple readers to read at the same time. Only one single writer can
access the shared data at the same time (no reader/writer when writer is active).
Problem: allow multiple readers to read at the same time. Only one single writer can
access the shared data at the same time (no reader/writer when writer is active).
Case1: First reader acquired the lock, reading, what happens if writer arrives?
Case2: First reader acquired the lock, reading, what happens if reader2 arrvs?
Case3: Writer acquired the lock, writing, what happens if reader1 arrives?
Case4: Writer acquired the lock, writing, what happens if reader2 arrives?
Synchronization
102 /
144
Philosopher in 2 states: eating (needs forks) and thinking (not need forks).
// eat
signal( forks[i] );
signal( forks[ (i + 1) % 5] );
// think
} while(TRUE);
Synchronization
109 /
144
Deadlock in a circular fashion: 4 gets the left fork, context switch (cs), 3 gets the left fork,
cs, .. , 0 gets the left fork, cs, 4 now wants the right fork which is held by 0 forever.
Unlucky sequence of cs’s not likely but possible.
A perfect solution w/o deadlock danger is possible with again semaphores.
Solution #1: put the left back if you cannot grab right.
Solution #2: grab both forks at once (atomic).
Synchronization
111 /
144
Less efficient: might have to switch b/w A and B 1 time more than necessary.
A arrives first.
Synchronization
114 /
144
Any problem?
Synchronization
115 /
144
That is, no thread executes critical point until after all threads have executed rendezvous.
That is, when the first n-1 threads arrive, they should block until the n th thread arrives.
Solution?
Synchronization
117 /
144
That is, no thread executes critical point until after all threads have executed rendezvous.
That is, when the first n-1 threads arrive, they should block until the n th thread arrives.
Solution: semaphore mutex = 1, barrier = 0; int n = 5, count = 0;
Synchronization
118 /
144
That is, when the first n-1 threads arrive, they should block until the nth thread arrives.
Solution: semaphore mutex = 1, barrier = 0; int n = 5, count = 0;
First n-1 threads wait when they get to the barrier. nth thread unlocks the barrier.
Synchronization
119 /
144
That is, when the first n-1 threads arrive, they should block until the nth thread arrives.
Solution: semaphore mutex = 1, barrier = 0; int n = 5, count = 0;
Problem?
Synchronization
120 /
144
That is, when the first n-1 threads arrive, they should block until the nth thread arrives.
Solution: semaphore mutex = 1, barrier = 0; int n = 5, count = 0;
Problem: deadlock! nth thread signals 1 of the waiting threads. No one signals again.
Synchronization
121 /
144
That is, when the first n-1 threads arrive, they should block until the nth thread arrives.
Solution: semaphore mutex = 1, barrier = 0; int n = 5, count = 0;
That is, when the first n-1 threads arrive, they should block until the nth thread arrives.
Solution: semaphore mutex = 1, barrier = 0; int n = 5, count = 0;
Problem: deadlock! 1st thread blocks. Since mutex is locked no one can do count++.
Synchronization
123 /
144
That is, when the first n-1 threads arrive, they should block until the nth thread arrives.
Solution: semaphore mutex = 1, barrier = 0; int n = 5, count = 0;
Solution: Monitors.
Idea: get help not from the OS but from the programming language.
High-level abstraction for process/thread synchronization.
C does not provide monitors (use semaphores) but Java does.
Compiler ensures that the critical regions of your code are protected.
You just identify the critical section of the code, put them into a monitor, and
compiler puts the protection code.
Monitor implementation using semaphores.
Compiler writer/language developer has to worry about this stuff, not the
casual application programmer.
Synchronization
126 /
144
Solution: Monitors.
Monitor is a construct in the language, like class construct:
monitor monitor-name {
// shared variable declarations
procedure P1 (..) { .. }
..
procedure Pn (..) { .. }
Solution: Monitors.
monitor construct guarantees that only one process may be active
within the monitor at a time.
This means that, if a process is running inside the monitor (= running
a procedure, say P1()), then no other process can be active inside the
monitor (= can run P1() or any other procedure of the monitor) at the
same time.
Solution: Monitors.
Schematic view of a monitor.
Solution: Monitors.
Schematic view of a monitor.
This monitor solution solves the critical section (mutual exc.) problem.
But not the other synchronization problems such as produc-consume,
dining philosophs.
Synchronization
130 /
144
Solution: Monitors.
Condition variables to solve all the synchronization problems.
In previous model, no way to enforce a process/thread to wait ‘till a
condition happens.
Now we can
Using conition variables.
condition x, y;
Solution: Monitors.
condition x, y;
Solution: Monitors.
Schematic view of a monitor w/ condition variables.
Solution: Monitors.
Schematic view of a monitor w/ condition variables.
New active process in the monitor (fetched from the entry queue),
does x.signal() from a different/same procedure. Prev. process
resumes from where it got blocked.
Synchronization
134 /
144
Solution: Monitors.
Schematic view of a monitor w/ condition variables.
Solution: Monitors.
Schematic view of a monitor w/ condition variables.
Solution: Monitors.
An example: We have 5 instances of a resource and N processes.
Only 5 processes can use the resource simultaneously.
Process code Monitor code
Synchronization
137 /
144
Solution: Monitors.
An example: Dining philosophers.
monitor DP { void test (int i) {
enum { THINKING, //not holding/wanting resources if ( (state[(i + 4) % 5] != EATING) &&
HUNGRY, //not holding but wanting (state[(i + 1) % 5] != EATING) &&
EATING} //has the resources (state[i] == HUNGRY)) {
state[5]; condition cond[5]; //each philosopher may state[i] = EATING ;
need to wait (no fork to eat), so need 5 condition variables cond[i].signal();
}
//no need for entry/exit code to pickup() ‘cos its in monitor }
void pickup (int i) {
state[i] = HUNGRY; //initially all thinking
test(i); initialization_code() {
if (state[i] != EATING) for (int i = 0; i < 5; i++)
cond[i].wait(); state[i] = THINKING;
} }
void putdown (int i) {
state[i] = THINKING; } /* end of monitor */
// test left and right neighbors
test((i + 4) % 5)
test((i + 1) % 5);
}
Synchronization
138 /
144
Solution: Monitors.
One philosopher/process doing this in an endless loop:
..
DP DiningPhilosophers;
..
while (1) {
//THINK..
Philosopher i:
DiningPhilosophters.pickup (i);
DiningPhilosophers.putdown (i);
//THINK..
}
Synchronization
139 /
144
Solution: Monitors.
First things first; what are the ID’s to access neighbors?
Solution: Monitors.
General idea.
Solution: Monitors.
An example: Allocate a resource to one of the several processes.
Priority-based: The process that will use the resource for the shortest
amount of time (known) will get the resource first if there are other
processes that want the resource.
Processes or Threads
.. that want to use the resource
Resource
Synchronization
142 /
144
Solution: Monitors.
An example: Allocate a resource to one of the several processes.
Assume we have condition variable implementation that can enqueue
sleeping/waiting processes w.r.t. a priority specified as a parameter to
wait() call.
condition x;
x.wait (priority);
Solution: Monitors.
An example: Allocate a resource to one of the several processes.
monitor ResourceAllocator
{
boolean busy; //true if resource is currently in use/allocated
condition x; //sleep the process that cannot acquire the resource
Solution: Monitors.
An example: Allocate a resource to one of the several processes.
Each process should use resource between acquire() and release() calls.