02.2 Basic Concepts

Basic Concepts
Kostis Sagonas
kostis@it.uu.se
IntroPP (UU) Lecture 2 1 / 30

Overview
Concurrent programming using shared memory

Concurrent programming using message passing

Concurrent Programming
Remember: a concurrent program consist of independent tasks, which may

execute during overlapping time periods.
On multi-processor machines, each processor may be running one of

these tasks simultaneously (true parallelism).
On uni-processor machines, only one task is running at any instant.

However, because of multitasking (e.g., time-sharing), several tasks
may appear to run simultaneously.

Concurrency: Interaction
Truly independent tasks are easy to program—but not very useful.

We need tasks that communicate (e.g., initial data, intermediate results,
events such as user input) and share resources (e.g., system devices).
For this, we need some form of interaction between tasks.

Shared Memory
Shared memory is memory that may be accessed simultaneously by

multiple processes (or by multiple threads within a process).
Shared memory provides an efficient means of sharing data and

communicating between different processes/threads.
Most multi-processor architectures today are shared memory architectures:

each CPU core has access to the same (shared) main memory.

Shared Memory: Example
Let us write ... ||| ... to denote concurrent tasks.
What will be the output of the following program?
int x = 0;
int y = x+1; int z = x+1;
printf("%d", y); printf("%d", z);

Shared Memory: Example
Let us write ... ||| ... to denote concurrent tasks.
int x = 0;
int y = x+1; int z = x+1;
printf("%d", y); printf("%d", z);
The program will output “1” followed by another “1”. (It can’t output
anything else because printf is thread-safe. More on that later.)
Note that the two tasks may execute on different processors, or even on
different computers. Shared memory provides an abstraction from the
actual hardware.

Shared Memory: A Simple Abstraction?
The shared memory abstraction is deceptively simple:

Tasks just perform regular memory operations (loads and stores).
Communication is implicit, i.e., there are no explicit annotations in
the code to indicate where tasks are communicating.
However, writing correct code that uses shared memory can be very
tricky. We’ll now discuss some of the challenges.

Concurrency and Non-Determinism
A deterministic algorithm, given any particular input, always performs the

same computations and produces the same output.
Sequential algorithms are deterministic, unless they depend on external

state (such as user input, hardware signals, calling random(), etc.).
Concurrent programs are often timing-sensitive: their output depends on,

e.g., scheduling decisions. Thus, they are often non-deterministic.

Concurrency and Non-Determinism: Example
int x = 0;
x = 1; printf("%d", x);

Concurrency and Non-Determinism: Example
int x = 0;
x = 1; printf("%d", x);
It could be either “0” or “1”, depending on which task is executed first.
(Actually, the program contains a data race. Depending on your hardware and
programming language, it may not output “0” or “1” after all: it could print a
different value or even crash. More on that soon.)

Problem: Race Conditions
A race condition occurs when the result of a concurrent program depends

on the timing of its execution (i.e., different tasks race to perform some
operations or to access a shared resource).
Race conditions easily lead to program bugs when the programmer did not
anticipate all possible executions.

Race Conditions: Example
Consider the following code to withdraw money from an account:

transfer (amount, account from, account to) {
if (account from.balance < amount) return NO;
account to.balance += amount;
account from.balance -= amount;
return YES;
}
What might go wrong when there are two concurrent transfers from the
same account?

The Check-Then-Act Error Pattern
The code on the previous slide is an instance of the check-then-act error

pattern:
if (check(x)) { act(x); }
There is likely a race condition when another task concurrently modifies x

after the check, but before the action.

Race Conditions: Another Example
Consider the following code to implement a counter:

int counter = 0;
count() {
counter = counter + 1;
}
What might go wrong when there are two concurrent calls of count()?


int counter = 0;
count() {
counter += 1;
}


int counter = 0;
count() {
counter++;
}

The Read-Modify-Write Error Pattern
The code on the previous slide is an instance of the read-modify-write

error pattern:
1. Read a variable.
2. Compute a new value (that depends on the value read).
3. Update the variable.
There is likely a race condition when another task concurrently modifies

the variable after the read, but before the update.

Synchronization and Mutual Exclusion
To prevent race conditions, we need to achieve some synchronization
between concurrent tasks.
A critical section is a piece of code that accesses a shared resource (e.g., a
data structure in shared memory) that must not be accessed concurrently.
The basic goal of process synchronization is to ensure mutual exclusion:
no two tasks execute parts of their critical sections at the same time.

Dekker’s Algorithm
bool flag_0 = false; bool flag_1 = false; int turn = 0 // or 1
P0 : P1 :
flag_0 = true; flag_1 = true;
while (flag_1) { while (flag_0) {
if (turn != 0) { if (turn != 1) {
flag_0 = false; flag_1 = false;
while (turn != 0) { while (turn != 1) {
// busy wait // busy wait
} }
flag_0 = true; flag_1 = true;
} }
} }
// critical section // critical section
... ...
turn = 1; turn = 0;
flag_0 = false; flag_1 = false;

Dekker’s Algorithm: Remarks
Dekker’s algorithm (ca. 1962) was the first algorithm to solve the mutual
exclusion problem, using only shared memory for communication.
However, Dekker’s algorithm

is limited to two processes,
makes use of busy waiting (rather than suspending processes),
assumes that the concurrent execution of P0 and P1 is equivalent to
some interleaving of their instructions (which is often not the case on
modern hardware).
There are more advanced synchronization primitives: locks, monitors,

message passing, etc.

Problem: Deadlocks
While too little synchronization leads to race conditions, too much (or
improper) synchronization likewise causes problems.
When one task is executing its critical section, other tasks that want to
begin executing their critical sections must wait.
A deadlock occurs when two (or more) tasks are waiting for each other.
“When two trains approach each other at a crossing, both shall

come to a full stop and neither shall start up again until the
other has gone.”
(alleged Kansas state law, 20th century)

Deadlocks: Example
Consider the following algorithm for copying a file:

1 Open the source file for exclusive access. (Assume that this blocks until
no other process has the file open. Once this call returns, other processes
that attempt to open the file block until the file has been closed again by
the current process.)
2 Open the destination file for exclusive access.
3 Copy data from source to destination.
4 Close the destination file.
5 Close the source file.
What can possibly go wrong?

Deadlocks: Example
Consider the following algorithm for copying a file:

1 Open the source file for exclusive access. (Assume that this blocks until
no other process has the file open. Once this call returns, other processes
that attempt to open the file block until the file has been closed again by
the current process.)
2 Open the destination file for exclusive access.
3 Copy data from source to destination.
4 Close the destination file.
5 Close the source file.
What can possibly go wrong?

Consider concurrent calls copy("A", "B") and copy("B", "A").

Problem: Livelocks
A livelock is similar to a deadlock (two or more tasks are waiting for each
other), but the tasks involved keep changing their state, without making
proper progress.
Real-life example: two people meet in a narrow corridor. Each tries to be

polite by moving aside to let the other person pass.
In practice, lifelocks occur less often than deadlocks, but are somewhat
harder to detect.

Problem: Resource Starvation
A task suffers from resource starvation when it is waiting for a resource

that keeps getting given to other tasks.
For instance, a (bad) scheduling algorithm might never schedule a task as

long as there is another task with higher priority.
Fairness means that as long as the system is making progress, a task that
is waiting for a resource will be granted the resource eventually. (However,
there is not necessarily a fixed upper bound on the waiting time.)

Problem: Data Races
A data race occurs when two (or more) tasks attempt to access the same
shared memory location,
at least one of the accesses is a write, and
the accesses may happen simultaneously.
For instance (as before):
int x = 0;
x = 1; printf("%d", x);
While race conditions may be benign, data races must be avoided! In

many programming languages, they have very weak semantics (e.g., your
program might crash).

Shared Memory: Limitations
Main issue: many processors need fast access to memory.
CPU CPU CPU

I/O
System Bus or Crossbar Switch
Memory
The CPU-to-memory connection is a bottleneck. Shared memory

does not scale well to many (> 10) processors.
Per-processor caches, commonly employed to reduce memory access
times, must be kept coherent (i.e., in sync).

Distributed Memory
Distributed memory refers to a multiple-processor system in which each

processor has its own private memory.
Tasks can only operate on local data. If remote data is required, tasks
must communicate with one or more remote processors.
http://en.wikipedia.org/wiki/Distributed_memory

Distributed Memory: Remarks
Advantages of distributed memory (vs. shared memory):

Scales to many processors
No data races (communication between processors is explicit)
Disadvantages of distributed memory (vs. shared memory):

No uniform address space
High access latency for remote data—programmers must think about
how to distribute data

Distributed Shared Memory
Physically distributed memory can be accessed via the same shared

address space from different processors.
+ : Uniform address space, implicit communication

– : High access latency for remote data

(Non-)Uniform Memory Access
We can also classify memory architectures according to how different

processors access memory.
Uniform memory access (UMA): all processors access memory in the

same way, with access times that are independent of the processor
and memory location. In a symmetric multi-processor (SMP) system,
additionally a single OS instance treats all processors equally.
−→ shared memory
Non-uniform memory access (NUMA): memory access times depend

on the location relative to the processor.
−→ distributed memory

Message Passing
Shared memory is tricky to program (critical sections, mutual exclusion

problem, . . . ). Processes might communicate in other ways.
Message passing makes communication between processes explicit. It relies

on two primitives:
send: sends a copy of some private data to another process

receive: copies data sent by another task to private address space

Synchronous vs. Asynchronous Message Passing
Message passing may be synchronous or asynchronous.
Synchronous: the sender of a message is blocked until the receiver

calls receive.
Asynchronous: the sender of a message can proceed immediately.

The message is buffered until the receiver calls receive.

Direct Communication vs. Channels
Processes may send messages directly to other (named) processes.
Alternatively, processes may send and receive messages via named

communication channels.

02.2 Basic Concepts

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

02.2 Basic Concepts

Uploaded by

Copyright:

Available Formats

Basic Concepts

IntroPP (UU) Lecture 2 1 / 30

Concurrent programming using shared memory

IntroPP (UU) Lecture 2 2 / 30

Remember: a concurrent program consist of independent tasks, which may

On multi-processor machines, each processor may be running one of

On uni-processor machines, only one task is running at any instant.

IntroPP (UU) Lecture 2 3 / 30

Truly independent tasks are easy to program—but not very useful.

For this, we need some form of interaction between tasks.

IntroPP (UU) Lecture 2 4 / 30

Shared memory is memory that may be accessed simultaneously by

Shared memory provides an efficient means of sharing data and

Most multi-processor architectures today are shared memory architectures:

IntroPP (UU) Lecture 2 5 / 30

Let us write ... ||| ... to denote concurrent tasks.

What will be the output of the following program?

IntroPP (UU) Lecture 2 6 / 30

Let us write ... ||| ... to denote concurrent tasks.

What will be the output of the following program?

IntroPP (UU) Lecture 2 6 / 30

The shared memory abstraction is deceptively simple:

IntroPP (UU) Lecture 2 7 / 30

A deterministic algorithm, given any particular input, always performs the

Sequential algorithms are deterministic, unless they depend on external

Concurrent programs are often timing-sensitive: their output depends on,

IntroPP (UU) Lecture 2 8 / 30

What will be the output of the following program?

IntroPP (UU) Lecture 2 9 / 30

What will be the output of the following program?

It could be either “0” or “1”, depending on which task is executed first.

IntroPP (UU) Lecture 2 9 / 30

A race condition occurs when the result of a concurrent program depends

IntroPP (UU) Lecture 2 10 / 30

Consider the following code to withdraw money from an account:

IntroPP (UU) Lecture 2 11 / 30

The code on the previous slide is an instance of the check-then-act error

There is likely a race condition when another task concurrently modifies x

IntroPP (UU) Lecture 2 12 / 30

Consider the following code to implement a counter:

IntroPP (UU) Lecture 2 13 / 30

Consider the following code to implement a counter:

IntroPP (UU) Lecture 2 13 / 30

Consider the following code to implement a counter:

IntroPP (UU) Lecture 2 13 / 30

The code on the previous slide is an instance of the read-modify-write

There is likely a race condition when another task concurrently modifies

IntroPP (UU) Lecture 2 14 / 30

IntroPP (UU) Lecture 2 15 / 30

IntroPP (UU) Lecture 2 16 / 30

However, Dekker’s algorithm

There are more advanced synchronization primitives: locks, monitors,

IntroPP (UU) Lecture 2 17 / 30

“When two trains approach each other at a crossing, both shall

IntroPP (UU) Lecture 2 18 / 30

Consider the following algorithm for copying a file:

What can possibly go wrong?

IntroPP (UU) Lecture 2 19 / 30

Consider the following algorithm for copying a file:

What can possibly go wrong?

IntroPP (UU) Lecture 2 19 / 30

Real-life example: two people meet in a narrow corridor. Each tries to be

IntroPP (UU) Lecture 2 20 / 30