You are on page 1of 26

UNIT-1 CONCURRENT AND PARALLEL PROGRAMMING

Syllabus: Concurrent versus sequential programming. Concurrent programming constructs


and race condition. Synchronization primitives.

Introduction: An "ordinary" program consists of data declarations and assignment and


control-flow statements in a programming language. Modern languages include structures
such as procedures and modules for organizing large software systems through abstraction
and encapsulation, but the statements that are actually executed are still the elementary
statements that compute expressions, move data and change the flow of control. In fact, these
are precisely the instructions that appear in the machine code that results from compilation.
These machine instructions are executed sequentially on a computer and access data stored in
the main or secondary memories.

A concurrent program is a set of sequential programs that can be executed in parallel. We


use the word process for the sequential programs that comprise a concurrent program and
save the term program for this set of processes.

-A concurrent system is one where a computation can advance without waiting for all other
computations to complete.

-Concurrency deals with managing the access to shared state from different threads.

-Concurrency means that multiple processes or threads are making progress concurrently.
While only one thread is executed at a time by the CPU, these threads can be switched in and
out as required. This means that no thread is actually completed totally before another is
scheduled. So all the threads are executing concurrently.

-An image that demonstrates concurrency is as follows:


In the above diagram, all the four threads are running concurrently. However, only one of
them can be scheduled on a processor at a time.

For example, concurrent processes can be executed on one core by interleaving the execution
steps of each process via time-sharing slices: only one process runs at a time, and if it does
not complete during its time slice, it is paused, another process begins or resumes, and then
later the original process is resumed. In this way, multiple processes are part-way through
execution at a single instant, but only one process is being executed at that instant

Levels of Concurrency

Low-Level Concurrency

In this level of concurrency, there is explicit use of atomic operations. We cannot use such
kind of concurrency for application building, as it is very error-prone and difficult to debug.
Even Python does not support such kind of concurrency.

Mid-Level Concurrency

In this concurrency, there is no use of explicit atomic operations. It uses the explicit locks.
Python and other programming languages support such kind of concurrency. Mostly
application programmers use this concurrency.

High-Level Concurrency

In this concurrency, neither explicit atomic operations nor explicit locks are used. Python has
concurrent, futures module to support such kind of concurrency.

Concurrency can be divided into different levels


● Instruction level is the execution of two or more machine instructions simultaneously
● Statement level is the execution of two or more statements simultaneously
● Unit level is the execution of two or more subprogram units simultaneously
● Program level is the execution of two or more programs simultaneously
● Concurrent control methods increase programming flexibility

The book by Sebesta mentions the following levels of concurrency:

Instruction concurrency: this is actually concurrency at the hardware level, called


"Concurrency at the level of instruction execution" above.

Statement concurrency: This means that several statements of a program are executed
concurrently on some physical parallel computer architecture. Usually, one considers a
conventional (sequential - non-concurrent) programming language, such as Fortran or C.
Certain optimizing compilers can automatically determine which statements are independent
of one another and can be executed concurrently without changing the result of the program
execution. They can also take the existing parallel hardware architecture into account (on
which the program is supposed to be executed) in order to generate code for this hardware
architecture which provides for concurrent statement execution. Note: The programmer is
normally not much involved for this kind of concurrency, since it is dealt with automatically
by the optimizing compiler.

Unit (or sub-program) concurrency: Here one talks about concurrent "units", which are
defined by the programming language and are intended to be executed concurrently.
Programming languages with concurrent "units" allow the programmer to design programs
which include a number of concurrent activities and the concurrency structure of a program
depends very much on the application. A typical application area where concurrent "units"
are very useful is simulation of process control or transportation systems, where the simulated
objects evolve in real time concurrently. This level of concurrency is also called logical
concurrency. In contrast to physical concurrency, it does not imply that the logically
concurrent "units" are really executed in physical concurrency. Often they are executed in an
interleaved manner (time-sharing, see below). Note: Instead of "unit", one often uses the
words process, thread, task, or simply concurrent statements. Note: This kind of concurrency
must be understood by the programmer that uses the programming language that supports
concurrent threads (e.g. Java). In this course, we are mainly interested in this kind of
concurrency.

Program (or task) concurrency: Often several programs are executed concurrently (in an
interleaved manner) on a computer within a time-sharing operating system. The CPU of the
computer is shared among the programs by allocating the CPU to each program for a short
time slice at a time. The allocation is done by a scheduler (see Sebesta's book). The same
technique of time sharing is usually also used for realizing concurrent "units" within a given
program

CATEGORIES OF CONCURRENCY
 There are two distinct categories of concurrent unit control, physical
concurrency and logical concurrency.
 When physical concurrency happenswhen several program units from the same
program literally execute simultaneously on more than one processor.
 On the other hand, logical concurrency, happens when the execution of several
programs takes place in an interleaving fashion on a single processor.

MOTIVATION FOR STUDYING CONCURRENCY


 Provides a method of conceptualizing program solutions to problems.
 The need for software to make effective use of the increasing popularity of the
hardware capability to support multiple processor concurrency.
Properties of Concurrent Systems:

For a program or concurrent system to be correct, some properties must be satisfied by it.
Properties related to the termination of system are as follows −

Correctness property

The correctness property means that the program or the system must provide the desired
correct answer. To keep it simple, we can say that the system must map the starting program
state to final state correctly.

Safety property

The safety property means that the program or the system must remain in a “good” or “safe”
state and never does anything “bad”.

Liveness property

This property means that a program or system must “make progress” and it would reach at
some desirable state.

Actors of concurrent systems

This is one common property of concurrent system in which there can be multiple processes
and threads, which run at the same time to make progress on their own tasks. These processes
and threads are called actors of the concurrent system.
Resources of Concurrent Systems

The actors must utilize the resources such as memory, disk, printer etc. in order to perform
their tasks.

Certain set of rules

Every concurrent system must possess a set of rules to define the kind of tasks to be
performed by the actors and the timing for each. The tasks could be acquiring of locks,
memory sharing, modifying the state, etc.

Barriers of Concurrent Systems

While implementing concurrent systems, the programmer must take into consideration the
following two important issues, which can be the barriers of concurrent systems −

Sharing of data

An important issue while implementing the concurrent systems is the sharing of data among
multiple threads or processes. Actually, the programmer must ensure that locks protect the
shared data so that all the accesses to it are serialized and only one thread or process can
access the shared data at a time. In case, when multiple threads or processes are all trying to
access the same shared data then not all but at least one of them would be blocked and would
remain idle. In other words, we can say that we would be able to use only one process or
thread at a time when lock is in force. There can be some simple solutions to remove the
above-mentioned barriers −

Data Sharing Restriction

The simplest solution is not to share any mutable data. In this case, we need not to use
explicit locking and the barrier of concurrency due to mutual data would be solved.

Data Structure Assistance

Many times the concurrent processes need to access the same data at the same time. Another
solution, than using of explicit locks, is to use a data structure that supports concurrent
access. For example, we can use the queue module, which provides thread-safe queues. We
can also use multiprocessing.JoinableQueue classes for multiprocessing-based
concurrency.

Immutable Data Transfer

Sometimes, the data structure that we are using, say concurrency queue, is not suitable then
we can pass the immutable data without locking it.
Mutable Data Transfer

In continuation of the above solution, suppose if it is required to pass only mutable data,
rather than immutable data, then we can pass mutable data that is read only.

Sharing of I/O Resources

Another important issue in implementing concurrent systems is the use of I/O resources by
threads or processes. The problem arises when one thread or process is using the I/O for such
a long time and other is sitting idle. We can see such kind of barrier while working with an
I/O heavy application. It can be understood with the help of an example, the requesting of
pages from web browser. It is a heavy application. Here, if the rate at which the data is
requested is slower than the rate at which it is consumed then we have I/O barrier in our
concurrent system.

Concurrent versus sequential programming:

Concurrent programming Sequential programming

Concurrent computation or simultaneous Sequential programming involves a


execution of processes or threads at the same consecutive and ordered the execution of
time. processes one after.

Multiple threads of execution are running A single thread of execution weaves its way
simultaneously through your program. through your program.

Multiple PC(Program counter) s are active, A Single PC (Program counter ) identifies the
One for each. current instruction being executed.

A simple example of this is consecutive


additions: 0 + 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 = 45
(1 + 9) + (2 + 8) + (3 + 7) + (4 + 6) + 5 + 0 =
45
Concurrent Programming Constructs:

1) Interleaving:

Interleaving is a process or methodology to make a system more efficient, fast and reliable by
arranging data in a noncontiguous manner. There are many uses for interleaving at the system
level, including:
 Storage: As hard disks and other storage devices are used to store user and system
data, there is always a need to arrange the stored data in an appropriate way.
 Error Correction: Errors in data communication and memory can be corrected
through interleaving.
 Multi-Dimensional Data Structures

Interleaving is also known as sector interleave.


Interleaving divides memory into small chunks. It is used as a high-level technique to solve
memory issues for motherboards and chips. By increasing bandwidth so data can access
chunks of memory, the overall performance of the processor and system increases. This is
because the processor can fetch and send more data to and from memory in the same amount
oftime.

Interleaving is the only technique supported by all kinds of motherboards. High-level


processing management systems are constantly required to implement such techniques.
Interleaving promotes efficient database and communication for servers in large
organizations.

There are various types of interleaving:


1. Two-Way Interleaving: Two memory blocks are accessed at same level for reading
and writing operations. The chance for overlapping exists.
2. Four-Way Interleaving: Four memory blocks are accessed at the same time.
3. Error-Correction Interleaving: Errors in communication systems occur in high
volumes rather than in single attacks. Interleaving controls these errors with specific
algorithms.
Latency is one disadvantage of interleaving. Interleaving takes time and hides all kinds of
error structures, which are not efficient.

2) Mutual Exclusion:

A mutual exclusion (mutex) is a program object that prevents simultaneous access to a shared
resource. This concept is used in concurrent programming with a critical section, a piece of
code in which processes or threads access a shared resource. Only one thread owns the mutex
at a time, thus a mutex with a unique name is created when a program starts. When a thread
holds a resource, it has to lock the mutex from other threads to prevent concurrent access of
the resource. Upon releasing the resource, the thread unlocks the mutex.
Mutex comes into the picture when two threads work on the same data at the same time. It
acts as a lock and is the most basic synchronization tool. When a thread tries to acquire a
mutex, it gains the mutex if it is available, otherwise the thread is set to sleep condition.
Mutual exclusion reduces latency and busy-waits using queuing and context switches. Mutex
can be enforced at both the hardware and software levels.
Disabling interrupts for the smallest number of instructions is the best way to enforce mutex
at the kernel level and prevent the corruption of shared data structures. If multiple processors
share the same memory, a flag is set to enable and disable the resource acquisition based on
availability. The busy-wait mechanism enforces mutex in the software areas. This is
furnished with algorithms such as Dekker's algorithm, the black-white bakery algorithm,
Szymanski's algorithm, Peterson's algorithm and Lamport's bakery algorithm.
Mutually exclusive readers and read/write mutex class codes can be defined for an efficient
implementation of mutex.

3) Safety and Liveness property:

Any property of all executions of a concurrent program can be formulated in terms of safety
and liveness.

Safety: A safety property asserts that nothing "bad" happens throughout execution.

Liveness: A liveness property asserts that something "good" eventually does happen.

For example, the property that a program always produces the correct answer can be
formulated using one safety property and one liveness property. The safety property is that
the program never terminates with the wrong answerterminating with the wrong answer is the
"bad thing". The liveness property is that the program eventually does terminate--termination
is the "good thing". We might also desire that a program generate answers in a timely
manner. This is also a safety property, where the "bad thing" is that the clock (an hidden
variable) has a certain value and the program counter (also an hidden variable) has not
reached the statement that generates the answer.
The key attribute of safety properties is that once the proscribed "bad thing" happens, no
subsequent execution can cause the safety property to hold. On the other hand, the key
attribute of liveness properties is that no partial execution is irremediable: it always remains
possible for the "good thing" to occur during subsequent execution.

4) Semaphores:

Semaphores are integer variables that are used to solve the critical section problem by using
two atomic operations, wait and signal that are used for process synchronization.
The definitions of wait and signal are as follows:
1. Wait

The wait operation decrements the value of its argument S, if it is positive. If S is


negative or zero, then no operation is performed.

wait(S)

while(S<=0);

S--;

2. Signal

The signal operation increments the value of its argument S.

signal(S)

S++;

Types of Semaphores
There are two main types of semaphores i.e. counting semaphores and binary semaphores.
Details about these are given as follows:
1. Counting Semaphores

These are integer value semaphores and have an unrestricted value domain. These
semaphores are used to coordinate the resource access, where the semaphore count is
the number of available resources. If the resources are added, semaphore count
automatically incremented and if the resources are removed, the count is
decremented.
2. Binary Semaphores
The binary semaphores are like counting semaphores but their value is restricted to 0
and 1. The wait operation only works when the semaphore is 1 and the signal
operation succeeds when semaphore is 0. It is sometimes easier to implement binary
semaphores than counting semaphores.
Advantages of Semaphores
Some of the advantages of semaphores are as follows:
1. Semaphores allow only one process into the critical section. They follow the mutual
exclusion principle strictly and are much more efficient than some other methods of
synchronization.
2. There is no resource wastage because of busy waiting in semaphores as processor
time is not wasted unnecessarily to check if a condition is fulfilled to allow a process
to access the critical section.
3. Semaphores are implemented in the machine independent code of the microkernel. So
they are machine independent.

Disadvantages of Semaphores
Some of the disadvantages of semaphores are as follows:
1. Semaphores are complicated so the wait and signal operations must be implemented
in the correct order to prevent deadlocks.
2. Semaphores are impractical for last scale use as their use leads to loss of modularity.
This happens because the wait and signal operations prevent the creation of a
structured layout for the system.
3. Semaphores may lead to a priority inversion where low priority processes may access
the critical section first and high priority processes later.

5) Monitors:

Monitors and semaphores are used for process synchronization and allow processes to access
the shared resources using mutual exclusion. However, monitors and semaphores contain
many differences. Details about both of these are given as follows:
Monitors are a synchronization construct that were created to overcome the problems caused
by semaphores such as timing errors.
Monitors are abstract data types and contain shared data variables and procedures. The shared
data variables cannot be directly accessed by a process and procedures are required to allow a
single process to access the shared data variables at a time.
This is demonstrated as follows:

monitor monitorName

data variables;
Procedure P1(....)

Procedure P2(....)

ProcedurePn(....)

InitializationCode(....)

Only one process can be active in a monitor at a time. Other processes that need to access the
shared variables in a monitor have to line up in a queue and are only provided access when
the previous process releases the shared variables.

6) Channel (programming):

In computing, a channel  is a model for  inter process


communication and synchronization via message passing. A message may be sent over a
channel, and another process or thread is able to receive messages sent over a channel it has
a reference to, as a stream. Different implementations of channels may be buffered or not,
and either synchronous or asynchronous.
Channels are fundamental to the process calculus approach to concurrency, and originated
in communicating sequential processes (CSP), a formal model for concurrency, and has been
used in many derived languages, such as occam, and Limbo programming
language (via Newsqueak and the Alef programming language). They are also used in Lab’s
libthread, as well as in Stackless Python and the Go programming language.
Channel implementations
Channels modeled after the CSP model are inherently synchronous: a process waiting to
receive an object from a channel will block until the object is sent. This is also
called rendezvous behaviour. Typical supported operations are presented below using the
example of the libthread channel API.

 Channel creation of fixed or variable size, returning a reference or handle

 Channel* chancreate(int elemsize, int bufsize)

 sending to a channel

 int chansend(Channel *c, void*v)

 receiving from a channel

 int chanrecv(Channel *c, void*v)

libthread channels[edit]
The Multithreading library, libthread, which was first created for the operating system Plan 9,
offers inter-thread communication based on fixed-size channels.
OCaml events
The OCaml event module offers typed channels for synchronization. When the module's send
and receive functions are called, they create corresponding send and receive events which can
be synchronized.

Examples
XMOS XC[edit]
The XMOS programming language XC provides a primitive type "chan" and two operators
"<:" and ":>" for sending and receiving data from a channel.
In this example, two hardware threads are started on the XMOS, running the two lines in the
"par" block. The first line transmits the number 42 through the channel while the second
waits until it is received and sets the value of x. The XC language also allows asynchronous
receiving on channels through a select statement.

chan c;
int x;
par {
c <: 42;
c :> x;
}

Go
This snippet of Go code performs similarly to the XC code. First the channel c is created,
then a goroutine is spawned which sends 42 through the channel. When the number is put in
the channel x is set to 42. Go allows channels to buffer contents, as well as non blocking
receiving through the use of a select block.

c :=make(chanint)

gofunc() {c <-+42}()

x :=<- c

Applications
In addition to their fundamental use for interprocess communication, channels can be used as
a primitive to implement various other concurrent programming constructs which can be
realized as streams. For example, channels can be used to construct futures and promises,
where a future is a one-element channel, and a promise is a process that sends to the channel,
fulfilling the future. Similarly, iterators can be constructed directly from channels.
7) Message Passing:

Process communication is the mechanism provided by the operating system that allows
processes to communicate with each other. This communication could involve a process
letting another process know that some event has occurred or transferring of data from one
process to another. One of the models of process communication is the message passing
model.
Message passing model allows multiple processes to read and write data to the message
queue without being connected to each other. Messages are stored on the queue until their
recipient retrieves them. Message queues are quite useful for inter process communication
and are used by most operating systems.
A diagram that demonstrates message passing model of process communication is given as
follows:
In the above diagram, both the processes P1 and P2 can access the message queue and store
and retrieve data.
Advantages of Message Passing Model
Some of the advantages of message passing model are given as follows:
1. The message passing model is much easier to implement than the shared memory
model.
2. It is easier to build parallel hardware using message passing model as it is quite
tolerant of higher communication latencies.

Disadvantage of Message Passing Model


The message passing model has slower communication than the shared memory model
because the connection setup takes time.

Race condition
A race condition is an undesirable situation that occurs when

a device or system attempts to perform two or more operations at the same time, but because
of the nature of the device or system, the operations must be done in the proper sequence to
be done correctly.

A simple example of a race condition is a light switch. In some homes there are multiple light
switches connected to a common ceiling light. When these types of circuits are used, the
switch position becomes irrelevant. If the light is on, moving either switch from its current
position turns the light off. Similarly, if the light is off, then moving either switch from its
current position turns the light on. With that in mind, imagine what might happen if two
people tried to turn on the light using two different switches at exactly the same time.
One instruction might cancel the other or the two actions might trip the circuit breaker.

Race conditions are most commonly associated with computer science. In


computer memory or storage, a race condition may occur if commands to read and write a
large amount of data are received at almost the same instant, and the machine attempts to
overwrite some or all of the old data while that old data is still being read. The result may be
one or more of the following: a computer crash, an "illegal operation," notification and
shutdown of the program, errors reading the old data or errors writing the new data. A race
condition can also occur if instructions are processed in the incorrect order.

Suppose for a moment that two processes need to perform a bit flip at a specific memory
location. Under normal circumstances the operation should work like this:

Process 1 Process 2 Memory Value


Read value   0
Flip value   1
  Read value 1
  Flip value 0

In this example, Process 1 performs a bit flip, changing the memory value from 0 to 1.
Process2 then performs a bit flip and changes the memory value from 1 to 0.

If a race condition occurred causing these two processes to overlap, the sequence could
potentially look more like this:

Process 1 Process 2 Memory Value


Read value   0
  Read value 0
Flip value   1
  Flip value 1

In this example, the bit has an ending value of 1 when its value should be 0. This occurs
because Process 2 is unaware that Process 1 is performing a simultaneous bit flip.

Security vulnerabilities caused by race conditions


When a program that is designed to handle tasks in a specific sequence is asked to perform
two or more operations simultaneously, an attacker can take advantage of the time gap
between when the service is initiated and when a security control takes effect in order to
create a deadlock or thread block situation.  With deadlock, two or more threads must wait
for a lock in a circular chain. This defect can cause the entire software system to halt
because such locks can never be acquired if the chain is circular. Thread block can also
dramatically impact application performance. In this type of concurrency defect, one thread
calls a long-running operation while holding a lock and preventing the progress of other
threads. 

Preventing race conditions

In computing environments, race conditions can be prevented by serialization of memory or


storage access. This means if read and write commands are received close together, the read
command is executed and completed first by default.

In a network, a race condition may occur if two users attempt to access an


available channel at the same instant, and neither computer receives notification the channel
is occupied before the system grants access. Statistically, this kind of coincidence will most
likely occur in networks with long lag times, such as those that use geostationary satellites.
To prevent such a race condition from developing, a priority scheme must be devised. For
example, the subscriber whose username begins with the earlier letter of the alphabet (or the
lower numeral) may get priority by default when two subscribers attempt to access the system
within a prescribed increment of time. Hackers can take advantage of race-condition
vulnerabilities to gain unauthorized access to networks.

Race conditions occasionally occur in logic gates when certain inputs come into conflict.
Because the gate output state takes a finite, nonzero amount of time to react to any change in
input states, sensitive circuits or devices following the gate may be fooled by the state of the
output, and thereby not operate properly.

Synchronization

Synchronization is an enforcing mechanism used to impose constraints on the order of


execution of threads. The synchronization controls the relative order of thread execution and
resolves any conflict among threads that might produce unwanted behavior. In simple terms,
synchronization is used to coordinate thread execution and manage shared data.
In an environment where messages are used for communicating between a sender and a
receiver, synchronization is implicit, as a message must be sent before the message can be
received. On the other hand, for a shared-memory based environment, threads have no
implicit interdependency unless some constraints are imposed. Two types of synchronization
operations are widely used: mutual exclusion and condition synchronization.
In the case of mutual exclusion , one thread blocks a critical section – a section of code that
contains shared data – and one or more threads wait to get their turn to enter into the section.
This helps when two or more threads share the same memory space and run simultaneously.
The mutual exclusion is controlled by a scheduler and depends on the granularity of the
scheduler. Condition synchronization, on the other hand, blocks a thread until the system
state specifies some specific conditions. The condition synchronization allows a thread to
wait until a specific condition is reached. Figure 1.2 below shows the generic representation
of synchronization.

Figure 1.2. Generic Representation of Synchronization Block inside Source Code

While a number of techniques are available for synchronization, only a few methods are used
by developers on a regular basis. The techniques used are also to some extent determined by
the programming environment.
The scope of synchronization is broad. Proper synchronization orders the updates to data and
provides an expected outcome. As shown in Figure 1.3 below, shared data d can get access
by threads Ti and Tj at time ti , tj , tk , tl , where

and a proper synchronization maintains the order to update d at these instances and considers
the state of d as a synchronization function of time.
This synchronization function, s , represents the behavior of a synchronized construct with
respect to the execution time of a thread.
Figure 1.3. Shared Data Synchronization, Where Data d Is Protected by a
Synchronization Operation

Figure 1.4 below represents how synchronization operations are performed in an actual


multi-threaded implementation in a generic form, and demonstrates the low of threads.
When m >=1 , the creation timing for initial threads T 1 T m might not be the same. After
block Bi as well as Bj , the number of threads could be different, which means m is not
necessarily equal to n and n is not necessarily equal to p. For all operational environments,
the values of m, n, and p are at least 1.
Figure 1.4. Operational Flow of Threads for an Application

Synchronization primitives

As in traditional computing environment, in the  parallel programming realm


synchronization is typically performed by three types of primitives: semaphores, locks,
and condition variables.
The use of these primitives depends on the application requirements. These synchronization
primitives are implemented by atomic operations and use appropriate memory fences.
A memory fence, sometimes called a memory barrier, is a process or dependent operation
that guarantees that threads see other threads' memory operations by maintaining reasonable
order. To hide the granularity of these synchronization primitives, higher level
synchronizations are used.
(i) Semaphores

Semaphores, the first set of software-oriented primitives to accomplish mutual exclusion of


parallel process synchronization, were introduced by the well known mathematician Edsger
Dijkstra in his 1968 paper, “The Structure of the “THE”-Multiprogramming System”
(Dijkstra 1968).
Dijkstra illustrated that synchronization can be achieved by using only traditional machine
instructions or hierarchical structure. He proposed that a semaphore can be represented by an
integer, sem, and showed that a semaphore can be bounded by two basic atomic operations, P
(proberen, which means test) and V (verhogen, which means increment).
These atomic operations are also referred as synchronizing primitives. Even though the
details of Dijkstra's semaphore representation have evolved, the fundamental principle
remains same. Where, P represents the “potential delay” or “wait” and V represents the
“barrier removal” or “release” of a thread. These two synchronizing primitives can be
represented for a semaphore s as follows:

where semaphore value sem is initialized with the value 0 or 1before the parallel processes
get started. In Dijkstra's representation, T referred to processes. Threads are used here instead
to be more precise and to remain consistent about the differences between threads and
processes.
The P operation blocks the calling thread if the value remains 0, whereas the V operation,
independent of P operation, signals a blocked thread to allow it to resume operation.
These P and V operations are “indivisible actions” and perform simultaneously. The positive
value of sem represents the number of threads that can proceed without blocking, and the
negative number refers to the number of blocked threads.
When the sem value becomes zero, no thread is waiting, and if a thread needs to decrement,
the thread gets blocked and keeps itself in a waiting list. When the value of sem gets
restricted to only 0 and 1, the semaphore is a binary semaphore.
To use semaphore, you can consider semaphore as a counter, which supports two atomic
operations. Implementation of semaphores varies. From a usability perspective, two kinds of
semaphores exist: strong and weak.
These represent the success of individual calls on P. A strong semaphore maintains First-
Come-First-Serve (FCFS) model and provides guarantee to threads to calls on P and avoid
starvation.
And a weak semaphore is the one which does not provide any guarantee of service to a
particular thread and the thread might starve. For example, in POSIX, the semaphores can
get into starvation status and implemented differently than what Dijkstra defined and
considered as a weak semaphore (Reek 2002).
According to Dijkstra, the mutual exclusion of parallel threads using P and V atomic
operations represented as follows:

Semaphores are largely of historical interest. They are the unstructured “goto ” statements
of multi-threaded programming. Most programming environments provide higher-level
structured synchronization primitives.
However, like the goto, semaphores are occasionally the best primitive to use. A typical use
of a semaphore is protecting a shared resource of which at most n instances are allowed to
exist simultaneously. The semaphore starts out with value n. A thread that needs to acquire an
instance of the resource performs operation P. It releases the resource using operation V.
Let's examine how semaphores might be used for the producer-consumer problem and
whether the problem can be resolved using semaphores o not. Producer-consumeris a
classic synchronization problem, also known as the bounded-buffer problem.
Here a producer function generates data to be placed in a shared buffer and a consumer
function receives the data out of the buffer and operates on it, where both producer and
consumer functions execute concurrently.
Pseudo-code using a semaphore for the producer-consumer problem is shown in Figure 1.5
below.

Figure 1.5 Pseudo-code of Producer-Consumer Problem

Here neither producer nor consumer maintains any order. If the producer function operates
forever prior to the consumer function then the system would require an infinite capacity and
that is not possible.
That is why the buffer size needs to be within a boundary to handle this type of scenario and
make sure that if the producer gets ahead of the consumer then the time allocated for the
producer must be restricted. The problem of synchronization can be removed by adding one
more semaphores in the previous solution shown in Figure 1.5 above.
Adding the semaphore would maintain the boundary of buffer as shown in Figure 1.6
below, where s Empty and s Full retain the constraints of buffer capacity for operating
threads.

Figure 1.6 Dual Semaphores Solution for Producer-Consumer Problem


Instead of using two independent semaphores and having a constraint-based solution, the
solution in Figure 1.6 above can be implemented using other synchronization primitives as
well. The following sections discuss how to solve the producer-consumer problem using
locks and conditional variables primitives.

(ii) Locks
Locks are similar to semaphores except that a single thread handles a lock at one instance.
Two basic atomic operations get performed on a lock:
* acquire():Atomically waits for the lock state to be unlocked and sets the lock state to lock.
* release():Atomically changes the lock state from locked to unlocked.
At most one thread acquires the lock. A thread has to acquire a lock before using a shared
resource; otherwise it waits until the lock becomes available.
When one thread wants to access shared data, it first acquires the lock, exclusively performs
operations on the shared data and later releases the lock for other threads to use.
The level of granularity can be either coarse or fine depending on the type of shared data
that needs to be protected from threads. The coarse granular locks have higher lock
contention than finer granular ones.
To remove issues with lock granularity, most of the processors support the Compare and
Swap (CAS) operation, which provides a way to implement lock-free synchronization. The
atomic CAS operations guarantee that the shared data remains synchronized among threads.
If you require the use of locks, it is recommended that you use the lock inside a critical
section with a single entry and single exit, as shown in Figure 1.7 below .

Figure 1.7 A Lock Used Inside a Critical Section

From an implementation perspective, it is always safe to use explicit locks rather than relying
on implicit locks. In general a lock must not be held for a long period of time. The explicit
locks are defined by the developer, whereas implicit locks come from the underlying
framework used, such as database engines provides lock the maintain data consistency.
In the produce-consumer problem, if the consumer wants to consume a shared data before the
producer produces, it must wait. To use locks for the producer-consumer problem, the
consumer must loop until the data is ready from the producer. The reason for looping is that
the lock does not support any wait operation, whereas Condition Variables does.
(i) Lock Types

An application can have different types of locks according to the constructs required to
accomplish the task. You must avoid mixing lock types within a given task. For this reason,
special attention is required when using any third party library.
If your application has some third party dependency for a resource Rand the third party uses
lock type L for R, then if you need to use a lock mechanism for R, you must use lock type L
rather any other lock type. The following sections cover these locks and define their
purposes.
1) Mutexes. The mutex is the simplest lock an implementation can use. Some texts use
the mutex as the basis to describe locks in general. The release of a mutex does not
depend on the release() operation only. A timer attribute can be added with a mutex.
If the timer expires before a release operation, the mutex releases the code block or shared
memory to other threads. A try-finally clause can be used to make sure that the mutex gets
released when an exception occurs. The use of a timer or try-finally clause helps to prevent a
deadlock scenario.
2) Recursive Locks. Recursive locks are locks that may be repeatedly acquired by the
thread that currently owns the lock without causing the thread to deadlock. No other
thread may acquire a recursive lock until the owner releases it once for each time the
owner acquired it.
Thus when using a recursive lock, be sure to balance acquire operations with release
operations. The best way to do this is to lexically balance the operations around single-entry
single-exit blocks, as was shown for ordinary locks.
The recursive lock is most useful inside a recursive function. In general, the recursive locks
are slower than non recursive locks. An example of recursive locks use is shown in Figure
1.8 below .

Figure 1.8 An Example of Recursive Lock Use


3) Read-Write Locks. Read-Write locks are also called shared-exclusive or multiple-
read/single-write locks or non-mutual exclusion semaphores. Read-write locks allow
simultaneous read access to multiple threads but limit the write access to only one
thread.
This type of lock can be used efficiently for those instances where multiple threads need to
read shared data simultaneously but do not necessarily need to perform a write operation. For
lengthy shared data, it is sometimes better to break the data into smaller segments and operate
multiple read-write locks on the dataset rather than having a data lock for a longer period of
time.
4) Spin Locks.Spin locks are non-blocking locks owned by a thread. Waiting threads
must “spin,” that is, poll the state of a lock rather than get blocked. Spin locks are
used mostly on multiprocessor systems.
This is because while the thread spins in a single-core process or system, no process
resources are available to run the other thread that will release the lock. The appropriate
condition for using spin lock sis whenever the hold time of a lock is less than the time of
blocking and waking up a thread.
The change of control for threads involves context switching of threads and updating thread
data structures, which could require more instruction cycles than spin locks. The spin time of
spin locks should be limited to about 50 to 100 percent of a thread context
switch(Kleiman1996) and should not be held during calls to other subsystems.
Improper use of spin locks might cause thread starvation. Think carefully before using this
locking mechanism. The thread starvation problem of spin locks can be alleviated by using a
queuing technique, where every waiting thread to spin on a separate local flag in memory
using First-In, First-Out (FIFO) or queue construct.

(iii) Condition Variables

Important in the parallel programming environment, condition variables are based


on Dijkstra's semaphore semantics, with the exception that no stored value is associated with
the operation.
This means condition variables do not contain the actual condition to test; a shared data state
is used instead to maintain the condition for threads. A thread waits or wakes up other
cooperative threads until a condition is satisfied.
The condition variables are preferable to locks when pooling requires and needs some
scheduling behavior among threads. To operate on shared data, condition variable C , uses a
lock, L . Three basic atomic operations are performed on a condition variable C :
* wait(L ):Atomically releases the lock and waits, where wait returns the lock been acquired
again
* signal(L ): Enables one of the waiting threads to run, where signal returns the lock is still
acquired
* broadcast(L ): Enables all of the waiting threads to run, where broadcast returns the lock is
still acquired
To control a pool of threads, use of a signal function is recommended. The penalty for using a
broadcast-based signaling function could be severe and extra caution needs to be undertaken
before waking up all waiting threads.
For some instances, however, broadcast signaling can be effective. As an example, a “write”
lock might allow all “readers” to proceed at the same time by using a broadcast mechanism.
To show the use of a condition variable for a synchronization problem, the pseudo code
in Figure 1.9 below solves the producer-consumer problem discussed earlier. A
variable LC is used to maintain the association between condition variable C and an
associated lock L .

Figure 1.9 Use of a Condition Variable for the Producer-Consumer Problem Monitors 

For structured synchronization, a higher level construct is introduced for simplifying the use
of condition variables and locks, known as a monitor. The purpose of the monitor is to
simplify the complexity of primitive synchronization operations and remove the
implementation details from application developers.
The compiler for the language that supports monitors automatically inserts lock operations at
the beginning and the end of each synchronization-aware routine.
Most recent programming languages do not support monitor explicitly, rather they expose
lock and unlock operations to the developers. The Java language supports explicit monitor
objects along with synchronized blocks inside a method. In Java, the monitor is maintained
by the ”synchronized” constructs, such as

where the “condition” primitives are used by wait(), notify(), or notifyAll() methods. Do not
confuse this with the Monitor object in the Java SDK though. The Java Monitor object is
used to perform resource management in Java Management Extension (JMX). Similarly, the
monitor r object in C# is used as lock construct.

You might also like