You are on page 1of 22

Parallelism

• Executing two or more operations at the same time is


known as Parallelism.

Goals of Parallelism:
• The purpose of parallel processing is to speedup the
computer processing capability or in words, it increases
the computational speed.
• Increases throughput, i.e. amount of processing that can
be accomplished during a given interval of time.
• Improves the performance of the computer for a given
clock speed.
• Two or more ALUs in CPU can work concurrently to
increase throughput.
• The system may have two or more processors operating
concurrently.
Exploitation of Concurrency:
Techniques of Concurrency:
 Overlap : execution of multiple operations by heterogeneous
functional units.
 Parallelism : execution of multiple operations by homogenous
functional units.

Throughput Enhancement
A computer’s performance is measured by the time taken for executing
a program.
The program execution involves performing instruction cycles, which
includes two types of operations:
 Internal Micro-operations: performed inside the hardware functional
units such as the processor, memory, I/O etc.
 Transfer of information: between different functional hardware units
for Instruction fetch, operand fetch, I/O operation etc.
Types of Parallelism:

Instruction Level Parallelism (ILP)


 Pipelining
 Superscalar

Processor Level Parallelism


 Array Computer
 Multiprocessor
Instruction Pipeline

• An instruction pipeline reads consecutive instructions


from memory while previous instructions are being
executed in other segments.
• Computer needs to process each instruction with the
following sequence of steps.
 Fetch the instruction from memory
 Decode the instruction
 Calculate the effective address
 Fetch the operands from memory
 Execute the instruction
 Store the result in the proper place
Four segment CPU Pipeline
Fetch Instruction

Decode & calculate


effective Address

Yes
Branch?

No
Fetch Operand

Execute Instruction

Interrupt Interrupt
handling ?
No
Yes
Update PC

Empty
Pipe
Timing of Instruction Pipeline

Step 1 2 3 4 5 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
2 FI DA FO
3 FI DA
4 FI - - FI DA FO EX
5 - - - FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Pipeline Conflicts

• Resource conflicts caused by access to memory by two


segments at the same time. These may be resolved by
using separate instruction and data memories.

• Data Dependency conflicts arise when an instruction


depends on the result of a previous instruction, but this
result is not yet available.

• Branch Difficulties arise from branch and other


instructions that change the value of PC.
Instruction-level parallelism (ILP)

• Instruction-level parallelism (ILP) is a measure of how many of


the operations in a computer program can be performed
simultaneously.

• Micro-architectural techniques that are used to exploit ILP include:


 Instruction pipelining where the execution of multiple instructions
can be partially overlapped.

 Superscalar execution in which multiple execution units are used to


execute multiple instructions in parallel. In typical superscalar
processors, the instructions executing simultaneously are adjacent
in the original program order.
• A superscalar CPU architecture implements a form of parallelism
called instruction-level parallelism within a single processor.

• It therefore allows faster CPU throughput than would otherwise be


possible at a given clock rate.

• A superscalar processor executes more than one instruction during


a clock cycle by simultaneously dispatching multiple instructions to
redundant functional units on the processor.

• Each functional unit is not a separate CPU core but an execution


resource within a single CPU such as an arithmetic logic unit, a bit
shifter, or a multiplier.

• While a superscalar CPU is typically also pipelined.

• Pipelining and Superscalar architecture are considered different


performance enhancement techniques.
The superscalar technique is associated with several
identifying characteristics (within a given CPU core):

 Instructions are issued from a sequential instruction stream.


 CPU hardware dynamically checks for data dependencies between
instructions at run time (versus software checking at compile time)
 The CPU accepts multiple instructions per clock cycle.

• Available performance improvement from superscalar techniques is


limited by two key areas:
 The degree of intrinsic parallelism in the instruction stream, i.e.
limited amount of instruction-level parallelism, and
 The complexity and time cost of the dispatcher and associated
dependency checking logic.
 The branch instruction processing.
Processor Level Parallelism

• Multiprocessing is the use of two or more central processing units


(CPUs) within a single computer system.
• The term also refers to the ability of a system to support more than
one processor and/or the ability to allocate tasks between them.
• Multiprocessing sometimes refers to the execution of multiple
concurrent software processes in a system as opposed to a single
process at any one instant.
• The terms multitasking or multiprogramming are more appropriate to
describe this concept, which is implemented mostly in software,
whereas multiprocessing is more appropriate to describe the use of
multiple hardware CPUs.
• A system can be both multiprocessing and multiprogramming, only
one of the two, or neither of the two.
• In a multiprocessing system, all CPUs may be equal, or some may
be reserved for special purposes.
• In multiprocessing, the processors can be used to
execute a single sequence of instructions in multiple
contexts
• In a single instruction stream, single data stream or
SISD, one processor sequentially processes instructions,
each instruction processes one data item.
• Single-instruction, multiple-data or SIMD, often used in
vector processing
• Multiple sequences of instructions in a single context
multiple-instruction, single-data or MISD, used to
describe pipelined processors.
• Multiple sequences of instructions in multiple contexts
(multiple-instruction, multiple-data or MIMD.
Modes of transfer

• Data transfer between the central computer and the I/O


devices may be handled in a variety of modes.

• The modes of transfer are:


1. Programmed I/O.
2. Interrupt-initiated I/O
3. Direct memory access (DMA)
Programmed I/O
• Programmed I/O operations are the result of I/O instructions written
in the computer program.
• Each data item transfer is initiated by an instruction in the program.
• Usually, the transfer is to and from CPU register and peripheral.
• Other instructions are needed to transfer the data to and from CPU
and memory.
• Transferring data under program control requires constant
monitoring of the peripheral by the CPU.
• Once a data transfer is initiated, the CPU is required to monitor the
interface to see when a transfer can again be made.
• It is up to the programmed instructions executed in the CPU to keep
close tabs on everything that is taking place in the interface unit and
the I/O device.
• In this method, the CPU stays in a program loop until the I/O unit
indicates that it is ready for data transfer.
• This is a time-consuming process since it keeps the processor busy
needlessly.
Interrupt Initiated IO

• In the programmed I/O method, the CPU stays in a program loop


until the I/O unit indicates that it is ready for data transfer.
• This is a time-consuming process since it keeps the processor
busy needlessly.
• It can be avoided by using interrupt facility and special
commands to inform the interface to issue an interrupt request
signal when the data are available from the device.
• In the mean-time the CPU can proceed to execute another
program.
• The interface meanwhile keeps monitoring the device.
• When the interface determines that the device is ready for the
data transfer, it generates an interrupt request to the computer.
• Upon detecting the external interrupt signal, the CPU
momentarily stops the task it is processing, branches to a service
program to process the I/O transfer, and then returns to the task it
was originally performing.
Types of Interrupt
1. External interrupts
2. Internal interrupts
3. Software interrupts

• External interrupts come from I/O devices, from a timing device,


from a circuit monitoring the power supply, or from any other
external source. For example: Timeout interrupt

• Internal interrupts arise from illegal or erroneous use of an


instruction or data. Internal interrupts are also called traps. For
example, attempt to divide by zero.
The difference between internal interrupt and external interrupt
 The internal interrupt is initiated by some exceptional
condition caused by program itself rather than by an external
event.
 External interrupts depend on external conditions that are
independent of the program.

• Software Interrupt: A software interrupt is initiated by


executing an instruction. Software interrupt is a special call
instruction that behaves like an interrupt rather than a
subroutine call. The most common use of a software interrupt
is associated with a supervisor call instruction. This instruction
provides means for switching from a CPU user mode to the
supervisor mode.
DMA (Direct Memory Access
• Direct memory access is an I/O technique used for high speed data
transfer.
• In DMA, the interface transfers data into and out of the memory unit
through the memory bus.
• In DMA, the CPU releases the control of the buses to a device
called a DMA controller.
• Removing the CPU from the path and letting the peripheral device
manage the memory buses directly would improve the speed of
transfer.
• The CPU initiates the transfer by supplying the interface with the
starting address and the number of words needed to be transferred
and then proceeds to execute other tasks.
• When the transfer is made, the DMA requests memory cycles
through the memory bus.
• When the request is granted by the memory controller, the DMA
transfers the data directly into memory.
• The CPU merely delays its memory access operation to allow the
direct memory I/O transfer.
• Many computers combine the interface logic with the requirements
for direct memory access into one unit and call it an I/O processor
( IOP).
• A DMA controller takes over the memory buses to manage the
transfer directly between the I/O device and memory using 2 special
control signals BR And BG.
• The BR (bus request) output signal is used by the DMA controller
to the CPU to take control of memory buses.
• The CPU then activates the BG (BUS GRANT) signal to inform the
external DMA that the buses are in high-impedance state.
• Then DMA takes the control of memory buses.
• DMA CONTROLLER: It needs the usual circuit of an interface to
communicate with the CPU and I/O device.
CPU bus signals for DMA Transfer

BR DBUS

ABUS
CPU
RD
BG
WR
Block diagram of DMA Controller

Address bus

Data bus Data Bus Address Bus buffers


buffers

Address registers
DS

RS
DMA Word count Register
RD Control
logic
WR
Control Register
BR
DMA Req
BG
To IO Device
Interrupt
DMA ACK

You might also like