Parallel Processing Chapter 1 2024

Chapter 1: Introduction
Dr. Fatemah K. Al-Assfor

Means: Execution of several activities at the same time.
Examples:
* 2 multiplications at the same time on 2 different processes,
* Printing a file on two printers at the same time.
Parallel processing is commonly used to perform complex tasks and computations to process tasks (i.e.
programs) that have intensive data.
It is a technique used for running two or more processors (CPUs) to handle separate parts of an overall
program (task).
Examples of parallel processing system: the multicore processor which is commonly found on computers
today or multi-processors.
In contrast to parallel processing is the sequential processing, this technique is used to process single task
only. It is an old technique
o Save time
o Solve larger problems
o Parallel nature of the problem, so parallel models fit it best
o Provide concurrency (do multiple things at the same time)
o Taking advantage of non-local resources
o Cost savings
o Overcoming memory constraints
• Nuclear physics • Business Intelligence

• Fluid dynamics • Banking, Finance,
• Weather forecast • Insurance, Risk Analysis
• Image processing, Image synthesis, Virtual reality • Regression tests for large software
• Petroleum • Storage and Access to large logs
• Virtual prototyping • Security: Finger Print matching, Image behavior
• Biology and genomics recognition
A multicore processor is an integrated circuit that has two or more CPUs called cores used
to read and execute the actual program instructions.
In multicore system, program is breaking up to several parts to be executed among multiple cores, each core
performs its operations in parallel as instructed and pulling data from the computer’s memory. This will help
in reducing the execution time of the program, and thus, enhanced performance and reduced power
consumption.
o multicore processor also enables more efficient simultaneous processing of multiple tasks, such as
multithreading (also called hyperthreading).
o The individual cores can execute multiple instructions in parallel, increasing the performance of
software which is written to take advantage of the unique architecture.
o Today, multicore processors are created with two cores (“dual core"), four cores (“quad core"), six
cores ("hexa-core"), and eight cores ("octo-core").
- A thread is a small set of instructions designed to be scheduled and executed by the CPU independently
of the parent process.
A multi-threading CPU is capable of executing multiple threads concurrently.
- A CPU is giving you the illusion that it's doing multiple computations at the same time. It does that
by spending a bit of time on each computation. It can do that because it has an execution context for
each computation.
- Hyperthreading: an Intel technology, divides a physical core into two logical cores, executing an
additional, concurrent set of instructions to increase performance.
Whenever the work is performed with multiple threads of a program running, the basic naming convention
for the cores are (say, for 4 threads running on the system) logical-core 0, logical - core 1, logical-core 2,
and logical-core 3.
- For an i3 processor with hyper-threading (2 threads/core), the max number of threads/ logical-cores
will be: 2x2=4 (means cores: 0,1,2, and 3).
The last logical core is called logical-core 3. So, they named the product i3.
- For an intel core i7, the processor with hyper-threading (2 threads per core), the max number of threads/
logical-cores will be: 4 core X 2 threads/core=8.
Logical cores : 0,1,2,…,7
* The last logical core is called logical-core 7. So, the product is core i7.
Note: Intel core i5 processor (7xx family) does NOT support hyper-threading and has
4 cores.
- Thus, the max number of threads/logical core will be: 4x1=4 (0,1,2, and 3). The last
logical core is called logical core i3. But the performance is somewhere in between i3
and i7 because 4 threads are running on 4 different cores.
So, they decided to call it intel core i5.
Concurrent - means that multiple tasks can be executed in an overlapping time
period. One of the tasks can begin before the preceding one is completed; however,
they won’t be running at the same time.
Concurrent
Parallelism –Parallelism is the ability to execute independent tasks of a program in the

same instant of time.
parallelism
The internal structure of a traditional uniprocessor computer consists of four main structural
components:
- Central processing unit (CPU): Is the heart of the processor used to control the operation
of the computer and performs its data processing functions.
- Main memory: To stores data and instructions.
- I/O: Moves data between the computer and its external environment.
- System Interconnection: Some mechanism that provides for communication among CPU,
main memory, and I/O. A common example of system interconnection is by means of a system
bus, consisting of a number of conducting wires to which all the other components attach.
• Address bus
• Data bus
• Control signals
Control Unit
–It is the heart of the CPU, used to control the operation of the CPU and hence the computer
Arithmetic and Logic Unit (ALU) or Execution Units
–Performs the computer’s data processing function
Registers
–Provide storage internal to the CPU
CPU Interconnection
–Some mechanism that provides for communication among the control unit, ALU, and
registers
Figure 1.3 The CPU parts
- Von Neumann architecture was first published by John von Neumann in 1945. It is a Uni-
processor computer
- Von Neumann architecture is based on the stored-program computer concept, where
instruction data and program data are stored in the same memory.
- Von Neumann architecture referred to as the IAS computer (Institute for Advanced Studies).
Figure 1.4 Block diagram of Von Neumann

- The IAS architecture design consists of
oA main memory, which stores both data and instructions.
oAn arithmetic and logic unit (ALU) capable of operating on binary data
oA control unit, which interprets the instructions in memory and causes them to be executed
oInput– output (I/O) equipment operated by the control unit
- IAS architecture used to perform the elementary operations of arithmetic most frequently, like
addition, subtraction, multiplication, and division.
- The IAS computer, although not completed until 1952, is the prototype of all subsequent
general- purpose computers.
- The control unit operates the IAS by fetching instructions from memory and executing them
one at a time (sequentially).
- The memory of IAS consists of 4,096 storage locations, each location called a “word”.
- The size of each memory word is 40-bit each.
- Both data and instructions are stored in the same memory.

- Numbers are represented in binary form. Each number is represented by a sign bit and a 39-bit value.
- Each instruction is a binary code.
- A word may alternatively contain two 20-bit instructions, with each instruction consisting of an 8-bit
operation code (opcode) specifying the operation to be performed and a 12-bit address designating one of the
words in memory (numbered from 0 to 999)
Figure 1.6 IAS Memory Formats

- Both the control unit and the ALU contain storage locations, called registers, defined as follows:
Memory buffer register (MBR) (40
bit)
- Contains a word to be stored in memory or
sent to the I/O unit Or
It is used to receive a word from memory or 01
Memory address register (MAR) from the I/O unit
(??)
Specifies the address in memory of the
02 word to be written from or read into the
MBR Instruction register (IR) (??)
Contains the op-code part of the
instruction being executed 03
Instruction buffer register (IBR) (20 bit)
Employed to temporarily hold the right

04 hand instruction from a word in memory
Program counter (PC) (12
bit)
Contains the address of the next
instruction to be fetched from the 05
Accumulator (AC) and memory
multiplier quotient (MQ) (?? bit each)
Employed to temporarily hold
06 operands and results of ALU operations
IAS computer had two general-purpose registers available:
-Accumulator (AC), and
- Multiplier/Quotient (MQ)
* they are Employed to hold temporarily operands and results of ALU operations.
EX: the result of multiplying two 40-bit numbers is an 80-bit number; the most
significant 40 bits are stored in the AC and the least significant in the MQ.
• It was an asynchronous machine, meaning that there was no central clock

regulating the timing of the instructions.
• One instruction started executing when the previous one finished.
• The addition-operation time was 62 microseconds and the multiplication-
operation time was 713 microseconds.
The IAS operates by repetitively performing an instruction cycle, as shown in Fig. 1.3.
Instruction Fetch
Instruction Cycle
Decode Execute
Execute cycle
Once the opcode is in the IR, the execute cycle is performed. Control circuitry interprets
the opcode and executes the instruction by sending out the appropriate control signals to
cause data to be moved or an operation to be performed by the ALU.
Figure 1.7 Partial Flowchart of IAS Operation
The IAS computer had a total of 21 instructions. The IAS computer instructions are grouped as
Move data between memory and ALU registers or

between two ALU registers
Data transfer
Normally, the control unit executes instructions in

sequence from memory. This sequence can be changed by a Unconditional branch
branch instruction, which facilitates repetitive operations
The branch can be made dependent on a condition,

thus allowing decision points
Conditional branch
Operations performed by the ALU Arithmetic
Permits addresses to be computed in the ALU and

then inserted into instructions stored in memory. This Address modify
allows a program considerable addressing flexibility.

Parallel Processing Chapter 1 2024

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Parallel Processing Chapter 1 2024

Uploaded by

Copyright:

Available Formats

Chapter 1: Introduction

Dr. Fatemah K. Al-Assfor

• Nuclear physics • Business Intelligence

Parallelism –Parallelism is the ability to execute independent tasks of a program in the

Figure 1.4 Block diagram of Von Neumann

- Both data and instructions are stored in the same memory.

Figure 1.6 IAS Memory Formats

Employed to temporarily hold the right

• It was an asynchronous machine, meaning that there was no central clock

Move data between memory and ALU registers or

Normally, the control unit executes instructions in

The branch can be made dependent on a condition,

Operations performed by the ALU Arithmetic

Permits addresses to be computed in the ALU and

You might also like