You are on page 1of 398

22CS2112: COMPUTER ORGANIZATION

AND ARCHITECTURE
UNIT -I
By
Dr. G. Bhaskar
Associate Professor
Dept. of ECE
Email Id:bhaskar.0416@gmail.com
Outline:

 Computer Organization and Architecture

 Introduction and Syllabus

2
COMPUTER ORGANIZATION AND ARCHITECTURE: SYLLABUS

Pre-requisite: A Course on “Digital Electronics”.


Course Objectives:
 The purpose of the course is to introduce principles of computer organization and the
basic architectural concepts.
 It begins with basic organization, design, and programming of a simple digital
computer and introduces simple register transfer language to specify various computer
operations.
 Topics include computer arithmetic, instruction set design, microprogrammed control
unit, pipelining and vector processing, memory organization and I/O systems, and
multiprocessors

UNIT - I
Digital Computers: Introduction, Block diagram of Digital Computer, Definition of
Computer Organization, Computer Design and Computer Architecture.
Register Transfer Language and Micro operations: Register Transfer language, Register
Transfer, Bus and memory transfers, Arithmetic Micro operations, logic micro operations,
shift micro operations, Arithmetic logic shift unit.
Basic Computer Organization and Design: Instruction codes, Computer Registers
Computer instructions, Timing and Control, 3
Instruction cycle, Memory Reference
Instructions, Input – Output and Interrupt.
UNIT - II
Microprogrammed Control: Control memory, Address sequencing, micro program
example, design of control unit.
Central Processing Unit: General Register Organization, Instruction Formats, Addressing
modes, Data Transfer and Manipulation, Program Control.

UNIT - III
Data Representation: Data types, Complements, Fixed Point Representation, Floating
Point Representation.
Computer Arithmetic: Addition and subtraction, multiplication Algorithms, Division
Algorithms, Floating – point Arithmetic operations. Decimal Arithmetic unit, Decimal
Arithmetic operations.

UNIT - IV
Input-Output Organization: Input-Output Interface, Asynchronous data transfer, Modes
of Transfer, Priority Interrupt Direct memory Access.
Memory Organization: Memory Hierarchy, Main Memory, Auxiliary memory,
Associate Memory, Cache Memory. 4
UNIT -V
Reduced Instruction Set Computer: CISC Characteristics, RISC Characteristics.
Pipeline and Vector Processing: Parallel Processing, Pipelining, Arithmetic Pipeline,
Instruction Pipeline, RISC Pipeline, Vector Processing, Array Processor.
Multi Processors: Characteristics of Multiprocessors, Interconnection Structures, Inter
processor arbitration, Inter processor communication and synchronization, Cache
Coherence.

TEXT BOOK:
1. Computer System Architecture – M. Morris Mano, Third Edition, Pearson/PHI.

REFERENCE BOOKS:
1. Computer Organization – Carl Hamacher, Zvonks Vranesic, SafeaZaky, Vth Edition,
McGraw Hill.
2. Computer Organization and Architecture – William Stallings Sixth Edition,
Pearson/PHI.
3. Structured Computer Organization – Andrew S. Tanenbaum, 4th Edition ,PHI/Pearson
5
Course Outcomes:

 Understand the basics of instruction sets and their impact on processor


design.

 Demonstrate an understanding of the design of the functional units of a


digital computer system.

 Evaluate cost performance and design trade-offs in designing and


constructing a computer processor including memory.

 Design a pipeline for consistent execution of instructions with minimum


hazards.

 Recognize and manipulate representations of numbers stored in digital


computers

6
7
UNIT - I
Digital Computers: Introduction, Block diagram of Digital Computer, Definition of

Computer Organization, Computer Design and Computer Architecture.


Register Transfer Language and Micro operations: Register Transfer language, Register
Transfer, Bus and memory transfers, Arithmetic Micro operations, logic micro

operations, shift micro operations, Arithmetic logic shift unit.


Basic Computer Organization and Design: Instruction codes, Computer Registers
Computer instructions, Timing and Control, Instruction cycle, Memory Reference

Instructions, Input – Output and Interrupt.

8
Digital Computers: Introduction
 The digital computer is a digital system that performs various computational
tasks.
 The word digital implies that the information in the computer is represented by
variables that take a limited number of discrete values.
 These values are processed internally by components that can maintain a
limited number of discrete states.
 The decimal digits 0, 1, 2, . . . , 9, for example, provide 10 discrete values.
 Digital computers use the binary number system, which has two digits: 0 and 1.
 A binary digit is called a bit. Information is represented in digital computers in
groups of bits.
 By using various coding techniques, groups of bits can be made to represent
not only binary numbers but also other discrete symbols, such as decimal digits
or letters of the alphabet.
9
Block diagram of Digital Computer:

Figure 1.1: Block diagram of a digital computer.


10
What is a computer?
• a computer is a sophisticated electronic calculating
machine that:
– Accepts input information,
– Processes the information according to a list of internally
stored instructions and
– Produces the resulting output information.
• Functions performed by a computer are:
– Accepting information to be processed as input.
– Storing a list of instructions to process the information.
– Processing the information according to the list of
instructions.
– Providing the results of the processing as output.

11
Block Diagram of a Digital Computer
COMPUTER ORGANISATION AND
ARCHITECTURE

• The components from which computers are built,


i.e., computer organization.
• In contrast, computer architecture is the science of
integrating those components to achieve a level of
functionality and performance.
• It is as if computer organization examines the
lumber, bricks, nails, and other building material
• While computer architecture looks at the design of
the house.
Brief History of Computer Evolution
Two phases:
1. before VLSI 1945 – 1978
• ENIAC
• IAS VLSI = Very Large
• IBM Scale Integration
• PDP-8
2. VLSI 1978  present day
• microprocessors !
Evolution of Computers
FIRST GENERATION (1945 – 1955)

• Program and data reside in the same memory


(stored program concepts – John von Neumann)
• ALP was made used to write programs
• Vacuum tubes were used to implement the functions
(ALU & CU design)
• Magnetic core and magnetic tape storage devices are
used
• Using electronic vacuum tubes, as the switching
components
SECOND GENERATION (1955 – 1965)

• Transistor were used to design ALU & CU


• HLL is used (FORTRAN)
• To convert HLL to MLL compiler were used
• Separate I/O processor were developed to operate in
parallel with CPU, thus improving the performance
• Invention of the transistor which was faster, smaller
and required considerably less power to operate
THIRD GENERATION (1965-1975)

• IC technology improved
• Improved IC technology helped in designing low cost, high
speed processor and memory modules
• Multiprogramming, pipelining concepts were incorporated
• DOS allowed efficient and coordinate operation of computer
system with multiple users
• Cache and virtual memory concepts were developed
• More than one circuit on a single silicon chip became
available
FOURTH GENERATION (1975-1985)

• CPU – Termed as microprocessor


• INTEL, MOTOROLA, TEXAS,NATIONAL
semiconductors started developing microprocessor
• Workstations, microprocessor (PC) & Notebook
computers were developed
• Interconnection of different computer for better
communication LAN,MAN,WAN
• Computational speed increased by 1000 times
• Specialized processors like Digital Signal Processor
were also developed
BEYOND THE FOURTH GENERATION
(1985 – TILL DATE)

• E-Commerce, E- banking, home office


• ARM, AMD, INTEL, MOTOROLA
• High speed processor - GHz speed
• Because of submicron IC technology lot of
added features in small size
Generations of Progress
Table 3.2 The 5 generations of digital computers, and their ancestors.
Generation Processor Memory I/O devices Dominant
(begun) technology innovations introduced look & fell
0 (1600s) (Electro-) Wheel, card Lever, dial, Factory
mechanical punched card equipment
1 (1950s) Vacuum tube Magnetic Paper tape, Hall-size
drum magnetic tape cabinet
2 (1960s) Transistor Magnetic Drum, printer, Room-size
core text terminal mainframe
3 (1970s) SSI/MSI RAM/ROM Disk, keyboard, Desk-size
chip video monitor mini
4 (1980s) LSI/VLSI SRAM/DRAM Network, CD, Desktop/
mouse,sound laptop micro
5 (1990s) ULSI/GSI/ SDRAM, Sensor/actuator, Invisible,
WSI, SOC flash point/click embedded

Slide 20
COMPUTER TYPES

Computers are classified based on the


parameters like
• Speed of operation
• Cost
• Computational power
• Type of application
DESK TOP COMPUTER

• Processing &storage units, visual display &audio uits,


keyboards
• Storage media-Hard disks, CD-ROMs
• Eg: Personal computers which is used in homes and offices
• Advantage: Cost effective, easy to operate, suitable for general
purpose educational or business application
NOTEBOOK COMPUTER

• Compact form of personal computer (laptop)


• Advantage is portability
WORK STATIONS
• More computational power than PC
•Costlier
•Used to solve complex problems which arises in engineering
application (graphics, CAD/CAM etc)

ENTERPRISE SYSTEM (MAINFRAME)


•More computational power
•Larger storage capacity
•Used for business data processing in large organization
•Commonly referred as servers or super computers
SERVER SYSTEM

• Supports large volumes of data which frequently need to be


accessed or to be modified
•Supports request response operation

SUPER COMPUTERS

•Faster than mainframes


•Helps in calculating large scale numerical and algorithm
calculation in short span of time
•Used for aircraft design and testing, military application and
weather forecasting
HANDHELD
• Also called a PDA (Personal
Digital Assistant).
• A computer that fits into a
pocket, runs on batteries, and
is used while holding the unit
in your hand.
• Typically used as an
appointment book, address
book, calculator, and notepad.
• Can be synchronized with a
personal microcomputer as a
backup.
Basic Terminology
• Computer • Software
– A device that accepts input, – A computer program that tells
processes data, stores data, and the computer how to perform
produces output, all according to particular tasks.
a series of stored instructions.
• Network
• Hardware – Two or more computers and
– Includes the electronic and other devices that are
mechanical devices that process connected, for the purpose of
the data; refers to the computer sharing data and programs.
as well as peripheral devices.
• Peripheral devices
– Used to expand the computer’s
input, output and storage
capabilities.
Basic Terminology
• Input
– Whatever is put into a computer system.
• Data
– Refers to the symbols that represent facts, objects, or ideas.
• Information
– The results of the computer storing data as bits and bytes; the words,
numbers, sounds, and graphics.
• Output
– Consists of the processing results produced by a computer.
• Processing
– Manipulation of the data in many ways.
• Memory
– Area of the computer that temporarily holds data waiting to be processed,
stored, or output.
• Storage
– Area of the computer that holds data on a permanent basis when it is not
immediately needed for processing.
Basic Terminology

•Assembly language program (ALP) – Programs are written using


mnemonics

•Mnemonic – Instruction will be in the form of English like form

•Assembler – is a software which converts ALP to MLL (Machine


Level Language)

•HLL (High Level Language) – Programs are written using English


like statements

•Compiler - Convert HLL to MLL, does this job by reading source


program at once
Basic Terminology
•Interpreter – Converts HLL to MLL, does this job
statement by statement

•System software – Program routines which aid the user


in the execution of programs eg: Assemblers, Compilers

•Operating system – Collection of routines responsible


for controlling and coordinating all the activities in a
computer system
Computing Systems

Computers have two kinds of components:


• Hardware, consisting of its physical devices
(CPU, memory, bus, storage devices, ...)
• Software, consisting of the programs it has
(Operating system, applications, utilities, ...)
Functions of computer
• ALL computer functions are:
– Data PROCESSING
– Data STORAGE Data = Information
– Data MOVEMENT
– CONTROL Coordinates How
Information is Used
Functional Units of a Computer

Processor
Input
Control
Memory

ALU
Output

Since 1946 all computers have had 5 components!!!


Functional units of a computer
Input unit accepts Arithmetic and logic unit(ALU):
information: •Performs the desired
•Human operators, operations on the input
•Electromechanical devices (keyboard) information as determined
•Other computers by instructions in the memory

Memory Arithmetic
Input Instr1 & Logic
Instr2
Instr3
Data1
Output Data2 Control

I/O Stores Processor


information: Control unit coordinates
Output unit sends various actions
results of processing: •Instructions,
•Data •Input,
•To a monitor display, •Output
•To a printer •Processing
33
FUNCTIONAL UNITS OF COMPUTER
• Input Unit

• Output Unit

• Central processing Unit (ALU and Control Units)

• Memory

• Bus Structure
INPUT UNIT:

•Converts the external world data to a binary format, which can


be understood by CPU

•Eg: Keyboard, Mouse, Joystick etc

OUTPUT UNIT:

•Converts the binary format data to a format that a common


man can understand

•Eg: Monitor, Printer, LCD, LED etc


Input unit
Binary information must be presented to a computer in a specific format. This
task is performed by the input unit:
- Interfaces with input devices.
- Accepts binary information from the input devices.
- Presents this binary information in a format expected by the computer.
- Transfers this information to the memory or processor.
Real world Computer

Memory

Keyboard
Audio input
Input Unit
……

Processor

36
Output unit
•Computers represent information in a specific binary form. Output units:
- Interface with output devices.
- Accept processed results provided by the computer in specific binary form.
- Convert the information in binary form to a form understood by an
output device.

Computer Real world

Memory Printer
Graphics display
Speakers
……
Output Unit

Processor

37
CPU
•The “brain” of the machine

•Responsible for carrying out computational task

•Contains ALU, CU, Registers

•ALU Performs Arithmetic and logical operations

•CU Provides control signals in accordance with some


timings which in turn controls the execution process

•Register Stores data and result and speeds up the


operation
Arithmetic and logic unit (ALU)

• Operations are executed in the Arithmetic and


Logic Unit (ALU).
– Arithmetic operations such as addition, subtraction.
– Logic operations such as comparison of numbers.

• In order to execute an instruction, operands need


to be brought into the ALU from the memory.
– Operands are stored in general purpose registers available in the ALU.
– Access times of general purpose registers are faster than the cache.

• Results of the operations are stored back in the


memory or retained in the processor for
immediate use.

39
Control unit

• Operation of a computer can be summarized as:


– Accepts information from the input units (Input unit).
– Stores the information (Memory).
– Processes the information (ALU).
– Provides processed results through the output units (Output unit).
• Operations of Input unit, Memory, ALU and
Output unit are coordinated by Control unit.
• Instructions control “what” operations take place
(e.g. data transfer, processing).
• Control unit generates timing signals which
determines “when” a particular operation takes
place.

40
MEMORY

•Stores data, results, programs

•Two class of storage


(i) Primary (ii) Secondary

•Two types are RAM or R/W memory and ROM read only memory

•ROM is used to store data and program which is not going to change.

•Secondary storage is used for bulk storage or mass storage


Memory unit
• Memory unit stores instructions and data.
– Recall, data is represented as a series of bits.
– To store data, memory unit thus stores bits.

• Processor reads instructions and reads/writes


data from/to the memory during the
execution of a program.
– In theory, instructions and data could be fetched one bit at a time.
– In practice, a group of bits is fetched at a time.
– Group of bits stored or retrieved at a time is termed as “word”
– Number of bits in a word is termed as the “word length” of a computer.

• In order to read/write to and from memory, a


processor should know where to look:
– “Address” is associated with each word location.

42
Memory unit (contd..)
• Processor reads/writes to/from memory
based on the memory address:
– Access any word location in a short and fixed amount of time based on the
address.
– Random Access Memory (RAM) provides fixed access time independent of
the location of the word.
– Access time is known as “Memory Access Time”.
• Memory and processor have to
“communicate” with each other in order to
read/write information.
– In order to reduce “communication time”, a small amount of RAM (known as
Cache) is tightly coupled with the processor.
• Modern computers have three to four levels of RAM units with different speeds
and sizes:
– Fastest, smallest known as Cache
– Slowest, largest known as Main memory.

43
Memory unit (contd..)

• Primary storage of the computer consists of RAM units.


– Fastest, smallest unit is Cache.
– Slowest, largest unit is Main Memory.
• Primary storage is insufficient to store large amounts of
data and programs.
– Primary storage can be added, but it is expensive.
• Store large amounts of data on secondary storage
devices:
– Magnetic disks and tapes,
– Optical disks (CD-ROMS).
– Access to the data stored in secondary storage in slower, but take advantage of the fact that
some information may be accessed infrequently.
• Cost of a memory unit depends on its access time,
lesser access time implies higher cost.

44
MEMORY LOCATIONS AND ADDRESSES
•Main memory is the second major subsystem in a
computer. It consists of a collection of storage locations,
each with a unique identifier, called an address.

•Data is transferred to and from memory in groups of


bits called words. A word can be a group of 8 bits, 16
bits, 32 bits or 64 bits (and growing).

•If the word is 8 bits, it is referred to as a byte. The term


“byte” is so common in computer science that
sometimes a 16-bit word is referred to as a 2-byte word,
or a 32-bit word is referred to as a 4-byte word.
Figure 5.3 Main memory
i
Memory addresses are defined using unsigned
binary integers.
Basic Operational Concepts

Basic Function of Computer


• To Execute a given task as per the appropriate program

• Program consists of list of instructions stored in


memory
Interconnection between Processor and Memory
Registers
Registers are fast stand-alone storage locations that hold data
temporarily. Multiple registers are needed to facilitate the
operation of the CPU. Some of these registers are

 Two registers-MAR (Memory Address Register) and


MDR (Memory Data Register) : To handle the data
transfer between main memory and processor. MAR-
Holds addresses, MDR-Holds data
 Instruction register (IR) : Hold the Instructions that is
currently being executed
 Program counter: Points to the next instructions that is
to be fetched from memory
•(PC) (MAR)( the contents of
PC transferred to MAR)

•(MAR) (Address bus) Select a


particular memory location

•Issues RD control signals

•Reads instruction present in memory


and loaded into MDR

•Will be placed in IR (Contents transferred


from MDR to IR)
•Instruction present in IR will be decoded by
which processor understand what operation it has
to perform

•Increments the contents of PC by 1, so that it


points to the next instruction address

•If data required for operation is available in


register, it performs the operation

•If data is present in memory following sequence


is performed
•Address of the data MAR

•MAR Address bus select memory


location where is issued RD signal

•Reads data via data bus MDR

•From MDR data can be directly routed to ALU or it


can be placed in register and then operation can be
performed

•Results of the operation can be directed towards


output device, memory or register

•Normal execution preempted (interrupt)


BUS STRUCTURE
Connecting CPU and memory
The CPU and memory are normally connected by three
groups of connections, each called a bus: data bus, address
bus and control bus

Connecting CPU and memory using three buses


BUS STRUCTURE
•Group of wires which carries information form CPU to peripherals or vice
– versa

•Single bus structure: Common bus used to communicate between


peripherals and microprocessor

INPUT MEMORY PROCESSOR OUTPUT

SINGLE BUS STRUCTURE


Continued:-

• To improve performance multibus structure can be used

•In two – bus structure : One bus can be used to fetch


instruction other can be used to fetch data, required for
execution.

•Thus improving the performance ,but cost increases


A2 A1 A0 Selected
CONTROL BUS location

0 0 0 0th Location
0 0 1 1st Location
0 1 0

W/R
CS RD 0 1 1
A0 PROCESSOR
A1 1 0 0
A2
1 0 1
ADDRESS BUS
1 1 0
D7 D0
D0 D7
1 1 1

DATA BUS
Cont:-

•23 = 8 i.e. 3 address line is required to select 8 location

•In general 2x = n where x number of address lines


(address bit) and n is number of location

•Address bus : unidirectional : group of wires which


carries address information bits form processor to
peripherals (16,20,24 or more parallel signal lines)
Cont:-

•Databus: bidirectional : group of wires which


carries data information bit form processor to
peripherals and vice – versa

•Controlbus: bidirectional: group of wires which


carries control signals form processor to
peripherals and vice – versa

•Figure below shows address, data and control bus


and their connection with peripheral and
microprocessor
PERFORMANCE

•Time taken by the system to execute a program

•Parameters which influence the performance are

•Clock speed

•Type and number of instructions available

•Average time required to execute an instruction

•Memory access time

•Power dissipation in the system

•Number of I/O devices and types of I/O devices connected

•The data transfer capacity of the bus


Information in a computer -- Instructions
• Instructions specify commands to:
– Transfer information within a computer (e.g., from
memory to ALU)
– Transfer of information between the computer and I/O
devices (e.g., from keyboard to computer, or computer to
printer)
– Perform arithmetic and logic operations (e.g., Add two
numbers, Perform a logical AND).
• A sequence of instructions to perform a task is
called a program, which is stored in the memory.
• Processor fetches instructions that make up a
program from the memory and performs the
operations stated in those instructions.

62
Information in a computer -- Data

• Data are the “operands” upon which


instructions operate.
• Data could be:
– Numbers,
– Encoded characters.

• Data, in a broad sense means any digital


information.
• Computers use data that is encoded as a string
of binary digits called bits.

63
How are the functional units connected?
•For a computer to achieve its operation, the functional units need to
communicate with each other.
•In order to communicate, they need to be connected.

Input Output Memory Processor

Bus

•Functional units may be connected by a group of parallel wires.


•The group of parallel wires is called a bus.
•Each wire in a bus can transfer one bit of information.
•The number of parallel wires in a bus is equal to the word length of
a computer

64
Organization of cache and main
memory

Main Cache
memory memory Processor

Bus

Why is the access time of the cache memory lesser than the
access time of the main memory?

65
Computer Components: Top-Level View
Basic Operational Concepts
Computer Systems and Their Parts
Computer

Analog Digital

Fixed-function Stored-program

Electronic Nonelectronic

General-purpose Special-purpose

Number cruncher Data manipulator

The space of computer systems, with what we normally mean by the


word “computer” highlighted.

Slide 68
REGISTER TRANSFER LANGUAGE
• The symbolic notation used to describe the micro-operation transfer
among registers is called RTL (Register Transfer Language).

• The information transformed from one register to another register is


represented in symbolic form by replacement operator is called Register
Transfer.

The use of symbols instead of a narrative explanation provides an
organized and concise manner for listing the micro-operation sequences in
registers and the control functions that initiate them.

A register transfer language is a system for expressing in symbolic form the
microoperation sequences among the registers of a digital module.
• It is a convenient tool for describing the internal organization of digital
computers in concise and precise manner.
• Registers:

• Computer registers are designated by upper case letters (and
optionally followed by digits or letters) to denote the function of
the register.
• For example, the register that holds an address for the memory unit
is usually called a memory address register and is designated by the
name MAR.
• Other designations for registers are PC (for program counter), IR
(for instruction register, and R1
• (for processor register).
• The individual flip-flops in an n-bit register are numbered in
sequence from 0 through n-1, starting from 0 in the rightmost
position and increasing the numbers toward the left.
Register Transfer Logic
• Register Transfer:
• Information transfer from one register to another
is designated in symbolic form by means of a
• replacement operator.
• The statement R2← R1 denotes a transfer of the
content of register R1 into register R2.
• It designates a replacement of the content of R2
by the content of R1.
• By definition, the content of the source register R
1 does not change after the transfer.
• If we want the transfer to occur only under a predetermined control
condition then it can be shown by an if-then statement.

if (P=1) then R2← R1

P is the control signal generated by a control section.
• We can separate the control variables from the register transfer operation
by specifying a Control Function.
• Control function is a Boolean variable that is equal to 0 or 1.
• control function is included in the statement as
• P: R2← R1
• Control condition is terminated by a colon implies transfer operation
be executed by the hardware only if P=1.
• Every statement written in a register transfer notation implies a hardware
construction for implementing the transfer.
Basic symbols of the register transfer
notation
Symbol Description Examples

Letters(and Denotes a register MAR, R2


numerals)
Denotes a part of a
Parentheses ( ) register R2(0-7), R2(L)

Arrow <-- Denotes transfer of R2 <-- R1


information
Separates two R2 <- R1, R1<-R2
Comma , microoperations
• A comma is used to separate two or more
operations that are executed at the same
time.
• The statement
• T : R2← R1, R1← R2 (exchange
operation)
• denotes an operation that exchanges the
contents of two rgisters during one common
clock pulse provided that T=1.
• Bus and Memory Transfers:
• A more efficient scheme for transferring information
between registers in a multiple-register configuration
is a Common Bus System.
• A common bus consists of a set of common lines, one
for each bit of a register.
• Control signals determine which register is selected
by the bus during each particular register transfer.
• Different ways of constructing a Common Bus System
• Using Multiplexers
• Using Tri-state Buffers
• In general a bus system has

multiplex “k” Registers

each register of “n” bits
• to produce “n-line bus”
• no. of multiplexers required = n
• size of each multiplexer = k x 1
• When the bus is includes in the statement, the register transfer is symbolized as
follows:
• BUS← C, R1← BUS
• The content of register C is placed on the bus, and the content of the bus is loaded
into register R1 by activating its load control input. If the bus is known to exist in
the system, it may be convenient just to show the direct transfer.
• R1← C
• Three-State Bus Buffers:

• A bus system can be constructed with three-state gates instead of
multiplexers.
• A three-state gate is a digital circuit that exhibits three states.
• Two of the states are signals equivalent to logic 1 and 0 as in a
conventional gate.
• The third state is a high-impedance state.
• The high-impedance state behaves like an open circuit, which means that
the output is disconnected and does not have logic significance.
• Because of this feature, a large number of three-state gate outputs can be
connected with wires to form a common bus line without endangering
loading effects.

The graphic symbol of a three-state buffer gate is shown in Fig. 4-4.
• Memory Transfer:
• The transfer of information from a memory word to the outside
environment is called a read
• operation.
• The transfer of new information to be stored into the memory is
called a write operation.
• A memory word will be symbolized by the letter M.
• The particular memory word among the many available is selected
by the memory address during the transfer.
• It is necessary to specify the address of M when writing memory
transfer operations.
• This will be done by enclosing the address in square brackets
following the letter M.

• Consider a memory unit that receives the address from a register,
called the address register, symbolized by AR.
• The data are transferred to another register, called the data register,
symbolized by DR.
• The read operation can be stated as follows:
• Read: DR<- M [AR]

• This causes a transfer of information into DR from the memory
word M selected by the address in AR.
• The write operation transfers the content of a data register to a
memory word M selected by the address. Assume that the input
data are in register R1 and the address is in AR.
• The write operation can be stated as follows:
• Write: M [AR] <- R1
Micro-operations
• Types of Micro-operations:
• Register Transfer Micro-operations: Transfer binary information
from one register to another.
• Arithmetic Micro-operations: Perform arithmetic operation on
numeric data stored in registers.
• Logical Micro-operations: Perform bit manipulation operations on
data stored in registers.
• Shift Micro-operations: Perform shift operations on data stored in
registers.
Register Transfer Micro-operation doesn’t change the information
content when the binary information moves from source register to
destination register.

Other three types of micro-operations change the information


change the information content during the transfer.
Arithmetic Micro Operations
• The basic arithmetic micro-operations are
• Addition
• Subtraction
• Increment
• Decrement
• Shift
• The arithmetic Micro-operation defined by the statement below specifies the add
micro- operation.
• R3 ← R1 + R2
• It states that the contents of R1 are added to contents of R2 and sum is transferred
to R3.
• To implement this statement hardware requires 3 registers and digital
component that performs addition
• Subtraction is most often implemented through complementation and addition.
• The subtract operation is specified by the following statement
• R3 ← R1 + R2 + 1
• instead of minus operator, we can write as
• R2 is the symbol for the 1’s complement of R2
• Adding 1 to 1’s complement produces 2’s complement
• Adding the contents of R1 to the 2's complement of R2 is
equivalent to R1-R2.
• Binary Adder:
• Digital circuit that forms the arithmetic sum of 2 bits and the
previous carry is called FULL ADDER.
• Digital circuit that generates the arithmetic sum of 2 binary
numbers of any lengths is called
• BINARY ADDER.
• Figure 4-6 shows the interconnections of four full-adders (FA) to
provide a 4-bit binary adder.
• The augends bits of A and the addend bits of B are designated by
subscript numbers from right to left, with subscript 0 denoting the
low-order bit.
• The carries are connected in a chain through the full-adders. The
input carry to the binary adder is Co and the output carry is C4. The
S outputs of the full-adders generate the required sum bits.
• An n-bit binary adder requires n full-adders.
• Binary Adder – Subtractor:
• The addition and subtraction operations can be combined into one common circuit
by including an exclusive-OR gate with each full-adder.
• A 4-bit adder-subtractor circuit is shown in Fig. 4-7.
• The mode input M controls the operation. When M = 0 the circuit is an adder and
when M = 1 the circuit becomes a subtractor.
• Each exclusive-OR gate receives input M and one of the inputs of B
• When M = 0, we have B xor 0 = B. The full-adders receive the value of B, the
input carry is 0, and the circuit performs A plus B.
• When M = 1, we have B xor 1 = B' and Co = 1.
• The B inputs are all complemented and a 1 is added through the input carry.
• The circuit performs the operation A plus the 2's complement of B.
Logical Micro Operations
• Logic Micro-operations:
• Logic microoperations specify binary operations for strings
of bits stored in registers.
• These operations consider each bit of the register
separately and treat them as binary variables.
• For example, the exclusive-OR microoperation with the
contents of two registers RI and R2 is symbolized by the
statement

• It specifies a logic microoperation to be executed on the


individual bits of the registers provided that the control
variable P = 1.
• Hardware Implementation:
• The hardware implementation of logic microoperations
requires that logic gates be inserted for each bit or pair of
bits in the registers to perform the required logic function.
• Although there are 16 logic microoperations, most
computers use only four--AND, OR, XOR (exclusive-OR), and
complement from which all others can be derived.
Shift Micro Operations
• Shift microoperations are used for serial transfer of
data.
• The contents of a register can be shifted to the left or
the right.
• During a shift-left operation the serial input transfers a
bit into the rightmost position.
• During a shift-right operation the serial input transfers
a bit into the leftmost position.
• There are three types of shifts: logical, circular, and
arithmetic.
• The symbolic notation for the shift microoperations is
shown in Table 4-7.
• Logical Shift:
• A logical shift is one that transfers 0 through the serial input.
• The symbols shl and shr for logical shift-left and shift-right
microoperations.
The microoperations that specify a 1-bit shift to the left of the
content of register R and a 1-bit shift to the right of the content of
register R shown in table 4.7.
• The bit transferred to the end position through the serial input is
assumed to be 0 during a logical shift.
• Circular Shift:
• The circular shift (also known as a rotate operation) circulates the
bits of the register around the two ends without loss of
information.
• This is accomplished by connecting the serial output of the shift
register to its serial input.
• We will use the symbols cil and cir for the circular shift left and
right, respectively.
• Arithmetic Shift:
• An arithmetic shift is a microoperation that shifts
a signed binary number to the left or right.
• An arithmetic shift-left multiplies a signed binary
number by 2.
• An arithmetic shift-right divides the number by 2.
• Arithmetic shifts must leave the sign bit
unchanged because the sign of the number
remains the same when it is multiplied or divided
by 2.
Arithmetic Logic Shift Unit
• Instead of having individual registers performing the
microoperations directly, computer systems employ a number of
storage registers connected to a common operational unit called an
arithmetic logic unit, abbreviated ALU.
• The ALU is a combinational circuit so that the entire register
transfer operation from the
• source registers through the ALU and into the destination register
can be performed during one clock pulse period.
• The shift microoperations are often performed in a separate unit,
but sometimes the shift unit is made part of the overall ALU.
• The arithmetic, logic, and shift circuits introduced in previous
sections can be combined into one ALU with common selection
variables. One stage of an arithmetic logic shift unit is shown in Fig.
4- 13.
• Particular microoperation is selected with inputs S1 and S0. A 4 x 1
multiplexer at the output chooses between an arithmetic output in
Di and a logic output in Ei.
• The data in the multiplexer are selected with inputs S3 and S2. The
other two data inputs to the multiplexer receive inputs Ai-1 for the
shift-right operation and Ai+1 for the shift-left operation.
• The circuit whose one stage is specified in Fig. 4-13 provides eight
arithmetic operation, four logic operations, and two shift
operations.
• Each operation is selected with the five variables S3, S2, S1, S0 and
Cin.
• The input carry Cin is used for selecting an arithmetic operation
only.
Instruction Codes
• Instruction:
• Instructions specify commands to:
– Transfer information within a computer (e.g., from memory to
ALU)
– Transfer of information between the computer and I/O devices
(e.g., from keyboard to computer, or computer to printer)
– Perform arithmetic and logic operations (e.g., Add two numbers,
Perform a logical AND).
• A sequence of instructions to perform a task is called a
program, which is stored in the memory.
• Processor fetches instructions that make up a program
from the memory and performs the operations stated
in those instructions.
98
Instruction Codes
● The organization of the computer is defined by its
internal registers, the timing and control structure,
and the set of instructions that it uses.
● A computer instruction is a binary code that
specifies a sequence of microoperations for the
computer.
● An instruction code is a group of bits that instruct
the computer to perform a specific operation.
● Instruction code is usually divided into two parts.
○ Operation part - Group of bits that define such
operations as add, subtract, multiply, shift, and
complement.
○ Address part - Contains registers or memory words
where the address of operand is found or the result is to
be stored.
● Each computer has its own instruction code format.
Operation Code
● The operation code(op-code) of an
instruction is a group of bits that define such
operations as add, subtract, multiply, shift,
and complement.
● The number of bits required for the operation
code of an instruction depends on the total
number of operations available in the
computer.(n bits for 2n operations)
● An operation code is sometimes called a
macro-operation because it specifies a set
of micro-operations.
Stored Program Organization
● Simplest way to organize computer is to have
one processor register(Accumulator AC) and an
instruction code format with two parts.
○ First-Operation to be performed(Opcode)
○ Second – Address

● The memory address tells the control where to


find an operand in memory.
● This operand is read from memory and used as
the data to be operated on together with the
data stored in the processor register.
● Instructions are stored
in one section of the
memory and data in
another.
● For a memory unit with
4096 words we need 12
bits to specify an
address since
212=4096.
● 4 bits are available for
opcode to specify one
out of 16 possible
operations.
● The control reads a 16-bit instruction from
the program portion of memory.
● It uses the 12-bit address part of the
instruction to read a 16-bit operand from
the data portion of memory.
● It then executes the operation specified
by the operation code.
● The operation is performed with the
memory operand and the content of AC.
Computer Registers
Need of Registers?
● Computer instructions are normally stored in consecutive memory
locations and are executed sequentially one at a time.
● The control reads an instruction from a specific address in memory
and executes it. It then continues by reading the next instruction in
sequence and executes it, and so on.
● This type of instruction sequencing needs a counter to calculate
the address of the next instruction after execution of the current
instruction is completed.
● It is also necessary to provide a register in the control unit for
storing the instruction code after it is read from memory.
● The computer needs processor registers for manipulating data and
a register for holding a memory address.
List of basic Registers
Code Bits Name Purpose

DR 16 Data Register Holds memory operand

AR 12 Address register Holds address for memory

AC 16 Accumulator Processor register

IR 16 Instruction register Holds instruction code

PC 12 Program counter Holds address of instruction

TR 16 Temporary register Holds temporary data

INPR 8 Input register Holds input character

OUTR 8 Output register Holds output character


Common Bus System
Need of Common Bus System ?
● The basic computer has eight registers, a memory
unit, and a control unit.
● Paths must be provided to transfer information from
one register to another and between memory and
registers.
● The number of wires will be excessive if connections
are made between the outputs of each register and
the inputs of the other registers.
● Hence more efficient scheme with a common bus is
used.
● The outputs of seven registers and memory are connected to
the common bus.
● The specific output that is selected for the bus lines at any
given time is determined from the binary value of the
selection variables S2, S1, and S0.
● The number along each output shows the decimal equivalent
of the required binary selection.
● For example, the number along the output of DR is 3. The
16-bit outputs of DR are placed on the bus lines when
S2S1S0 =011 since this is the binary value of decimal 3.
● The lines from the common bus are connected to the inputs
of each register and the data inputs of the memory.
● The particular register whose LD (load) input is enabled
receives the data from the bus during the next clock pulse
transition.
● The memory receives the contents of the bus when its write
input is activated.
● The memory places its 16-bit output onto the bus when the
read input is activated and S2S1S0 =111.
● INPR is connected to provide information to the bus
but OUTR can only receive information from the bus.
● This is because INPR receives a character from an
input device which is then transferred to AC.
● OUTR receives a character from AC and delivers it to
an output device.
● There is no transfer from OUTR to any of the other
registers.
● The inputs of AC come from an adder and logic
circuit.
● This circuit has three sets of inputs.
● One set of 16-bit inputs come from the outputs of AC.
They are used to implement register micro-operations
such as complement AC and shift AC.
● Another set of 16-bit inputs come from the data
regisler DR. The inputs from DR and AC are used for
arithmetic and logic microoperations.
● A third set of 8-bit inputs come from the input register
INPR.
Computer
Instructions
Instruction Cycle
• The processing required for a single instruction is called an
instruction cycle.

• The CPU performs a sequence of micro operations for each


instruction. The sequence for each instruction of the Basic
Computer can be refined into 4 abstract phases:
• 1. Fetch instruction
• 2. Decode
• 3. Fetch operand
• 4. Execute
• ADD LocationA, LocationB
• This instruction adds the contents of memory
locationA and Contents of memory locationB in 3
instruction cycles.
• The program fragment shown adds the contents
of the memory word at address 940 to the
contents of the memory word at address 941 and
stores the result in the latter location.
• Three instructions, which can be described as
three fetch and three execute cycles, are required
INSTRUCTION CYCLE
INSTRUCTION CYCLE
• Program execution can be represented in above
figure:
• Instruction 1
step1. Fetch instruction1
step2. Execute instruction1
• Instruction 2
step3. Fetch instruction2
step4. Execute instruction2
• Instruction 3
step5. Fetch instruction3
step6. Execute instruction3
1. The PC contains 300, the address of the first instruction.
This instruction (the value 1940 in hexadecimal) is
loaded into the instruction register IR, and the PC is
incremented.
2. The first 4 bits (first hexadecimal digit) in the IR indicate
that the AC is to be loaded. The remaining 12 bits (three
hexadecimal digits) specify the address (940) from
which data are to be loaded.
3. The next instruction (5941) is fetched from location 301,
and the PC is incremented.
4. The old contents of the AC and the contents of location
941 are added, and the result is stored in the AC.
5. The next instruction (2941) is fetched from location 302,
and the PC is incremented.
6. The contents of the AC are stored in location 941.
Instruction Format
● The basic computer has three instruction
code formats each having 16 bits
○ Memory reference instructions
○ Register reference instructions
○ I/O instructions
● The opcode part of the instruction contains
three bits and the meaning of the remaining
13 bits depends on the operation code
encountered.
Memory reference instructions
● Bits 0-11 for specifying address.
● Bits 12-14 for specifying opcode.
● 15th bit specifies addressing modes. (0 for direct
and 1 for indirect)

● Opcode=000 through 110


● I=0 or 1
● Eg:
○ AND - 0xxx(direct) or 8xxx(indirect)
○ ADD - 1xxx or 9xxx
Register reference instructions
● Recognized by the operation code 111 with a 0 in the
15th bit of the instruction.
● Specifies an operation on or a test of the AC register.
● An operand from memory is not needed.
● Therefore the 12 bits are used to specify the operation
to be executed.

● Eg:
○ CLA - 7800 : Clear AC
○ CLE - 7400 : Clear E
I/O instructions
● These instructions are needed for transfering
informations to and from AC register.
● Recognized by the opcode 111 and a 1 in the
15th bit.
● Bits 0-11 specify the type of I/O Operation
performed.

● Eg:
○ INP - F800 : Input characters to AC
○ OUT - F400 : Output characters from AC
Instruction Set Completeness
● The set of instructions are said to be
complete if the computer includes a sufficient
number of instructions in each of the
following categories
a. Arithmetic, logical, and shift instructions
b. Instructions for moving information to and from
memory and processor registers
c. Program control instructions
d. Input and output instructions
Timing and
Control
Timing and Control
● The timing for all registers in the basic computer
is controlled by a master clock generator.
● The clock pulses are applied to all flip-flops and
registers.
● The clock pulses do not change the state of a
register unless the register is enabled by a
control signal.
● Two major types of control organization:
○ hardwired control
○ microprogrammed control.
Example
Add R1, R2

T1 Enable R1

T2 Enable R2

T3 Enable ALU for addition operation

T4
•Control unit works with a
reference signal called
T1
processor clock

T2 •Processor divides the


operations into basic steps

R1 R2
•Each basic step is
executed in one clock
cycle

R2
Hardwired Control Microprogrammed Control

● The control logic is ● The control information


implemented with is stored in a control
gates, flip-flops, memory.
decoders, and other ● The control memory is
digital circuits. programmed to initiate
● It can be optimized to the required sequence
produce a fast mode of of microoperations.
operation. ● Required changes or
● Requires changes in the modifications can be
wiring among the done by updating the
various components if microprogram in
the design has to be control memory.
modified or changed.
Hardwired control unit
● Consists of two decoders, a sequence counter, and a
number of control logic gates.
● An instruction read from memory is placed in the
instruction register(IR).
● The IR is divided into three parts:
○ I bit, opcode, and Address bits.

● Op-code in 12-14 bits are decoded with a 3x8


decoder.
● 8 outputs of the decoder are designated by the
symbols D0 through D7.
● Bit 15 is transferred to a flip-flop I.
● Bits 0-11 are applied to the control logic gates.
Hardwired control unit
Hardwired control unit

● The 4-bit sequence counter can count in binary from 0-


15.
● The outputs of the counter are decoded into 16 timing
signals T0-T15.
● The sequence counter SC can be incremented or
cleared synchronously.
● Mostly,SC is incremented to provide the sequence of
timing signals(T1,T2,...,T15)
● Once in a while, the counter is cleared to 0, causing
the next active timing signal to be T0.
● Eg: Suppose,at time T4, SC is cleared to 0 if decoder
output D3 is active. D3T4: SC ← 0.
Hardwired control unit

● The SC responds to the positive transition of the


clock.
● Initially, the CLR input of SC is active.
● Hence it clears SC to 0, giving the timing signal T0
out of the decoder.
● T0 is active during one clock cycle and will trigger
only those registers whose control inputs are
connected to timing signal T0.
● SC is incremented with every positive clock
transition, unless its CLR input is active.
● This produces the sequence of timing signals T 0, T1,
T2, T3, T4 up to T15 and back to T0.
Interrupt
• When the external device becomes ready to be
serviced— then the external device sends an interrupt
request signal to the processor. The processor
responds by suspending operation of the current
program
• With interrupts, the processor can be engaged in
executing other instructions while an I/O operation is
in progress and resumes the original execution after
the device is serviced.
• Processor provides requested service by executing
interrupt service routine (ISR)
• Contents of PC, general registers, and some control
information are stored in memory .
• When ISR completed, processor restored, so that
interrupted program may continue
• It suspends execution of the current program
being executed and saves its context. This
means saving the address of the next
instruction to be executed
END OF UNIT 1
22CS2112: COMPUTER ORGANIZATION
AND ARCHITECTURE
UNIT -II
By
Dr. G. Bhaskar
Associate Professor
Dept. of ECE
Email Id:bhaskar.0416@gmail.com
UNIT – II
• Micro Programmed Control:
• Control memory
• Address sequencing
• Micro program Example
• Design of Control Unit
• Central Processing Unit:
• General Register Organization
• Instruction Formats
• Addressing Modes
• Data Transfer and Manipulation
• Program Control
Control Memory
● A computer with microprogrammed control unit will have 2
memories.
○ Main Memory.
○ Control Memory.
● The main memory is available to the user for storing the
programs.
● The contents of main memory may alter when the data are
manipulated and every time that the program is changed.
● A memory that is part of a control unit is referred to as a control
memory.
● The control memory holds a fixed microprogram that cannot be
altered by the user and contains various control signals.
● The control memory can be a read-only memory (ROM) since
alterations are not needed.
● Writable control memory is used in dynamic programing.
● The function of the control unit in a digital computer is to initiate
sequences of microoperations.
● Two methods of implementing control unit are
○ Hardwired control
○ Microprogrammed control.
● Hardwired Control
○ Design involves the use of fixed instructions, fixed logic blocks,

encoders, decoders, etc.


○ Key characteristics are high-speed operation, expensive, relatively

complex, and no flexibility of adding new instructions.


○ Example CPUs: Intel 8085, Motorola 6802, and any RISC (Reduced
Instruction Set Computer) CPUs.
Microprogrammed Control Unit
● A control unit whose binary control variables are stored in memory
is called a microprogrammed control unit.
● Main advantage - for different control sequence; only have to
change microprogram residing in control memory. (No need of
hardware changes)
● The control function that specifies a microoperation is a binary
variable.
● The control variables at any given time can be represented by a
string of 1’s and 0’s called a control word.
● Each word in control memory contains a microinstruction.
● The microinstruction specifies one or more microoperations.
● A sequence of microinstructions constitutes a microprogram.
Microoperations ➙ Control word ➙ Microinstruction ➙
Microprogram ➙ Control Memory.
● The control memory is a ROM in which all control information is
permanently stored.
● The control address register specifies the address of the microinstruction
● Control data register holds the microinstruction read from memory.
● The microinstruction contains a control word that specifies one or more
microoperations for the data processor.
● The next address is computed in the next address generator(sequencer)
and then transferred into the control address register to read the next
microinstruction.
● The control data register(pipeline register) holds the present
microinstruction while the next address is computed and read from
memory.
Address Sequencing
● Microinstructions are stored in control memory
in groups, with each group specifying a routine.
● Each computer instruction has its own
microprogram routine in control memory to
generate the microoperations that execute the
instruction.
● The hardware that controls the address
sequencing must be capable of sequencing the
microinstructions within a routine and be able to
branch from one routine to another.
Address Sequencing
The address sequencing required in a control
memory. It can be achieved by:
1. Incrementing of the control address register.
2. Unconditional branch or conditional branch,
depending on status bit conditions.
3. A mapping process from the bits of the
instruction to an address for control memory.
4. A facility for subroutine call and return.
Incrementing CAR:
● The incrementer increments the content of the

control address register by one, to select the next


microinstruction in sequence.
Conditional Branching:
● Branch logic provides decision-making capabilities in

the control unit.


● The status bits, together with the field in the

microinstruction that specifies a branch address,


control the conditional branch decisions
● Simplest way to implement branch logic is to test the

specified condition and branch to an address if the


condition is met; else increment the address register.
Mapping of Instruction:
● A special type of branch instruction.
● Here a branching is done to the first word in control memory where
a microprogram routine for an instruction is located.
● The status bits for this branch are the bits in the operation code of
the instruction.
● Can be implemented using ROM.
● The bits of the instruction specify the address of a mapping ROM.
● The contents of the mapping ROM give the bits for the control
address register.
● This concept provides flexibility for adding instructions for control
memory as the need arises.
Subroutines:
● Programs that are used by other routines to accomplish a

particular task.
● Microinstructions can be saved by employing subroutines

that use common sections of microcode.


● Must have a provision for storing the return address

during a subroutine call and restoring the address during a


subroutine return.
● This may be accomplished by placing the incremented

address from the control address register into a subroutine


register and branching to the beginning of the subroutine.
● The subroutine register can then become the source for

transferring the address for the return to the main routine.


 Instruction Fetch Routine
○ An initial address is loaded into the CAR when power is
turned on.
○ Routine is sequenced by incrementing the control address
register(CAR).
○ At the end of the routine, the instruction is in the
instruction register (IR)
 Effective Address Computation Routine
○ It determines the effective address of the operand.
○ Routine can be reached through a branch microinstruction,
based on the status of the mode bits of the instruction.
○ At the end the address of the operand is in the memory
address register.
 Generating Microoperations
○ Depend on the operation code part of the instruction.
○ The transformation from the instruction code bits to an
address in control memory where the routine is located is
referred to as a mapping process.
Microinstruction Format
● Generating microcode for the control memory is called
microprogramming.

● 20 bits of the microinstruction are divided into four functional


parts.
● Three fields F1, F2, and F3 specify microoperations for the
computer.
○ Each field is 3 bits which provides 21 microoperations.
● CD field selects status bit conditions.
○ 2 bits are encoded to specify 4 status bits conditions.
● BR field specifies the type or branch to be used.
○ Used in conjunction with the field AD, to choose the address
of the next microinstruction.
● AD field contains a branch address. Since the control memory has
128 = 27 words, AD has 7 bits.
F1 Microoperation Symbol F2 Microoperation Symbol
000 None NOP 000 None NOP

001 AC ← AC+ DR ADD 001 AC ← AC DR SUB

010 AC ← 0 CLRAC 010 AC ← AC DR OR

011 AC ← AC+1 INCAC 011 AC ← AC DR AND

100 AC ← DR DRTAC 100 DR ← M [AR] READ

101 AC ← DR DRTAR 101 DR ← AC ACTDR

110 AR ← PC PCTAR 110 DR ← DR 1 INCDR

111 M [AR] ← DR WRITE 111 DR(010) ← PC PCTDR


F3 Microoperation Symbol

000 None NOP

001 AC ← AC DR XOP

010 AC ← AC CM

011 AC ← shl AC SHL

100 AC ← shr AC SHR

101 PC ← PC 1 INCPC

110 PC ← AR ARTPC

111 Reserved
CD Conditio Symbol Comments BR Symbol Function
n
00 JMP CAR ← AD if condition = 1
00 Always 1 U Unconditional CAR ← CAR 1 if condition = 0
branch
01 CALL CAR ← AD, SBR ← CAR 1 if
01 DR(15) I Indirect condition = 1
address bit CAR ← CAR 1 if condition = 0

10 AC(15) S Sign bit of AC 10 RET CAR ← SBR (Return from


subroutine)
11 AC 0 Z Zero value in
AC 11 MAP CAR(25) ← DR(1114),
CAR(0,l,6) ← 0
• The symbols defined in above table can be
used to specify micro instructions in symbolic
form.
• A symbolic micro program can be translated
into its binary equivalent by means of an
assembler.
• Each symbolic micro instruction is divided into
five fields: Label, micro operations, CD, BR and
AD.
• The label field may be empty or it may specify a
symbolic address.
-A label is terminated with a colon (:)
• The micro operations field consists of one, two or
three symbols separated by commas.
• The CD field has one of the letters U, I, S or Z
• The BR field contains one of the 4 symbols
• The AD field specifies a value for the address field
with (i)symbolic address (ii)Next symbol (iii) RET
or MAP symbol
Micro program Example
• The micro programs specified in two ways:
1) Symbolic Micro program
2) Binary Micro program
• The symbolic microprogram is a convenient form
for writing microprograms in a way that people
can read and understand.
• The symbolic microprogram must be translated to
binary to store in the memory.
• The equivalent binary form of the symbolic
microprogram is called Binary Micro Program
Micro program Example
Label MicroOperation CD BR AD
FETCH: PCTAR U JMP NEXT
READ,INCPC U JMP NEXT
DRTAR U MAP

The fetch routine needs three microinstructions as


given below:
AR<-PC
DR<-M[AR], PC<-PC+1
AR<-DR, CAR(2-5)<-DR(11-14),CAR(0,1,6)<-0
Design of Control Unit
● Bits of the microinstruction are usually divided
into fields, with each field defining a separate
function.(F1-F2-F3, CD, BR,AD)
● Each field requires a decoder to produce the
corresponding control signals.
● Each of the three fields of the microinstruction
presently available in the output of control
memory are decoded with a 3x8 decoder to
provide eight outputs.
● Each of these outputs must be connected to the
proper circuit to initiate the corresponding
microoperation.
Microprogram sequencer
● The basic components of microprogrammed control unit are the
control memory and the circuits that select the next address.
● The address selection part is called as microprogram sequencer.
● Microprogram sequencer can be constructed with digital functions
to suit a particular application.
● Two imp. factors that must be considered while designing the
microinstruction sequencer:
○ The size of the microinstruction.
○ The address generation time.
● The purpose of microprogram sequencer is to present an address to
the control memory so that a microinstruction may be read and
executed.
● The next address logic of the sequencer determines the specific
address source to be loaded into the CAR.
● The choice of the address source is guided by the next address
information bits that the sequencer receives from address
information bits that the sequencer receives from the present
microinstruction.
Central Processing Unit
Central Processing Unit
● The part of the computer that performs the data-processing
operations is called the central processing unit(CPU).

● The register set stores data used during the execution of the
instructions.
● ALU performs the required microoperations for executing the
instructions.
● The control unit supervises the transfer of information among the
registers and instructs the ALU as to which operation to perform.
General Register Organization
● Registers are used to store the intermediate values
during instruction execution.
● Register organization show how registers are selected
and how data flow between register and ALU.
● A decoder is used to select a particular register.
● The output of each register is connected to two
multiplexers to form the two buses A and B.
● The selection lines in each multiplexer select the input
data for the particular bus.
● The A and B buses form the two inputs of an ALU.
● The operation select lines decide the micro operation
to be performed by ALU.
● The result of the micro operation is available at the output bus.
● The output bus connected to the inputs of all registers, thus by
selecting a destination register it is possible to store the result in it.
Example : To perform R1 ← R2 + R3,
1. MUX A selector (SELA): to place the content of R2 into bus A.
2. MUX B selector (SELB): to place the content of R3 into bus B.
3. ALU operation selector (OPR): to provide the arithmetic addition A.
4. Decoder destination selector (SELD): to transfer the content of the
output bus into R1.
Control Word
● The combined value of a binary selection inputs specifies the
control word.
● It consist of four fields SELA, SELB,and SELD contains three bit each
and OPR field contains four bits thus the total bits in the control
word are 13-bits.

● The three bit of SELA select a source registers of the a input of the
ALU.
● The three bits of SELB select a source registers of the b input of the
ALU.
● The three bits of SELD select a destination register using the
decoder.
● The four bits of OPR select the operation to be performed by ALU.
Example : R2=R1+R3
Instruction Formats
● The bits of the instruction are divided into groups called fields.
1. Opcode field - specifies the operation to be performed.
2. Address field - specifies a memory address / processor register.
3. Mode field - specifies the way the operand or the effective address is
determined.
● The number of address fields depends on the internal organization
of registers.
● Three types of CPU organizations:
1. Single accumulator organization.
Eg: ADD X
2. General register organization.
Eg: ADD R1, R2, R3 - MOV R1, R2
3. Stack organization
Eg: PUSH X
● Based on these, instructions are classified into four formats.
Three-Address Instruction
● Computers with three-address instruction formats can
use each address field to specify either a processor
register or a memory operand.
● The program in assembly language that evaluates X =
(A + B) * (C + D)
ADD R1, A, B // R1 ← M[A] + M[B]
ADD R2, C, D // R2 ← M[C] + M[D]
MUL X, R1, R2 // M[X] ← R1* R2
● Advantage - It results in short programs when
evaluating arithmetic expressions.
● Disadvantage - The binary-coded instructions require
too many bits to specify three addresses.
● Eg: Commercial computer Cyber 170.
Two-Address Instruction
● Two-address instructions are the most common
in commercial computers.
● Here again each address field can specify either
a processor register or a memory word.
● The program to evaluate X = (A + B) * (C + D)
MOV R1, A // R1 ← M[A]
ADD R1, B // R1 ← R1 + M[B]
MOV R2, C // R2 ← M[C]
ADD R2, D // R2 ← R2 + M[D]
MUL R1, R2 // R1 ← R1*R2
MOV X, R1 // M[X] ← R1
One-Address Instruction
● Use an implied accumulator (AC) register for all data
manipulation.
● Here we neglect the second register and assume that
the AC contains the result of all operations.
● The program to evaluate X = (A + B) * (C + D)
LOAD A // AC ← M[A]
ADD B // AC ← AC + M[B]
STORE T // M[T] ← ΑC
LOAD C // AC ← M[C]
ADD D // AC ← AC + M[D]
MUL T // AC ← AC * M[T]
STORE X // M[X] ← AC
● T is the address of a temporary memory location
required for storing the intermediate result.
Zero-Address Instruction
● Used in stack-organized computers.
● The program to evaluate X = (A + B) * (C + D) for a
stack-organized computer
PUSH A // TOS ← A
PUSH B // TOS ← B
ADD // TOS ← (A + B)
PUSH C // TOS ← C
PUSH D // TOS ← D
ADD // TOS ← (C + D)
MUL // TOS ← (C + D)*(A + B)
POP X // M[X] ← TOS
● To evaluate arithmetic expressions in a stack
computer, it is necessary to convert the expression
into reverse Polish notation.
Addressing Modes
● The operation field of an instruction specifies the operation to be
performed.
● This operation must be executed on some data stored in computer
registers or memory words.
● The way the operands are chosen during program execution is
dependent on the addressing mode of the instruction.
● The addressing mode specifies a rule for interpreting or modifying
the address field of the instruction before the operand is actually
referenced.
● Computers use addressing mode techniques for the purpose of
accommodating one or both of the following provisions:
1. To give programming versatility to the user by providing such
facilities as pointers to memory, counters for loop control, indexing
of data, and program relocation.
2. To reduce the number of bits in the addressing field of the
instruction
1. Implied Mode
● The operands are specified implicitly in the
definition of the instruction.
● Eg: CMA - Complement Accumulator.
● All register reference instructions that use an
accumulator are implied-mode instructions.
● Zero-address instructions in a stack-organized
computer are implied-mode instructions
since the operands are implied to be on top
of the stack.
2. Immediate Mode
● The operand is specified in the instruction
itself.
● I has an operand field rather than an address
field.
● The operand field contains the actual
operand to be used in conjunction with the
operation specified in the instruction.
● They are are useful for initializing registers to
a constant value.
● Eg: ADD 7
3. Register Mode
● The address field specifies a processor register.
● In this mode the operands are in registers that reside
within the CPU.
● The particular register is selected from a register field
in the instruction.
● A k-bit field can specify any one of 2k registers.
● Eg: ADD R1
4. Register Indirect Mode
● The instruction specifies a register in the CPU whose contents give the
address of the operand in memory.
● The selected register contains the address of the operand rather than the
operand itself.
● Before using a register indirect mode instruction, the programmer must
ensure that the memory address of the operand is placed in the
processor register with a previous instruction.
● Advantage - The address field of the instruction uses fewer bits to select
a register.
5. Autoincrement or Autodecrement
Mode
● Similar to the register indirect mode except
that the register is incremented or
decremented after (or before) its value is
used to access memory.
● When the address stored in the register refers
to a table of data in memory, it is necessary
to increment or decrement the register after
every access to the table.
● This can be achieved by using the increment
or decrement instruction.
6.Direct Address Mode
● The effective address is equal to the address part of the
instruction.
● The operand resides in memory and its address is given
directly by the address field of the instruction.
● In a branch-type instruction the address field specifies the
actual branch address.
7. Indirect Address Mode
● The address field of the instruction gives the address
where the effective address is stored in memory.
● Control fetches the instruction from memory and uses
its address part to access memory again to read the
effective address.
8. Relative Address Mode
● Content of the program counter is added to the address part of
the instruction in order to obtain the effective address.
● The address part of the instruction is usually a signed number
(positive or negative).
● This number is added to the content of the program counter,
producing an effective address whose position in memory is
relative to the address of the next instruction.
● It is often used with branch-type instructions.
● Example:
Let PC contains the number 825.
The address part of the instruction contains the number 24.
The instruction at location 825 is read from memory during the
fetch phase and the program counter is then incremented by one
to 826.
The effective address computation for the relative address mode is
826 + 24 = 850.
9. Indexed Addressing Mode
● Content of an index register is added to the address part
of the instruction to obtain the effective address.
● The index register is a special CPU register that contains an
index value.
● The address field of the instruction defines the beginning
address of a data array in memory.
● The distance between the beginning address and the
address of the operand is the index value stored in the
index register.
● Any operand in the array can be accessed with the same
instruction if the index register contains the correct index
value.
● The index register can be incremented to facilitate access
to consecutive operands.
10. Base Register Addressing Mode
● In this mode the content of a base register is added to the address
part of the instruction to obtain the effective address.
● Similar to the indexed addressing mode except that the register is
now called a base register.
● The difference between the two modes is in the way they are used
rather than in the way that they are computed.
● A base register holds a base address and the address field of the
instruction gives a displacement relative to this base address.
● This mode is used to facilitate the relocation of programs in
memory.
● When programs and data are moved from one segment of
memory to another, as required in multiprogramming systems, the
address values of instructions must reflect this change of position.
● Here only the value of the base register requires updating to
reflect the beginning of a new memory segment.
Data Transfer & Manipulation

● Computers provide an extensive set of instructions to give the user


the flexibility to carry out various computational tasks.
● The basic set of operations available in a typical computer can be
classified into three categories:
1. Data transfer instructions
2. Data manipulation instructions
3. Program control instructions
● Data transfer instructions cause transfer of data from one location to
another without changing the binary information content.
● Data manipulation instructions are those that perform arithmetic,
logic, and shift operations.
● Program control instructions provide decision-making capabilities and
change the path taken by the program when executed in the
computer.
Data Transfer Instruction
● Data transfer instructions move data from one place in the computer
to another without changing the data content.
● The most common transfers are between memory and processor
registers, between processor registers and input or output, and
between the processor registers themselves.
● Typical Instructions are,
1. Load (LD) - Transfer from memory to a processor register, usually
an accumulator.
2. Store (ST) - Transfer from a processor register into memory.
3. Move (MOV) - Transfer from one register to another.
4. Exchange (XCH) - Swaps information between two registers or a
register and a memory word.
5. Input (IN), Output (OUT) - Transfer data among processor
registers and input or output terminals.
6. Push (PUSH), Pop(POP) - Transfer data between processor
registers and a memory stack.
Data Manipulation Instructions
● Perform operations on data and provide the
computational capabilities for the computer.
● Divided into three basic types:
1. Arithmetic instructions.

2. Logical and bit manipulation instructions.

3. Shift instructions.
Arithmetic Instructions

● Increment (INC) : adds 1 to the value stored in a register or memory


word.
● Decrement (DEC) : subtracts 1 from a value stored in a register or
memory word.
● Add (ADD) : Addition.
● Subtract (SUB) : Subtraction.
● Multiply (MUL) : Multiplication.
● Divide (DIV) : Division.
● Add with carry (ADDC) : Performs the addition on two operands plus
the value of the carry from the previous computation.
● Subtract with borrow (SUBB) : Subtracts two words and a borrow
which may have resulted from a previous subtract operation.
● Negate (NEG) : Forms the 2’s complement of a number.
Logical and Bit Manipulation Instructions
● Clear (CLR)
● Complement (COM)
● AND (AND)
● OR (OR)
● Exclusive-OR (XOR)
● Clear carry (CLRC)
● Set carry (SETC)
● Complement carry (COMC)
● Enable interrupt (EI)
● Disable interrupt (DI)
Shift Instructions
● Shifts are operations in which the bits of a word are
moved to the left or right.
● The bit shifted in at the end of the word determines
the type of shift used.
● Types
1. Logical Shift.

2. Arithmetic Shift

3. Rotate Shift
Logical Shift
● Inserts 0 to the end bit position.
● The end position is the leftmost bit for shift right and the rightmost bit for the shift left.

Logical Shift Right


Logical Shift Left (SHL)
(SHR)

11010 to 01101 11010 to 10100

55
Arithmetic Shift

● Arithmetic Shift Right - Preserve sign bit in the left most position. Sign bit shifted to right along
with other numbers but sign bit remain unchanged.

Arithmetic Shift Right Arithmetic Shift Left


(SHRA) (SHLA)

11010 to 11011 11010 to 10100

56
Rotate Shift
● Bits are shifted out one end are not lost but circulated back into other end.
● Rotate Left Through Carry(ROLC), Rotate right through Carry (RORC) treats carry bit as an
extension of register whose word is being rotated.

Rotate Right (ROR) Rotate Left (ROL)

10110 to 01011 10110 to 01101

57
Program Control

58
Program Control Instructions
● Program control instructions specify conditions for altering the
content of the program counter.
● Provides decision making capabilities.
● Branch (BR) : BR ADR, branch the program to Address ADR.
(PC←ADR)
● Jump (JMP)
● Skip (SKP) : Skip instruction(PC←PC + 1) if some condition is
met.
● Call (CALL) : Used with subroutines
● Return (RET)
● Compare (CMP) : Compare by subtraction.
● Test (by ANDing) TST : AND instruction without storing result.

59
END OF UNIT 2
22CS2112: COMPUTER ORGANIZATION
AND ARCHITECTURE
UNIT-III

By
Dr. G. Bhaskar
Associate Professor
Dept. of ECE
Email Id:bhaskar.0416@gmail.com
UNIT 3
• Data Representation: Data types, Complements,
• Fixed Point Representation,
• Floating Point Representation.
• Computer Arithmetic: Addition and subtraction,
• multiplication Algorithms,
• Division Algorithms,
• Floating – point Arithmetic operations.
• Decimal Arithmetic unit, Decimal Arithmetic operations.
Computer data types
• Computer programs or application may use different types of data based on the
problem or requirement.

• Given below is different types of data that computer uses:


• Numeric data – Integer and Real numbers
• Non-numeric data – Character data, address data, logical data
• Let’s study about each with further sub-categories.

• Numeric data
• It can be of the following two types:
• Integers
• Real Numbers
• Real numbers can be represented as:
• Fixed point representation
• Floating point representation
• Character data
• A sequence of character is called character data.
• A character may be alphabetic (A-Z or a-z), numeric (0-9),
special character (+, #, *, @, etc.) or combination of all of
these. A character is represented by group of bits.

• When set of multiple character are combined together they
form a meaningful data. A character is represented in
standard ASCII format.Another popular format is EBCDIC
used in large computer systems.

• Example of character data
• Rajneesh1#
• 229/3, xyZ
• Mission Milap – X/10
• Logical data
• A logical data is used by computer systems to take logical decisions.
• Logical data is different from numeric or alphanumeric data in the
way that numeric and alphanumeric data may be associated with
numbers or characters but logical data is denoted by either of two
values true (T) or false(F).

• You can see the example of logical data in construction of truth
table in logic gates.
• A logical data can also be statement consisting of numeric or
character data with relational symbols (>, <, =, etc.).

• Character set
• Character sets can of following types in computers:
• Alphabetic characters- It consists of alphabet characters A-Z or a-z.
• Numeric characters- It consists of digits from 0 to 9.
• Special characters- Special symbols are +, *, /, -, ., <, >, =, @, %, #,
etc.
Fixed point representation
• In computers, fixed-point representation is a real data
type for numbers. Fixed point representation can
convert data into binary form, and then the data is
processed, stored, and used by the computer. It has a
fixed number of bits for the integral and fractional
parts. For example, if given fixed-point representation
is IIIII.FFF, we can store a minimum value of 00000.001
and a maximum value of 99999.999.
• There are three parts of the fixed-point number
representation: Sign bit, Integral part, and Fractional
part.
• Sign bit:- The fixed-point number representation
in binary uses a sign bit. The negative number has
a sign bit 1, while a positive number has a bit 0.
• Integral Part:- The integral part in fixed-point
numbers is of different lengths at different places.
It depends on the register's size; for an 8-bit
register, the integral part is 4 bits.
• Fractional part:- The Fractional part is of different
lengths at different places. It depends on the
registers; for an 8-bit register, the fractional part
is 3 bits.
How to write numbers in Fixed-point
notation?
• Now that we have learned about fixed-point number
representation, let's see how to represent it.
• The number considered is 4.5
• Step 1: We will convert the number 4.5 to binary
form. 4.5 = 100.1
• Step 2: Represent the binary number in fixed-point
notation with the following format.
• Fixed Point Notation of 4.5
Floating Point
Representations
Floating-point
arithmetic

❑ We often incur floating-point programming.


– Floating point greatly simplifies working with large (e.g., 270) and
small (e.g., 2-17) numbers
❑ We’ll focus on the IEEE 754 standard for floating-point arithmetic.
– How FP numbers are represented
– Limitations of FP numbers
– FP addition and multiplication
Floating-point
representation
❑ IEEE numbers are stored using a kind of scientific notation.

 mantissa * 2exponent

❑ We can represent floating-point numbers with three binary


fields: a sign bit s, an exponent field e, and a fraction field f.

s e f

❑ The IEEE 754 standard defines several different precisions.


— Single precision numbers include an 8-bit exponent field
and a 23-bit fraction, for a total of 32 bits.
— Double precision numbers have an 11-bit exponent field
and a 52-bit fraction, for a total of 64 bits.
Sign
s e f

❑ The sign bit is 0 for positive numbers and 1 for negative


numbers.
❑ But unlike integers, IEEE values are stored in signed magnitude
format.
Mantissa
s e f
❑ There are many ways to write a number in scientific notation, but
there is always a unique normalized representation, with exactly one
non-zero digit to the left of the point.
0.232 × 103 = 23.2 × 101 = 2.32 * 102 = …

01001 = 1.001× 23 = …
❑ What’s the normalized representation of 00101101.101 ?
00101101.101
= 1.01101101 × 25

❑ What’s the normalized representation of 0.0001101001110 ?


0.0001101001110
= 1.110100111 × 2-4
Mantissa
s e f
❑ There are many ways to write a number in scientific notation, but
there is always a unique normalized representation, with exactly one
non-zero digit to the left of the point.
0.232 × 103 = 23.2 × 101 = 2.32 * 102 = …

01001 = 1.001× 23 = …
❑ The field f contains a binary fraction.
❑ The actual mantissa of the floating-point value is (1 + f).
– In other words, there is an implicit 1 to the left of the binary
point.
– For example, if f is 01101…, the mantissa would be 1.01101…
❑ A side effect is that we get a little more precision: there are 24 bits in
the mantissa, but we only need to store 23 of them.
❑ But, what about value 0?
Exponent
s e f
❑ There are special cases that require encodings
– Infinities (overflow)
– NAN (divide by zero)
❑ For example:
– Single-precision: 8 bits in e → 256 codes; 11111111 reserved for
special cases → 255 codes; one code (00000000) for zero → 254
codes; need both positive and negative exponents → half
positives (127), and half negatives (127)
– Double-precision: 11 bits in e → 2048 codes; 111…1 reserved for
special cases → 2047 codes; one code for zero → 2046 codes;
need both positive and negative exponents → half positives
(1023), and half negatives (1023)
Exponent
s e f

❑ The e field represents the exponent as a biased number.


– It contains the actual exponent plus 127 for single precision, or
the actual exponent plus 1023 in double precision.
– This converts all single-precision exponents from -126 to +127
into unsigned numbers from 1 to 254, and all double-precision
exponents from -1022 to +1023 into unsigned numbers from 1 to
2046.
❑ Two examples with single-precision numbers are shown below.
– If the exponent is 4, the e field will be 4 + 127 = 131 (100000112).
– If e contains 01011101 (9310), the actual exponent is 93 - 127 = -
34.
❑ Storing a biased exponent means we can compare IEEE values as if
they were signed integers.
Mapping Between e and
Actual Exponent
Actual
e
Exponent
0000 0000 Reserved
0000 0001 1-127 = -126 -12610
0000 0010 2-127 = -125 -12510
… …
0111 1111 010
… …
1111 1110 254-127=127 12710
1111 1111 Reserved
Converting an IEEE 754 number
to decimal
s e f

❑ The decimal value of an IEEE number is given by the formula:

(1 - 2s) * (1 + f) * 2e-bias

❑ Here, the s, f and e fields are assumed to be in decimal.


– (1 - 2s) is 1 or -1, depending on whether the sign bit is 0
or 1.
– We add an implicit 1 to the fraction field f, as mentioned
earlier.
– Again, the bias is either 127 or 1023, for single or double
precision.
Example IEEE-decimal conversion
❑ Let’s find the decimal value of the following IEEE number.

1 01111100 11000000000000000000000

❑ First convert each individual field to decimal.


– The sign bit s is 1.
– The e field contains 01111100 = 12410.
– The mantissa is 0.11000… = 0.7510.
❑ Then just plug these decimal values of s, e and f into our formula.

(1 - 2s) * (1 + f) * 2e-bias

❑ This gives us (1 - 2) * (1 + 0.75) * 2124-127 = (-1.75 * 2-3) = -0.21875.


Converting a decimal number
to IEEE 754
❑ What is the single-precision representation of 347.625?

1. First convert the number to binary: 347.625 = 101011011.1012.


2. Normalize the number by shifting the binary point until there is
a single 1 to the left:

101011011.101 x 20 = 1.01011011101 x 28

3. The bits to the right of the binary point comprise the fractional
field f.
4. The number of times you shifted gives the exponent. The field e
should contain: exponent + 127.
5. Sign bit: 0 if positive, 1 if negative.
Example
❑ What is the single-precision representation of 639.6875

639.6875 = 1001111111.10112
= 1.0011111111011×29

s=0
e = 9 + 127 = 136 = 10001000
f = 0011111111011

The single-precision representation is:


0 10001000 00111111110110000000000
Examples: Compare FP
1.
numbers
0 0111 1111 110…0
( <, > ? )0 1000 0000 110…0
+1.112 × 2 (127-127) =1.7510 +1.112 × 2 (128-127) = 11.12=3.510

0 0111 1111 110…0 0 1000 0000 110…0


+ 0111 1111 < + 1000 0000
directly comparing exponents as unsigned values gives result

2. 1 0111 1111 110…0 1 1000 0000 110…0


-f × 2(0111 1111 ) -f × 2(1000 0000)
For exponents: 0111 1111 < 1000 0000
So -f × 2(0111 1111 ) > -f × 2(1000 0000)
Special Values (single-
precision)
E F meaning Notes

00000000 0…0 0 +0.0 and -0.0

Valid Unnormalized
00000000 X…X
number =(-1)S x 2-126 x (0.F)

11111111 0…0 Infinity

11111111 X…X Not a Number


E Real F Value
Exponent
0000 0000 Reserved 000…0 010
xxx…x Unnormalized
(-1)S x 2-126 x (0.F)
0000 0001 -12610
0000 0010 -12510
… … Normalized
0111 1111 010 (-1)S x 2e-127 x (1.F)
… …
1111 1110 12710
1111 1111 Reserved 000…0 Infinity
xxx…x NaN
Range of numbers
❑ Normalized (positive range; negative is symmetric)

smallest 00 00000010 0000000000000000000000 +2-126(1+0) = 2-126

largest 011111110 11111111111111111111111 +2127(2-2-23)

❑ Unnormalized
smallest 00 00000000 0000000000000000000001 +2-126(2-23) = 2-149

largest 00000000011 111111111111111111111 +2-126(1-2-23)

2-126 2127(2-2-23)

0 2-149 2-126(1-2-23)

Positive overflow
Positive underflow
In comparison
❑ The smallest and largest possible 32-bit integers in two’s
complement are only -231 and 231 - 1
❑ How can we represent so many more values in the IEEE 754
format, even though we use the same number of bits as regular
integers?

what’s the next representable FP number?


2-126

+2-126(1+2-23) differ from the smallest number by 2-149


Finiteness
❑ There aren’t more IEEE numbers.
❑ With 32 bits, there are 232, or about 4 billion, different bit patterns.
– These can represent 4 billion integers or 4 billion reals.
– But there are an infinite number of reals, and the IEEE format
can only represent some of the ones from about -2128 to +2128.
– Represent same number of values between 2n and 2n+1 as 2n+1
and 2n+2

2 4 8 16
❑ Thus, floating-point arithmetic has “issues”
– Small roundoff errors can accumulate with multiplications or
exponentiations, resulting in big errors.
– Rounding errors can invalidate many basic arithmetic
principles such as the associative law, (x + y) + z = x + (y + z).
❑ The IEEE 754 standard guarantees that all machines will produce
the same results—but those results may not be mathematically
accurate!
Limits of the IEEE
representation
❑ Even some integers cannot be represented in the IEEE
format.
int x = 33554431;
float y = 33554431;
printf( "%d\n", x );
printf( "%f\n", y );

33554431
33554432.000000

❑ Some simple decimal numbers cannot be represented exactly


in binary to begin with.

0.1010 = 0.0001100110011...2
0.10
❑ During the Gulf War in 1991, a U.S. Patriot missile failed to intercept
an Iraqi Scud missile, and 28 Americans were killed.
❑ A later study determined that the problem was caused by the
inaccuracy of the binary representation of 0.10.
– The Patriot incremented a counter once every 0.10 seconds.
– It multiplied the counter value by 0.10 to compute the actual
time.
❑ However, the (24-bit) binary representation of 0.10 actually
corresponds to 0.099999904632568359375, which is off by
0.000000095367431640625.
❑ This doesn’t seem like much, but after 100 hours the time ends up
being off by 0.34 seconds—enough time for a Scud to travel 500
meters!
❑ Professor Skeel wrote a short article about this.
Roundoff Error and the Patriot Missile. SIAM News, 25(4):11, July 1992.
Floating-point addition
example
❑ To get a feel for floating-point operations, we’ll do an addition
example.
– To keep it simple, we’ll use base 10 scientific notation.
– Assume the mantissa has four digits, and the exponent
has one digit.
❑ An example for the addition:

99.99 + 0.161 = 100.151


❑ As normalized numbers, the operands would be written as:

9.999 * 101 1.610 * 10-1

31
Steps 1-2: the actual addition
1. Equalize the exponents.
The operand with the smaller exponent should be rewritten by
increasing its exponent and shifting the point leftwards.

1.610 * 10-1 = 0.01610 * 101

With four significant digits, this gets rounded to: 0.016

This can result in a loss of least significant digits—the rightmost 1 in


this case. But rewriting the number with the larger exponent could
result in loss of the most significant digits, which is much worse.

2. Add the mantissas.

9.999 * 101
+ 0.016 * 101
10.015 * 101
Steps 3-5: representing the result
3. Normalize the result if necessary.

10.015 * 101 = 1.0015 * 102

This step may cause the point to shift either left or right, and the
exponent to either increase or decrease.

4. Round the number if needed.

1.0015 * 102 gets rounded to 1.002 * 102

5. Repeat Step 3 if the result is no longer normalized.


We don’t need this in our example, but it’s possible for rounding to
add digits—for example, rounding 9.9995 yields 10.000.

Our result is 1.002*102 , or 100.2 . The correct answer is 100.151, so we have


the right answer to four significant digits, but there’s a small error already.

33
Example
❑Calculate 0 1000 0001 110…0 plus 0 1000 0010 00110..0
both are single-precision IEEE 754 representation

1. 1st number: 1.112  2 (129-127); 2nd number: 1.00112  2(130-127)


2. Compare the e field: 1000 0001 < 1000 0010
3.Align exponents to 1000 0010; so the 1st number becomes:
0.1112  23
4. Add mantissa
1.0011
+0.1110
10.0001
5. So the sum is: 10.0001  23 = 1.00001  24
So the IEEE 754 format is: 0 1000 0011 000010…0

34
Multiplication
❑ To multiply two floating-point values, first multiply their magnitudes
and add their exponents.

9.999 * 101
* 1.610 * 10-1
16.098 * 100

❑ You can then round and normalize the result, yielding 1.610 * 101.
❑ The sign of the product is the exclusive-or of the signs of the
operands.
– If two numbers have the same sign, their product is positive.
– If two numbers have different signs, the product is negative.

00=0 01=1 10=1 11=0

❑ This is one of the main advantages of using signed magnitude.

35
The history of floating-point
computation
❑ In the past, each machine had its own implementation of
floating-point arithmetic hardware and/or software.
– It was impossible to write portable programs that would
produce the same results on different systems.
❑ It wasn’t until 1985 that the IEEE 754 standard was adopted.
– Having a standard at least ensures that all compliant
machines will produce the same outputs for the same
program.

36
Floating-point hardware
❑ When floating point was introduced in microprocessors, there
wasn’t enough transistors on chip to implement it.
– You had to buy a floating point co-processor (e.g., the
Intel 8087)
❑ As a result, many ISA’s use separate registers for floating
point.
❑ Modern transistor budgets enable floating point to be on chip.
– Intel’s 486 was the first x86 with built-in floating point
(1989)
❑ Even the newest ISA’s have separate register files for floating
point.
– Makes sense from a floor-planning perspective.

37
DIVISION IN BINARY
DIVISION HARDWARE
DIVISION FLOW CHART
Floating point operations
• Addition X+Y (adjusted Xm + Ym) 2Ye where Xe ≤ Ye
• Subtraction X-Y (adjusted Xm - Ym) 2Ye where Xe ≤ Ye
• Multiplication X x Y(adjusted Xm x Ym) 2Xe+Ye
• Division X/Y(adjusted Xm / Ym) 2Xe-Ye
Algorithm FP Addition/Subtraction
• Let X and Y be the FP numbers involved in
addition/subtraction, where Ye > Xe.
• Basic steps:
• Compute Ye - Xe, a fixed point subtraction
• Shift the mantissa of Xm by (Ye - Xe) steps to the
right to form Xm2Ye-Xe if Xe is smaller than Ye else
the mantissa of Ym will have to be adjusted.
• Compute Xm2Ye-Xe ± Ym
• Determine the sign of the result
• Normalize the resulting value, if necessary
Multiplication and Division
Results in FP arithmetic
• FP arithmetic results will have to be produced in normalised form.
• Adjusting the bias of the resulting exponent is required. Biased
representation of exponent causes a problem when the exponents
are added in a multiplication or subtracted in the case of division,
resulting in a double biased or wrongly biased exponent. This must
be corrected. This is an extra step to be taken care of by FP
arithmetic hardware.
• When the result is zero, the resulting mantissa has an all zero but
not the exponent. A special step is needed to make the exponent
bits zero.
• Overflow – is to be detected when the result is too large to be
represented in the FP format.
• Underflow – is to be detected when the result is too small to be
represented in the FP format. Overflow and underflow are
automatically detected by hardware, however, sometimes the
mantissa in such occurrence may remain in denormalised form.
• Handling the guard bit (which are extra bits) becomes an issue
when the result is to be rounded rather than truncated.
DECIMAL ADDER
What is a BCD Adder ?
• Decimal adder and BCD Adder both are same.
• A BCD adder, also known as a Binary-Coded Decimal
adder, is a digital circuit that performs addition
operations on Binary-Coded Decimal numbers. BCD is a
numerical representation that uses a four-bit binary
code to represent each decimal digit from 0 to 9. BCD
encoding allows for direct conversion between binary
and decimal representations, making it useful for
arithmetic operations on decimal numbers.
• The purpose of a BCD adder is to add two BCD
numbers together and produce a BCD result. It follows
specific rules to ensure accurate decimal results. The
BCD adder circuit typically consists of multiple stages,
each representing one decimal digit, and utilizes binary
addition circuits combined with BCD-specific rules.
Working of Decimal Adder
• We take a 4-bit Binary-Adder, which takes addend and
augend bits as an input with an input carry 'Carry in'.
• The Binary-Adder produces five outputs, i.e., Z8, Z4, Z2, Z1,
and an output carry K.
• With the help of the output carry K and Z8, Z4, Z2, Z1
outputs, the logical circuit is designed to identify the Cout
• Cout = K + Z8*Z4 + Z8*Z2
• The Z8, Z4, Z2, and Z1 outputs of the binary adder are
passed into the 2nd 4-bit binary adder as an Augend.
• The addend bit of the 2nd 4-bit binary adder is designed in
such a way that the 1st and the 4th bit of the addend
number are 0 and the 2nd and the 3rd bit are the same as
Cout. When the value of Cout is 0, the addend number will be
0000, which produce the same result as the 1st 4-bit binary
number. But when the value of the Cout is 1, the addend bit
will be 0110, i.e., 6, which adds with the augent to get the
valid BCD number.
• Example: 1001+1000
• First, add both the numbers using a 4-bit binary adder and pass the
input carry to 0.
• The binary adder produced the result 0001 and carried output 'K' 1.
• Then, find the Cout value to identify that the produced BCD is invalid
or valid using the expression Cout=K+Z8.Z4+Z8.Z2.
K=1
Z8 = 0
Z4 = 0
Z2 = 0
Cout = 1+0*0+0*0
Cout = 1+0+0
Cout = 1
• The value of Cout is 1, which expresses that the produced BCD code
is invalid. Then, add the output of the 1st 4-bit binary adder with
0110.
= 0001+0110
= 0111
• The BCD is represented by the carry output as:
BCD=Cout Z8 Z4 Z2 Z1=1 0 1 1 1
Algorithm for Decimal Adder
• BCD Addition of Given Decimal Number
• BCD addition of a given decimal number involves performing
addition operations on the individual BCD digits of the number.
• Step 1: Convert the decimal number into BCD representation:
• Take each decimal digit and convert it into its BCD equivalent, which
is a four-bit binary code.
• For example, the decimal number 456 would be represented as
0100 0101 0110 in BCD.
• Step 2: Align the BCD numbers for addition:
Ensure that the BCD numbers to be added have the same number
of digits.
If necessary, pad the shorter BCD number with leading zeros to
match the length of the longer BCD number.
• Step 3: Perform binary addition on the BCD digits:
• Start from the rightmost digit and add the corresponding BCD digits
of the two numbers.
• If the sum of the BCD digits is less than or equal to 9 (0000 to 1001
in binary), it represents a valid BCD digit.
• If the sum is greater than 9 (1010 to 1111 in binary), it indicates a
carry-out, and a correction is needed.
• Step 4: Handle carry-out and correction:
• When a carry-out occurs, it needs to be propagated to the next
higher-order digit. Add the carry-out to the next higher-order digit's
BCD digit and continue the process until all digits have been
processed.Step 5: Obtain the final BCD result:
• Once all the BCD digits have been processed, the resulting BCD
numbers represent the decimal sum of the original BCD numbers.
Decimal Arithmetic
Operation
Decimal Arithmetic Operation

• Decimal numbers in BCD are stored in


computer registers in group of 4 bits.
• Each 4 bit group must be taken as a unit when
performing decimal microoperation
• The following are the decimal arithmetic
microoperation symbols:
Decimal Arithmetic Operation
Decimal Arithmetic Operation

• Incrementing/Decrementing a register is
same for binary and BCD that binary
counter goes through 16 states from 0000 to
1111.
• The BCD counter goes through 10 states
from 0000 to 1001 and back to 0000.
• A decimal shift right or left is proceeded by
d to indicate a shift over four bits that
hold the decimal digit.
Addition and Subtraction

• Adecimal data can be added in 3 different ways


Addition and Subtraction
Addition and Subtraction
Addition and Subtraction
• The parallel method uses a decimal arithmetic
unit composed of as many BCD adder as there are
digits in the number.
• In digit serial bit parallel method, the digits are
applied to a single BCD adder serially, while the
bits of each coded digit are transferred in parallel.
• In serial adder the bits are shifted one at a
time through full adder. The binary sum formed
after four shifts must be corrected into valid BCD
digit.
• If it is ≥1010, the binary sum is corrected by
adding 0110 and generates a carry for next pair
of digit.
Addition and Subtraction

• The parallel method is fast but requires a


large number of BCDadders.
• The digit serial bit parallel method
requires only one BCD adder which is
shared by all the digits, so it is slower than
parallel method.
• The serial method requires a minimum
amount of equipment that is only one
full adder, but is very slow.
22CS2112: COMPUTER ORGANIZATION
AND ARCHITECTURE
UNIT - IV
INPUT-OUTPUT ORGANIZATION AND MEMORY
ORGANIZATION
By
Dr. G. Bhaskar
Associate Professor
Dept. of ECE
Email Id:bhaskar.0416@gmail.com
INPUT-OUTPUT ORGANIZATION

• Peripheral Devices

• Input-Output Interface

• Asynchronous Data Transfer

• Modes of Transfer

• Priority Interrupt

• Direct Memory Access

• Input-Output Processor

• Serial Communication
Peripheral Devices

PERIPHERAL DEVICES
Input Devices Output Devices
• Keyboard • Card Puncher, Paper Tape Puncher
• Optical input devices • CRT
- Card Reader • Printer (Impact, Ink Jet,
- Paper Tape Reader Laser, Dot Matrix)
- Bar code reader • Plotter
- Digitizer • Analog
- Optical Mark Reader • Voice
• Magnetic Input Devices
- Magnetic Stripe Reader
• Screen Input Devices
- Touch Screen
- Light Pen
- Mouse
• Analog Input Devices
Input/Output Interfaces

INPUT/OUTPUT INTERFACE
Provides a method for transferring information between internal storage (such as
memory and CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
Peripherals - Electromechanical Devices
CPU or Memory - Electronic Device

Data Transfer Rate


Peripherals - Usually slower
CPU or Memory - Usually faster than peripherals
Some kinds of Synchronization mechanism may be needed

Unit of Information
Peripherals – Byte, Block, …
CPU or Memory – Word

Data representations may differ


Input/Output Interfaces

I/O BUS AND INTERFACE MODULES


I/O bus
Data
Processor Address
Control

Interface Interface Interface Interface

Keyboard
and Printer Magnetic Magnetic
display disk tape
terminal

Each peripheral has an interface module associated with it

Interface
- Decodes the device address (device code)
- Decodes the commands (operation)
- Provides signals for the peripheral controller
- Synchronizes the data flow and supervises
the transfer rate between peripheral and CPU or Memory
Typical I/O instruction
Op. code Device address Function code
(Command)
Input/Output Interfaces

I/O BUS AND MEMORY BUS

Functions of Buses

•MEMORY BUS is for information transfers between CPU and the

MM

* I/O BUS is for information transfers between CPU

and I/O devices through their I/O interface


Input/Output Interfaces

I/O INTERFACE
Port A I/O data
register
Bidirectional Bus
data bus buffers
Port B I/O data
register

CPU Chip select CS


I/O
Register select RS1 Control Control Device
Timing register
Register select RS0 and
I/O read Control
RD Status Status
I/O write WR register

CS RS1 RS0 Register selected


0 x x None - data bus in high-impedence
1 0 0 Port A register
1 0 1 Port B register
1 1 0 Control register
1 1 1 Status register
Programmable Interface
- Information in each port can be assigned a meaning
depending on the mode of operation of the I/O device
→ Port A = Data; Port B = Command; Port C = Status
- CPU initializes(loads) each port by transferring a byte to the Control Register
→ Allows CPU can define the mode of operation of each port
→ Programmable Port: By changing the bits in the control register, it is
possible to change the interface characteristics
Asynchronous Data Transfer

ASYNCHRONOUS DATA TRANSFER


Synchronous and Asynchronous Operations
Synchronous - All devices derive the timing
information from common clock line
Asynchronous - No common clock

Asynchronous Data Transfer


Asynchronous data transfer between two independent units requires that
control signals be transmitted between the communicating units to
indicate the time at which data is being transmitted

Two Asynchronous Data Transfer Methods


Strobe pulse
- A strobe pulse is supplied by one unit to indicate
the other unit when the transfer has to occur

Handshaking
- A control signal is accompanied with each data
being transmitted to indicate the presence of data
- The receiving unit responds with another control
signal to acknowledge receipt of the data
Asynchronous Data Transfer

STROBE CONTROL
* Employs a single control line to time each transfer
* The strobe may be activated by either the source or
the destination unit

Source-Initiated Strobe Destination-Initiated Strobe


for Data Transfer for Data Transfer

Block Diagram Block Diagram


Data bus Data bus
Source Destination Source Destination
unit Strobe unit unit Strobe unit

Timing Diagram Timing Diagram


Valid data Valid data
Data Data

Strobe Strobe
Asynchronous Data Transfer

HANDSHAKING

Strobe Methods

Source-Initiated

The source unit that initiates the transfer has


no way of knowing whether the destination unit
has actually received data

Destination-Initiated

The destination unit that initiates the transfer


no way of knowing whether the source has
actually placed the data on the bus

To solve this problem, the HANDSHAKE method


introduces a second control signal to provide a Reply
to the unit that initiates the transfer
Asynchronous Data Transfer

SOURCE-INITIATED TRANSFER USING HANDSHAKE


Data bus
Source Data valid Destination
Block Diagram unit Data accepted unit

Valid data
Data bus
Timing Diagram

Data valid

Data accepted

Sequence of Events Source unit Destination unit


Place data on bus.
Enable data valid.
Accept data from bus.
Enable data accepted
Disable data valid.
Invalidate data on bus.
Disable data accepted.
Ready to accept data
(initial state).
* Allows arbitrary delays from one state to the next
* Permits each unit to respond at its own data transfer rate
* The rate of transfer is determined by the slower unit
Asynchronous Data Transfer

DESTINATION-INITIATED TRANSFER USING HANDSHAKE


Data bus
Block Diagram Source Data valid Destination
unit Ready for data unit

Timing Diagram Ready for data

Data valid

Valid data
Data bus

Sequence of Events Source unit Destination unit


Ready to accept data.
Place data on bus. Enable ready for data.
Enable data valid.

Accept data from bus.


Disable data valid. Disable ready for data.
Invalidate data on bus
(initial state).

* Handshaking provides a high degree of flexibility and reliability because the


successful completion of a data transfer relies on active participation by both units
* If one unit is faulty, data transfer will not be completed
-> Can be detected by means of a timeout mechanism
Asynchronous Data Transfer

ASYNCHRONOUS SERIAL TRANSFER


Asynchronous serial transfer
Four Different Types of Transfer Synchronous serial transfer
Asynchronous parallel transfer
Asynchronous Serial Transfer Synchronous parallel transfer
- Employs special bits which are inserted at both
ends of the character code
- Each character consists of three parts; Start bit; Data bits; Stop bits.

1 1 0 0 0 1 0 1
Start Character bits Stop
bit bits
(1 bit) (at least 1 bit)

A character can be detected by the receiver from the knowledge of 4 rules;


- When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected
by a Start Bit , which is always a 0
- The character bits always follow the Start Bit
- After the last character , a Stop Bit is detected when
the line returns to the 1-state for at least 1 bit time
The receiver knows in advance the transfer rate of the bits and the number of
information bits to expect
Asynchronous Data Transfer

UNIVERSAL ASYNCHRONOUS RECEIVER-TRANSMITTER - UART


A typical asynchronous communication interface available as an IC
Transmit
Bidirectional Transmitter Shift data
data bus Bus register register
buffers

Control Transmitter Transmitter


clock
register control

Internal Bus
and clock
Chip select CS
Register select Status Receiver Receiver CS RS Oper. Register selected
RS Timing clock
register control 0 x x None
I/O read and and clock 1 0 WR Transmitter register
RD Control 1 1 WR Control register
I/O write Receive 1 0 RD Receiver register
WR Receiver Shift data
1 1 RD Status register
register register

Transmitter Register
- Accepts a data byte(from CPU) through the data bus
- Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift register
- Complete data byte is sent to the receiver register
Status Register Bits
- Used for I/O flags and for recording errors
Control Register Bits
- Define baud rate, no. of bits in each character, whether
to generate and check parity, and no. of stop bits
Modes of Transfer

MODES OF TRANSFER - PROGRAM-CONTROLLED I/O


3 different Data Transfer Modes between the central
computer(CPU or Memory) and peripherals; Program-Controlled I/O
Interrupt-Initiated I/O
Direct Memory Access (DMA)
Program-Controlled I/O(Input Dev to CPU)
Data bus Interface I/O bus
Address bus Data register
Data valid I/O
CPU I/O read device
I/O write Status Data accepted
register F

Read status register


Check flag bit

=0 Polling or Status Checking


flag
=1 • Continuous CPU involvement
Read data register
• CPU slowed down to I/O speed
Transfer data to memory • Simple
• Least hardware
no Operation
complete?
yes
Continue with
program
Priority Interrupt

PRIORITY INTERRUPT
Priority
- Determines which interrupt is to be served first
when two or more requests are made simultaneously
- Also determines which interrupts are permitted to
interrupt the computer while another is being serviced
- Higher priority interrupts can make requests while
servicing a lower priority interrupt

Priority Interrupt by Software(Polling)


- Priority is established by the order of polling the devices(interrupt sources)
- Flexible since it is established by software
- Low cost since it needs a very little hardware
- Very slow

Priority Interrupt by Hardware


- Require a priority interrupt manager which accepts
all the interrupt requests to determine the highest priority request
- Fast since identification of the highest priority
interrupt request is identified by the hardware
- Fast since each interrupt source has its own interrupt vector to access
directly to its own service routine
Priority Interrupt

HARDWARE PRIORITY INTERRUPT - DAISY-CHAIN:


Processor data bus
VAD 1 VAD 2 VAD 3 * Serial hardware priority function
Device 1 Device 2 Device 3 * Interrupt Request Line
To next
PI PO PI PO PI PO
device
- Single common line
* Interrupt Acknowledge Line
- Daisy-Chain
Interrupt request INT
CPU
Interrupt acknowledge
INTACK

Interrupt Request from any device(>=1)


-> CPU responds by INTACK <- 1
-> Any device receives signal(INTACK) 1 at PI puts the VAD on the bus
Among interrupt requesting devices the only device which is physically closest
to CPU gets INTACK=1, and it blocks INTACK to propagate to the next device
One stage of the daisy chain priority arrangement
Priority in VAD
PI Enable
Vector address
Interrupt Priority out PI RF PO Enable
RF PO 0 0 0 0
request S Q
from device 0 1 0 0
R 1 0 1 0
1 1 1 1
Delay

Interrupt request to CPU


Priority Interrupt

PARALLEL PRIORITY INTERRUPT


Interrupt register Bus
Buffer
Disk 0 I0 y
Printer 1 I1 x
Priority 0
Reader 2 I 2 encoder
0 VAD
Keyboard 3 0 to CPU
I3
0
0
0 IEN IST
0
Mask
register 1 Enable

2
Interrupt
to CPU
3
INTACK
from CPU
IEN: Set or Clear by instructions ION or IOF
IST: Represents an unmasked interrupt has occurred. INTACK enables
tristate Bus Buffer to load VAD generated by the Priority Logic
Interrupt Register:
- Each bit is associated with an Interrupt Request from
different Interrupt Source - different priority level
- Each bit can be cleared by a program instruction
Mask Register:
- Mask Register is associated with Interrupt Register
- Each bit can be set or cleared by an Instruction
Priority Interrupt

INTERRUPT PRIORITY ENCODER

Determines the highest priority interrupt when more than one interrupts take place

Priority Encoder Truth table

Inputs Outputs
I0 I1 I2 I 3 x y IST Boolean functions
1 d d d 0 0 1
0 1 d d 0 1 1
0 0 1 d 1 0 1 x = I0' I1'
0 0 0 1 1 1 1 y = I0' I1 + I0’ I2’
0 0 0 0 d d 0 (IST) = I0 + I1 + I2 + I3
Priority Interrupt

INTERRUPT CYCLE

At the end of each Instruction cycle


- CPU checks IEN and IST
- If IEN • IST = 1, CPU -> Interrupt Cycle

SP SP - 1 Decrement stack pointer


M[SP]  PC Push PC into stack
INTACK  1 Enable interrupt acknowledge
PC AD Transfer vector address to PC
IEN 0 Disable further interrupts
Go To Fetch to execute the first instruction
in the interrupt service routine
Priority Interrupt

INTERRUPT SERVICE ROUTINE


address Memory I/O service programs
7
0 JMP DISK DISK Program to service
1 JMP PTR magnetic disk
VAD=00000011 3
2 JMP RDR PTR Program to service
3 JMP KBD line printer
8
1 Main program RDR
KBD Program to service
749 current instr.
interrupt 750 character reader
4
KBD Program to service
Stack
11 keyboard
5
2 255
256 Disk 256
750 interrupt
6 9 10
Initial and Final Operations
Each interrupt service routine must have an initial and final set of
operations for controlling the registers in the hardware interrupt system

Initial Sequence Final Sequence


[1] Clear lower level Mask reg. bits [1] IEN <- 0
[2] IST <- 0 [2] Restore CPU registers
[3] Save contents of CPU registers [3] Clear the bit in the Interrupt Reg
[4] IEN <- 1 [4] Set lower level Mask reg. bits
[5] Go to Interrupt Service Routine [5] Restore return address, IEN <- 1
Direct Memory Access

DIRECT MEMORY ACCESS


* Block of data transfer from high speed devices, Drum, Disk, Tape
* DMA controller - Interface which allows I/O transfer directly between
Memory and Device, freeing CPU for other tasks
* CPU initializes DMA Controller by sending memory
address and the block size(number of words)
CPU bus signals for DMA transfer


ABUS Address bus High-impedence
Bus request BR DBUS Data bus (disabled)
CPU when BG is
Bus granted BG RD Read
WR Write enabled

Block diagram of DMA controller


Address bus

Data bus Data bus Address bus


buffers buffers
Internal Bus

DMA select DS Address register


Register select RS
Read RD Word count register
Write WR Control
logic
Bus request BR Control register

Bus grant BG
Interrupt Interrupt DMA request
DMA acknowledge to I/O device
Direct Memory Access

DMA I/O OPERATION


Starting an I/O
- CPU executes instruction to
Load Memory Address Register
Load Word Counter
Load Function(Read or Write) to be performed
Issue a GO command

Upon receiving a GO Command DMA performs I/O


operation as follows independently from CPU

Input
[1] Input Device <- R (Read control signal)
[2] Buffer(DMA Controller) <- Input Byte; and
assembles the byte into a word until word is full
[4] M <- memory address, W(Write control signal)
[5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC - 1
[6] If WC = 0, then Interrupt to acknowledge done, else go to [1]

Output
[1] M <- M Address, R
M Address R <- M Address R + 1, WC <- WC - 1
[2] Disassemble the word
[3] Buffer <- One byte; Output Device <- W, for all disassembled bytes
[4] If WC = 0, then Interrupt to acknowledge done, else go to [1]
Direct Memory Access

CYCLE STEALING
While DMA I/O takes place, CPU is also executing instructions

DMA Controller and CPU both access Memory -> Memory Access Conflict

Memory Bus Controller

- Coordinating the activities of all devices requesting memory access


- Priority System

Memory accesses by CPU and DMA Controller are interwoven,


with the top priority given to DMA Controller
-> Cycle Stealing

Cycle Steal

- CPU is usually much faster than I/O(DMA), thus


CPU uses the most of the memory cycles
- DMA Controller steals the memory cycles from CPU
- For those stolen cycles, CPU remains idle
- For those slow CPU, DMA Controller may steal most of the memory
cycles which may cause CPU remain idle long time
Direct Memory Access

DMA TRANSFER

Interrupt
Random-access
BG
CPU memory unit (RAM)
BR
RD WR Addr Data RD WR Addr Data
Read control
Write control
Data bus
Address bus

Address
select

RD WR Addr Data
DS DMA ack.

RS DMA I/O
Controller Peripheral
BR device
BG DMA request
Interrupt
MEMORY ORGANIZATION

• Memory Hierarchy

• Main Memory

• Auxiliary Memory

• Associative Memory

• Cache Memory

• Virtual Memory

• Memory Management Hardware


MEMORY HIERARCHY
Memory Hierarchy is to obtain the highest possible
access speed while minimizing the total cost of the memory system
Auxiliary memory
Magnetic
tapes I/O Main
processor memory
Magnetic
disks

CPU Cache
memory

Register

Cache

Main Memory

Magnetic Disk

Magnetic Tape
Main Memory

MAIN MEMORY
RAM and ROM Chips
Typical RAM chip

Chip select 1 CS1


Chip select 2 CS2
Read RD 128 x 8 8-bit data bus
RAM
Write WR
7-bit address AD 7

CS1 CS2 RD WR Memory function State of data bus


0 0 x x Inhibit High-impedence
0 1 x x Inhibit High-impedence
1 0 0 0 Inhibit High-impedence
1 0 0 1 Write Input data to RAM
1 0 1 x Read Output data from RAM
1 1 x x Inhibit High-impedence

Typical ROM chip

Chip select 1 CS1


Chip select 2 CS2
512 x 8 8-bit data bus
ROM
9-bit address AD 9
Main Memory

MEMORY ADDRESS MAP


Address space assignment to each memory chip

Example: 512 bytes RAM and 512 bytes ROM

Hexa Address bus


Component address 10 9 8 7 6 5 4 3 2 1
RAM 1 0000 - 007F 0 0 0 x x x x x x x
RAM 2 0080 - 00FF 0 0 1 x x x x x x x
RAM 3 0100 - 017F 0 1 0 x x x x x x x
RAM 4 0180 - 01FF 0 1 1 x x x x x x x
ROM 0200 - 03FF 1 x x x x x x x x x

Memory Connection to CPU


- RAM and ROM chips are connected to a CPU
through the data and address buses

- The low-order lines in the address bus select


the byte within the chips and other lines in the
address bus select a particular chip through
its chip select inputs
Main Memory

CONNECTION OF MEMORY TO CPU


Address bus CPU
16-11 10 9 8 7-1 RD WR Data bus

Decoder
3 2 1 0
CS1
CS2

Data
RD 128 x 8
RAM 1
WR
AD7

CS1
CS2

Data
RD 128 x 8
RAM 2
WR
AD7

CS1
CS2

Data
RD 128 x 8
RAM 3
WR
AD7

CS1
CS2
RD 128 x 8 Data
RAM 4
WR
AD7

CS1
CS2
Data

1- 7 512 x 8
8
9 } AD9 ROM
Auxiliary Memory

AUXILIARY MEMORY
Information Organization on Magnetic Tapes
file i
block 1 block 2
block 3 EOF
R1
R2 R3 R4
R5
R6
block 3 IRG
R1
EOF R3 R2
R5 R4 block 1
block 2

Organization of Disk Hardware


Moving Head Disk Fixed Head Disk

Track
Associative Memory

ASSOCIATIVE MEMORY
- Accessed by the content of the data rather than by an address
- Also called Content Addressable Memory (CAM)
Hardware Organization Argument register(A)

Key register (K)


Match
register

Input Associative memory


array and logic
M
Read m words
Write n bits per word

- Compare each word in CAM in parallel with the


content of A(Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M(i) = 1
- K(Key Register) provides a mask for choosing a
particular field or key in the argument in A
(only those bits in the argument that have 1’s in
their corresponding position of K are compared)
Associative Memory

ORGANIZATION OF CAM
A1 Aj An

K1 Kj Kn

Word 1 C11 C1j C1n M1

Word i Ci1 Cij Cin Mi

Word m Cm1 Cmj Cmn Mm

Bit 1 Bit j Bit n

Internal organization of a typical cell Cij


Aj Kj
Input

Write

R S
F ij Match To Mi
Read logic

Output
Associative Memory

MATCH LOGIC

K1 A1 K2 A2 Kn An

F'i1 F i1 F'i2 F i2 .... F'in F in

Mi
Cache Memory

CACHE MEMORY
Locality of Reference
- The references to memory at any given time
interval tend to be confined within a localized areas
- This area contains a set of information and
the membership changes gradually as time goes by
- Temporal Locality
The information which will be used in near future
is likely to be in use already( e.g. Reuse of information in loops)
- Spatial Locality
If a word is accessed, adjacent(near) words are likely accessed soon
(e.g. Related data items (arrays) are usually stored together;
instructions are executed sequentially)

Cache
- The property of Locality of Reference makes the
Cache memory systems work
- Cache is a fast small capacity memory that should hold those information
which are most likely to be accessed

Main memory
CPU
Cache memory
Cache Memory

PERFORMANCE OF CACHE
Memory Access
All the memory accesses are directed first to Cache
If the word is in Cache; Access cache to provide it to CPU
If the word is not in Cache; Bring a block (or a line) including
that word to replace a block now in Cache

- How can we know if the word that is required


is there ?
- If a new block is to replace one of the old blocks,
which one should we choose ?
Performance of Cache Memory System

Hit Ratio - % of memory accesses satisfied by Cache memory system


Te: Effective memory access time in Cache memory system
Tc: Cache access time
Tm: Main memory access time

Te = Tc + (1 - h) Tm

Example: Tc = 0.4 s, Tm = 1.2 s, h = 0.85%


Te = 0.4 + (1 - 0.85) * 1.2 = 0.58 s
Cache Memory

MEMORY AND CACHE MAPPING -ASSOCIATIVE MAPPLING -


Mapping Function
Specification of correspondence between main
memory blocks and cache blocks
Associative mapping
Direct mapping
Set-associative mapping
Associative Mapping
- Any block location in Cache can store any block in memory
-> Most flexible
- Mapping Table is implemented in an associative memory
-> Fast, very Expensive
- Mapping Table
Stores both address and the content of the memory word
address (15 bits)

Argument register

Address Data
01000 3450
CAM 02777 6710
22235 1234
Cache Memory

MEMORY AND CACHE MAPPING - DIRECT MAPPING -

- Each memory block has only one place to load in Cache


- Mapping Table is made of RAM instead of CAM
- n-bit memory address consists of 2 parts; k bits of Index field and
n-k bits of Tag field
- n-bit addresses are used to access main memory
and k-bit Index is used to access the Cache
Addressing Relationships Tag(6) Index(9)

00 000 32K x 12
000
512 x 12
Main memory Cache memory
Address = 15 bits Address = 9 bits
Data = 12 bits Data = 12 bits
77 777 777

Direct Mapping Cache Organization


Memory
address Memory data
00000 1220 Cache memory
Index
address Tag Data
00777 2340 000 00 1220
01000 3450

01777 4560
02000 5670

777 02 6710
02777 6710
Thank You
22CS2112: COMPUTER ORGANIZATION
AND ARCHITECTURE
UNIT - V
REDUCED INSTRUCTION SET COMPUTER, PIPELINE
AND VECTOR PROCESSING AND MULTI PROCESSORS
By
Dr. G. Bhaskar
Associate Professor
Dept. of ECE
Email Id:bhaskar.0416@gmail.com
Reduced Instruction Set Computer:

CISC Characteristics

RISC Characteristics.

Pipeline and Vector Processing:

Parallel Processing, Pipelining, Arithmetic Pipeline, Instruction Pipeline,

RISC Pipeline, Vector Processing, Array Processor.

Multi Processors:

Characteristics of Multiprocessors, Interconnection Structures, Inter

processor arbitration, Inter processor communication and synchronization, 2

Cache Coherence.
What is CISC?
Complex Instruction Set Computer (CISC) Characteristics
 Major characteristics of a CISC architecture
• A large number of instructions - typically from 100 to 250
instruction
• Some instructions that perform specialized tasks and are used
infrequently
• A large variety of addressing modes - typically from 5 to 20
different modes
• Variable-length instruction formats
• Instructions that manipulate operands in memory (RISC in
register)
What is RISC?

• Reduced Instruction Set Computer (RISC)

 Major characteristics of a RISC architecture

• Relatively few instructions

• Relatively few addressing modes

• Memory access limited to load and store instruction

• All operations done within the registers of the CPU

• Fixed-length, easily decoded instruction format

• Single-cycle instruction execution

• Hardwired rather than microprogrammed control


Advantages of CISC
• Reduced code size
• More memory efficient
• Widely used

Disadvantages of CISC
• Slower execution
• More complex design
• Higher power consumption

Advantages of RISC:
• Simpler instructions
• Faster execution
• Lower power consumption

Disadvantages of RISC
• More instructions required
• Increased memory usage
• Higher cost
RISC CISC
Focus on software Focus on hardware
Uses only Hardwired control unit Uses both hardwired and microprogrammed
control unit
Transistors are used for more registers Transistors are used for storing complex
Instructions
Fixed sized instructions Variable sized instructions
Can perform only Register to Register
Arithmetic operations Can perform REG to REG or REG to MEM or
MEM to MEM
Requires more number of registers Requires less number of registers
Code size is large Code size is small
An instruction executed in a single clock
cycle Instruction takes more than one clock cycle
An instruction fit in one word. Instructions are larger than the size of one
word
Simple and limited addressing modes. Complex and more addressing modes.
RISC is Reduced Instruction Cycle. CISC is Complex Instruction Cycle.
The number of instructions are less as
compared to CISC. The number of instructions are
more as compared to RISC.
It consumes the low power. It consumes more/high power.
RISC is highly pipelined. CISC is less pipelined.
RISC required more RAM. CISC required less RAM.
Here, Addressing modes are less. Here, Addressing modes are more.
CISC RISC
CISC is the short form for Complex RISC refers to Reduced Instruction Set
Instruction Set Computer. Computer.
A large number of instructions are present in Very few instructions are present. The
the architecture. number of instructions is generally less than
100.
The CSIC architecture processes complex The RISC architecture executes simple yet
instructions that require several clock cycles optimized instructions in a single clock cycle.
for execution. On average, it takes two to five It processes instructions at an average speed
clock cycles per instruction (CPI) of 1.5 clock cycles per instruction (CPI).
Variable-length encodings of the Fixed-length encodings of the instructions are
instructions. used.
CISC instructions require high execution RISC instructions require less time for
time. execution
Some examples of CISC processors include Examples of RISC processors include Alpha,
Intel x86 CPUs, System/360, VAX, PDP-11, ARC, ARM, AVR, MIPS, PA-RISC, PIC,
Motorola 68000 family, and AMD Power Architecture, and SPARC
CISC does not support parallelism and RISC processors support instruction
pipelining. As such, CISC instructions are pipelining
less pipelined
Parallel Processing

• Parallel processing can be described as a class of techniques which


enables the system to achieve simultaneous data-processing tasks to
increase the computational speed of a computer system.

• A parallel processing system can carry out simultaneous data-


processing to achieve faster execution time. For instance, while an
instruction is being processed in the ALU component of the CPU,
the next instruction can be read from memory.

• The amount of hardware increases with parallel processing, and with


it, the cost of the system increases.
• The purpose of parallel processing is to speed up the
computer processing capability and increase its throughput,
that is, the amount of processing that can be accomplished
during a given interval of time.
• A parallel processing system can be achieved by having a
multiplicity of functional units that perform identical or
different operations simultaneously. The data can be
distributed among various multiple functional units.
• At the lowest level, we distinguish between parallel and
serial operations by the type of registers used. e.g. shift
registers and registers with parallel load
Parallel Processing
At the lower level
Serial Shift register VS
parallel load registers

At the higher level


Multiplicity of functional
units that performer
identical or different
operations simultaneously.
Parallel Computers
PIPELINING

A technique of decomposing a sequential process into


suboperations, with each subprocess being executed in a
partial dedicated segment that operates concurrently
with all other segments.

A pipeline can be visualized as a collection of processing


segments through which binary information flows.

The name “pipeline” implies a flow of information analogous


to an industrial assembly line.
Example of the Pipeline Organization
OPERATIONS IN EACH PIPELINE STAGE
GENERAL PIPELINE
Speedup ratio of pipeline
Cont.
PIPELINE AND MULTIPLE FUNCTION UNITS
Cont.
ARITHMETIC PIPELINE
Cont.
INSTRUCTION CYCLE
INSTRUCTION PIPELINE
INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE
Pipeline
Space time diagram
DATA HAZARDS
FORWARDING HARDWARE
INSTRUCTION SCHEDULING
CONTROL HAZARDS
CONTROL HAZARDS
CONTROL HAZARDS
VECTOR PROCESSING

There is a class of computational problems that are


beyond the capabilities of conventional computer.
These problems are characterized by the fact that
they require a vast number of computations that will
take a conventional computer days or even weeks to
complete.
VECTOR PROCESSING
VECTOR PROGRAMMING
VECTOR INSTRUCTIONS
Matrix Multiplication

The multiplication of two nxn matrices consists of n2 inner


products or n3 multiply-add operations.

Example: Product of two 3x3 matrices


c11= a11b11+a12b21+a13b31

This requires 3 multiplications and 3 additions. The total


number of multiply-add required to compute the matrix
product is 9x3=27.

In general, the inner product consists of the sum of k product


terms of the form

C = A1B1+A2B2+A3B3+…+Ak Bk
C = A1B1+A5B5+A9B9+A13 B13+…

+A2B2+A6B6+A10B10+A14 B14+…

+A3B3+A7B7+A11B11+A15 B15+…

+A4B4+A8B8+A12B12+A16 B16+…
VECTOR INSTRUCTION FORMAT
MULTIPLE MEMORY MODULE AND INTERLEAVING
MULTIPLE MEMORY MODULE AND INTERLEAVING
MULTIPLE MEMORY MODULE AND INTERLEAVING
ARRAY PROCESSOR
attached array processor with host computer
SIMD array processor Organization
Pipelining and Vector Processing 1

PIPELINING AND VECTOR PROCESSING

• Parallel Processing

• Pipelining

• Arithmetic Pipeline

• Instruction Pipeline

• RISC Pipeline

• Vector Processing

• Array Processors
Pipelining and Vector Processing 2 Parallel Processing

PARALLEL PROCESSING

Execution of Concurrent Events in the computing


process to achieve faster Computational Speed

Levels of Parallel Processing

- Job or Program level

- Task or Procedure level

- Inter-Instruction level

- Intra-Instruction level
Pipelining and Vector Processing 3 Parallel Processing

PARALLEL COMPUTERS
Architectural Classification

– Flynn's classification
» Based on the multiplicity of Instruction Streams and
Data Streams
» Instruction Stream
• Sequence of Instructions read from memory
» Data Stream
• Operations performed on the data in the processor

Number of Data Streams


Single Multiple

Number of Single SISD SIMD


Instruction
Streams Multiple MISD MIMD
Pipelining and Vector Processing 4 Parallel Processing
COMPUTER ARCHITECTURES FOR PARALLEL
PROCESSING
Von-Neuman SISD Superscalar processors
based
Superpipelined processors

VLIW

MISD Nonexistence

SIMD Array processors

Systolic arrays
Dataflow
Associative processors

MIMD Shared-memory multiprocessors


Reduction
Bus based
Crossbar switch based
Multistage IN based

Message-passing multicomputers

Hypercube
Mesh
Reconfigurable
Pipelining and Vector Processing 5 Parallel Processing

SISD COMPUTER SYSTEMS

Control Processor Data stream Memory


Unit Unit

Instruction stream

Characteristics
- Standard von Neumann machine
- Instructions and data are stored in memory
- One operation at a time

Limitations
Von Neumann bottleneck

Maximum speed of the system is limited by the


Memory Bandwidth (bits/sec or bytes/sec)

- Limitation on Memory Bandwidth


- Memory is shared by CPU and I/O
Pipelining and Vector Processing 6 Parallel Processing

SISD PERFORMANCE IMPROVEMENTS

• Multiprogramming
• Spooling
• Multifunction processor
• Pipelining
• Exploiting instruction-level parallelism
- Superscalar
- Superpipelining
- VLIW (Very Long Instruction Word)
Pipelining and Vector Processing 7 Parallel Processing

MISD COMPUTER SYSTEMS

M CU P

M CU P
Memory
• •
• •
• •

M CU P Data stream

Instruction stream

Characteristics
- There is no computer at present that can be
classified as MISD
Pipelining and Vector Processing 8 Parallel Processing

SIMD COMPUTER SYSTEMS


Memory
Data bus

Control Unit
Instruction stream

P P ••• P Processor units

Data stream

Alignment network

M M ••• M Memory modules

Characteristics
- Only one copy of the program exists
- A single controller executes one instruction at a time
Pipelining and Vector Processing 9 Parallel Processing

TYPES OF SIMD COMPUTERS

Array Processors
- The control unit broadcasts instructions to all PEs,
and all active PEs execute the same instructions
- ILLIAC IV, GF-11, Connection Machine, DAP, MPP

Systolic Arrays

- Regular arrangement of a large number of


very simple processors constructed on
VLSI circuits
- CMU Warp, Purdue CHiP

Associative Processors
- Content addressing
- Data transformation operations over many sets
of arguments with a single instruction
- STARAN, PEPE
Pipelining and Vector Processing 10 Parallel Processing

MIMD COMPUTER SYSTEMS


P M P M ••• P M

Interconnection Network

Shared Memory

Characteristics
- Multiple processing units

- Execution of multiple instructions on multiple data

Types of MIMD computer systems


- Shared memory multiprocessors

- Message-passing multicomputers
Pipelining and Vector Processing 11 Parallel Processing

SHARED MEMORY MULTIPROCESSORS


M M ••• M

Buses,
Interconnection Network(IN) Multistage IN,
Crossbar Switch

P P ••• P

Characteristics
All processors have equally direct access to
one large memory address space
Example systems
Bus and cache-based systems
- Sequent Balance, Encore Multimax
Multistage IN-based systems
- Ultracomputer, Butterfly, RP3, HEP
Crossbar switch-based systems
- C.mmp, Alliant FX/8
Limitations
Memory access latency
Hot spot problem
Pipelining and Vector Processing 12 Parallel Processing

MESSAGE-PASSING MULTICOMPUTER
Message-Passing Network Point-to-point connections

P P ••• P

M M ••• M

Characteristics
- Interconnected computers
- Each processor has its own memory, and
communicate via message-passing

Example systems
- Tree structure: Teradata, DADO
- Mesh-connected: Rediflow, Series 2010, J-Machine
- Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III

Limitations
- Communication overhead
- Hard to programming
Pipelining and Vector Processing 13 Pipelining

PIPELINING
A technique of decomposing a sequential process
into suboperations, with each subprocess being
executed in a partial dedicated segment that
operates concurrently with all other segments.
Ai * Bi + Ci for i = 1, 2, 3, ... , 7
Ai Bi Memory Ci
Segment 1
R1 R2

Multiplier
Segment 2
R3 R4

Adder
Segment 3

R5

R1  Ai, R2  Bi Load Ai and Bi


R3  R1 * R2, R4  Ci Multiply and load Ci
R5  R3 + R4 Add
Pipelining and Vector Processing 14 Pipelining

OPERATIONS IN EACH PIPELINE STAGE

Clock Segment 1 Segment 2 Segment 3


Pulse
Number R1 R2 R3 R4 R5
1 A1 B1
2 A2 B2 A1 * B1 C1
3 A3 B3 A2 * B2 C2 A1 * B1 + C1
4 A4 B4 A3 * B3 C3 A2 * B2 + C2
5 A5 B5 A4 * B4 C4 A3 * B3 + C3
6 A6 B6 A5 * B5 C5 A4 * B4 + C4
7 A7 B7 A6 * B6 C6 A5 * B5 + C5
8 A7 * B7 C7 A6 * B6 + C6
9 A7C +7 B *7
Pipelining and Vector Processing 15 Pipelining

GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

Space-Time Diagram
1 2 3 4 5 6 7 8 9 Clock cycles
Segment 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
Pipelining and Vector Processing 16 Pipelining

PIPELINE SPEEDUP
n: Number of tasks to be performed

Conventional Machine (Non-Pipelined)


tn: Clock cycle
: Time required to complete the n tasks
 = n * tn

Pipelined Machine (k stages)


tp: Clock cycle (time to complete each suboperation)
: Time required to complete the n tasks
 = (k + n - 1) * tp

Speedup
Sk: Speedup

Sk = n*tn / (k + n - 1)*tp

tn
lim Sk = ( = k, if tn = k * tp )
n tp
Pipelining and Vector Processing 17 Pipelining

PIPELINE AND MULTIPLE FUNCTION UNITS


Example
- 4-stage pipeline
- subopertion in each stage; tp = 20nS
- 100 tasks to be executed
- 1 task in non-pipelined system; 20*4 = 80nS

Pipelined System
(k + n - 1)*tp = (4 + 99) * 20 = 2060nS

Non-Pipelined System
n*k*tp = 100 * 80 = 8000nS

Speedup
Sk = 8000 / 2060 = 3.88

4-Stage Pipeline is basically identical to the system


with 4 identical function units Ii Ii+1 I i+2 I i+3

Multiple Functional Units P1 P2 P3 P4


Pipelining and Vector Processing 18 Arithmetic Pipeline

ARITHMETIC PIPELINE
Floating-point adder Exponents
a b
Mantissas
A B
X = A x 2a
Y = B x 2b R R

1 Compare the exponents Compare Difference


Segment 1: exponents
2 Align the mantissa
by subtraction
3 Add/sub the mantissa
4 Normalize the result
R

Segment 2: Choose exponent Align mantissa

Segment 3: Add or subtract


mantissas

R R

Segment 4: Adjust Normalize


exponent result

R R
Pipelining and Vector Processing 19 Arithmetic Pipeline

4-STAGE FLOATING POINT ADDER


A = a x 2p B = b x 2q
p a q b

Stages: Other
Exponent fraction Fraction
S1 subtractor selector
Fraction with min(p,q)
r = max(p,q)
Right shifter
t = |p - q|

Fraction
S2 adder
r c

Leading zero
S3 counter
c
Left shifter
r

d
Exponent
S4 adder

s d
C = A + B = c x 2r = d x 2s
(r = max (p,q), 0.5  d < 1)
Pipelining and Vector Processing 20 Instruction Pipeline

INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle
1 Fetch an instruction from memory
2 Decode the instruction
3 Calculate the effective address of the operand
4 Fetch the operands from memory
5 Execute the operation
6 Store the result in the proper place

* Some instructions skip some phases


* Effective address calculation can be done in
the part of the decoding phase
* Storage of the operation result into a register
is done automatically in the execution phase

==> 4-Stage Pipeline

1 FI: Fetch an instruction from memory


2 DA: Decode the instruction and calculate
the effective address of the operand
3 FO: Fetch the operand
4 EX: Execute the operation
Pipelining and Vector Processing 21 Instruction Pipeline

INSTRUCTION PIPELINE

Execution of Three Instructions in a 4-Stage Pipeline


Conventional

i FI DA FO EX

i+1 FI DA FO EX

i+2 FI DA FO EX

Pipelined

i FI DA FO EX
i+1 FI DA FO EX
i+2 FI DA FO EX
Pipelining and Vector Processing 22 Instruction Pipeline

INSTRUCTION EXECUTION IN A 4-STAGE PIPELINE

Segment1: Fetch instruction


from memory

Decode instruction
Segment2: and calculate
effective address

Branch?
yes
no
Fetch operand
Segment3: from memory

Segment4: Execute instruction

Interrupt yes
Interrupt?
handling
no
Update PC

Empty pipe Step: 1 2 3 4 5 6 7 8 9 10 11 12 13


Instruction 1 FI DA FO EX
2 FI DA FO EX

(Branch) 3 FI DA FO EX
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
Pipelining and Vector Processing 23 Instruction Pipeline

MAJOR HAZARDS IN PIPELINED EXECUTION


Structural hazards(Resource Conflicts)
Hardware Resources required by the instructions in
simultaneous overlapped execution cannot be met
Data hazards (Data Dependency Conflicts)
An instruction scheduled to be executed in the pipeline requires the
result of a previous instruction, which is not yet available

R1 <- B + C ADD DA B,C + Data dependency

R1 <- R1 + 1
INC DA bubble R1 +1

Control hazards
Branches and other instructions that change the PC
make the fetch of the next instruction to be delayed
JMP ID PC + PC Branch address dependency

bubble IF ID OF OE OS

Hazards in pipelines may make it Pipeline Interlock:


necessary to stall the pipeline Detect Hazards Stall until it is cleared
Pipelining and Vector Processing 24 Instruction Pipeline

STRUCTURAL HAZARDS
Structural Hazards
Occur when some resource has not been
duplicated enough to allow all combinations
of instructions in the pipeline to execute

Example: With one memory-port, a data and an instruction fetch


cannot be initiated in the same clock
i FI DA FO EX

i+1 FI DA FO EX
i+2 stall stall FI DA FO EX

The Pipeline is stalled for a structural hazard


<- Two Loads with one port memory
-> Two-port memory will serve without stall
Pipelining and Vector Processing 25 Instruction Pipeline

DATA HAZARDS
Data Hazards

Occurs when the execution of an instruction


depends on the results of a previous instruction
ADD R1, R2, R3
SUB R4, R1, R5
Data hazard can be dealt with either hardware
techniques or software technique
Hardware Technique

Interlock
- hardware detects the data dependencies and delays the scheduling
of the dependent instruction by stalling enough clock cycles
Forwarding (bypassing, short-circuiting)
- Accomplished by a data path that routes a value from a source
(usually an ALU) to a user, bypassing a designated register. This
allows the value to be produced to be used at an earlier stage in the
pipeline than would otherwise be possible

Software Technique
Instruction Scheduling(compiler) for delayed load
Pipelining and Vector Processing 26 Instruction Pipeline

FORWARDING HARDWARE
Example:
Register
file
ADD R1, R2, R3
SUB R4, R1, R5

3-stage Pipeline MUX MUX Bypass


path
I: Instruction Fetch Result
A: Decode, Read Registers, write bus
ALU
ALU Operations
E: Write the result to the
destination register R4

ALU result buffer


ADD I A E

SUB I A E Without Bypassing

SUB I A E With Bypassing


Pipelining and Vector Processing 27 Instruction Pipeline

INSTRUCTION SCHEDULING
a = b + c;
d = e - f;

Unscheduled code: Scheduled Code:


LW Rb, b LW Rb, b
LW Rc, c LW Rc, c
ADD Ra, Rb, Rc LW Re, e
SW a, Ra ADD Ra, Rb, Rc
LW Re, e LW Rf, f
LW Rf, f SW a, Ra
SUB Rd, Re, Rf SUB Rd, Re, Rf
SW d, Rd SW d, Rd

Delayed Load
A load requiring that the following instruction not use its result
Pipelining and Vector Processing 28 Instruction Pipeline

CONTROL HAZARDS
Branch Instructions

- Branch target address is not known until


the branch instruction is completed
Branch FI DA FO EX
Instruction
Next FI DA FO EX
Instruction

Target address available

- Stall -> waste of cycle times

Dealing with Control Hazards

* Prefetch Target Instruction


* Branch Target Buffer
* Loop Buffer
* Branch Prediction
* Delayed Branch
Pipelining and Vector Processing 29 Instruction Pipeline

CONTROL HAZARDS
Prefetch Target Instruction
– Fetch instructions in both streams, branch not taken and branch taken
– Both are saved until branch branch is executed. Then, select the right
instruction stream and discard the wrong stream
Branch Target Buffer(BTB; Associative Memory)
– Entry: Addr of previously executed branches; Target instruction
and the next few instructions
– When fetching an instruction, search BTB.
– If found, fetch the instruction stream in BTB;
– If not, new stream is fetched and update BTB
Loop Buffer(High Speed Register file)
– Storage of entire loop that allows to execute a loop without accessing memory
Branch Prediction
– Guessing the branch condition, and fetch an instruction stream based on
the guess. Correct guess eliminates the branch penalty
Delayed Branch
– Compiler detects the branch and rearranges the instruction sequence
by inserting useful instructions that keep the pipeline busy
in the presence of a branch instruction
Pipelining and Vector Processing 30 RISC Pipeline

RISC PIPELINE
RISC
- Machine with a very fast clock cycle that
executes at the rate of one instruction per cycle
<- Simple Instruction Set
Fixed Length Instruction Format
Register-to-Register Operations
Instruction Cycles of Three-Stage Instruction Pipeline
Data Manipulation Instructions
I: Instruction Fetch
A: Decode, Read Registers, ALU Operations
E: Write a Register

Load and Store Instructions


I: Instruction Fetch
A: Decode, Evaluate Effective Address
E: Register-to-Memory or Memory-to-Register

Program Control Instructions


I: Instruction Fetch
A: Decode, Evaluate Branch Address
E: Write Register(PC)
Pipelining and Vector Processing 31 RISC Pipeline

DELAYED LOAD
LOAD: R1  M[address 1]
LOAD: R2  M[address 2]
ADD: R3  R1 + R2
STORE: M[address 3]  R3
Three-segment pipeline timing
Pipeline timing with data conflict

clock cycle 1 2 3 4 5 6
Load R1 I A E
Load R2 I A E
Add R1+R2 I A E
Store R3 I A E

Pipeline timing with delayed load

clock cycle 1 2 3 4 5 6 7
Load R1 I A E
The data dependency is taken
Load R2 I A E care by the compiler rather
NOP I A E than the hardware
Add R1+R2 I A E
Store R3 I A E
Pipelining and Vector Processing 32 RISC Pipeline

DELAYED BRANCH
Compiler analyzes the instructions before and after
the branch and rearranges the program sequence by
inserting useful instructions in the delay steps

Using no-operation instructions


Clock cycles: 1 2 3 4 5 6 7 8 9 10
1. Load I A E
2. Increment I A E
3. Add I A E
4. Subtract I A E
5. Branch to X I A E
6. NOP I A E
7. NOP I A E
8. Instr. in X I A E

Rearranging the instructions


Clock cycles: 1 2 3 4 5 6 7 8
1. Load I A E
2. Increment I A E
3. Branch to X I A E
4. Add I A E
5. Subtract I A E
6. Instr. in X I A E
Pipelining and Vector Processing 33

VECTOR PROCESSING
• A vector processor is an ensemble of hardware resources, including
vector registers, functional pipelines, processing elements and register
counters for performing register operations.

• Vector processing occurs when arithmetic or logical operations are


applied to vectors. It is distinguished from scalar processing which
operates on one or one pair of data. The conversion from scalar code
to vector code is called vectorization.

• Both pipelined processors and SIMD computers can perform vector


operations.

• Vector processing reduces software overhead incurred in the


maintenance of looping control, reduces memory access conflicts and
above all matches nicely with pipelining and segmentation concept to
generate one result per each clock cycle continuously.
Pipelining and Vector Processing 34 Vector Processing

VECTOR PROCESSING

Vector Processing Applications


• Problems that can be efficiently formulated in terms of vectors
– Long-range weather forecasting
– Petroleum explorations
– Seismic data analysis
– Medical diagnosis
– Aerodynamics and space flight simulations
– Artificial intelligence and expert systems
– Mapping the human genome
– Image processing

Vector Processor (computer)


Ability to process vectors, and related data structures such as matrices
and multi-dimensional arrays, much faster than conventional computers

Vector Processors may also be pipelined


Pipelining and Vector Processing 35 Vector Processing

VECTOR PROGRAMMING

DO 20 I = 1, 100
20 C(I) = B(I) + A(I)

Conventional computer

Initialize I = 0
20 Read A(I)
Read B(I)
Store C(I) = A(I) + B(I)
Increment I = i + 1
If I  100 goto 20

Vector computer

C(1:100) = A(1:100) + B(1:100)


Pipelining and Vector Processing 36 Vector Processing

VECTOR INSTRUCTIONS
f1: V ->V (Vector-Vector Instruction)
f2: V  S (Vector Reduction Instruction)
f3: V x V  V (Vector-Vector Instruction) V: Vector operand
f4: V x S >V (Vector- Scaler Instruction) S: Scalar operand

Type Mnemonic Description (I = 1, ..., n)


f1 VSQR Vector square root B(I)  SQR(A(I))
VSIN Vector sine B(I)  sin(A(I))
VCOM Vector complement A(I)  A(I)
f2 VSUM Vector summation S  A(I)
VMAX Vector maximum S  max{A(I)}
f3 VADD Vector add C(I)  A(I) + B(I)
VMPY Vector multiply C(I)  A(I) * B(I)
VAND Vector AND C(I)  A(I) . B(I)
VLAR Vector larger C(I)  max(A(I),B(I))
VTGE Vector test > C(I)  0 if A(I) < B(I)
C(I)  1 if A(I) > B(I)
f4 SADD Vector-scalar add B(I)  S + A(I)
SDIV Vector-scalar divide B(I)  A(I) / S
Pipelining and Vector Processing 37 Vector Processing

VECTOR INSTRUCTION FORMAT

Vector Instruction Format


Operation Base address Base address Base address Vector
code source 1 source 2 destination length

Pipeline for Inner Product

Source
A

Source Mu l ti p l i er Adder
B pipeline pipeline
Pipelining and Vector Processing 38

Matrix Multiplication
A11 A12 A13 B11 B12 B13 C11 C12 C13
A21 A22 A23
× B21 B22 B23 = C21 C22 C23
A31 A32 A33 B31 B32 B33
C31 C32 C33

C11 = (A11 × B11 ) + (A12 × B21 ) + (A13 × B31 )


.
.
.

In general

Cij = ∑ (Aik × Bkj ) (for k=1 to 3)


Pipelining and Vector Processing 39 Vector Processing

MULTIPLE MEMORY MODULE AND INTERLEAVING

Multiple Module Memory


Address bus
M0 M1 M2 M3

AR AR AR AR

Memory Memory Memory Memory


array array array array

DR DR DR DR

Data bus

Address Interleaving

Different sets of addresses are assigned to


different memory modules
Multiprocessors
Characteristics of multiprocessors – Interconnection structures – Inter processor
arbitration – Inter processor communication and synchronization – Cache coherence

Fig. 5.1 Basic multiprocessor architecure

1. A multiprocessor system is an interconnection of two or more CPU’s with memory and input-output
equipment.
2. Multiprocessors system are classified as multiple instruction stream, multiple data stream
systems(MIMD).
3. There exists a distinction between multiprocessor and multicomputers that though both support
concurrent operations.
4. In multicomputers several autonomous computers are connected through a network and they may
or may not communicate but in a multiprocessor system there is a single OS Control that provides
interaction between processors and all the components of the system to cooperate in the solution of
the problem.
5. VLSI circuit technology has reduced the cost of the computers to such a low Level that the concept
of applying multiple processors to meet system performance requirements has become an attractive
design possibility.
Fig. 5.2 Taxonomy of mono- mulitporcessor organizations
Characteristics of Multiprocessors:
Benefits of Multiprocessing:
1. Multiprocessing increases the reliability of the system so that a failure or error in one part has
limited effect on the rest of the system. If a fault causes one processor to fail, a second
processor can be assigned to perform the functions of the disabled one.
2. Improved System performance. System derives high performance from the fact that
computations can proceed in parallel in one of the two ways:
• Multiple independent jobs can be made to operate in parallel.
• A single job can be partitioned into multiple parallel tasks. This can be achieved in two
ways:
- The user explicitly declares that the tasks of the program be executed in
parallel
- The compiler provided with multiprocessor s/w that can automatically detect parallelism in program. Actually it
checks for Data dependency
COUPLING OF PROCESSORS
Tightly Coupled System/Shared Memory:
- Tasks and/or processors communicate in a highly synchronized fashion
- Communicates through a common global shared memory
- Shared memory system. This doesn’t preclude each processor from having its own local memory(cache memory)
Loosely Coupled System/Distributed Memory
- Tasks or processors do not communicate in a synchronized fashion.
- Communicates by message passing packets consisting of an address, the data content, and some error detection code.
- Overhead for data exchange is high
- Distributed memory system
Loosely coupled systems are more efficient when the interaction between tasks is minimal, whereas tightly coupled system
can tolerate a higher degree of interaction
between tasks.
Shared (Global) Memory
- A Global Memory Space accessible by all processors
- Processors may also have some local memory Distributed (Local, Message-Passing) Memory
- All memory units are associated with processors
- To retrieve information from another processor's memory a message must be sent there
Uniform Memory
- All processors take the same time to reach all memory locations Non-uniform (NUMA) Memory
- Memory access is not uniform
Fig. 5.3 Shared and distributed memory

Shared memory multiprocessor:

Fig 5.4 Shared memory multiprocessor


Characteristics
- All processors have equally direct access to one large memory address space
Limitations
- Memory access latency; Hot spot problem

5.2 Interconnection Structures:


The interconnection between the components of a multiprocessor System can have different
physical configurations depending n the number of transfer paths that are available between the
processors and memory in a shared memory system and among the processing elements in a loosely
coupled system.
Some of the schemes are as:
- Time-Shared Common Bus
- Multiport Memory
- Crossbar Switch
- Multistage Switching Network
- Hypercube System
a. Time shared common Bus
- All processors (and memory) are connected to a common bus or busses
- Memory access is fairly uniform, but not very scalable
- A collection of signal lines that carry module-to-module communication
- Data highways connecting several digital system elements
- Operations of Bus

Fig. 5.5 Time shared common bus organization

Fig. 5.6 system bus structure for multiprocessor


In the above figure we have number of local buses to its own local memory and to one or more processors. Each
local bus may be connected to a CPU, an IOP, or any combinations of processors. A system bus controller links each
local bus to a common system bus. The I/O devices connected to the local IOP, as well as the local memory, are
available to the local processor. The memory connected to the common system bus is shared by all processors. If an
IOP is connected directly to the system bus the I/O devices attached to it may be made available to all processors
Disadvantage.:
 Only one processor can communicate with the memory or another processor at any given time.
 As a consequence, the total overall transfer rate within the system is limited by the speed of the single
path
b. Multiport Memory:
Multiport Memory Module
- Each port serves a CPU Memory Module Control Logic
- Each memory module has control logic
- Resolve memory module conflicts Fixed priority among CPUs Advantages
- The high transfer rate can be achieved because of the multiple paths.
Disadvantages:
- It requires expensive memory control logic and a large number of cables and connections

Fig. 5.7 Multiport memory


c. Crossbar switch:

- Each switch point has control logic to set up the transfer path between a processor and a
memory.
- It also resolves the multiple requests for access to the same memory on the predetermined
priority basis.
- Though this organization supports simultaneous transfers from all memory modules because
there is a separate path associated with each Module.
- The H/w required to implement the switch can become quite large and complex

a) b)

Fig. 5.8 a) cross bar switch b) Block diagram of cross bar switch

Advantage:
- Supports simultaneous transfers from all memory modules Disadvantage:
- The hardware required to implement the switch can become quite large and complex.
d. Multistage Switching Network:
- The basic component of a multi stage switching network is a two-input, two- output interchange switch.
Fig. 5.9 operation of 2X2 interconnection switch
Using the 2x2 switch as a building block, it is possible to build a multistage network to control the
communication between a number of sources and destinations.
- To see how this is done, consider the binary tree shown in Fig. below.
- Certain request patterns cannot be satisfied simultaneously. i.e., if P1 000~011, then P2
100~111

Fig 5.10 Binary tree with 2x2 switches

Fig. 5.11 8X8 Omega switching network


- Some request patterns cannot be connected simultaneously. i.e., any two sources cannot be connected
simultaneously to destination 000 and 001
- In a tightly coupled multiprocessor system, the source is a processor and the destination is a memory module.

- Set up the path transfer the address into memory transfer the data
- In a loosely coupled multiprocessor system, both the source and destination are Processsing elements.
e. Hypercube System:

The hypercube or binary n-cube multiprocessor structure is a loosely coupled


system composed of N=2n processors interconnected in an n-dimensional binary cube.
- Each processor forms a node of the cube, in effect it contains not only a CPU but also local memory and I/O
interface.
- Each processor address differs from that of each of its n neighbors by exactly one bit position.

- Fig. below shows the hypercube structure for n=1, 2, and 3.


- Routing messages through an n-cube structure may take from one to n links from a source node to a
destination node.

- A routing procedure can be developed by computing the exclusive-OR of the source node address with the
destination node address.
- The message is then sent along any one of the axes that the resulting binary value will have 1 bits
corresponding to the axes on which the two nodes differ.

- A representative of the hypercube architecture is the Intel iPSC computer complex.


- It consists of 128(n=7) microcomputers, each node consists of a CPU, a floating point processor, local memory,
Fig. 5.12 Hypercube structures for n=1,2,3

3. Inter-processor Arbitration
- Only one of CPU, IOP, and Memory can be granted to use the bus at a time
- Arbitration mechanism is needed to handle multiple requests to the shared resources to resolve multiple
contention
- SYSTEM BUS:
• A bus that connects the major components such as CPU’s, IOP’s and memory
• A typical System bus consists of 100 signal lines divided into three functional groups: data, address and
control lines. In addition there are power distribution lines to the components.
- Synchronous Bus
• Each data item is transferred over a time slice
• known to both source and destination unit
• Common clock source or separate clock and synchronization signal is transmitted periodically to
synchronize the clocks in the system
- Asynchronous Bus
• Each data item is transferred by Handshake mechanism
 Unit that transmits the data transmits a control signal that indicates the presence of data
 Unit that receiving the data responds with another control signal to acknowledge the receipt of the data
o Strobe pulse -supplied by one of the units to indicate to the other unit when the data transfer has to occur
Table 5.1 IEEE standard 796 multibus signals

Fig. 5.13 Inter-processor arbitration static arbitration


Interprocessor Arbitration Dynamic Arbitration
- Priorities of the units can be dynamically changeable while the system is in operation
- Time Slice
o Fixed length time slice is given sequentially to each processor, round- robin fashion
- Polling
o Unit address polling -Bus controller advances the address to identify the
requesting unit. When processor that requires the access recognizes its address, it activates the bus busy line
and then accesses the bus. After a number of bus cycles, the polling continues by choosing a different
processor.
- LRU
o The least recently used algorithm gives the highest priority to the requesting device that has not used bus for
the longest interval.
- FIFO
o The first come first serve scheme requests are served in the order received. The bus controller here maintains
a queue data structure.
- Rotating Daisy Chain
o Conventional Daisy Chain -Highest priority to the nearest unit to the bus controller
o Rotating Daisy Chain –The PO output of the last device is connected to
the PI of the first one. Highest priority to the unit that is nearest to the unit that has most recently accessed the
bus(it becomes the bus controller)
5.4 Inter processor communication and synchronization:
- The various processors in a multiprocessor system must be provided with a facility for communicating with each
other.
o A communication path can be established through a portion of memory or
a common input-output channels.
- The sending processor structures a request, a message, or a procedure, and places it in the memory mailbox.
o Status bits residing in common memory
o The receiving processor can check the mailbox periodically.
o The response time of this procedure can be time consuming.
- A more efficient procedure is for the sending processor to alert the receiving processor directly by means of an
interrupt signal.
- In addition to shared memory, a multiprocessor system may have other shared resources.
o e.g., a magnetic disk storage unit.
- To prevent conflicting use of shared resources by several processors there must be a provision for assigning
resources to processors. i.e., operating system.
- There are three organizations that have been used in the design of operating system for multiprocessors: master-
slave configuration, separate operating system, and distributed operating system.
- In a master-slave mode, one processor, master, always executes the operating system functions.
- In the separate operating system organization, each processor can execute the operating system routines it needs.
This organization is more suitable for loosely coupled systems.
- In the distributed operating system organization, the operating system routines are distributed among the
available processors. However, each particular operating system function is assigned to only one processor at a
time. It is also referred to as a floating operating system.
Loosely Coupled System
- There is no shared memory for passing information.
- The communication between processors is by means of message passing through I/O channels.
- The communication is initiated by one processor calling a procedure that resides in the memory of the processor
with which it wishes to communicate.
- The communication efficiency of the interprocessor network depends on the communication routing protocol, processor
speed, data link speed, and the topology of the network.
Interprocess Synchronization
- The instruction set of a multiprocessor contains basic instructions that are used to implement communication and
synchronization between cooperating processes.
o Communication refers to the exchange of data between different processes.
o Synchronization refers to the special case where the data used to
communicate between processors is control information.
- Synchronization is needed to enforce the correct sequence of processes and to ensure mutually exclusive access to shared
writable data.
- Multiprocessor systems usually include various mechanisms to deal with the
synchronization of resources.
o Low-level primitives are implemented directly by the hardware.
o These primitives are the basic mechanisms that enforce mutual exclusion for more complex mechanisms implemented
in software.
o A number of hardware mechanisms for mutual exclusion have been developed.
 A binary semaphore
Mutual Exclusion with Semaphore
- A properly functioning multiprocessor system must provide a mechanism that will guarantee orderly access to shared
memory and other shared resources.
o Mutual exclusion: This is necessary to protect data from being changed simultaneously by two or more processors.
o Critical section: is a program sequence that must complete execution before another processor accesses the same
shared resource.
- A binary variable called a semaphore is often used to indicate whether or not a processor is executing a critical section.
- Testing and setting the semaphore is itself a critical operation and must be performed as a single indivisible operation.
- A semaphore can be initialized by means of a test and set instruction in conjunction with a hardware lock mechanism.
- The instruction TSL SEM will be executed in two memory cycles (the first to read and the second to write) as follows:
R M[SEM], M[SEM] 1
5.5 Cache Coherence
cache coherence is the consistency of shared resource data that ends up stored in multiple local caches. When clients in a system maintain
caches of a common memory resource, problems may arise with inconsistent data, which is particularly the case with CPUs in a
multiprocessing system.

Fig. 5.14 cache coherence


Shared Cache
-Disallow private cache
-Access time delay Software Approaches
*Read-Only Data are Cacheable
- Private Cache is for Read-Only data
- Shared Writable Data are not cacheable
- Compiler tags data as cacheable and noncacheable
- Degrade performance due to software overhead
*Centralized Global Table
- Status of each memory block is maintained in CGT: RO(Read-Only); RW(Read and Write)
- All caches can have copies of RO blocks
- Only one cache can have a copy of RW block
- Hardware Approaches
*Snoopy Cache Controller
- Cache Controllers monitor all the bus requests from CPUs and IOPs
- All caches attached to the bus monitor the write operations
- When a word in a cache is written, memory is also updated (write through)
- Local snoopy controllers in all other caches check their memory to
determine if they have a copy of that word; If they have, that location is marked invalid(future
reference to this location causes cache miss)

You might also like