Intisar'SComputer Organization and Architecture

Computer Organization
and Architecture
Dr.Intisar Al-Shummari
Al-Isra Private University

College of Science and Information Technology
Computer Science Department
Text Book: Computer Organization and Architecture, William Stalling, 6th edition, 2004
Internet Resources- Web site for book http://williamstallings.com/COA6e.html
١
Chapter 1
Introduction
Architecture & Organization
Architecture is those attributes visible to the programmer
Instruction set, number of bits used for data representation, I/O

mechanisms, addressing techniques.
Architecture issue: e.g. Is there a multiply instruction?
Organization is how features are implemented

(i.e h/w details transparent to the programmer)
Control signals, interfaces, memory technology.

Organization issue: e.g. Is there a hardware multiply unit or is it done
by repeated addition?
All Intel x86 family share the same basic architecture

The IBM System/370 family share the same basic architecture
This gives code compatibility

At least backwards
Organization differs between different versions
However, the relation between architecture and organization is very close, and often
not distinguishable.
Introduction
Computers are used in scientific calculations, commercial and business data
processing, air traffic control, space guidance, the educational field, and many other
areas. The most striking property of a digital computer is its generality.
*It can follow a sequence of instructions, called a program, that operates on given
data.
• The user can specify and change programs and/or data according to the specify
need.
٢
Structure & Function
Structure is the way in which components relate to each other

Function is the operation of individual components as part of the structure
The Computer is a complex system, which can be recognized in a hierarchical nature.
It consists of a set of interrelated subsystems. Each of them is in turn, hierarchic in its
structure. This nature is essential for both design and description.
♦There are two approaches for description: 1- Bottom-up and 2- Top-down.
The later is clearer and will be adopted here.
Function
All computer functions are:
Data processing
Data storage
Data movement
Control
Data can move in and out of the three functions under the control function
management as shown diagrammatically below
Functional view
Functional view of the Computer System
٣
Operations
Data movement: Transferring data from one peripheral to another
e.g. keyboard to screen
Data Storage : Data transferred from external environment to computer (read)

or visa versa (write). e.g. Internet download to disk
Processing from/to internal storage e.g. updating bank statement
Processing from storage to I/O: Processing en-route between storage and

external environment. e.g. printing a bank statement
٤
Structure - Top Level
The Computer is an entity that can interact in some fashion with its external
environment, simply as shown below
A block diagram of the digital computer is shown in the following figure.
The memory unit stores programs as well as input, output, and intermediate
data.
The processor unit performs arithmetic and other data-processing tasks as
specified by a program.
The control unit supervises the flow of information between the various units.
The control unit retrieves the instructions, one by one, from the program
which is stored in memory.
A top level view of the Computer system shows four main structural
components:
Central Processing Unit (CPU): controls the operation of the computer
and performs its data processing functions – called processor.
Main Memory MM: Stores data and Programs.
Input / output, I/O: Moves data between the computer and its external
environment.
System interconnection: Some mechanism that provides for
communication among CPU, MM and I/O.
٥
Structure - The CPU
Most interesting of all parts is the CPU which consists of:
Control Unit, CU: Controls the computer operations.
Arithmetic and Logic Unit, ALU: Performs data processing functions.
Internal Registers: Provide internal storage to CPU.
Internal connection facilities: Some mechanisms that provide
communication among the CU, ALU &Registers.
Each Component will be examined in details later. However, several

approaches of the control unit implementation is possible, but most common is
the micro-programmed implementation. This can be depicted in the
following diagram, which will be discussed also in this course.
٦
Structure - The Control Unit
For this scheme , CU consists of :
Sequential Logic.
Control Unit Registers and Decoders.
Control memory.
٧
Chapter 2
Computer Evolution and Performance
ENIAC background : Electronic Numerical Integrator And Computer

First General Purpose Electronic Digital Computer.
Designed by: John Eckert and John Mauchly
At: University of Pennsylvania
Used for developing Range and Trajectory tables for weapons (may take few
days for single value by one person manually)
Started 1943
Finished 1946
Too late for war effort
Used until 1955 (operates under the Army Ballistic Research Laboratory, BRL
management)
ENIAC - details
Decimal machine (not binary): i.e. number representations and operations
were performed in decimal.
20 accumulators of 10 digits
Programmed manually by switches
18,000 vacuum tubes
30 tons
15,000 square feet
140 kW power consumption
5,000 additions per second
ENIAC – draw backs (disadvantages)

Lots of Failures
Entering and altering data and programs was manual no storage.
Programmed manually by switches setting or plugging and unplugging cables
extremely tedious.
von Neumann/ Turing (proposed as: Electronic Discrete Variable Computer,

EDVAC)
Prototype of all Subsequent General Purpose Computers.
Introduced the Stored Program concept
Developed at: Princeton Institute for Advanced Studies
IAS Computer
Completed 1952
Consists of:
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions from memory and executing them
Input and output equipment operated by control unit
٨
Structure of von Nuemann machine
IAS – details (Features)

Memory: 1000 locations (words) x 40 bits per word
Binary number (data)
2 x 20 bit instructions
• i.e. 8 bits for operation code (Opcode)

• and 12 bits for Address of register
The control unit operates by fetching instructions from the memory and executing
them one at a time.
IAS – Memory Formats
٩
Structure of IAS - detail
A detailed structure diagram of IAS Computer is shown next and has the
followings Set of registers: (storage in CPU)
Memory Buffer Register, MBR: Contains the word to be stored in

memory or used to receive a word from memory.
Memory Address Register, MAR: Specifies the address in memory of

the word to be written from or read into the MBR.
Instruction Register, IR: Contains the 8 bit Opcode instruction Being

executed.
Instruction Buffer Register, IBR:Holds temporarily the right hand inst.
From a word in memory.
Program Counter, PC: Holds the address of the next instruction-pair To

be fetched from memory.
Accumulator, AC and Multiplier Quotient, MQ: Employed to hold
temporarily operands and result of ALU.
١٠
Fetch- Execute Cycle
The system works by a cycle called
Fetch-Execute cycle.
i.e
⌧ Each instruction is brought first from the main memory to the
IR (fetch cycle). Then
⌧ It is executed after being decoded (execute cycle).
as shown below
All instructions are taken sequentially one after the other unless a jump is
executed.
IAS instruction set

For the IAS computer, there are 21 instructions which fall into 5 categories.
i.e.
⌧ Data Transfer: moves data between memory and ALU registers
or between two ALU registers.
e.g. LOAD MQ : transfer contents of MQ to AC
(00001010)
LOAD MQ, M(X) : transfer contents of mem. loc. X to MQ
(00001001)
STOR M(X) : transfer contents of AC to mem. loc. X
(00100001)
LOAD M(X) : transfer M(X) to AC
(00000001)
LOAD –M(X) : transfer -M(X) to AC
(00000010)
LOAD |M(X)| : transfer absolute value of M(X) to AC
(00000011)
LOAD - |M(X)| : transfer -|M(X)| to AC
(00000100)
⌧ Unconditional branch: changes the execution sequence. e.g.

JUMP M(X, 0:19) :Take next inst from left half of M(x)
JUMP M(X, 20:39) :Take next inst from right half of
M(x)
Conditional branch: branching depending on a condition .

JUMP +M(X, 0:19) :if number in the AC is non-ve, take next inst.
from left half of M(X)
JUMP +M(X, 20:39) :if number in the AC is non-ve, take next inst.
from right half of M(X)
١١
⌧ Arithmetic: operations performed by ALU.
e.g. ADD M(X) : add M(X) to AC; put the result in AC
ADD |M(X)| : add |M(X)| to AC; put the result in AC
SUB M(X) : subtract M(X) from AC; put the result in AC
.
.
⌧ Address Modify: change address according to some

calculations in ALU,
e.g. STOR M(X, 8:19) :Replace left address at M(X) by 12 right
most bits in AC.
STOR M(X, 28:39) :Replace right address at M(X) by 12 right
most bits in AC.
Commercial Computers
1947 - Eckert-Mauchly Computer Corporation
UNIVAC I (Universal Automatic Computer)
US Bureau of Census 1950 calculations
Became part of Sperry-Rand Corporation
Late 1950s - UNIVAC II
Faster
More memory
IBM
Punched-card processing equipment
1953 - the 701
IBM’s first stored program computer
Scientific calculations
1955 - the 702
Business applications
Lead to 700/7000 series , which established IBM as the domination computer
manufacturer.
Transistors
Replaced vacuum tubes (1948)
Smaller
Cheaper
Less heat dissipation
Solid State device
Made from Silicon (Sand)
Invented 1947 at Bell Labs
William Shockley et al.
Transistor Based Computers

Second generation machines
NCR & RCA produced small transistor machines
IBM 7000
١٢
DEC - 1957
Produced PDP-1
Microelectronics
Literally - “small electronics”
A computer is made up of gates, memory cells and interconnections
These can be manufactured on a semiconductor
e.g. silicon wafer
Integrated Circuits , IC
Generations of Computer
Vacuum tube - 1946-1957
Transistor - 1958-1964 ( separate devices )
Small scale integration ( SSI ) - 1965 on
Up to 100 devices on a chip or
(up to 10 devices / cm2 )
Medium scale integration ( MSI ) - to 1971
100-3,000 devices on a chip or
( 10 ~ 100 devices / cm2 )
Large scale integration ( LSI ) - 1971-1977
3,000 - 100,000 devices on a chip or
(100 ~ 1000 devices / cm2 )
Very large scale integration ( VLSI ) - 1978 to date
100,000 - 100,000,000 devices on a chip or
(1000 ~ 1000 devices/cm2 )
Ultra large scale integration ( ULSI ) – 1980’s
Over 100,000,000 devices on a chip or
( > 10000 devices / cm2 )
Moore’s Law
Increased density of components on chip
Gordon Moore - cofounder of Intel
Number of transistors on a chip will double every year
Since 1970’s development has slowed a little
Number of transistors doubles every 18 months
Cost of a chip has remained almost unchanged
Higher packing density means shorter electrical paths, giving higher
performance
Smaller size gives increased flexibility
Reduced power and cooling requirements
Fewer interconnections increases reliability
١٣
Growth in CPU Transistor Count
Semiconductor Memory
1970
Fairchild
Size of a single core i.e. 1 bit of magnetic core storage
Holds 256 bits
Non-destructive read
Much faster than core
Capacity approximately doubles each year
Speeding it up
Pipelining
On board cache
On board L1 & L2 cache
Branch prediction
Data flow analysis
Speculative execution
١٤
Performance Mismatch
Processor speed increased
Memory capacity increased
Memory speed lags behind processor speed
DRAM and Processor Characteristics
Solutions
Increase number of bits retrieved at one time
Make DRAM “wider” rather than “deeper”
Change DRAM interface
Cache
Reduce frequency of memory access
More complex cache and cache on chip
Increase interconnection bandwidth
High speed buses
Hierarchy of buses
١٥
Chapter 3
A View of Computer Function and Interconnection
(System Buses)
Program Concept
Hardwired systems are inflexible
General purpose hardware can do different tasks, given correct control signals
Instead of re-wiring, supply a new set of control signals
What is a program?
A sequence of steps
For each step, an arithmetic or logical operation is done
For each operation, a different set of control signals is needed
Function of Control Unit

For each operation a unique code is provided
e.g. ADD, MOVE
A hardware segment accepts the code and issues the control signals
We have a computer!
Computer Components
All computer designs are based on the concept of Von Neumann architecture,
which is based on:
Data and instructions are stored in a single R/W memory.
Contents of memory are addressable.
Execution occurs in a sequential fashion unless explicitly modified.
This configuration can be achieved in two ways:

h/w will perform various functions on data depending on control
signal applied to h/w. as shown below
h/w will perform various functions using general purpose hardware,

which accept data and control signals then produce results. So a unique
code for each possible set of control signal is provided and stored in a
h/w segment, that can accept a code and generate control signals as
shown
١٦
The Control Unit and the Arithmetic and Logic Unit constitute the Central
Processing Unit, CPU.
Data and instructions need to get into the system and results out
Input/output:
Temporary storage of code and results is needed
Main memory :holds data and instructions during execution
Computer Components:
Top Level View
I/O AR : specifies a particular I/O device.

I/O BR : used for exchange of data between an I/O module and CPU.
I/O Module : transfers data from external device to CPU and memory and visa
versa.
Memory locations are defined by sequentially numbered addresses.
١٧
Instruction Cycle
Basically it is the execution of a program, which consists of a set of instructions

stored in the memory and executed sequentially.
It consists of Two steps:
Fetch : reading instruction from the memory.
Execute : execute the instruction.
Basic Instruction cycle
Fetch Cycle
Program Counter (PC) holds address of next instruction to be fetched
Processor fetches instruction from memory location pointed to by PC
Increment PC ( i.e. PC = PC + 1 )
Unless told otherwise
Instruction loaded into Instruction Register (IR)
Processor interprets instruction and performs required actions
Execute Cycle
Processor-memory
data transfer between CPU and main memory
Processor I/O
Data transfer between CPU and I/O module
Data processing
Some arithmetic or logical operation on data
Control
Alteration of sequence of operations
e.g. jump
Combination of above
Example
Suppose a hypothetical machine is used with the following instruction and
integer formats:
١٨
Suppose PC is set to location 300, so the processor will fetch the instruction
from location 300 and PC will change to 301.
The content of loc 300 is put into IR.
This content will be interpreted and action is taken to execute the needed
action. (e.g. if 0001, a load AC with content of address 940.
The next step is loading AC with 003 (which is the content of 940)
Example of Program Execution
١٩
Instruction Cycle - State Diagram
Instruction address calculation (iac): Determines the address of the next

instruction. May involve adding 1 for word addressable memory or 2 for byte
addressable memory in the case of 16 bit word machine.
Instruction fetch (if) : Read inst. From its mem. Loc. To IR.
Instruction operation decoding (iod): analyse inst. To determine type of
operation.
Operand address calculation (oac): if there is a reference to an operand in
mem. Or via I/O.
Operand fetch (of): fetch the operand from mem. Or read it from I/O.
Data operation (do): perform the operation indicated in the instruction.
Operand store (os): write the result into the mem. Or out to I/O
٢٠
Interrupts
Mechanism by which other modules (e.g. I/O) may interrupt normal sequence
of processing. Many classes of interrupt exist:
Program: generated by some conditions that occurs as a result of
instruction execution, such as overflow, division by zero. i.e. illegal
machine instruction.
Timer: generated by a timer within the processor. This allows the OS
to perform certain functions on regular basis.
Generated by I/O controller to signal normal operation of an operation
or to signal a variety of error conditions.
Generated by a failure, such as power failure or memory parity error.
Interrupt is used to improve processing efficiency, as shown in the following example
Program Flow Control
(a) No interrupts: the processor is stopped for all the write operation, which
could be quite long.
the I./O program consists of:
• Sequence of operations to prepare to actual I/O
operation( include copying data to printer buffer &
prepare printer commands) - - - > label 4.
• Actual I/O command.
• Sequence of instructions to complete the operation

(and may set a flag of success or failure of printing)
- - - > label 5.
(b) Interrupts is included (Short I/O wait):

So when a write call is executed, it invokes the I/O program and only
preparation of I/O command is started.
Then the program continues execution while data is transferring from

memory to printer, concurrently.
٢١
When I/O device is ready for more output, the interrupt handler
interrupts the program again (by sending interrupt signal), which
suspends operation of the program, branches to service the I/O device,
then resumes the program execution, as in the figure (b)
(c) Interrupts with Long I/O wait is possible, and two actions are processed
concurrently on the output device in two places, as shown in the diagram .
Example for the timing diagram
Interrupt Cycle
Added to instruction cycle

Processor checks for interrupt
Indicated by an interrupt signal
If no interrupt, processor proceeds to fetch next instruction
If interrupt pending:
Suspend execution of current program
Save context (contents and all intermediate values and addresses.
Set PC to start address of interrupt handler routine (which is part of the
OS)
Process interrupt
Restore context and continue interrupted program
٢٢
Instruction Cycle (with Interrupts) - State Diagram
Multiple Interrupts
Disable interrupts
Processor will ignore further interrupts whilst processing one interrupt
Interrupts remain pending and are checked after first interrupt has been
processed
Interrupts handled in sequence as they occur
Define priorities
Low priority interrupts can be interrupted by higher priority interrupts
When higher priority interrupt has been processed, processor returns to
previous interrupt
Multiple Interrupts – Sequential
٢٣
Multiple Interrupts – Nested
Connecting
The Computer is a network of basic modules. All the units must be connected
by paths called interconnection structure (busses) that depend on exchanges
made between these modules.
Different type of connections for different type of units as follows:
Memory: consists of N words of equal length addressed (0,1,2,..,N-1)

• Operations :
– Read, write control
– Input: Data and Addresses
– Output: Data
Memory Connection
Receives and sends data
Receives addresses (of locations)
Receives control signals
Read
Write
Timing
٢٤
Input / Output ( I/O Module):
Similar to memory from computer’s
viewpoint
Output
Receive data from computer
Send data to peripheral
Input
Receive data from peripheral
Send data to computer
Receive control signals from computer

Send control signals to peripherals
e.g. spin disk
Receive addresses from/to computer
e.g. port number to identify peripheral
Send interrupt signals (control)
Processor:
Reads in instruction and data
Writes out data (after processing)
Sends control signals to other units
Receives (& acts on) interrupts
٢٥
Generally, the interconnection structure must support the following types of
transfer:
⌧ Memory to processor
⌧ Processor to memory
⌧ I/O to processor
⌧ Processor to I/O
⌧ I/O to/from memory
Buses
There are a number of possible interconnection systems

Single and multiple BUS structures are most common
e.g. Control/Address/Data bus (PC)
e.g. Unibus (DEC-PDP)
What is a Bus?
A bus is a communication pathway connecting two or more devices.
Shared transmission medium, with only one transmitting at a time.
Usually broadcast
Often grouped
A number of channels in one bus
e.g. 32 bit data bus is 32 separate single bit channels instead of sending
it on one line serially.
Power lines may not be shown
Bus interconnections:
Computers may have different buses for various purposes, however a bus that
connect various computer components is called System Bus. Such bus would
be used for address, data or control information and may consist of 50~100
lines, each of a particular function.
They may be separated or categorized in three types:
Data
Address
Control
Besides power distribution lines for supply of power to attached
modules.
Data Bus
Data lines: they are paths for data transfer data bus, which consists of 8, 16
or 32 separate lines (called data bus width)- - it affects the speed data transfer.
Carries data
Remember that there is no difference between “data” and “instruction”
at this level
Width is a key determinant of performance
8, 16, 32, 64 bit
٢٦
Address bus
Identify the source or destination address of data in memory or I/O module
e.g. CPU needs to read an instruction (data) from a given location in memory
Bus width determines maximum memory capacity of system
e.g. 8080 has 16 bit address bus giving 64k address space or maximum
memory size.
Control Bus
Used to carry control signal and timing information. Typical control lines
includes the followings:
Memory read/write signal
Interrupt request
Clock signals
I/O read/write signal
Status information
Acknowledgement
Bus request
Bus granting
Interrupt acknowledgement
Reset
Bus Interconnection Scheme
The operation of the bus is done in two steps:

For sending:
1- obtain the use of the bus.
2. Transfer data via the bus
For recieving data:
1. Obtain the use of the bus.
2. Transfer a request to another module over appropriate control and
address lines. Then wait for the other module to send the data .
What do buses look like?

Parallel lines on circuit boards
Ribbon cables
Strip connectors on mother boards e.g. PCI
Sets of wires
٢٧
Single Bus Problems
Lots of devices or modules on one bus leads to:

Propagation delays
⌧ Long data paths mean that co-ordination of bus use can
adversely affect performance
Bus becomes bottleneck
⌧ if aggregate data transfer approaches bus capacity
Most systems use multiple buses to overcome these problems
The solution to overcome these problems can be done by one of the

followings:
(1) Increasing the speed of transfer rate by using wider buses
- e.g. use 64 line data bus instead of 32 lines
(2) Using multiple buses as shown in the traditional bus architecture in the next slide.
Traditional (ISA) (with cache)
٢٨
This bus hierarchy introduces cache memory while main memory is isolated
from CPU. This makes in/out transfer of memory does not interfere with
processor activities.
Another expansion bus is added to increase efficiency, allowing for support of

a wide variety of I/O devices, such as:
Network connection to LAN,
SCSI (Small Computer System Interface) to local hard disk or another
devices
Serial port to a printer
Scanner
Modem to the internet, . . .
Another improvement is the use of high performance architecture, known as:

mezzanine architecture, where a high – speed bus is integrated in the system as in
the following
High Performance Bus
The cache controller is integrated into a bridge or buffer device, connecting to

high speed bus, which supports:
high speed LAN (such as Ethernet 100 Mbps),
video and graphic controllers
SCSI and
fire wire controller. (Fire wire: high speed bus arrangement designed
for high capacity I/O devices).
Also lower speed devices such as modem, serial printer, fax are possible on an
expansion bus interface.
٢٩
Elements of Bus Design
They can be categorized as follows:

1. Type:
Dedicated
Multiplexed
2. Method of arbitration:
Centralized
Decentralized
3. Timing:
Synchronous
Asynchronous
4. Bus width
Address
Data
5. Data Transfer type:
Read
Write
Read-modify- write
Read – after – write
Block
1. Bus Types:
Dedicated
Permanently assigned bus lines to one function or physical subset of
computer components.
e.g. Separate data & address lines
Multiplexed
Data and address can be sent on the same bus (or shared lines) by using
an address valid or data valid control line.
⌧ Advantage - fewer lines are used, saving space & cost.
⌧ Disadvantages
• More complex control
• Ultimate performance (performance reduction)
2. Bus Arbitration
Centralized arbitration
Only single bus controller (or arbiter) is responsible for allocating
time on the bus (bus access) for each module.
[May be part of CPU or separate]
Decentralized arbitration
More than one module controlling the bus, each module has its access
control logic and all modules act together to share the bus. But only
one module may control bus at one time.
In both modules, a data transfer can be initiated with another I/O device, which acts
as slave for certain exchange, e.g. CPU and DMA controller.
٣٠
3. Timing
Co-ordination of events on bus, it can synchronous or asynchronous.
In Synchronous time, it is simple, but all tied to the fixed clock rate.
In Asynchronous timing, mixing of old and new (or slow and fast) devices is
possible. But more complicated control of the buses is required.
Synchronous
Events determined by clock signals
Control Bus includes clock line, that contain the sequence of the
clock pulses and can be read by all other devices
A single 1-0 is a bus cycle
Usually sync on leading edge
Usually a single cycle for an event
A read statement is shown next
٣١
Synchronous Timing Diagram
(for a read operation)
٣٢
Asynchronous Timing
The occurrence of one event on a bus follows another event occurrence
and depends on it.
The asynchronous timing diagram for READ operation is:
٣٣
The asynchronous timing diagram for WRITE operation is:
4. Bus width
The wider the bus for data, the greater the number of bits transferred for one
time.
The wider the bus for address, the greater the range of locations that are
referred to.
e.g.
data bus : 8 lines means 8 bits/transfer
16 lines means 16 bits/transfer ..etc
Address bus: 8 lines means 28 = 256 locations

16 lines means 216 = 64000 locations. ..etc
٣٤
5. Data Transfer Type
Write (multiplexed) operation
Read (multiplexed) operation
Write (non-multiplexed) operation

Data and address send by master in same cycle over separate bus lines
Read (non-multiplexed) operation Putting address on address bus, a

read control signal & wait for data to come on data bus
Read – modify – write operation For data bus applications
Read – modify – write operation For checking written data
Block data transfer For multiple read or write
٣٥
PCI Bus
Peripheral Component Interconnection
Intel released to public domain, in 1990.
32 or 64 bit
50 lines
PCI is a high bandwidth processor-independent bus that function as a

mezzanine or peripheral bus.
Better system performance for high speed I/O subsystems (e.g graphic
display adaptor network controllers, disk controllers, etc.)
Currently allows: 64 data bus, 66 MHz and
transfer rate of 528 Mbyte/s, or 4.224 Gbps.
PCI Bus Lines (required)

Systems lines
Including clock and reset
Address & Data
32 time multiplexed lines for address/data
Interrupt & validate lines
Interface Control: control the timing of transaction and provide coordination
among initiators and targets.
Arbitration pins:
Not shared. i.e. each PCI master has its own pair of arbitration lines
that connect it directly to the PCI bus arbiter.
Error lines: used to report parity and other errors
PCI Bus Lines (Optional)
Interrupt lines
Not shared
Cache support
64-bit Bus Extension
Additional 32 lines
Time multiplexed
2 lines to enable devices to agree to use 64-bit transfer
JTAG/Boundary Scan
For testing procedures defined in IEEE Standard 1149.1
Further Reading:
www.pcguide.com /ref/mbsys/buses/
www.pcguide.com
٣٦
Chapter 4 & 5
Internal Memory and Cache Memory
Characteristics:
Memories are characterized by:
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organisation
Location
Internal (or CPU) - - -[Accessed directly]
Internal
Main memory processor or CU registers
Cache
External - - [Always accessed via I./O controller]

Peripheral storage devices:
⌧ Disks
⌧ tapes
Capacity (or maximum size)

It is either measured in number of words, bytes or bits. Where:
Word size:The natural unit of organisation of memory = data length
(= some times instruction length.)
Byte is 8 bits.
For internal; the memory capacity is expressed in bytes (8 bits) or words (8,
16, 32 or 64 bits)
For external; the memory capacity is expressed in bytes.
٣٧
Unit of Transfer
Internal: it is the number of data lines(in/out) of a memory module –word
length
Usually governed by data bus width
External : it is the number of bits read out or written into the memory at a time
Usually a block of data (2 ~4 kbytes) which is much larger than a word
Addressable unit:
Smallest location which can be uniquely addressed
Usually it is the Word itself internally, but some allows byte level.
However, if address of length A bits, then number of addressable units
is 2A n
Access Methods
Sequential access
Start at the beginning and read through in order
Access time depends on location of data and next location
e.g. tape
Direct access
Individual blocks have unique address
Access is by jumping to vicinity plus sequential search
Access time depends on location and previous location
e.g. disk
Random
Individual addresses identify locations exactly, (i.e. each has physical
wired-in mechanism).
Access time is independent of location or previous access, (i.e. same
for all locations)
e.g. RAM
Associative
Data is located by a comparison with contents of a portion of the store
Access time is independent of location or previous access
e.g. cache
٣٨
Performance
Access time: The time it takes to perform a read or write operation. i.e.
Time between presenting the address and getting the valid data
Memory Cycle time
Time may be required for the memory to “recover” before next access
Cycle time = access time + recovery time
Transfer Rate
Rate at which data can be moved or transferred into or out of a
memory unit.
For RAM, it is = 1/Cycle time.
But for non-RAM it is:
TN = TA + (N/R)
Where
TN : Average time to read or write N bits
TA: Average access time
N: number of bits
R: Transfer rate, (bits/sec.)
Physical Types
Semiconductor
RAM
Magnetic
Disk & Tape
Optical
CD & DVD
Others
Bubble
Hologram
Physical Characteristics
Volatility: (volatile memory: the memory that loses its contents when the
power is switched off, e.g semiconductor memory).
Non-volatile: e.g: magnetic memory and Read Only Memory, ROM
Erasable (altered) and non-erasable (permanent);

e.g magnetic or
Read Only Memory, ROM.
Power consumption
٣٩
Organisation
The design issue of a memory or the Physical arrangement of bits into words
Not always obvious
e.g. interleaved
Memory Hierarchy
Roughly, it includes the followings:
Registers
In CPU
Internal or Main memory

May include one or more levels of cache
“RAM”
External memory
Backing store
The Bottom Line

The questions raised at the design process are:
How much?
Capacity or size: depends on applications, and it could be utilized if it
is there.
How fast?
Time is money: It should be able to keep up with the processor.
How expensive? It must be reasonable and affordable.
Trade off between cost, capacity And access time;

Faster access time --- greater cost/bit
Greater capacity --- Smaller cost
Greater Capacity --- Slower access time.
For better balance do not use single memory component or technology
employ memory Hierarchy, as below:
٤٠
Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
So you want fast?

It is possible to build a computer which uses only static RAM (see later)
This would be very fast
This would need no cache
How can you cache cache?
This would cost a very large amount
Therefore smaller, more expensive and faster memories are supplemented by

larger, cheaper and slower memories
Example
Suppose in a computer system, there are two memory levels;
Level 1 of access time 0.1 μsec. (Cache) of
Capacity 1000 words and
Level 2 of access time 1 μsec (RAM) of
Capacity 100 000 words.
The words are transferred to level 1, then accessed directly by the processor,
(Ignore the time required to determine whether the word in level 1 or 2)
Now if 95% of accessed words are found in cache, then the average time to
access a word is = T1 + T2
= (0.95)*(0.1 μs) + (0.05)*(1 μs) = 0.15 μs
Locality of Reference
During the course of the execution of a program, memory references tend to
cluster .e.g. loops
٤١
Semiconductor Memory
Main Memory: called Random Access Memory (RAM)
Built of Semiconductor micro-electronics, With the following features:
Misnamed because all semiconductor memories are random access and
not only main
Read/Write
Volatile
Temporary storage
Either (1) Dynamic OR (2) Static
Dynamic RAM
Bits stored as charge in capacitors
Charges leak
Need refreshing even when
powered
Simpler construction
Smaller per bit
Less expensive
Need refresh circuits
Slower
Suitable as Main memory
Static RAM
Bits stored as on/off switches
No charges to leak
No refreshing needed when
powered
More complex construction
Larger per bit
More expensive
Does not need refresh circuits
Faster
Suitable as Cache
٤٢
Read Only Memory (ROM)
Permanent storage
Useful for Microprogramming and other applications like:
Library subroutines frequently used
Systems programs (BIOS)
Function tables
Problems with ROM:

High cost – especially for less copies.
No room for errors – throw out.
Types of ROM
Written during manufacture (simple ROM)
Very expensive for small runs
Programmable (once)
PROM: can be programmed electrically once.
Cheep and Convenience BUT Needs special equipment to program
Read “mostly”: Can be used many times
Erasable Programmable (EPROM)
⌧ Erased by UV
Electrically Erasable (EEPROM)
⌧ Takes much longer to write than read
Flash memory
⌧ Erase whole memory electrically
Semiconductor Memory Types
٤٣
Organisation:
The basic element of memory is memory cell having the following features:
Exhibit two stable states – representing 0 and 1.
Can be written-in (at least once)
Can be sensed or read.
Basic element has three terminals:

Select: to select the cell for read or write.
Control: to indicate Read or Write
In/Out terminal:
for data in and data out (or sensed data)
Memory Organisation in details

Each memory unit contains an array of memory cells. It is organized into m words of
b bits each.
For example:
A 16Mbit chip can be organised as 1M of 16 bit words
A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip
1 and so on
A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array (four square
arrays)
Reduces number of address pins
⌧ Multiplex row address and column address
⌧ 11 pins to address (211=2048)
⌧ Adding one more pin doubles range of values so x4 capacity
٤٤
Typical 16 Mb DRAM (4M x 4)
Notes
Address lines A0 .. A10 are half the number expected for 2048X2048 array
i.e saving on number of pins.
1st ,11 bits defines the row address and then
2nd , 11 bits defines the column address.
These signal are accompanied by the Row Address Select RAS and the
Column Address Select CAS.
WE : Write enable
OE : Read enable
Refresh Counter is needed in DRAM’s. It steps through all of the row value.
For each row, the output lines from the refresh counter are supplied to the row
decoder and RAS line is activated. This causes each cell in the row to be
refreshed.
٤٥
Packaging
D1 – D4 : in / out
A0 – A10 , CAS, RAS, OE &WE : in
only
VCC : power supply to the chip

VSS : Ground Pin
Error Correction
Hard Failure
Permanent defect, that memory can not store data reliably.
Caused by:
⌧ harsh environment abuse,
⌧ manufacturing defect and
⌧ Wear.
Soft Error
Random, non-destructive
No permanent damage to memory
Caused by:
⌧ Power supply problems
⌧ Alpha particles (from radio active decay)
Both hard and soft error are undesirable and some logic are included for
detection and correction, e.g. Detected using Hamming error correcting code
٤٦
Error Correcting Code Function
Generally, this process can be of the following form (see next slide):
If a data of M – bits is to be read into the memory, a calculation
depicted as a function f is performed on the data, producing a code of
K – bits.
Then M + K bits are to be stored.
When a word is read out, the code is used to detect and possibly
correct errors. By comparing a new K – bits code generated from M
with that fetched from the memory.
The result is on of three:
1- no error
2- An error is detected, but can be corrected.
3- An error is detected but can not be corrected
Error Correcting Code Function
Cache
Small amount of fast memory- giving speed to available memory of large size
and less expensive types.
Sits between normal main memory and CPU
May be located on CPU chip or module
٤٧
Cache operation - overview
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of main memory is in each cache
slot
٤٨
Cache Read Operation – Flowchart
Cache Design
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
٤٩
Size does matter
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
Typical Cache Organization
٥٠
Comparison of Cache Sizes
Processor Type Year of L1 cachea L2 cache L3 cache

I t d ti
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
IBM SP High-end server/ 2000 64 KB/32 KB 8 MB —
t
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Mapping Function
Cache of 64 k Bytes
Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes
16 M Bytes main memory
24 bit address
(224=16M)
Direct Mapping
Each block of main memory maps to only one cache line
i.e. if a block is in cache, it must be in one specific place
Address is in two parts
Least Significant w bits identify unique word
Most Significant s bits specify one memory block
The MSBs are split into a cache line field r and a tag of s-r (most significant)
٥١
Direct Mapping Address Structure
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
Direct Mapping Cache Organization
٥٢
Direct Mapping Example
Direct Mapping pros & cons

• Simple
• Inexpensive
• Fixed location for given block
— If a program accesses 2 blocks that map to the same line repeatedly,
cache misses are very high.
٥٣
Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
Fully Associative Cache Organization
٥٤
Associative Mapping Example
٥٥
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
— e.g. Block B can be in any line of set i
• e.g. 2 lines per set
— 2 way associative mapping
— A given block can be in one of 2 lines in only one set
Two Way Set Associative Cache Organization
٥٦
Replacement Algorithms
Direct mapping
• No choice
• Each block only maps to one line
• Replace that line
Associative & Set Associative

• Hardware implemented algorithm (speed)
• Least Recently used (LRU)
• e.g. in 2 way set associative
— Which of the 2 block is LRU?
• First in first out (FIFO)
— replace block that has been in cache longest
• Least frequently used (LFU)
— replace block which has had fewest hits
• Random
Write Policy
Write through
• All writes go to main memory as well as cache
• Multiple CPUs can monitor main memory traffic to keep local (to
CPU) cache up to date
• Lots of traffic
• Slows down writes
Write back
• Updates initially made in cache only
• Update bit for cache slot is set when update occurs
• If block is to be replaced, write to main memory only if update bit
is set
• Other caches get out of sync
• I/O must access main memory through cache
٥٧
Pentium 4 Cache
• 80386 – no on chip cache
• 80486 – 8k using 16 byte lines and four way set associative
organization
• Pentium (all versions) – two on chip L1 caches
— Data & instructions
• Pentium III – L3 cache added off chip
• Pentium 4
— L1 caches
– 8k bytes
– 64 byte lines
– four way set associative
— L2 cache
– Feeding both L1 caches
– 256k
– 128 byte lines
– 8 way set associative
— L3 cache on chip
٥٨
Pentium 4 Block Diagram
٥٩
Chapter 6
External Memory
Types of External Memory

Magnetic Disk
RAID
Removable
Optical
CD-ROM
CD-Writable (WORM)
CD-R/W
DVD
Magnetic Tape
Magnetic Disk
Metal or plastic disk coated with magnetizable material (iron oxide…rust)
Range of packaging
Floppy
Winchester hard disk
Removable hard disk
Data Organization and Formatting

Concentric rings or tracks
Gaps between tracks
Reduce gap to increase capacity
Same number of bits per track (variable packing density)
Constant angular velocity
Tracks divided into sectors
Minimum block size is one sector
May have more than one sector per block
٦٠
Disk Data Layout
٦١
Disk Velocity
• Bit near centre of rotating disk passes fixed point slower than bit on outside of
disk
• Increase spacing between bits in different tracks
• Rotate disk at constant angular velocity (CAV)
— Gives pie shaped sectors and concentric tracks
— Individual tracks and sectors addressable
— Move head to given track and wait for given sector
— Waste of space on outer tracks
– Lower data density
• Can use zones to increase capacity
— Each zone has fixed bits per track
— More complex circuitry
Comparison of Disk Layouts
٦٢
Fixed/Movable Head Disk
Fixed head
One read write head per track
Heads mounted on fixed ridged arm
Movable head
One read write head per side
Mounted on a movable arm
Removable or Not
Removable disk
Can be removed from drive and replaced with another disk
Provides unlimited storage capacity
Easy data transfer between systems
Non removable disk
Permanently mounted in the drive
Multiple Platter
• One head per side
• Heads are joined and aligned
• Aligned tracks on each platter form cylinders
• Data is striped by cylinder
— reduces head movement
— Increases speed (transfer rate)
٦٣
٦٤
Tracks and Cylinders
Floppy Disk
8”, 5.25”, 3.5”
Small capacity
Up to 1.44Mbyte (2.88M never popular)
Slow
Universal
Cheap
Winchester Hard Disk

Developed by IBM in Winchester (USA)
Sealed unit
One or more platters (disks)
Heads fly on boundary layer of air as disk spins
Very small head to disk gap
Getting more robust
Winchester Hard Disk Universal

Cheap
Fastest external storage
Getting larger all the time
Multiple Gigabyte now usual
٦٥
Removable Hard Disk
ZIP
Cheap
Very common
Only 100M
JAZ
Not cheap
1G
L-120 (a: drive)
Also reads 3.5” floppy
Becoming more popular?
Finding Sectors
Must be able to identify start of track and sector
Format disk
Additional information not available to user
Marks tracks and sectors
ST506 format (old!)
Characteristics
Fixed (rare) or movable head
Removable or fixed
Single or double (usually) sided
Single or multiple platter
Head mechanism
Contact (Floppy)
Fixed gap
Flying (Winchester)
Speed
Seek time
Moving head to correct track
(Rotational) latency
Waiting for data to rotate under head
Access time = Seek + Latency
Transfer rate
٦٦
Timing of Disk I/O Transfer
RAID
Redundant Array of Independent Disks
Redundant Array of Inexpensive Disks
6 levels in common use
Not a hierarchy
Set of physical disks viewed as single logical drive by O/S
Data distributed across physical drives
Can use redundant capacity to store parity information
RAID 0
No redundancy
Data striped across all disks
Round Robin striping
Increase speed
Multiple data requests probably not on same disk
Disks seek in parallel
A set of data is likely to be striped across multiple disks
RAID 1
Mirrored Disks
Data is striped across disks
2 copies of each stripe on separate disks
Read from either
Write to both
Recovery is simple
Swap faulty disk & re-mirror
No down time
Expensive
٦٧
RAID 2
Disks are synchronized
Very small stripes
Often single byte/word
Error correction calculated across corresponding bits on disks
Multiple parity disks store Hamming code error correction in corresponding
positions
Lots of redundancy
Expensive
Not used
RAID 3
Similar to RAID 2
Only one redundant disk, no matter how large the array
Simple parity bit for each set of corresponding bits
Data on failed drive can be reconstructed from surviving data and parity info
Very high transfer rates
RAID 4
Each disk operates independently
Good for high I/O request rate
Large stripes
Bit by bit parity calculated across stripes on each disk
Parity stored on parity disk
RAID 5
Like RAID 4
Parity striped across all disks
Round robin allocation for parity stripe
Avoids RAID 4 bottleneck at parity disk
Commonly used in network servers
RAID 6
• Two parity calculations
• Stored in separate blocks on different disks
• User requirement of N disks needs N+2
• High data availability
— Three disks need to fail for data loss
— Significant write penalty
٦٨
RAID 0, 1, 2
٦٩
RAID 3 & 4
٧٠
RAID 5 & 6
Optical Storage CD-ROM

Originally for audio
650Mbytes giving over 70 minutes audio
Polycarbonate coated with highly reflective coat, usually aluminum
Data stored as pits
Read by reflecting laser
Constant packing density
Constant linear velocity
٧١
CD-ROM Drive Speeds
Audio is single speed
Constant linier velocity
1.2 ms-1
Track (spiral) is 5.27km long
Gives 4391 seconds = 73.2 minutes
Other speeds are quoted as multiples
e.g. 24x
The quoted figure is the maximum the drive can achieve
٧٢
CD-ROM Format
Mode 0=blank data field

Mode 1=2048 byte data+error correction
Mode 2=2336 byte data
Random Access on CD-ROM

Difficult
Move head to rough position
Set correct speed
Read address
Adjust to required location
Other Optical Storage

CD-Writable
WORM
Now affordable
Compatible with CD-ROM drives
CD-RW
Erasable
Getting cheaper
Mostly CD-ROM drive compatible
DVD - what’s in a name?

Digital Video Disk
Used to indicate a player for movies
⌧ Only plays video disks
Digital Versatile Disk
Used to indicate a computer drive
⌧ Will read computer disks and play video disks
٧٣
DVD - technology
Multi-layer
Very high capacity (4.7G per layer)
Full length movie on single disk
Using MPEG compression
Finally standardized (honest!)
Movies carry regional coding
Players only play correct region films
Can be “fixed”
CD and DVD
٧٤
Magnetic Tape
Serial access
Slow
Very cheap
Backup and archive
Digital Audio Tape (DAT)

Uses rotating head (like video)
High capacity on small tape
4Gbyte uncompressed
8Gbyte compressed
Backup of PC/network servers
٧٥
Chapter 7
Input/Output
Input/Output Problems
Wide variety of peripherals
Delivering different amounts of data
At different speeds
In different formats
All slower than CPU and RAM
Need I/O modules
Input/Output Module
Interface to CPU and Memory through system bus
Interface to one or more peripherals
Contains logic for performing communication function between peripherals
and bus system, why?
1-not possible to have various logic in the processor for various peripherals
2- different data transfer rate – not matching
3- different data formats at peripherals.
i.e. I/O module has two major functions:

1- interface to proc.& memory via system bus.
2-interface to one or more peripherals by tailoring data links.
٧٦
GENERIC MODEL OF I/O
External Devices
An external device attached to the I/O module would be considered as shown
diagrammatically:
٧٧
Control signals: Read , Write
Data: bits to send or receive
from I/O module
Status: Ready, not Ready
Control logic: Controls device
operation
Transducer: converts data
from / to electrical to other
forms of energy
Buffer: for temporary storage
(size 8 to 16 bits)
External Devices
Human readable: suitable for communication with computer users
Screen, printer, keyboard
Machine readable: suitable for communication with equipments
Monitoring and control, disk, tape, CD, sensor
Communication: suitable for communication with remote devices.
Modem, Network Interface Card (NIC)
I/O Module Function

Control & Timing: Coordinates flow of traffic between internal resources and
external devices. For processor it might include:
Checks status of external devices.
Returns device status
Request data transfer
Transfers data from I/O
CPU Communication: Communication between I/O module & processor,

includes:
Command decoding
Exchange of data
Status reporting
Address recognition
Device Communication: Involves command status information and data
٧٨
I/O Module Function (continue)
Data Buffering: This due to different data rates for different device
Error Detection: I/O module reports the electrical and mechanical failures,
such as paper jam, bad disk track, parity control, .. Etc.
I/O Steps
CPU checks I/O module device status
I/O module returns status
If ready, CPU requests data transfer
I/O module gets data from device
I/O module transfers data to CPU
Variations for output, DMA, etc.
I/O Module Diagram Block diagram of an I/O module
٧٩
Data transfer to / from the module are buffered in one or more data registers.
Also one or more status registers (which could also function as control
register)
Each I/O module has a unique address or a set of unique addresses.
I/O module contains logic specific to the interface with each device that it
controls.
I/O modules that do a lot of processing, presenting high level interface to

processor called I/O channel or I/O processor. BUT simple that requires
detailed control is called Device controller. (the former is found in mainframes
and the latter is found in microcomputers and PC)
Input Output Techniques

There are three possible techniques for I/O operation:
Programmed I/O: data are exchanged between processor & I/O
module. The processor has to wait until I/O operation is complete
important processor time may be wasted.
Interrupt-driven I/O: the processor issues I/O command, continues to

execute other jobs until interrupted by the I/O module when its work is
completed.
Direct Memory Access (DMA): I/O module & MM exchange data

directly with out processor involvement.
٨٠
Programmed I/O
CPU has direct control over I/O
Sensing status
Read/write commands
Transferring data
CPU waits for I/O module to complete operation
Wastes CPU time
Programmed I/O - detail

CPU requests I/O operation
I/O module performs operation
I/O module sets status bits
٨١
CPU checks status bits periodically
I/O module does not inform CPU directly
I/O module does not interrupt CPU
CPU may wait or come back later
I/O Commands
CPU issues address
Identifies module (& device if >1 per module)
CPU issues command
Control - telling module what to do
⌧ e.g. spin up disk
Test - check status
⌧ e.g. power? Error?
Read/Write
⌧ Module transfers data via buffer from/to device
Addressing I/O Devices

Under programmed I/O data transfer is very like memory access (CPU
viewpoint)
Each device given unique identifier
CPU commands contain identifier (address)
I/O Mapping
Memory mapped I/O
Devices and memory share an address space
I/O looks just like memory read/write
No special commands for I/O
⌧ Large selection of memory access commands available
Isolated I/O
Separate address spaces
Need I/O or memory select lines
Special commands for I/O
⌧ Limited set
Interrupt Driven I/O

Overcomes CPU waiting
No repeated CPU checking of device
I/O module interrupts when ready
٨٢
Interrupt Driven I/O Basic Operation
CPU issues read command
I/O module gets data from peripheral whilst CPU does other work
I/O module interrupts CPU
CPU requests data
I/O module transfers data
Simple Interrupt Processing
٨٣
CPU Viewpoint
Issue read command
Do other work
Check for interrupt at end of each instruction cycle
If interrupted:-
Save context (registers)
Process interrupt
⌧ Fetch data & store
Direct Memory Access

• Interrupt driven and programmed I/O require active CPU intervention
— Transfer rate is limited
— CPU is tied up
• DMA is the answer.
— Additional Module (hardware) on bus is used.
— DMA controller takes over from CPU for I/O
٨٤
Typical DMA Module Diagram
DMA Operation
• CPU tells DMA controller:-
— Read/Write
— Device address
— Starting address of memory block for data
— Amount of data to be transferred
• CPU carries on with other work
• DMA controller deals with transfer
• DMA controller sends interrupt when finished
DMA Transfer Cycle Stealing

• DMA controller takes over bus for a cycle
• Transfer of one word of data
• Not an interrupt
— CPU does not switch context
• CPU suspended just before it accesses bus
— i.e. before an operand or data fetch or a data write
• Slows down CPU but not as much as CPU doing transfer
٨٥
DMA and Interrupt Breakpoints During an Instruction Cycle
٨٦
DMA Configurations
(a)
• Single Bus, Detached DMA controller

• Each transfer uses bus twice
— I/O to DMA then DMA to memory
• CPU is suspended twice
(b)
• Single Bus, Integrated DMA controller

• Controller may support >1 device
• Each transfer uses bus once
— DMA to memory
• CPU is suspended once
(c)
• Separate I/O Bus

• Bus supports all DMA enabled devices
• Each transfer uses bus once
— DMA to memory
• CPU is suspended once
٨٧
Chapter 11
Instruction Sets:
Addressing Modes and Formats
Addressing Modes
• Immediate
• Direct
• Indirect
• Register
• Register Indirect
• Displacement (Indexed)
• Stack
Immediate Addressing
• Operand is part of instruction
• Operand = address field
• e.g. ADD 5
— Add 5 to contents of accumulator
— 5 is operand
• No memory reference to fetch data
• Fast
• Limited range
Immediate Addressing Diagram
٨٨
Direct Addressing
• Address field contains address of operand
• Effective address (EA) = address field (A)
• e.g. ADD A
— Add contents of cell A to accumulator
— Look in memory at address A for operand
• Single memory reference to access data
• No additional calculations to work out effective address
• Limited address space
Direct Addressing Diagram
Indirect Addressing
• Memory cell pointed to by address field contains the address of (pointer to)
the operand
• EA = (A)
— Look in A, find address (A) and look there for operand
• e.g. ADD (A)
— Add contents of cell pointed to by contents of A to accumulator
• Large address space
• 2n where n = word length
• May be nested, multilevel, cascaded
— e.g. EA = (((A)))
– Draw the diagram yourself
• Multiple memory accesses to find operand
• Hence slower
٨٩
Indirect Addressing Diagram
Register Addressing
• Operand is held in register named in address filed
• EA = R
• Limited number of registers
• Very small address field needed
— Shorter instructions
— Faster instruction fetch
• No memory access
• Very fast execution
• Very limited address space
• Multiple registers helps performance
— Requires good assembly programming or compiler writing
— N.B. C programming
• register int a;
Register Addressing Diagram
٩٠
Register Indirect Addressing
• EA = (R)
• Operand is in memory cell pointed to by contents of register R
• Large address space (2n)
• One fewer memory access than indirect addressing
Register Indirect Addressing Diagram
Displacement Addressing
• EA = A + (R)
• Address field hold two values
— A = base value
— R = register that holds displacement
— or vice versa
Displacement Addressing Diagram
٩١
Relative Addressing
• A version of displacement addressing
• R = Program counter, PC
• EA = A + (PC)
• i.e. get operand from A cells from current location pointed to by PC
Base-Register Addressing
• A holds displacement
• R holds pointer to base address
• R may be explicit or implicit
• e.g. segment registers in 80x86
Indexed Addressing
• A = base
• R = displacement
• EA = A + R
• Good for accessing arrays
— EA = A + R
— R++
Combinations
• Post index
• EA = (A) + (R)
• Pre index
• EA = (A+(R))
• (Draw the diagrams)
Stack Addressing
• Operand is (implicitly) on top of stack
• e.g.
— ADD Pop top two items from stack and add
٩٢
Pentium Addressing Modes
• Virtual or effective address is offset into segment
— Starting address plus offset gives linear address
— This goes through page translation if paging enabled
• 12 addressing modes available
— Immediate
— Register operand
— Displacement
— Base
— Base with displacement
— Scaled index with displacement
— Base with index and displacement
— Base scaled index with displacement
— Relative
Pentium Addressing Mode Calculation
٩٣
Instruction Formats
• Layout of bits in an instruction
• Includes opcode
• Includes (implicit or explicit) operand(s)
• Usually more than one instruction format in an instruction set
Instruction Length
• Affected by and affects:
— Memory size
— Memory organization
— Bus structure
— CPU complexity
— CPU speed
• Trade off between powerful instruction repertoire and saving space
Allocation of Bits
• Number of addressing modes
• Number of operands
• Register versus memory
• Number of register sets
• Address range
• Address granularity
PDP-10 Instruction Format
٩٤
Chapter 13
Reduced Instruction Set Computers
Major Advances in Computers(1)

• The family concept
— IBM System/360 1964
— DEC PDP-8
— Separates architecture from implementation
• Microporgrammed control unit
— Idea by Wilkes 1951
— Produced by IBM S/360 1964
• Cache memory
— IBM S/360 model 85 1969
• Solid State RAM

— (See memory notes)
• Microprocessors
— Intel 4004 1971
• Pipelining
— Introduces parallelism into fetch execute cycle
• Multiple processors
The Next Step - RISC

• Reduced Instruction Set Computer
• Key features
— Large number of general purpose registers
— or use of compiler technology to optimize register use
— Limited and simple instruction set
— Emphasis on optimising the instruction pipeline
RISC Characteristics
• One instruction per cycle
• Register to register operations
• Few, simple addressing modes
• Few, simple instruction formats
• Hardwired design (no microcode)
• Fixed instruction format
• More compile time/effort
٩٥
Comparison of processors
٩٦
Chapter 16
Control Unit Operation
Micro-Operations
• A computer executes a program
• Fetch/execute cycle
• Each cycle has a number of steps
— see pipelining
• Called micro-operations
• Each step does very little
• Atomic operation of CPU
Constituent Elements of Program Execution
Fetch - 4 Registers
• Memory Address Register (MAR)
— Connected to address bus
— Specifies address for read or write op
• Memory Buffer Register (MBR)
— Connected to data bus
— Holds data to write or last data read
• Program Counter (PC)
— Holds address of next instruction to be fetched
• Instruction Register (IR)
— Holds last instruction fetched
٩٧
Fetch Sequence (symbolic)
• t1: MAR <- (PC)
• t2: MBR <- (memory)
• PC <- (PC) +1
• t3: IR <- (MBR)
(tx = time unit/clock cycle)
Or
• t1: MAR <- (PC)

• t3: PC <- (PC) +1
• IR <- (MBR)
Indirect Cycle
• MAR <- (IRaddress) - address field of IR
• MBR <- (memory)
• IRaddress <- (MBRaddress)
• MBR contains an address

• IR is now in same state as if direct addressing had been used
• (What does this say about IR size?)
Interrupt Cycle
• t1: MBR <-(PC)
• t2: MAR <- save-address
• PC <- routine-address
• t3: memory <- (MBR)
• This is a minimum
— May be additional micro-ops to get addresses
— N.B. saving context is done by interrupt handler routine, not micro-ops
Execute Cycle (ADD)

• Different for each instruction
• e.g. ADD R1,X - add the contents of location X to Register 1 , result in R1
• t1: MAR <- (IRaddress)
• t3: R1 <- R1 + (MBR)
• Note no overlap of micro-operations
٩٨
Instruction Cycle
• Each phase decomposed into sequence of elementary micro-operations
• E.g. fetch, indirect, and interrupt cycles
• Execute cycle
— One sequence of micro-operations for each opcode
• Need to tie sequences together
• Assume new 2-bit register
— Instruction cycle code (ICC) designates which part of cycle processor
is in
– 00: Fetch
– 01: Indirect
– 10: Execute
– 11: Interrupt
Flowchart for Instruction Cycle
٩٩
Types of Micro-operation
• Transfer data between registers
• Transfer data from register to external
• Transfer data from external to register
• Perform arithmetic or logical operations
Functions of Control Unit

• Sequencing
— Causing the CPU to step through a series of micro-operations
• Execution
— Causing the performance of each micro-op
• This is done using Control Signals
Control Signals
• Clock
— One micro-instruction (or set of parallel micro-instructions) per clock
cycle
• Instruction register
— Op-code for current instruction
— Determines which micro-instructions are performed
• Flags
— State of CPU
— Results of previous operations
• From control bus
— Interrupts
— Acknowledgements
Model of Control Unit
١٠٠
Control Signals - output
• Within CPU
— Cause data movement
— Activate specific functions
• Via control bus
— To memory
— To I/O modules
CPU with Internal Bus
Hardwired Implementation
• Instruction register
— Op-code causes different control signals for each different instruction
— Unique logic for each op-code
— Decoder takes encoded input and produces single output
— n binary inputs and 2n outputs
١٠١
Problems With Hard Wired Designs
• Complex sequencing & micro-operation logic
• Difficult to design and test
• Inflexible design
• Difficult to add new instructions
١٠٢
Chapter 17
Micro-programmed Control
Control Unit Organization
Micro-programmed Control
• Use sequences of instructions (see earlier notes) to control complex operations
• Called micro-programming or firmware
١٠٣
Chapter 18
Parallel Processing
Multiple Processor Organization

• Single instruction, single data stream - SISD
• Single instruction, multiple data stream - SIMD
• Multiple instruction, single data stream - MISD
• Multiple instruction, multiple data stream- MIMD
Single Instruction, Single Data Stream - SISD

• Single processor
• Single instruction stream
• Data stored in single memory
• Uni-processor
Single Instruction, Multiple Data Stream - SIMD

• Single machine instruction
• Controls simultaneous execution
• Number of processing elements
• Lockstep basis
• Each processing element has associated data memory
• Each instruction executed on different set of data by different processors
• Vector and array processors
Multiple Instruction, Single Data Stream - MISD

• Sequence of data
• Transmitted to set of processors
• Each processor executes different instruction sequence
• Never been implemented
Multiple Instruction, Multiple Data Stream- MIMD

• Set of processors
• Simultaneously execute different instruction sequences
• Different sets of data
• SMPs, clusters and NUMA systems
١٠٤
Taxonomy of Parallel Processor Architectures
١٠٥
Tutorial and Sample Exams
1. Explain the difference between computer architecture and computer

organization.
2. What are the main functions of a computer? Describe how the concepts of
function and structure have different meanings.
3. Show (with diagrams) the possible computer operations.
4. What are the main components of the computer? Describe the function of
each.
5. What are the components of CPU? How may the control unit be implemented?
6. What are the main technological characteristics for each computer generation?
Ans: Table2-2.
7. Describe the ENIAC in brief. What was its purpose?
8. What concept was introduced by Von Numann machine (IAS Computer)?
Give the block diagram of IAS computer showing its main parts.
9. Define each of the following: M R, MAR, IR, IBR, PC, AC. Draw the
extended structure for IAS machine.
10. What formats was used for words in IAS memory? What functions did the
IAS instructions have?
11. What benefits are gained from using transistors as compared to vacuum tubes?
12. Explain Moor's law( in 1965). What are its consequences?
13. What are the techniques used to speed up the processing?
14. What is meant by performance mismatch? How it can be solved?
15. Explain the instruction fetch cycle.
16. Given the following instructions and data are stored (hex) in memory:
300 1940
301 5941
302 3940
303 2941
.. ..
940 0010
941 0021
What are the contents of AC, IR, PC and M[941] after executing? Assume that
the initial value of PC is 300 and the following operation codes (op codes) are
used:
0001 Load AC from memory

0010 Store AC into memory
0011 Add Ac to memory word
0100 Shift AC right one bit and put into memory
0101 Shift AC left and put into memory
0110 OR Ac with memory word
17. Show the state diagram of instruction cycle with interrupt.

18. What classes of interrupts exist? Explain.
١٠٦
19. Give an example to compare the program flow of control with and without
interrupt. Show also the program timing.
20. What is the ISR?
21. Explain how multiple interrupts can be handled?
22. Give some examples of control signals. What is the purpose of bus granting
and reset signals?
23. What is the function of the address bus? What is the maximum memory
capacity for 20 bit address lines?
24. Explain the bus operations used in sending and receiving data between
computer modules.
25. What are the problems associated with a single bus structure? What are the
solutions?
26. Give examples for I/O devices that might be attached to the expansion bus (in
traditional bus architecture).
Answer: LAN, SCSI, Serial (printer and scanner), modem.
27. What is the advantage of mazzazine architecture? Draw the arrangement.
Answer: The advantage is that high speed bus brings high demand devices into
closer integration with the processor
28. What are the bus types? What is the advantage of physical dedication? What
are the disadvantages of time multiplexing?
29. What is the purpose of bus arbitration? How is it classified? Compare.
30. What does the term bus timing refer to? Classify the use of bus in terms of
timing. Show the timing diagram for read operation. Draw also the timing of
asynchronous read.
31. Which is better, the synchronous bus or asynchronous?
32. (T/F) The wider the data bus, the greater the number of bits transferred at one
time.
33. (T/F) The width of address bus has an impact on system capacity.
34. (T/F) The maximum memory space that can be addressed with 27 bits address
bus is
35. (T/F) the data bus of 32 bit corresponds to 32-bit MAR.
Al-Isra University Subject: Computer Architecture

Faculty of Science & IT Course No.:605352
Department of Computer Science
Second Semester 2005/2006 Time : 1 Hr.
Second Exam: 9th May2006 Lecturer: Dr. Intisar Al-Shummari
Q1: What is the hit rate required to give an average access time of 20ns in a two-level
memory system if the access time of the top level is 10ns and that of the bottom level
is 60ns.[4 Marks]
Q2: Explain the cache operation. [4 Marks]
Q3: A digital computer has a main memory size of 64K words and a cache that can
hold 2048 word from main memory. Each word of main memory is an 11-bit size.
Compute the tag field and cache size (in bits) when direct mapping is used. [6 Marks]
١٠٧
Q4: Encircle the correct answer for 7 of the following: [7 Marks]
1. The EPROM
a) Can be erased by UV
b) Can be used once
c) Is a volatile memory
d) None of the above
2. The internal memory includes _______
a) CPU
b) Main memory only
c) Main memory, control memory and caches
d) tapes
3. The external storage can be accessed directly by_______
a) CPU
b) I/O controller
c) user
d) system bus
4. The cost per bit is ________ in magnetic memories as compared to main
memory.
a) Increased
b) Decreased
c) The same
5. The data access method in semiconductor memories is_________
a) Direct
b) Sequential
c) Random
d) All the above
6. The static RAM has________
a) Bits stored as charges
b) Larger size per a bit compared to DRAM
c) A refreshment circuit
d) Slower performance compared to DRAM
7. The disadvantage of associative mapping is that_______
a) It needs long time to decode the address
b) It requires a complex comparison circuit
c) It is a non-volatile memory
8. The DRAM is _______
a) nonvolatile
b) suitable for main memory
c) suitable for use as cache
d) is the fastest in the memory hierarchy
Q5: Explain two replacement algorithms used for cache block replacement. [4 Marks]
Al-Isra University Subject: Computer Architecture

Faculty of Science & IT Course No.:605352
Department of Computer Science
Time : 2 Hrs.
Final Exam: Lecturer: Dr. Intisar Al-Shummari
١٠٨
Q1: Encircle the correct answer: [6 Marks]
1. Which of the following is true:
a) Random access method is used in CDs and DVDs
b) Random access method is used in all cache systems.
c) In associative access method, the access time is independent of location or
previous access.
2. In magnetic disks, the time required to position the moving head at the track is
called:
a) Seek time
b) Rotational time
c) Execute time
d) Cycle time
3. The transducer is used in external peripherals to:
a) Hold data temporary
b) Control the data transfer
c) Interrupt CPU
d) Convert data to/from electrical form.
4. Reading an instruction from memory is called:
a) execute
b) Direct
c) Fetch
d) Implied
5. The ______ the bus for data, the greater the number of bits transferred for one
time.
a) Slower
b) Wider
c) Simpler
d) Cheaper
6. In _________ bus, few lines are used but with complex control circuit for each
module.
a) Synchronized
b) Centralized
c) Multiplexed
d) dedicated
Q2: Given the following instructions and data are stored (hex) in memory: [7 Marks]
320 1941
321 4940
322 3942
323 5941
324 2942
.. ..
940 0010
941 0022
942 0008
What are the contents of AC, IR, PC, and M [942] after executing? Assume
that the initial value of PC is 320 and the following operation codes (op codes
in binary) are used:
0001 Load AC from memory
0010 Store AC into memory
0011 Add AC to memory word and put sum into AC
١٠٩
0100 Subtract memory word from AC and put into AC
0101 Shift AC right and put into memory
Q3: A digital computer has a main memory size of 4M words and a cache that can
hold 32K word from main memory. Each block in cache is of 16 words and each
word of main memory is a 16-bit size. Compute the tag field, number of blocks, cache
word (line size) and cache size (in bits) when direct mapping is used. [6 Marks]
Q4: Compute the time required to transfer 2 K bits in a rate of 256 bit per second
from memory system with average access time of 2 seconds. [3 Marks]
Q5: If the average access time in a two level memory system=10ns, what is the hit
ratio for a cache of access time=20ns if the access time of RAM=60ns. [3 Marks]
Q6: Explain the steps of DMA process. How does DMA differ from interrupt? [6
Marks]
Q7: Complete the table for each of the addressing modes if ADD instruction is
executed. Assume initially the AC=02, PC=50, R=85, Index register=30. [6 Marks]
Addressing mode Effective Address Contents of AC
Direct
Immediate
Relative
Register
Register indirect
Indexed
Address Memory contents

50 ADD to AC
51 70
52 Next address
. .
. .
70 22
.
85 44
. .
. .
100 55
. .
122 33
Q8: Explain the disk layouts in CAV and multiple zoned recording. How the data is
recorded form CDs. [5 Marks]
١١٠

Intisar&#39;SComputer Organization and Architecture

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intisar&#39;SComputer Organization and Architecture

Uploaded by

Copyright:

Available Formats

Computer Organization

Al-Isra Private University

Architecture & Organization

Architecture is those attributes visible to the programmer

Instruction set, number of bits used for data representation, I/O

Organization is how features are implemented

Control signals, interfaces, memory technology.

All Intel x86 family share the same basic architecture

This gives code compatibility

Structure is the way in which components relate to each other

Functional view of the Computer System

Data Storage : Data transferred from external environment to computer (read)

Processing from/to internal storage e.g. updating bank statement

Processing from storage to I/O: Processing en-route between storage and

A block diagram of the digital computer is shown in the following figure.

Each Component will be examined in details later. However, several

ENIAC background : Electronic Numerical Integrator And Computer

ENIAC – draw backs (disadvantages)

von Neumann/ Turing (proposed as: Electronic Discrete Variable Computer,

IAS – details (Features)

• i.e. 8 bits for operation code (Opcode)

IAS – Memory Formats

Memory Buffer Register, MBR: Contains the word to be stored in

Memory Address Register, MAR: Specifies the address in memory of

Instruction Register, IR: Contains the 8 bit Opcode instruction Being

Program Counter, PC: Holds the address of the next instruction-pair To

IAS instruction set

⌧ Unconditional branch: changes the execution sequence. e.g.

Conditional branch: branching depending on a condition .

⌧ Address Modify: change address according to some

Transistor Based Computers

DRAM and Processor Characteristics

Function of Control Unit

This configuration can be achieved in two ways:

h/w will perform various functions using general purpose hardware,

I/O AR : specifies a particular I/O device.

Basically it is the execution of a program, which consists of a set of instructions

Basic Instruction cycle

Example of Program Execution

Instruction address calculation (iac): Determines the address of the next

Program Flow Control

• Actual I/O command.

• Sequence of instructions to complete the operation

(b) Interrupts is included (Short I/O wait):

Then the program continues execution while data is transferring from

Example for the timing diagram

Added to instruction cycle

Multiple Interrupts – Sequential

Different type of connections for different type of units as follows:

Memory: consists of N words of equal length addressed (0,1,2,..,N-1)

Receive control signals from computer

There are a number of possible interconnection systems

Power lines may not be shown

Bus Interconnection Scheme

The operation of the bus is done in two steps:

What do buses look like?

Lots of devices or modules on one bus leads to:

Most systems use multiple buses to overcome these problems

The solution to overcome these problems can be done by one of the

Traditional (ISA) (with cache)

Another expansion bus is added to increase efficiency, allowing for support of

Another improvement is the use of high performance architecture, known as:

High Performance Bus

The cache controller is integrated into a bridge or buffer device, connecting to

Intisar'SComputer Organization and Architecture

Intisar'SComputer Organization and Architecture