Professional Documents
Culture Documents
Intisar'SComputer Organization and Architecture
Intisar'SComputer Organization and Architecture
and Architecture
Dr.Intisar Al-Shummari
Text Book: Computer Organization and Architecture, William Stalling, 6th edition, 2004
Internet Resources- Web site for book http://williamstallings.com/COA6e.html
١
Chapter 1
Introduction
However, the relation between architecture and organization is very close, and often
not distinguishable.
Introduction
Computers are used in scientific calculations, commercial and business data
processing, air traffic control, space guidance, the educational field, and many other
areas. The most striking property of a digital computer is its generality.
*It can follow a sequence of instructions, called a program, that operates on given
data.
• The user can specify and change programs and/or data according to the specify
need.
٢
Structure & Function
Function
All computer functions are:
Data processing
Data storage
Data movement
Control
Data can move in and out of the three functions under the control function
management as shown diagrammatically below
Functional view
٣
Operations
Data movement: Transferring data from one peripheral to another
e.g. keyboard to screen
٤
Structure - Top Level
The Computer is an entity that can interact in some fashion with its external
environment, simply as shown below
The memory unit stores programs as well as input, output, and intermediate
data.
The processor unit performs arithmetic and other data-processing tasks as
specified by a program.
The control unit supervises the flow of information between the various units.
The control unit retrieves the instructions, one by one, from the program
which is stored in memory.
A top level view of the Computer system shows four main structural
components:
Central Processing Unit (CPU): controls the operation of the computer
and performs its data processing functions – called processor.
Main Memory MM: Stores data and Programs.
Input / output, I/O: Moves data between the computer and its external
environment.
System interconnection: Some mechanism that provides for
communication among CPU, MM and I/O.
٥
Structure - The CPU
Most interesting of all parts is the CPU which consists of:
Control Unit, CU: Controls the computer operations.
Arithmetic and Logic Unit, ALU: Performs data processing functions.
Internal Registers: Provide internal storage to CPU.
Internal connection facilities: Some mechanisms that provide
communication among the CU, ALU &Registers.
٦
Structure - The Control Unit
For this scheme , CU consists of :
Sequential Logic.
Control Unit Registers and Decoders.
Control memory.
٧
Chapter 2
Computer Evolution and Performance
ENIAC - details
Decimal machine (not binary): i.e. number representations and operations
were performed in decimal.
20 accumulators of 10 digits
Programmed manually by switches
18,000 vacuum tubes
30 tons
15,000 square feet
140 kW power consumption
5,000 additions per second
Consists of:
Main memory storing programs and data
ALU operating on binary data
Control unit interpreting instructions from memory and executing them
Input and output equipment operated by control unit
٨
Structure of von Nuemann machine
The control unit operates by fetching instructions from the memory and executing
them one at a time.
٩
Structure of IAS - detail
A detailed structure diagram of IAS Computer is shown next and has the
followings Set of registers: (storage in CPU)
١٠
Fetch- Execute Cycle
The system works by a cycle called
Fetch-Execute cycle.
i.e
⌧ Each instruction is brought first from the main memory to the
IR (fetch cycle). Then
⌧ It is executed after being decoded (execute cycle).
as shown below
All instructions are taken sequentially one after the other unless a jump is
executed.
١١
⌧ Arithmetic: operations performed by ALU.
e.g. ADD M(X) : add M(X) to AC; put the result in AC
ADD |M(X)| : add |M(X)| to AC; put the result in AC
SUB M(X) : subtract M(X) from AC; put the result in AC
.
.
Commercial Computers
1947 - Eckert-Mauchly Computer Corporation
UNIVAC I (Universal Automatic Computer)
US Bureau of Census 1950 calculations
Became part of Sperry-Rand Corporation
Late 1950s - UNIVAC II
Faster
More memory
IBM
Punched-card processing equipment
1953 - the 701
IBM’s first stored program computer
Scientific calculations
1955 - the 702
Business applications
Lead to 700/7000 series , which established IBM as the domination computer
manufacturer.
Transistors
Replaced vacuum tubes (1948)
Smaller
Cheaper
Less heat dissipation
Solid State device
Made from Silicon (Sand)
Invented 1947 at Bell Labs
William Shockley et al.
١٢
DEC - 1957
Produced PDP-1
Microelectronics
Literally - “small electronics”
A computer is made up of gates, memory cells and interconnections
These can be manufactured on a semiconductor
e.g. silicon wafer
Integrated Circuits , IC
Generations of Computer
Vacuum tube - 1946-1957
Transistor - 1958-1964 ( separate devices )
Small scale integration ( SSI ) - 1965 on
Up to 100 devices on a chip or
(up to 10 devices / cm2 )
Medium scale integration ( MSI ) - to 1971
100-3,000 devices on a chip or
( 10 ~ 100 devices / cm2 )
Large scale integration ( LSI ) - 1971-1977
3,000 - 100,000 devices on a chip or
(100 ~ 1000 devices / cm2 )
Very large scale integration ( VLSI ) - 1978 to date
100,000 - 100,000,000 devices on a chip or
(1000 ~ 1000 devices/cm2 )
Ultra large scale integration ( ULSI ) – 1980’s
Over 100,000,000 devices on a chip or
( > 10000 devices / cm2 )
Moore’s Law
Increased density of components on chip
Gordon Moore - cofounder of Intel
Number of transistors on a chip will double every year
Since 1970’s development has slowed a little
Number of transistors doubles every 18 months
Cost of a chip has remained almost unchanged
Higher packing density means shorter electrical paths, giving higher
performance
Smaller size gives increased flexibility
Reduced power and cooling requirements
Fewer interconnections increases reliability
١٣
Growth in CPU Transistor Count
Semiconductor Memory
1970
Fairchild
Size of a single core i.e. 1 bit of magnetic core storage
Holds 256 bits
Non-destructive read
Much faster than core
Capacity approximately doubles each year
Speeding it up
Pipelining
On board cache
On board L1 & L2 cache
Branch prediction
Data flow analysis
Speculative execution
١٤
Performance Mismatch
Processor speed increased
Memory capacity increased
Memory speed lags behind processor speed
Solutions
Increase number of bits retrieved at one time
Make DRAM “wider” rather than “deeper”
Change DRAM interface
Cache
Reduce frequency of memory access
More complex cache and cache on chip
Increase interconnection bandwidth
High speed buses
Hierarchy of buses
١٥
Chapter 3
A View of Computer Function and Interconnection
(System Buses)
Program Concept
Hardwired systems are inflexible
General purpose hardware can do different tasks, given correct control signals
Instead of re-wiring, supply a new set of control signals
What is a program?
A sequence of steps
For each step, an arithmetic or logical operation is done
For each operation, a different set of control signals is needed
We have a computer!
Computer Components
All computer designs are based on the concept of Von Neumann architecture,
which is based on:
Data and instructions are stored in a single R/W memory.
Contents of memory are addressable.
Execution occurs in a sequential fashion unless explicitly modified.
١٦
The Control Unit and the Arithmetic and Logic Unit constitute the Central
Processing Unit, CPU.
Data and instructions need to get into the system and results out
Input/output:
Temporary storage of code and results is needed
Main memory :holds data and instructions during execution
Computer Components:
Top Level View
١٧
Instruction Cycle
Fetch Cycle
Program Counter (PC) holds address of next instruction to be fetched
Processor fetches instruction from memory location pointed to by PC
Increment PC ( i.e. PC = PC + 1 )
Unless told otherwise
Instruction loaded into Instruction Register (IR)
Processor interprets instruction and performs required actions
Execute Cycle
Processor-memory
data transfer between CPU and main memory
Processor I/O
Data transfer between CPU and I/O module
Data processing
Some arithmetic or logical operation on data
Control
Alteration of sequence of operations
e.g. jump
Combination of above
Example
Suppose a hypothetical machine is used with the following instruction and
integer formats:
١٨
Suppose PC is set to location 300, so the processor will fetch the instruction
from location 300 and PC will change to 301.
The content of loc 300 is put into IR.
This content will be interpreted and action is taken to execute the needed
action. (e.g. if 0001, a load AC with content of address 940.
The next step is loading AC with 003 (which is the content of 940)
١٩
Instruction Cycle - State Diagram
٢٠
Interrupts
Mechanism by which other modules (e.g. I/O) may interrupt normal sequence
of processing. Many classes of interrupt exist:
Program: generated by some conditions that occurs as a result of
instruction execution, such as overflow, division by zero. i.e. illegal
machine instruction.
Timer: generated by a timer within the processor. This allows the OS
to perform certain functions on regular basis.
Generated by I/O controller to signal normal operation of an operation
or to signal a variety of error conditions.
Generated by a failure, such as power failure or memory parity error.
Interrupt is used to improve processing efficiency, as shown in the following example
(a) No interrupts: the processor is stopped for all the write operation, which
could be quite long.
the I./O program consists of:
• Sequence of operations to prepare to actual I/O
operation( include copying data to printer buffer &
prepare printer commands) - - - > label 4.
٢١
When I/O device is ready for more output, the interrupt handler
interrupts the program again (by sending interrupt signal), which
suspends operation of the program, branches to service the I/O device,
then resumes the program execution, as in the figure (b)
(c) Interrupts with Long I/O wait is possible, and two actions are processed
concurrently on the output device in two places, as shown in the diagram .
Interrupt Cycle
٢٢
Instruction Cycle (with Interrupts) - State Diagram
Multiple Interrupts
Disable interrupts
Processor will ignore further interrupts whilst processing one interrupt
Interrupts remain pending and are checked after first interrupt has been
processed
Interrupts handled in sequence as they occur
Define priorities
Low priority interrupts can be interrupted by higher priority interrupts
When higher priority interrupt has been processed, processor returns to
previous interrupt
٢٣
Multiple Interrupts – Nested
Connecting
The Computer is a network of basic modules. All the units must be connected
by paths called interconnection structure (busses) that depend on exchanges
made between these modules.
٢٤
Input / Output ( I/O Module):
Similar to memory from computer’s
viewpoint
Output
Receive data from computer
Send data to peripheral
Input
Receive data from peripheral
Send data to computer
Processor:
Reads in instruction and data
Writes out data (after processing)
Sends control signals to other units
Receives (& acts on) interrupts
٢٥
Generally, the interconnection structure must support the following types of
transfer:
⌧ Memory to processor
⌧ Processor to memory
⌧ I/O to processor
⌧ Processor to I/O
⌧ I/O to/from memory
Buses
What is a Bus?
A bus is a communication pathway connecting two or more devices.
Shared transmission medium, with only one transmitting at a time.
Usually broadcast
Often grouped
A number of channels in one bus
e.g. 32 bit data bus is 32 separate single bit channels instead of sending
it on one line serially.
Bus interconnections:
Computers may have different buses for various purposes, however a bus that
connect various computer components is called System Bus. Such bus would
be used for address, data or control information and may consist of 50~100
lines, each of a particular function.
They may be separated or categorized in three types:
Data
Address
Control
Besides power distribution lines for supply of power to attached
modules.
Data Bus
Data lines: they are paths for data transfer data bus, which consists of 8, 16
or 32 separate lines (called data bus width)- - it affects the speed data transfer.
Carries data
Remember that there is no difference between “data” and “instruction”
at this level
Width is a key determinant of performance
8, 16, 32, 64 bit
٢٦
Address bus
Identify the source or destination address of data in memory or I/O module
e.g. CPU needs to read an instruction (data) from a given location in memory
Bus width determines maximum memory capacity of system
e.g. 8080 has 16 bit address bus giving 64k address space or maximum
memory size.
Control Bus
Used to carry control signal and timing information. Typical control lines
includes the followings:
Memory read/write signal
Interrupt request
Clock signals
I/O read/write signal
Status information
Acknowledgement
Bus request
Bus granting
Interrupt acknowledgement
Reset
٢٧
Single Bus Problems
٢٨
This bus hierarchy introduces cache memory while main memory is isolated
from CPU. This makes in/out transfer of memory does not interfere with
processor activities.
Also lower speed devices such as modem, serial printer, fax are possible on an
expansion bus interface.
٢٩
Elements of Bus Design
1. Bus Types:
Dedicated
Permanently assigned bus lines to one function or physical subset of
computer components.
e.g. Separate data & address lines
Multiplexed
Data and address can be sent on the same bus (or shared lines) by using
an address valid or data valid control line.
⌧ Advantage - fewer lines are used, saving space & cost.
⌧ Disadvantages
• More complex control
• Ultimate performance (performance reduction)
2. Bus Arbitration
Centralized arbitration
Only single bus controller (or arbiter) is responsible for allocating
time on the bus (bus access) for each module.
[May be part of CPU or separate]
Decentralized arbitration
More than one module controlling the bus, each module has its access
control logic and all modules act together to share the bus. But only
one module may control bus at one time.
In both modules, a data transfer can be initiated with another I/O device, which acts
as slave for certain exchange, e.g. CPU and DMA controller.
٣٠
3. Timing
Co-ordination of events on bus, it can synchronous or asynchronous.
In Synchronous time, it is simple, but all tied to the fixed clock rate.
In Asynchronous timing, mixing of old and new (or slow and fast) devices is
possible. But more complicated control of the buses is required.
Synchronous
Events determined by clock signals
Control Bus includes clock line, that contain the sequence of the
clock pulses and can be read by all other devices
A single 1-0 is a bus cycle
Usually sync on leading edge
Usually a single cycle for an event
٣١
Synchronous Timing Diagram
(for a read operation)
٣٢
Asynchronous Timing
The occurrence of one event on a bus follows another event occurrence
and depends on it.
٣٣
The asynchronous timing diagram for WRITE operation is:
4. Bus width
The wider the bus for data, the greater the number of bits transferred for one
time.
The wider the bus for address, the greater the range of locations that are
referred to.
e.g.
data bus : 8 lines means 8 bits/transfer
16 lines means 16 bits/transfer ..etc
٣٤
5. Data Transfer Type
Write (multiplexed) operation
٣٥
PCI Bus
Peripheral Component Interconnection
Intel released to public domain, in 1990.
32 or 64 bit
50 lines
Further Reading:
www.pcguide.com /ref/mbsys/buses/
www.pcguide.com
٣٦
Chapter 4 & 5
Characteristics:
Memories are characterized by:
Location
Capacity
Unit of transfer
Access method
Performance
Physical type
Physical characteristics
Organisation
Location
Internal (or CPU) - - -[Accessed directly]
Internal
Main memory processor or CU registers
Cache
For internal; the memory capacity is expressed in bytes (8 bits) or words (8,
16, 32 or 64 bits)
٣٧
Unit of Transfer
Internal: it is the number of data lines(in/out) of a memory module –word
length
Usually governed by data bus width
External : it is the number of bits read out or written into the memory at a time
Usually a block of data (2 ~4 kbytes) which is much larger than a word
Addressable unit:
Smallest location which can be uniquely addressed
Usually it is the Word itself internally, but some allows byte level.
However, if address of length A bits, then number of addressable units
is 2A n
Access Methods
Sequential access
Start at the beginning and read through in order
Access time depends on location of data and next location
e.g. tape
Direct access
Individual blocks have unique address
Access is by jumping to vicinity plus sequential search
Access time depends on location and previous location
e.g. disk
Random
Individual addresses identify locations exactly, (i.e. each has physical
wired-in mechanism).
Access time is independent of location or previous access, (i.e. same
for all locations)
e.g. RAM
Associative
Data is located by a comparison with contents of a portion of the store
Access time is independent of location or previous access
e.g. cache
٣٨
Performance
Access time: The time it takes to perform a read or write operation. i.e.
Time between presenting the address and getting the valid data
Memory Cycle time
Time may be required for the memory to “recover” before next access
Cycle time = access time + recovery time
Transfer Rate
Rate at which data can be moved or transferred into or out of a
memory unit.
For RAM, it is = 1/Cycle time.
But for non-RAM it is:
TN = TA + (N/R)
Where
TN : Average time to read or write N bits
TA: Average access time
N: number of bits
R: Transfer rate, (bits/sec.)
Physical Types
Semiconductor
RAM
Magnetic
Disk & Tape
Optical
CD & DVD
Others
Bubble
Hologram
Physical Characteristics
Volatility: (volatile memory: the memory that loses its contents when the
power is switched off, e.g semiconductor memory).
Non-volatile: e.g: magnetic memory and Read Only Memory, ROM
Power consumption
٣٩
Organisation
The design issue of a memory or the Physical arrangement of bits into words
e.g. interleaved
Memory Hierarchy
Roughly, it includes the followings:
Registers
In CPU
External memory
Backing store
٤٠
Hierarchy List
Registers
L1 Cache
L2 Cache
Main memory
Disk cache
Disk
Optical
Tape
Example
Suppose in a computer system, there are two memory levels;
Level 1 of access time 0.1 μsec. (Cache) of
Capacity 1000 words and
Level 2 of access time 1 μsec (RAM) of
Capacity 100 000 words.
The words are transferred to level 1, then accessed directly by the processor,
(Ignore the time required to determine whether the word in level 1 or 2)
Now if 95% of accessed words are found in cache, then the average time to
access a word is = T1 + T2
= (0.95)*(0.1 μs) + (0.05)*(1 μs) = 0.15 μs
Locality of Reference
During the course of the execution of a program, memory references tend to
cluster .e.g. loops
٤١
Semiconductor Memory
Main Memory: called Random Access Memory (RAM)
Built of Semiconductor micro-electronics, With the following features:
Misnamed because all semiconductor memories are random access and
not only main
Read/Write
Volatile
Temporary storage
Either (1) Dynamic OR (2) Static
Dynamic RAM
Bits stored as charge in capacitors
Charges leak
Need refreshing even when
powered
Simpler construction
Smaller per bit
Less expensive
Need refresh circuits
Slower
Suitable as Main memory
Static RAM
Bits stored as on/off switches
No charges to leak
No refreshing needed when
powered
More complex construction
Larger per bit
More expensive
Does not need refresh circuits
Faster
Suitable as Cache
٤٢
Read Only Memory (ROM)
Permanent storage
Useful for Microprogramming and other applications like:
Library subroutines frequently used
Systems programs (BIOS)
Function tables
Types of ROM
Written during manufacture (simple ROM)
Very expensive for small runs
Programmable (once)
PROM: can be programmed electrically once.
Cheep and Convenience BUT Needs special equipment to program
Read “mostly”: Can be used many times
Erasable Programmable (EPROM)
⌧ Erased by UV
Electrically Erasable (EEPROM)
⌧ Takes much longer to write than read
Flash memory
⌧ Erase whole memory electrically
٤٣
Organisation:
The basic element of memory is memory cell having the following features:
Exhibit two stable states – representing 0 and 1.
Can be written-in (at least once)
Can be sensed or read.
For example:
A 16Mbit chip can be organised as 1M of 16 bit words
A bit per chip system has 16 lots of 1Mbit chip with bit 1 of each word in chip
1 and so on
A 16Mbit chip can be organised as a 2048 x 2048 x 4bit array (four square
arrays)
Reduces number of address pins
⌧ Multiplex row address and column address
⌧ 11 pins to address (211=2048)
⌧ Adding one more pin doubles range of values so x4 capacity
٤٤
Typical 16 Mb DRAM (4M x 4)
Notes
Address lines A0 .. A10 are half the number expected for 2048X2048 array
i.e saving on number of pins.
1st ,11 bits defines the row address and then
2nd , 11 bits defines the column address.
These signal are accompanied by the Row Address Select RAS and the
Column Address Select CAS.
WE : Write enable
OE : Read enable
Refresh Counter is needed in DRAM’s. It steps through all of the row value.
For each row, the output lines from the refresh counter are supplied to the row
decoder and RAS line is activated. This causes each cell in the row to be
refreshed.
٤٥
Packaging
D1 – D4 : in / out
A0 – A10 , CAS, RAS, OE &WE : in
only
Error Correction
Hard Failure
Permanent defect, that memory can not store data reliably.
Caused by:
⌧ harsh environment abuse,
⌧ manufacturing defect and
⌧ Wear.
Soft Error
Random, non-destructive
No permanent damage to memory
Caused by:
⌧ Power supply problems
⌧ Alpha particles (from radio active decay)
Both hard and soft error are undesirable and some logic are included for
detection and correction, e.g. Detected using Hamming error correcting code
٤٦
Error Correcting Code Function
Generally, this process can be of the following form (see next slide):
If a data of M – bits is to be read into the memory, a calculation
depicted as a function f is performed on the data, producing a code of
K – bits.
Then M + K bits are to be stored.
When a word is read out, the code is used to detect and possibly
correct errors. By comparing a new K – bits code generated from M
with that fetched from the memory.
The result is on of three:
1- no error
2- An error is detected, but can be corrected.
3- An error is detected but can not be corrected
Cache
Small amount of fast memory- giving speed to available memory of large size
and less expensive types.
Sits between normal main memory and CPU
May be located on CPU chip or module
٤٧
Cache operation - overview
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of main memory is in each cache
slot
٤٨
Cache Read Operation – Flowchart
Cache Design
Size
Mapping Function
Replacement Algorithm
Write Policy
Block Size
Number of Caches
٤٩
Size does matter
Cost
More cache is expensive
Speed
More cache is faster (up to a point)
Checking cache for data takes time
٥٠
Comparison of Cache Sizes
Mapping Function
Cache of 64 k Bytes
Cache block of 4 bytes
i.e. cache is 16k (214) lines of 4 bytes
16 M Bytes main memory
24 bit address
(224=16M)
Direct Mapping
Each block of main memory maps to only one cache line
i.e. if a block is in cache, it must be in one specific place
Address is in two parts
Least Significant w bits identify unique word
Most Significant s bits specify one memory block
The MSBs are split into a cache line field r and a tag of s-r (most significant)
٥١
Direct Mapping Address Structure
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag field
Check contents of cache by finding line and checking Tag
٥٢
Direct Mapping Example
٥٣
Associative Mapping
• A main memory block can load into any line of cache
• Memory address is interpreted as tag and word
• Tag uniquely identifies block of memory
• Every line’s tag is examined for a match
• Cache searching gets expensive
٥٤
Associative Mapping Example
٥٥
Set Associative Mapping
• Cache is divided into a number of sets
• Each set contains a number of lines
• A given block maps to any line in a given set
— e.g. Block B can be in any line of set i
• e.g. 2 lines per set
— 2 way associative mapping
— A given block can be in one of 2 lines in only one set
٥٦
Replacement Algorithms
Direct mapping
• No choice
• Each block only maps to one line
• Replace that line
Write Policy
Write through
• All writes go to main memory as well as cache
• Multiple CPUs can monitor main memory traffic to keep local (to
CPU) cache up to date
• Lots of traffic
• Slows down writes
Write back
• Updates initially made in cache only
• Update bit for cache slot is set when update occurs
• If block is to be replaced, write to main memory only if update bit
is set
• Other caches get out of sync
• I/O must access main memory through cache
٥٧
Pentium 4 Cache
• 80386 – no on chip cache
• 80486 – 8k using 16 byte lines and four way set associative
organization
• Pentium (all versions) – two on chip L1 caches
— Data & instructions
• Pentium III – L3 cache added off chip
• Pentium 4
— L1 caches
– 8k bytes
– 64 byte lines
– four way set associative
— L2 cache
– Feeding both L1 caches
– 256k
– 128 byte lines
– 8 way set associative
— L3 cache on chip
٥٨
Pentium 4 Block Diagram
٥٩
Chapter 6
External Memory
Magnetic Disk
Metal or plastic disk coated with magnetizable material (iron oxide…rust)
Range of packaging
Floppy
Winchester hard disk
Removable hard disk
٦٠
Disk Data Layout
٦١
Disk Velocity
• Bit near centre of rotating disk passes fixed point slower than bit on outside of
disk
• Increase spacing between bits in different tracks
• Rotate disk at constant angular velocity (CAV)
— Gives pie shaped sectors and concentric tracks
— Individual tracks and sectors addressable
— Move head to given track and wait for given sector
— Waste of space on outer tracks
– Lower data density
• Can use zones to increase capacity
— Each zone has fixed bits per track
— More complex circuitry
٦٢
Fixed/Movable Head Disk
Fixed head
One read write head per track
Heads mounted on fixed ridged arm
Movable head
One read write head per side
Mounted on a movable arm
Removable or Not
Removable disk
Can be removed from drive and replaced with another disk
Provides unlimited storage capacity
Easy data transfer between systems
Non removable disk
Permanently mounted in the drive
Multiple Platter
• One head per side
• Heads are joined and aligned
• Aligned tracks on each platter form cylinders
• Data is striped by cylinder
— reduces head movement
— Increases speed (transfer rate)
٦٣
٦٤
Tracks and Cylinders
Floppy Disk
8”, 5.25”, 3.5”
Small capacity
Up to 1.44Mbyte (2.88M never popular)
Slow
Universal
Cheap
٦٥
Removable Hard Disk
ZIP
Cheap
Very common
Only 100M
JAZ
Not cheap
1G
L-120 (a: drive)
Also reads 3.5” floppy
Becoming more popular?
Finding Sectors
Must be able to identify start of track and sector
Format disk
Additional information not available to user
Marks tracks and sectors
Characteristics
Fixed (rare) or movable head
Removable or fixed
Single or double (usually) sided
Single or multiple platter
Head mechanism
Contact (Floppy)
Fixed gap
Flying (Winchester)
Speed
Seek time
Moving head to correct track
(Rotational) latency
Waiting for data to rotate under head
Access time = Seek + Latency
Transfer rate
٦٦
Timing of Disk I/O Transfer
RAID
Redundant Array of Independent Disks
Redundant Array of Inexpensive Disks
6 levels in common use
Not a hierarchy
Set of physical disks viewed as single logical drive by O/S
Data distributed across physical drives
Can use redundant capacity to store parity information
RAID 0
No redundancy
Data striped across all disks
Round Robin striping
Increase speed
Multiple data requests probably not on same disk
Disks seek in parallel
A set of data is likely to be striped across multiple disks
RAID 1
Mirrored Disks
Data is striped across disks
2 copies of each stripe on separate disks
Read from either
Write to both
Recovery is simple
Swap faulty disk & re-mirror
No down time
Expensive
٦٧
RAID 2
Disks are synchronized
Very small stripes
Often single byte/word
Error correction calculated across corresponding bits on disks
Multiple parity disks store Hamming code error correction in corresponding
positions
Lots of redundancy
Expensive
Not used
RAID 3
Similar to RAID 2
Only one redundant disk, no matter how large the array
Simple parity bit for each set of corresponding bits
Data on failed drive can be reconstructed from surviving data and parity info
Very high transfer rates
RAID 4
Each disk operates independently
Good for high I/O request rate
Large stripes
Bit by bit parity calculated across stripes on each disk
Parity stored on parity disk
RAID 5
Like RAID 4
Parity striped across all disks
Round robin allocation for parity stripe
Avoids RAID 4 bottleneck at parity disk
Commonly used in network servers
RAID 6
• Two parity calculations
• Stored in separate blocks on different disks
• User requirement of N disks needs N+2
• High data availability
— Three disks need to fail for data loss
— Significant write penalty
٦٨
RAID 0, 1, 2
٦٩
RAID 3 & 4
٧٠
RAID 5 & 6
٧١
CD-ROM Drive Speeds
Audio is single speed
Constant linier velocity
1.2 ms-1
Track (spiral) is 5.27km long
Gives 4391 seconds = 73.2 minutes
Other speeds are quoted as multiples
e.g. 24x
The quoted figure is the maximum the drive can achieve
٧٢
CD-ROM Format
٧٣
DVD - technology
Multi-layer
Very high capacity (4.7G per layer)
Full length movie on single disk
Using MPEG compression
Finally standardized (honest!)
Movies carry regional coding
Players only play correct region films
Can be “fixed”
CD and DVD
٧٤
Magnetic Tape
Serial access
Slow
Very cheap
Backup and archive
٧٥
Chapter 7
Input/Output
Input/Output Problems
Wide variety of peripherals
Delivering different amounts of data
At different speeds
In different formats
All slower than CPU and RAM
Need I/O modules
Input/Output Module
Interface to CPU and Memory through system bus
Interface to one or more peripherals
Contains logic for performing communication function between peripherals
and bus system, why?
1-not possible to have various logic in the processor for various peripherals
2- different data transfer rate – not matching
3- different data formats at peripherals.
٧٦
GENERIC MODEL OF I/O
External Devices
An external device attached to the I/O module would be considered as shown
diagrammatically:
٧٧
Control signals: Read , Write
Data: bits to send or receive
from I/O module
Status: Ready, not Ready
Control logic: Controls device
operation
Transducer: converts data
from / to electrical to other
forms of energy
Buffer: for temporary storage
(size 8 to 16 bits)
External Devices
Human readable: suitable for communication with computer users
Screen, printer, keyboard
Machine readable: suitable for communication with equipments
Monitoring and control, disk, tape, CD, sensor
Communication: suitable for communication with remote devices.
Modem, Network Interface Card (NIC)
٧٨
I/O Module Function (continue)
Data Buffering: This due to different data rates for different device
Error Detection: I/O module reports the electrical and mechanical failures,
such as paper jam, bad disk track, parity control, .. Etc.
I/O Steps
CPU checks I/O module device status
I/O module returns status
If ready, CPU requests data transfer
I/O module gets data from device
I/O module transfers data to CPU
Variations for output, DMA, etc.
٧٩
Data transfer to / from the module are buffered in one or more data registers.
Also one or more status registers (which could also function as control
register)
I/O module contains logic specific to the interface with each device that it
controls.
٨٠
Programmed I/O
CPU has direct control over I/O
Sensing status
Read/write commands
Transferring data
CPU waits for I/O module to complete operation
Wastes CPU time
٨١
CPU checks status bits periodically
I/O module does not inform CPU directly
I/O module does not interrupt CPU
CPU may wait or come back later
I/O Commands
CPU issues address
Identifies module (& device if >1 per module)
CPU issues command
Control - telling module what to do
⌧ e.g. spin up disk
Test - check status
⌧ e.g. power? Error?
Read/Write
⌧ Module transfers data via buffer from/to device
I/O Mapping
Memory mapped I/O
Devices and memory share an address space
I/O looks just like memory read/write
No special commands for I/O
⌧ Large selection of memory access commands available
Isolated I/O
Separate address spaces
Need I/O or memory select lines
Special commands for I/O
⌧ Limited set
٨٢
Interrupt Driven I/O Basic Operation
CPU issues read command
I/O module gets data from peripheral whilst CPU does other work
I/O module interrupts CPU
CPU requests data
I/O module transfers data
٨٣
CPU Viewpoint
Issue read command
Do other work
Check for interrupt at end of each instruction cycle
If interrupted:-
Save context (registers)
Process interrupt
⌧ Fetch data & store
٨٤
Typical DMA Module Diagram
DMA Operation
• CPU tells DMA controller:-
— Read/Write
— Device address
— Starting address of memory block for data
— Amount of data to be transferred
• CPU carries on with other work
• DMA controller deals with transfer
• DMA controller sends interrupt when finished
٨٥
DMA and Interrupt Breakpoints During an Instruction Cycle
٨٦
DMA Configurations
(a)
(c)
٨٧
Chapter 11
Instruction Sets:
Addressing Modes and Formats
Addressing Modes
• Immediate
• Direct
• Indirect
• Register
• Register Indirect
• Displacement (Indexed)
• Stack
Immediate Addressing
• Operand is part of instruction
• Operand = address field
• e.g. ADD 5
— Add 5 to contents of accumulator
— 5 is operand
• No memory reference to fetch data
• Fast
• Limited range
٨٨
Direct Addressing
• Address field contains address of operand
• Effective address (EA) = address field (A)
• e.g. ADD A
— Add contents of cell A to accumulator
— Look in memory at address A for operand
• Single memory reference to access data
• No additional calculations to work out effective address
• Limited address space
Indirect Addressing
• Memory cell pointed to by address field contains the address of (pointer to)
the operand
• EA = (A)
— Look in A, find address (A) and look there for operand
• e.g. ADD (A)
— Add contents of cell pointed to by contents of A to accumulator
• Large address space
• 2n where n = word length
• May be nested, multilevel, cascaded
— e.g. EA = (((A)))
– Draw the diagram yourself
• Multiple memory accesses to find operand
• Hence slower
٨٩
Indirect Addressing Diagram
Register Addressing
• Operand is held in register named in address filed
• EA = R
• Limited number of registers
• Very small address field needed
— Shorter instructions
— Faster instruction fetch
• No memory access
• Very fast execution
• Very limited address space
• Multiple registers helps performance
— Requires good assembly programming or compiler writing
— N.B. C programming
• register int a;
٩٠
Register Indirect Addressing
• EA = (R)
• Operand is in memory cell pointed to by contents of register R
• Large address space (2n)
• One fewer memory access than indirect addressing
Displacement Addressing
• EA = A + (R)
• Address field hold two values
— A = base value
— R = register that holds displacement
— or vice versa
٩١
Relative Addressing
• A version of displacement addressing
• R = Program counter, PC
• EA = A + (PC)
• i.e. get operand from A cells from current location pointed to by PC
Base-Register Addressing
• A holds displacement
• R holds pointer to base address
• R may be explicit or implicit
• e.g. segment registers in 80x86
Indexed Addressing
• A = base
• R = displacement
• EA = A + R
• Good for accessing arrays
— EA = A + R
— R++
Combinations
• Post index
• EA = (A) + (R)
• Pre index
• EA = (A+(R))
Stack Addressing
• Operand is (implicitly) on top of stack
• e.g.
— ADD Pop top two items from stack and add
٩٢
Pentium Addressing Modes
• Virtual or effective address is offset into segment
— Starting address plus offset gives linear address
— This goes through page translation if paging enabled
• 12 addressing modes available
— Immediate
— Register operand
— Displacement
— Base
— Base with displacement
— Scaled index with displacement
— Base with index and displacement
— Base scaled index with displacement
— Relative
٩٣
Instruction Formats
• Layout of bits in an instruction
• Includes opcode
• Includes (implicit or explicit) operand(s)
• Usually more than one instruction format in an instruction set
Instruction Length
• Affected by and affects:
— Memory size
— Memory organization
— Bus structure
— CPU complexity
— CPU speed
• Trade off between powerful instruction repertoire and saving space
Allocation of Bits
• Number of addressing modes
• Number of operands
• Register versus memory
• Number of register sets
• Address range
• Address granularity
٩٤
Chapter 13
• Key features
— Large number of general purpose registers
— or use of compiler technology to optimize register use
— Limited and simple instruction set
— Emphasis on optimising the instruction pipeline
RISC Characteristics
• One instruction per cycle
• Register to register operations
• Few, simple addressing modes
• Few, simple instruction formats
• Hardwired design (no microcode)
• Fixed instruction format
• More compile time/effort
٩٥
Comparison of processors
٩٦
Chapter 16
Micro-Operations
• A computer executes a program
• Fetch/execute cycle
• Each cycle has a number of steps
— see pipelining
• Called micro-operations
• Each step does very little
• Atomic operation of CPU
Fetch - 4 Registers
• Memory Address Register (MAR)
— Connected to address bus
— Specifies address for read or write op
• Memory Buffer Register (MBR)
— Connected to data bus
— Holds data to write or last data read
• Program Counter (PC)
— Holds address of next instruction to be fetched
• Instruction Register (IR)
— Holds last instruction fetched
٩٧
Fetch Sequence (symbolic)
• t1: MAR <- (PC)
• t2: MBR <- (memory)
• PC <- (PC) +1
• t3: IR <- (MBR)
Or
Indirect Cycle
• MAR <- (IRaddress) - address field of IR
• MBR <- (memory)
• IRaddress <- (MBRaddress)
Interrupt Cycle
• t1: MBR <-(PC)
• t2: MAR <- save-address
• PC <- routine-address
• t3: memory <- (MBR)
• This is a minimum
— May be additional micro-ops to get addresses
— N.B. saving context is done by interrupt handler routine, not micro-ops
٩٨
Instruction Cycle
• Each phase decomposed into sequence of elementary micro-operations
• E.g. fetch, indirect, and interrupt cycles
• Execute cycle
— One sequence of micro-operations for each opcode
• Need to tie sequences together
• Assume new 2-bit register
— Instruction cycle code (ICC) designates which part of cycle processor
is in
– 00: Fetch
– 01: Indirect
– 10: Execute
– 11: Interrupt
٩٩
Types of Micro-operation
• Transfer data between registers
• Transfer data from register to external
• Transfer data from external to register
• Perform arithmetic or logical operations
Control Signals
• Clock
— One micro-instruction (or set of parallel micro-instructions) per clock
cycle
• Instruction register
— Op-code for current instruction
— Determines which micro-instructions are performed
• Flags
— State of CPU
— Results of previous operations
• From control bus
— Interrupts
— Acknowledgements
١٠٠
Control Signals - output
• Within CPU
— Cause data movement
— Activate specific functions
• Via control bus
— To memory
— To I/O modules
Hardwired Implementation
• Instruction register
— Op-code causes different control signals for each different instruction
— Unique logic for each op-code
— Decoder takes encoded input and produces single output
— n binary inputs and 2n outputs
١٠١
Problems With Hard Wired Designs
• Complex sequencing & micro-operation logic
• Difficult to design and test
• Inflexible design
• Difficult to add new instructions
١٠٢
Chapter 17
Micro-programmed Control
Control Unit Organization
Micro-programmed Control
• Use sequences of instructions (see earlier notes) to control complex operations
• Called micro-programming or firmware
١٠٣
Chapter 18
Parallel Processing
١٠٤
Taxonomy of Parallel Processor Architectures
١٠٥
Tutorial and Sample Exams
300 1940
301 5941
302 3940
303 2941
.. ..
940 0010
941 0021
What are the contents of AC, IR, PC and M[941] after executing? Assume that
the initial value of PC is 300 and the following operation codes (op codes) are
used:
١٠٦
19. Give an example to compare the program flow of control with and without
interrupt. Show also the program timing.
20. What is the ISR?
21. Explain how multiple interrupts can be handled?
22. Give some examples of control signals. What is the purpose of bus granting
and reset signals?
23. What is the function of the address bus? What is the maximum memory
capacity for 20 bit address lines?
24. Explain the bus operations used in sending and receiving data between
computer modules.
25. What are the problems associated with a single bus structure? What are the
solutions?
26. Give examples for I/O devices that might be attached to the expansion bus (in
traditional bus architecture).
Answer: LAN, SCSI, Serial (printer and scanner), modem.
27. What is the advantage of mazzazine architecture? Draw the arrangement.
Answer: The advantage is that high speed bus brings high demand devices into
closer integration with the processor
28. What are the bus types? What is the advantage of physical dedication? What
are the disadvantages of time multiplexing?
29. What is the purpose of bus arbitration? How is it classified? Compare.
30. What does the term bus timing refer to? Classify the use of bus in terms of
timing. Show the timing diagram for read operation. Draw also the timing of
asynchronous read.
31. Which is better, the synchronous bus or asynchronous?
32. (T/F) The wider the data bus, the greater the number of bits transferred at one
time.
33. (T/F) The width of address bus has an impact on system capacity.
34. (T/F) The maximum memory space that can be addressed with 27 bits address
bus is
35. (T/F) the data bus of 32 bit corresponds to 32-bit MAR.
Q1: What is the hit rate required to give an average access time of 20ns in a two-level
memory system if the access time of the top level is 10ns and that of the bottom level
is 60ns.[4 Marks]
Q3: A digital computer has a main memory size of 64K words and a cache that can
hold 2048 word from main memory. Each word of main memory is an 11-bit size.
Compute the tag field and cache size (in bits) when direct mapping is used. [6 Marks]
١٠٧
Q4: Encircle the correct answer for 7 of the following: [7 Marks]
1. The EPROM
a) Can be erased by UV
b) Can be used once
c) Is a volatile memory
d) None of the above
2. The internal memory includes _______
a) CPU
b) Main memory only
c) Main memory, control memory and caches
d) tapes
3. The external storage can be accessed directly by_______
a) CPU
b) I/O controller
c) user
d) system bus
4. The cost per bit is ________ in magnetic memories as compared to main
memory.
a) Increased
b) Decreased
c) The same
5. The data access method in semiconductor memories is_________
a) Direct
b) Sequential
c) Random
d) All the above
6. The static RAM has________
a) Bits stored as charges
b) Larger size per a bit compared to DRAM
c) A refreshment circuit
d) Slower performance compared to DRAM
7. The disadvantage of associative mapping is that_______
a) It needs long time to decode the address
b) It requires a complex comparison circuit
c) It is a non-volatile memory
8. The DRAM is _______
a) nonvolatile
b) suitable for main memory
c) suitable for use as cache
d) is the fastest in the memory hierarchy
Q5: Explain two replacement algorithms used for cache block replacement. [4 Marks]
١٠٨
Q1: Encircle the correct answer: [6 Marks]
1. Which of the following is true:
a) Random access method is used in CDs and DVDs
b) Random access method is used in all cache systems.
c) In associative access method, the access time is independent of location or
previous access.
2. In magnetic disks, the time required to position the moving head at the track is
called:
a) Seek time
b) Rotational time
c) Execute time
d) Cycle time
3. The transducer is used in external peripherals to:
a) Hold data temporary
b) Control the data transfer
c) Interrupt CPU
d) Convert data to/from electrical form.
4. Reading an instruction from memory is called:
a) execute
b) Direct
c) Fetch
d) Implied
5. The ______ the bus for data, the greater the number of bits transferred for one
time.
a) Slower
b) Wider
c) Simpler
d) Cheaper
6. In _________ bus, few lines are used but with complex control circuit for each
module.
a) Synchronized
b) Centralized
c) Multiplexed
d) dedicated
Q2: Given the following instructions and data are stored (hex) in memory: [7 Marks]
320 1941
321 4940
322 3942
323 5941
324 2942
.. ..
940 0010
941 0022
942 0008
What are the contents of AC, IR, PC, and M [942] after executing? Assume
that the initial value of PC is 320 and the following operation codes (op codes
in binary) are used:
0001 Load AC from memory
0010 Store AC into memory
0011 Add AC to memory word and put sum into AC
١٠٩
0100 Subtract memory word from AC and put into AC
0101 Shift AC right and put into memory
Q3: A digital computer has a main memory size of 4M words and a cache that can
hold 32K word from main memory. Each block in cache is of 16 words and each
word of main memory is a 16-bit size. Compute the tag field, number of blocks, cache
word (line size) and cache size (in bits) when direct mapping is used. [6 Marks]
Q4: Compute the time required to transfer 2 K bits in a rate of 256 bit per second
from memory system with average access time of 2 seconds. [3 Marks]
Q5: If the average access time in a two level memory system=10ns, what is the hit
ratio for a cache of access time=20ns if the access time of RAM=60ns. [3 Marks]
Q6: Explain the steps of DMA process. How does DMA differ from interrupt? [6
Marks]
Q7: Complete the table for each of the addressing modes if ADD instruction is
executed. Assume initially the AC=02, PC=50, R=85, Index register=30. [6 Marks]
Addressing mode Effective Address Contents of AC
Direct
Immediate
Relative
Register
Register indirect
Indexed
Q8: Explain the disk layouts in CAV and multiple zoned recording. How the data is
recorded form CDs. [5 Marks]
١١٠