Architecture Finals Complete Reviewer

REVIEW FOR ARCHITECTURE AND ORGANIZATONS ICT 118
CHAPTER 1
Architecture is those attributes visible to the programmer Operations
(2) Storage
— Instruction set, number of bits used for data
representation, I/O mechanisms, addressing
techniques.
— e.g. Is there a multiply instruction?
— All Intel x86 family share the same basic

architecture
— The IBM System/370 family share the same basic

architecture Operation
(3) Processing from/to storage
Organization is how features are implemented
— Control signals, interfaces, memory technology.
— e.g. Is there a hardware multiply unit or is it done

by repeated addition?
— Organization differs between different version
Structure is the way in which components relate to each Structure – Top Level
other Four Components:
1.Main Memory
Function is the operation of individual components as part 2.Central Processing Unit
of the structure 3.System Interconnection
4.Input / Output
• All computer functions are:
— Data processing Structure - The CPU
— Data storage 1.Registers
— Data movement 2.Arithmetic and Logic Unit (ALU)
— Data Control 3.Internal CPU Interconnection
4.Control Unit (CU)
Functional view Structure - The Control Unit (CU)

1.Sequencing Login
2.Control Unit Registers and Decoders
3.Control Memory
Operations
(1) Data movement
CHAPTER 2
ENIAC – background Structure of IAS – detail
• Electronic Numerical Integrator And Computer (ENIAC)
• Eckert and Mauchly
• University of Pennsylvania
• Trajectory tables for weapons
• Started 1943 • Finished 1946 —Too late for war effort
• Used until 1955
ENIAC - details
• Decimal (not binary)
• 20 accumulators of 10 digits
• Programmed manually by switches
• 18,000 vacuum tubes
• 30 tons
• 15,000 square feet
• 140 kW power consumption
• 5,000 additions per second
von Neumann/Turing Commercial Computers

• Stored Program concept • 1947 - Eckert-Mauchly Computer Corporation
• Main memory storing programs and data • UNIVAC I (Universal Automatic Computer)
• ALU operating on binary data • US Bureau of Census 1950 calculations
• Control unit interpreting instructions from memory and • Became part of Sperry-Rand Corporation
executing • Late 1950s - UNIVAC II
• Input and output equipment operated by control unit —Faster
• Princeton Institute for Advanced Studies —IAS —More memory
• Completed 1952
IBM
Structure of von Neumann machine • Punched-card processing equipment
• 1953 - the 701
—IBM’s first stored program computer
—Scientific calculations
• 1955 - the 702
—Business applications
• Lead to 700/7000 series
Transistors
• Replaced vacuum tubes
• Smaller
• Cheaper
• Less heat dissipation
• Solid State device
• Made from Silicon (Sand)
• Invented 1947 at Bell Labs
• William Shockley et al.
IAS (Institute for Advance Studies) – details Transistor Based Computers
• 1000 x 40 bit words • Second generation machines
—Binary number • NCR & RCA produced small transistor machines
—2 x 20 bit instructions • IBM 7000
• DEC - 1957
• Set of registers (storage in CPU) —Produced PDP-1
1.Memory Buffer Register(MBR) – contains a word and stores Microelectronics

data of input and output which to fetch data and pass to MAR • Literally - “small electronics”
2.Memory Address Register (MAR) – specifies address in the • A computer is made up of gates, memory cells
memory and interconnections
3.Instruction Register (IR) – decode either a 8 bit or 32 bit • These can be manufactured on a semiconductor
depends on the capacity that will send address to MAR • e.g. silicon wafer
4.Instruction Buffer Register (IBR) – hold temporary output or
input of data
5.Program Counter (PC) – determine what the next execution
of MAR
6.Accumulator (AC) - the one who process inside the CPU
7.Multiplier Quotient (MQ) – determine to help arithmetic
operations and the one also to process inside the CPU
Generations of Computer Semiconductor Memory
• Vacuum tube - 1946-1957 • 1970
• Transistor- 1958-1964 • Fairchild
• Small scale integration - 1965 on • Size of a single core —i.e. 1 bit of magnetic core
—Up to 100 devices on a chip storage
• Medium scale integration - to 1971 • Holds 256 bits
—100-3,000 devices on a chip • Non-destructive read
• Large scale integration - 1971-1977 • Much faster than core
—3,000 - 100,000 devices on a chip • Capacity approximately doubles each year
• Very large scale integration - 1978 to date
—100,000 - 100,000,000 devices on a chip
• Ultra large scale integration
—Over 100,000,000 devices on a chip
Moore’s Law Intel
• Increased density of components on chip • 1971 - 4004
• Gordon Moore - cofounder of Intel —First microprocessor
• Number of transistors on a chip will double every year —All CPU components on a single chip
• Since 1970’s development has slowed a little —4 bit
— Number of transistors doubles every 18 months • Followed in 1972 by 8008
• Cost of a chip has remained almost unchanged —8 bit
• Higher packing density means shorter electrical paths, giving —Both designed for specific applications
higher performance • 1974 - 8080
• Smaller size gives increased flexibility —Intel’s first general purpose microprocessor
• Reduced power and cooling requirements
• Fewer interconnections increases reliability
Growth in CPU Transistor Count Speeding it up
• Pipelining
• On board cache
• On board L1 & L2 cache
• Branch prediction - unconditional continue
execution while conditional you need to satisfy the
execution
• Data flow analysis
• Speculative execution
Performance Mismatch
• Processor speed increased
• Memory capacity increased
• Memory speed lags behind processor speed
IBM 360 series DRAM(Dynamic Random Access Memory) and

• 1964 Processor Characteristics
• Replaced (& not compatible with) 7000 series
• First planned “family” of computers
—Similar or identical instruction sets
—Similar or identical O/S
—Increasing speed
—Increasing number of I/O ports (i.e. more terminals)
—Increased memory size
—Increased cost
• Multiplexed switch structure
DEC PDP-8 Trends in DRAM use
• 1964
• First minicomputer (after miniskirt!)
• Did not need air conditioned room
• Small enough to sit on a lab bench
• $16,000
—$100k+ for IBM 360
• Embedded applications & OEM
• BUS STRUCTURE
DEC - PDP-8 Bus Structure

Solutions ACRONYMS
• Increase number of bits retrieved at one time
—Make DRAM “wider” rather than “deeper”
1.Memory Buffer Register(MBR)
• Change DRAM interface
2.Memory Address Register (MAR)
—Cache
3.Instruction Register (IR)
• Reduce frequency of memory access
4.Instruction Buffer Register (IBR)
—More complex cache and cache on chip
5.Program Counter (PC)
• Increase interconnection bandwidth
6.Accumulator (AC)
—High speed buses
7.Multiplier Quotient (MQ)
—Hierarchy of buses
8.Arithmetic Logic Unit (ALU)
9.Program Control Unit (PCU)
10. Dynamic Random Access Memory (DRAM)
Pentium Evolution (1)
11.International Business Machine (IBM)
• 4004
12.Control Unit (CU)
• 8080
13.Architecture and Construction Management
— first general purpose microprocessor
(ACM)
— 8 bit data path
14. Institute of Electrical and Electronics Engineer
— Used in first personal computer
(IEEE)
– Altair
15.Central Processing Unit (CPU)
• 8086
16. Computer Organization and Architecture (COA)
— much more powerful
17.Computer Evolution and Performance (CEP)
— 16 bit
18.Electronic Numerical Integrator And Computer
— instruction cache, prefetch few instructions
(ENIAC)
— 8088 (8 bit external bus) used in first IBM PC
19. Institute for Advanced Studies (IAS)
• 80286
20. Arithmetic Logic Circuits (ALC)
— 16 Mbyte memory addressable
21.Control Signals (CS)
— up from 1Mb
22.Universal Automatic Computer (UNIVAC)
• 80386
23. Moore’s Law (ML)
— 32 bit
24. Transistors Per Chip (TPC)
— Support for multitasking
25. Original Equipment Manufacturer (OEM)
26. Digital Equipment Corporation (DEC)
27. Charles Babbage Institute (CBI)
28. Intel Developer Home (IDH)

• 80486
—sophisticated powerful cache and instruction pipelining
—built in maths co-processor
• Pentium
—Superscalar
—Multiple instructions executed in parallel
• Pentium Pro
—Increased superscalar organization
—Aggressive register renaming
—branch prediction
—data flow analysis
—speculative execution
• Pentium II
—MMX technology
—graphics, video & audio processing
• Pentium III
—Additional floating point instructions for 3D graphics
• Pentium 4
—Note Arabic rather than Roman numerals
—Further floating point and multimedia enhancements
• Itanium
—64 bit
—see chapter 15
• See Intel web pages for detailed information on processors
CHAPTER 3: SYSTEM BUSES
What is a program? Example of Program Execution
• A sequence of steps
• For each step, an arithmetic or logical operation is
done • For each operation, a different set of control
signals is needed
Function of Control Unit
• For each operation a unique code is provided —e.g.
ADD, MOVE
• A hardware segment accepts the code and issues
the control signals
• We have a computer!
Components
• The Control Unit and the Arithmetic and Logic Unit
constitute the Central Processing Unit
• Data and instructions need to get into the system
and results out —Input/output
• Temporary storage of code and results is needed —
Main memory
Computer Components: Top Level View Instruction Cycle - State Diagram
Instruction Cycle Interrupts

• Two steps: • Mechanism by which other modules (e.g. I/O) may
—Fetch interrupt normal sequence of processing
—Execute • Program
— e.g. overflow, division by zero
• Timer
— Generated by internal processor timer
— Used in pre-emptive multi-tasking
• I/O
— from I/O controller
• Hardware failure
— e.g. memory parity error
Fetch Cycle Program Flow Control
• Program Counter (PC) holds address of next
instruction to fetch
• Processor fetches instruction from memory location
pointed to by PC
• Increment PC
—Unless told otherwise
• Instruction loaded into Instruction Register (IR)
• Processor interprets instruction and performs
required actions
Execute Cycle Interrupt Cycle

• Processor-memory • Added to instruction cycle
—data transfer between CPU and main memory • Processor checks for interrupt
• Processor I/O — Indicated by an interrupt signal
—Data transfer between CPU and I/O module • If no interrupt, fetch next instruction
• Data processing • If interrupt pending:
—Some arithmetic or logical operation on data — Suspend execution of current program
• Control — Save context
—Alteration of sequence of operations — Set PC to start address of interrupt handler routine
—e.g. jump — Process interrupt
• Combination of above — Restore context and continue interrupted program
Transfer of Control via Interrupts Multiple Interrupts
• Disable interrupts
—Processor will ignore further interrupts whilst processing
one interrupt
—Interrupts remain pending and are checked after first
interrupt has been processed
—Interrupts handled in sequence as they occur
• Define priorities
—Low priority interrupts can be interrupted by higher
priority interrupts
—When higher priority interrupt has been processed,
processor returns to previous interrupt
Instruction Cycle with Interrupts Multiple Interrupts – Sequential
Program Timing Multiple Interrupts – Nested

Short I/O Wait
Program Timing Time Sequence of Multiple Interrupts

Long I/O Wait
Instruction Cycle (with Interrupts) - State Diagram Connecting

• All the units must be connected
• Different type of connection for different type of unit
—Memory
—Input/Output
—CPU
Computer Modules
Control Bus
• Control and timing information
—Memory read/write signal
—Interrupt request
—Clock signals
Bus Interconnection Scheme
Memory Connection Big and Yellow?

• Receives and sends data • What do buses look like?
• Receives addresses (of locations) —Parallel lines on circuit boards
• Receives control signals —Ribbon cables
—Read —Strip connectors on mother boards
—Write – e.g. PCI
—Timing —Sets of wires
Input/Output Connection(1) Single Bus Problems
• Similar to memory from computer’s viewpoint • Lots of devices on one bus leads to:
• Output —Propagation delays
—Receive data from computer – Long data paths mean that co-ordination of bus use can
—Send data to peripheral adversely affect performance
• Input – If aggregate data transfer approaches bus capacity
—Receive data from peripheral • Most systems use multiple buses to overcome these
—Send data to computer problems
Input/Output Connection(2) Traditional (ISA) (with cache)
• Receive control signals from computer •
Send control signals to peripherals
—e.g. spin disk
• Receive addresses from computer
—e.g. port number to identify peripheral
• Send interrupt signals (control)
CPU Connection
• Reads instruction and data
• Writes out data (after processing)
• Sends control signals to other units
• Receives (& acts on) interrupts
Buses
• There are a number of possible interconnection
systems
• Single and multiple BUS structures are most
common
• e.g. Control/Address/Data bus (PC)
• e.g. Unibus (DEC-PDP)
What is a Bus? High Performance Bus
• A communication pathway connecting two or more
devices
• Usually broadcast
• Often grouped
—A number of channels in one bus
—e.g. 32 bit data bus is 32 separate single bit
channels
• Power lines may not be shown
Data Bus
• Carries data
—Remember that there is no difference between
“data” and “instruction” at this level
• Width is a key determinant of performance
—8, 16, 32, 64 bit
Address bus Bus Types
• Identify the source or destination of data • Dedicated —Separate data & address lines
• e.g. CPU needs to read an instruction (data) from a • Multiplexed
given location in memory —Shared lines
• Bus width determines maximum memory capacity of —Address valid or data valid control line
system —Advantage - fewer lines —Disadvantages – More complex
control – Ultimate performance
—e.g. 8080 has 16 bit address bus giving 64k address
space
Bus Arbitration PCI Bus Lines (Optional)
• More than one module controlling the bus • Interrupt lines
• e.g. CPU and DMA controller —Not shared
• Only one module may control bus at one time • Cache support
• Arbitration may be centralised or distributed • 64-bit Bus Extension
—Additional 32 lines
—Time multiplexed
—2 lines to enable devices to agree to use 64-bit transfer
• JTAG/Boundary Scan —For testing procedures
Centralised Arbitration PCI Commands
• Single hardware device controlling bus access • Transaction between initiator (master) and target
—Bus Controller • Master claims bus
—Arbiter • Determine type of transaction
• May be part of CPU or separate —e.g. I/O read/write
• Address phase
• One or more data phases
Distributed Arbitration
• Each module may claim the bus
• Control logic on all modules
Timing PCI Read Timing Diagram
• Co-ordination of events on bus
• Synchronous
—Events determined by clock signals
—Control Bus includes clock line
—A single 1-0 is a bus cycle
—All devices can read clock line
—Usually sync on leading edge
—Usually a single cycle for an event
Synchronous Timing Diagram PCI Bus Arbitration
Asynchronous Timing – Read Diagram
PCI Bus
• Peripheral Component Interconnection
• Intel released to public domain
• 32 or 64 bit
• 50 lines
PCI Bus Lines (required)

• Systems lines
—Including clock and reset
• Address & Data
—32 time mux lines for address/data
—Interrupt & validate lines
• Interface Control
• Arbitration
—Not shared
—Direct connection to PCI bus arbiter
• Error lines
Chapter 4: Cache Memory

Characteristics Access Methods (2)
• Location • Random
• Capacity
— Individual addresses identify
• Unit of transfer
• Access method locations exactly
• Performance — Access time is independent of
• Physical type location or previous access
• Physical characteristics — e.g. RAM
• Organisation • Associative
— Data is located by a
Location
comparison with contents of a
CPU
portion of the store
Internal
— Access time is independent of
External
location or previous access
— e.g. cache
Capacity Memory Hierarchy

• Word size • Registers
— The natural unit of organisation — In CPU
• Number of words • Internal or Main memory
— or Bytes — May include one or more
levels of cache
Unit of Transfer — “RAM”
• Internal • External memory
— Usually governed by data bus — Backing store
width
• External
— Usually a block which is much
larger than a word
• Addressable unit
— Smallest location which can be
uniquely addressed
— Word internally
— Cluster on M$ disks
Access Methods (1) Memory Hierarchy – Diagram

• Sequential
— Start at the beginning and read
through in order
— Access time depends on location
of data and previous location
— e.g. tape
—
• Direct
— Individual blocks have unique
address
— Access is by jumping to vicinity
plus sequential search
— Access time depends on location
and previous location
— e.g. disk
Performance
• Access time
— Time between presenting the
address and getting the valid
data
• Memory Cycle time
— Time may be required for the
memory to “recover” before
next access
— Cycle time is access + recovery
• Transfer Rate
— Rate at which data can be moved
Physical Types So you want fast?

• Semiconductor • It is possible to build a computer
— RAM which uses only static RAM (see
• Magnetic later)
— Disk & Tape • This would be very fast
• Optical • This would need no cache
— CD & DVD — How can you cache cache?
• Others • This would cost a very large amount
— Bubble
---- Hologram
Physical Characteristics Locality of Reference
• Decay • During the course of the execution
• Volatility of a program, memory references
• Erasable tend to cluster
• Power consumption • e.g. loops
Organisation Cache
• Physical arrangement of bits into • Small amount of fast memory
words • Sits between normal main memory
• Not always obvious and CPU
• e.g. interleaved • May be located on CPU chip or
module
The Bottom Line

• How much?
— Capacity
• How fast?
— Time is money
• How expensive?
Hierarchy List Cache operation – overview

• Registers • CPU requests contents of memory location
• Check cache for this data
• L1 Cache
• If present, get from cache (fast)
• L2 Cache • If not present, read required block from
• Main memory main memory to cache
• Disk cache • Then deliver from cache to CPU
• Disk • Cache includes tags to identify which block
• Optical of main memory is in each cache slot
• Tape
Cache Design
Direct Mapping
• Size
Address Structure
• Mapping Function • 24 bit address
• Replacement Algorithm • 2 bit word identifier (4 byte block)
• Write Policy • 22 bit block identifier
• Block Size — 8 bit tag (=22-14)
• Number of Caches — 14 bit slot or line
• No two blocks in the same line have the
same Tag field
• Check contents of cache by finding line and
checking Tag
Size does matter Direct Mapping

• Cost Cache Line Table
— More cache is expensive • Cache line Main Memory blocks
• Speed held
• 0 0, m, 2m,
— More cache is faster (up to a
3m…2s-m
point) • 1 1,m+1,
— Checking cache for data takes 2m+1…2s-m+1
time • m-1 m-1, 2m-1,3m-1…2s-1
Typical Cache Organization Direct Mapping Cache Organization
Mapping Function Direct Mapping Example

• Cache of 64kByte
• Cache block of 4 bytes
— i.e. cache is 16k (214) lines of 4
bytes
• 16MBytes main memory
• 24 bit address
— (224=16M)
Direct Mapping Direct Mapping Summary

• Each block of main memory maps to • Address length = (s + w) bits
only one cache line • Number of addressable units =
— i.e. if a block is in cache, it must s+w words or bytes
be in one specific place
2
• Address is in two parts • Block size = line size = 2w words
• Least Significant w bits identify unique or bytes
word • Number of blocks in main
• Most Significant s bits specify one memory = 2s+ w/2w = 2s
memory block • Number of lines in cache = m =
• The MSBs are split into a cache line 2r
field r and a tag of s-r (most significant)
• Size of tag = (s – r) bits
Direct Mapping pros & cons
• Simple Associative Mapping
• Inexpensive Address Structure
• Fixed location for given block • 22 bit tag stored with each 32 bit block of
— If a program accesses 2 blocks data
that map to the same line • Compare tag field with tag entry in cache
to check for hit
repeatedly, cache misses are
• Least significant 2 bits of address identify
very high which 16 bit word is required from 32 bit
data block
• e.g.
— Address Tag
Data Cache line
— FFFFFC FFFFFC 24682468
3FFF
Associative Mapping Associative Mapping Summary

• A main memory block can load into any • Address length = (s + w) bits
line of cache • Number of addressable units = 2s+w words
or bytes
• Memory address is interpreted as tag
• Block size = line size = 2w words or bytes
and word • Number of blocks in main memory = 2s+
• Tag uniquely identifies block of w/2w = 2s
memory • Number of lines in cache = undetermined

• Every line’s tag is examined for a match • Size of tag = s bits
• Cache searching gets expensive
Fully Associative Cache Organization Set Associative Mapping

• Cache is divided into a number of
sets
• Each set contains a number of lines
• A given block maps to any line in a
given set
— e.g. Block B can be in any line
of set i
• e.g. 2 lines per set
— 2 way associative mapping
— A given block can be in one of
2 lines in only one set
Associative Mapping Example Set Associative Mapping

Example
• 13 bit set number
• Block number in main memory is
modulo 213
• 000000, 00A000, 00B000, 00C000 …
map to same set
Two Way Set Associative Cache

Organization
Set Associative Mapping Write Policy
Address Structure • Must not overwrite a cache block
• Use set field to determine cache set to look in
unless main memory is up to date
• Compare tag field to see if we have a hit
• e.g • Multiple CPUs may have individual
— Address Tag Data caches
Set number • I/O may address main memory
— 1FF 7FFC 1FF 12345678 directly
1FFF
— 001 7FFC 001 11223344
1FFF
Two Way Set Associative Mapping Example Write through

• All writes go to main memory as well as
cache
• Multiple CPUs can monitor main memory
traffic to keep local (to CPU) cache up to
date
• Lots of traffic
• Slows down writes
• Remember bogus write through caches!
Set Associative Mapping Summary Write back

• Address length = (s + w) bits • Updates initially made in cache only
• Number of addressable units = 2s+w • Update bit for cache slot is set when
words or bytes update occurs
• Block size = line size = 2w words or • If block is to be replaced, write to
bytes main memory only if update bit is
• Number of blocks in main memory = 2d set
• Number of lines in set = k • Other caches get out of sync
• Number of sets = v = 2d • I/O must access main memory
• Number of lines in cache = kv = k * 2d through cache
• Size of tag = (s – d) bits • N.B. 15% of memory references are
writes
Replacement Algorithms (1)
Direct mapping Pentium 4 Cache
• No choice • 80386 – no on chip cache
• Each block only maps to one line • 80486 – 8k using 16 byte lines and
• Replace that line four way set associative organization
• Pentium (all versions) – two on chip
Replacement Algorithms (2) L1 caches
Associative & Set Associative — Data & instructions
• Hardware implemented algorithm (speed) • Pentium 4 – L1 caches
• Least Recently used (LRU)
— 8k bytes
• e.g. in 2 way set associative
— Which of the 2 block is lru? — 64 byte lines
• First in first out (FIFO) — four way set associative
— replace block that has been in cache • L2 cache
longest — Feeding both L1 caches
• Least frequently used — 256k
— replace block which has had fewest
— 128 byte lines
hits
• Random — 8 way set associative
Pentium 4 Diagram (Simplified) Power PC Cache Organization

• 601 – single 32kb 8 way set
associative
• 603 – 16kb (2 x 8kb) two way set
associative
• 604 – 32kb
• 610 – 64kb
• G3 & G4
— 64kb L1 cache
– 8 way set associative
— 256k, 512k or 1M L2 cache
– two way set associative
Pentium 4 Core Processor PowerPC G4

• Fetch/Decode Unit
— Fetches instructions from L2 cache
— Decode into micro-ops
— Store micro-ops in L1 cache
• Out of order execution logic
— Schedules micro-ops
— Based on data dependence and
resources
— May speculatively execute
• Execution units
— Execute micro-ops
— Data from L1 cache
— Results in registers
• Memory subsystem
---- L2 cache and systems bus
Pentium 4 Design Reasoning Comparison of Cache Sizes
• Decodes instructions into RISC like
micro-ops before L1 cache
• Micro-ops fixed length
— Superscalar pipelining and
scheduling
• Pentium instructions long & complex
• Performance improved by separating
decoding from scheduling & pipelining
— (More later – ch14)
• Data cache is write back
— Can be configured to write
through
• L1 cache controlled by 2 bits in register
— CD = cache disable
— NW = not write through
— 2 instructions to invalidate
(flush) cache and write back then
invalidate
Chapter 5
Internal Memory
✓ Semiconductor Memory Types
✓ Semiconductor Memory
• RAM
o Misnamed as all semiconductor memory is random access
o Read/Write
o Volatile
o Temporary storage
o Static or dynamic
✓ Memory Cell Operation

✓ Dynamic RAM
• Bits stored as charge in capacitors
• Charges leak
• Need refreshing even when powered
• Simpler construction
• Smaller per bit
• Less expensive
Need refresh circuits
• Slower
• Main memory
• Essentially analogue
o Level of charge determines value
✓ DRAM Operation
• Address line active when bit read or written
o Transistor switch closed (current flows)
• Write
o Voltage to bit line
▪ High for 1 low for 0
o Then signal address line
▪ Transfers charge to capacitor
• Read
o Address line selected
▪ transistor turns on
o Charge from capacitor fed via bit line to sense amplifier
▪ Compares with reference value to determine 0 or 1
o Capacitor charge must be restored
✓ Static RAM
• Bits stored as on/off switches
• No charges to leak
• No refreshing needed when powered
• More complex construction
• Larger per bit
• More expensive
• Does not need refresh circuits
• Faster
• Cache
• Digital
o Uses flip-flops
✓ SRAM v DRAM
• Both volatile
o Power needed to preserve data
• Dynamic cell
o Simpler to build, smaller
o More dense
o Less expensive
o Needs refresh
o Larger memory units
• Static
o Faster
o Cache
✓ Read Only Memory (ROM)
• Permanent storage
— Nonvolatile
• Microprogramming (see later)
• Library subroutines
• Systems programs (BIOS)
• Function tables
✓ Types of ROM
• Written during manufacture
o Very expensive for small runs
• Programmable (once)
o PROM
o Needs special equipment to program
• Read “mostly”
o Erasable Programmable (EPROM)
▪ Erased by UV
o Electrically Erasable (EEPROM)
▪ Takes much longer to write than read
o Flash memory
▪ Erase whole memory electrically
✓ Packaging
✓ Error Correction
• Hard Failure
o Permanent defect
• Soft Error
o Random, non-destructive
o No permanent damage to memory
• Detected using Hamming error correcting code

✓ Synchronous DRAM (SDRAM)
• Access is synchronized with an external clock
• Address is presented to RAM
• RAM finds data (CPU waits in conventional DRAM)
• Since SDRAM moves data in time with system clock, CPU knows when data will be ready
• CPU does not have to wait, it can do something else
• Burst mode allows SDRAM to set up stream of data and fire it out in block
• DDR-SDRAM sends data twice per clock cycle (leading & trailing edge)
✓ RAMBUS
• Adopted by Intel for Pentium & Itanium
• Main competitor to SDRAM
• Vertical package – all pins on one side
• Data exchange over 28 wires < cm long
• Bus addresses up to 320 RDRAM chips at 1.6Gbps
• Asynchronous block protocol
o 480ns access time
o Then 1.6 Gbps
✓ RAMBUS Diagram
Chapter 6
External Memory
✓ Types of External Memory

1. Magnetic Disk
— RAID
— Removable
2. Optical
— CD-ROM
— CD-Recordable (CD-R)
— CD-R/W
— DVD
3. Magnetic Tape
1. Magnetic Disk
• Disk substrate coated with magnetizable material (iron oxide…rust)
• Substrate used to be aluminium
• Now glass
a. Improved surface uniformity
i. Increases reliability
b. Reduction in surface defects
i. Reduced read/write errors
c. Lower flight heights (See later)
d. Better stiffness
e. Better shock/damage resistance
✓ Read and Write Mechanisms

• Recording and retrieval via conductive coil called a head
• May be single read/write head or separate ones
• During read/write, head is stationary, platter rotates
• Write
— Current through coil produces magnetic field
— Pulses sent to head
— Magnetic pattern recorded on surface below
• Read (traditional)
— Magnetic field moving relative to coil produces current
— Coil is the same for read and write
• Read (contemporary)
— Separate read head, close to write head
— Partially shielded magneto resistive (MR) sensor
— Electrical resistance depends on direction of magnetic field
— High frequency operation
– Higher storage density and speed
✓ Inductive Write MR Read

✓ Data Organization and Formatting
• Concentric rings or tracks
o Gaps between tracks
o Reduce gap to increase capacity
o Same number of bits per track (variable packing density)
o Constant angular velocity
• Tracks divided into sectors

• Minimum block size is one sector
May have more than one sector per block
✓ Disk Data Layout
✓ Disk Velocity
• Bit near centre of rotating disk passes fixed point slower than bit on outside of disk
• Increase spacing between bits in different tracks
• Rotate disk at constant angular velocity (CAV)
o Gives pie shaped sectors and concentric tracks
o Individual tracks and sectors addressable
o Move head to given track and wait for given sector
o Waste of space on outer tracks
▪ Lower data density
▪ Can use zones to increase capacity
o Each zone has fixed bits per track
More complex circuitry
✓ Disk Layout Methods Diagram
✓ Finding Sectors
• Must be able to identify start of track and sector
• Format disk
o Additional information not available to user
o Marks tracks and sectors
✓ ST506 format (old!)
✓ Characteristics
• Fixed (rare) or movable head
• Removable or fixed
• Single or double (usually) sided
• Single or multiple platter
• Head mechanism
— Contact (Floppy)
— Fixed gap
Flying (Winchester)
✓ Fixed/Movable Head Disk

• Fixed head
— One read write head per track
— Heads mounted on fixed ridged arm
• Movable head
— One read write head per side
— Mounted on a movable arm
✓ Removable or Not
• Removable disk
o Can be removed from drive and replaced with another disk
o Provides unlimited storage capacity
o Easy data transfer between systems
• Nonremovable disk
o Permanently mounted in the drive
✓ Multiple Platter
• One head per side
• Heads are joined and aligned
• Aligned tracks on each platter form cylinders
• Data is striped by cylinder
o reduces head movement
o Increases speed (transfer rate)
✓ Cylinders
✓ Floppy Disk
• 8”, 5.25”, 3.5”
• Small capacity
o Up to 1.44Mbyte (2.88M never popular)
• Slow
• Universal
• Cheap
• Obsolete?
✓ Winchester Hard Disk (1)

• Developed by IBM in Winchester (USA)
• Sealed unit
• One or more platters (disks)
• Heads fly on boundary layer of air as disk spins
• Very small head to disk gap
• Getting more robust
✓ Winchester Hard Disk (2)
• Universal
• Cheap
• Fastest external storage
• Getting larger all the time
o Multiple Gigabyte now usual
✓ Removable Hard Disk

• ZIP
o Cheap
o Very common
o Only 100M
• JAZ
o Not cheap
o 1G
• L-120 (a: drive)
o Also reads 3.5” floppy
o Becoming more popular?
• All obsoleted by CD-R and CD-R/W?
✓ Speed
• Seek time
o Moving head to correct track
• (Rotational) latency
o Waiting for data to rotate under head
• Access time = Seek + Latency
• Transfer rate
✓ Timing of Disk I/O Transfer
✓ RAID
• Redundant Array of Independent Disks
• Redundant Array of Inexpensive Disks
• 6 levels in common use
• Not a hierarchy
• Set of physical disks viewed as single logical drive by O/S
• Data distributed across physical drives
• Can use redundant capacity to store parity information
✓ RAID 0
• No redundancy
• Data striped across all disks
• Round Robin striping
• Increase speed
o Multiple data requests probably not on same disk
o Disks seek in parallel
o A set of data is likely to be striped across multiple disks
✓ RAID 1
• Mirrored Disks
• Data is striped across disks
• 2 copies of each stripe on separate disks
• Read from either
• Write to both
• Recovery is simple
o Swap faulty disk & re-mirror
o No down time
o Expensive
✓ RAID 2
• Disks are synchronized
• Very small stripes
o Often single byte/word
• Error correction calculated across corresponding bits on disks
• Multiple parity disks store Hamming code error correction in corresponding
positions
• Lots of redundancy
o Expensive
o Not used
✓ RAID 3
• Similar to RAID 2
• Only one redundant disk, no matter how large the array
• Simple parity bit for each set of corresponding bits
• Data on failed drive can be reconstructed from surviving data and parity info
• Very high transfer rates
✓ RAID 4
• Each disk operates independently
• Good for high I/O request rate
• Large stripes
• Bit by bit parity calculated across stripes on each disk
Parity stored on parity disk
✓ RAID 5
• Like RAID 4
• Parity striped across all disks
• Round robin allocation for parity stripe
• Avoids RAID 4 bottleneck at parity disk
• Commonly used in network servers
• N.B. DOES NOT MEAN 5 DISKS!!!!!
✓ RAID 6
• Two parity calculations
• Stored in separate blocks on different disks
• User requirement of N disks needs N+2
• High data availability
o Three disks need to fail for data loss
o Significant write penalty
✓ RAID 0, 1, 2
✓ RAID 3 & 4
✓ RAID 5 & 6
✓ Data Mapping For RAID 0

✓ Optical Storage CD-ROM
• Originally for audio
• 650Mbytes giving over 70 minutes audio
• Polycarbonate coated with highly reflective coat, usually aluminium
• Data stored as pits
• Read by reflecting laser
• Constant packing density
• Constant linear velocity
✓ CD Operation
✓ CD-ROM Drive Speeds

• Audio is single speed
o Constant linier velocity
o 1.2 ms-1
o Track (spiral) is 5.27km long
o Gives 4391 seconds = 73.2 minutes
• Other speeds are quoted as multiples
• e.g. 24x
• Quoted figure is maximum drive can achieve
✓ CD-ROM Format
• Mode 0=blank data field

• Mode 1=2048 byte data+error correction
• Mode 2=2336 byte data
✓ Random Access on
CD-ROM
• Difficult
• Move head to rough position
• Set correct speed
• Read address
• Adjust to required location
• (Yawn!)
✓ CD-ROM for & against

• Large capacity (?)
• Easy to mass produce
• Removable
• Robust
• Expensive for small runs
• Slow
• Read only
✓ Other Optical Storage

• CD-Recordable (CD-R)
o WORM
o Now affordable
o Compatible with CD-ROM drives
• CD-RW
o Erasable
o Getting cheaper
o Mostly CD-ROM drive compatible
o Phase change
▪ Material has two different reflectivities in different phase states
✓ DVD - what’s in a name?

• Digital Video Disk
o Used to indicate a player for movies
▪ Only plays video disks
• Digital Versatile Disk
o Used to indicate a computer drive
▪ Will read computer disks and play video disks
• Dogs Veritable Dinner
• Officially - nothing!!!
✓ DVD – technology
• Multi-layer
• Very high capacity (4.7G per layer)
• Full length movie on single disk
o Using MPEG compression
• Finally standardized (honest!)
• Movies carry regional coding
• Players only play correct region films
• Can be “fixed”
✓ DVD – Writable
• Loads of trouble with standards
• First generation DVD drives may not read first generation DVD-W disks
• First generation DVD drives may not read CD-RW disks
• Wait for it to settle down before buying!
✓ CD and DVD
✓ Magnetic Tape
• Serial access
• Slow
• Very cheap
• Backup and archive
✓ Digital Audio Tape (DAT)

• Uses rotating head (like video)
• High capacity on small tape
o 4Gbyte uncompressed
o 8Gbyte compressed
o Backup of PC/network servers
CHAPTER 7
INPUT / OUTPUT
✓ Input/Output Problems
• Wide variety of peripherals
— Delivering different amounts of data
— At different speeds
— In different formats
• All slower than CPU and RAM
• Need I/O modules
✓ Input/Output Module
• Interface to CPU and Memory
• Interface to one or more peripherals
✓ Generic Model of I/O Module
✓ External Devices
• Human readable
— Screen, printer, keyboard
• Machine readable
— Monitoring and control
• Communication
— Modem
— Network Interface Card (NIC)
✓ External Device Block Diagram
✓ Typical I/O Data Rates
✓ I/O Module Function

• Control & Timing
• CPU Communication
• Device Communication
• Data Buffering
• Error Detection
✓ I/O Steps
• CPU checks I/O module device status
• I/O module returns status
• If ready, CPU requests data transfer
• I/O module gets data from device
• I/O module transfers data to CPU
• Variations for output, DMA, etc.
✓ I/O Module Diagram
✓ I/O Module Decisions

• Hide or reveal device properties to CPU
• Support multiple or single device
• Control device functions or leave for CPU
• Also O/S decisions
— e.g. Unix treats everything it can as a file
✓ Input Output Techniques
• Programmed
• Interrupt driven
• Direct Memory Access (DMA)
✓ Programmed I/O
• CPU has direct control over I/O
— Sensing status
— Read/write commands
— Transferring data
• CPU waits for I/O module to complete operation
• Wastes CPU time
✓ Programmed I/O – detail
✓ I/O Commands
• CPU issues address
— Identifies module (& device if >1 per module)
• CPU issues command
— Control - telling module what to do
– e.g. spin up disk
— Test - check status
– e.g. power? Error?
— Read/Write
– Module transfers data via buffer from/to device
✓ Addressing I/O Devices
• Under programmed I/O data transfer is very like memory access (CPU viewpoint)
• Each device given unique identifier
• CPU commands contain identifier (address)
✓ I/O Mapping
• Memory mapped I/O
— Devices and memory share an address space
— I/O looks just like memory read/write
— No special commands for I/O
– Large selection of memory access commands available
• Isolated I/O
— Separate address spaces
— Need I/O or memory select lines
— Special commands for I/O
– Limited set
✓ Interrupt Driven I/O
• Overcomes CPU waiting
• No repeated CPU checking of device
• I/O module interrupts when ready
✓ Interrupt Driven I/O
Basic Operation
• CPU issues read command
• I/O module gets data from peripheral whilst CPU does other work
• I/O module interrupts CPU
• CPU requests data
• I/O module transfers data
✓ CPU Viewpoint
• Issue read command
• Do other work
• Check for interrupt at end of each instruction cycle
• If interrupted:-
— Save context (registers)
— Process interrupt
– Fetch data & store
• See Operating Systems notes
✓ Design Issues
• How do you identify the module issuing the interrupt?
• How do you deal with multiple interrupts?
— i.e. an interrupt handler being interrupted
✓ Identifying Interrupting Module (1)
• Different line for each module
— PC
— Limits number of devices
• Software poll
— CPU asks each module in turn
— Slow
✓ Identifying Interrupting Module (2)
• Daisy Chain or Hardware poll
— Interrupt Acknowledge sent down a chain
— Module responsible places vector on bus
— CPU uses vector to identify handler routine
• Bus Master
— Module must claim the bus before it can raise interrupt
— e.g. PCI & SCSI
✓ Multiple Interrupts
• Each interrupt line has a priority
• Higher priority lines can interrupt lower priority lines
• If bus mastering only current master can interrupt
✓ Example - PC Bus
• 80x86 has one interrupt line
• 8086 based systems use one 8259A interrupt controller
• 8259A has 8 interrupt lines
✓ Sequence of Events
• 8259A accepts interrupts
• 8259A determines priority
• 8259A signals 8086 (raises INTR line)
• CPU Acknowledges
• 8259A puts correct vector on data bus
• CPU processes interrupt
✓ ISA Bus Interrupt System

• ISA bus chains two 8259As together
• Link is via interrupt 2
• Gives 15 lines
— 16 lines less one for link
• IRQ 9 is used to re-route anything trying to use IRQ 2
— Backwards compatibility
• Incorporated in chip set
✓ 82C59A Interrupt
Controller
✓ Intel 82C55A
Programmable Peripheral Interface
✓ Using 82C55A To Control Keyboard/Display
✓ Direct Memory Access

• Interrupt driven and programmed I/O require active CPU intervention
— Transfer rate is limited
— CPU is tied up
• DMA is the answer
✓ DMA Function
• Additional Module (hardware) on bus
• DMA controller takes over from CPU for I/O
✓ DMA Module Diagram

✓ DMA Operation
• CPU tells DMA controller:-
— Read/Write
— Device address
— Starting address of memory block for data
— Amount of data to be transferred
• CPU carries on with other work
• DMA controller deals with transfer
• DMA controller sends interrupt when finished
✓ DMA Transfer
Cycle Stealing
• DMA controller takes over bus for a cycle
• Transfer of one word of data
• Not an interrupt
— CPU does not switch context
• CPU suspended just before it accesses bus
— i.e. before an operand or data fetch or a data write
• Slows down CPU but not as much as CPU doing transfer
✓ Aside
• What effect does caching memory have on DMA?
• Hint: how much are the system buses available?
✓ DMA Configurations (1)
• Single Bus, Detached DMA controller

• Each transfer uses bus twice
— I/O to DMA then DMA to memory
• CPU is suspended twice
• Single Bus, Integrated DMA controller

• Controller may support >1 device
• Each transfer uses bus once
— DMA to memory
• CPU is suspended once
• Separate I/O Bus

• Bus supports all DMA enabled devices
• Each transfer uses bus once
— DMA to memory
• CPU is suspended once
✓ I/O Channels
• I/O devices getting more sophisticated
• e.g. 3D graphics cards
• CPU instructs I/O controller to do transfer
• I/O controller does entire transfer
• Improves speed
— Takes load off CPU
— Dedicated processor is faster
✓ I/O Channel Architecture
✓ Interfacing
• Connecting devices together
• Bit of wire?
• Dedicated processor/memory/buses?
• E.g. FireWire, InfiniBand
✓ IEEE 1394 FireWire
• High performance serial bus
• Fast
• Low cost
• Easy to implement
• Also being used in digital cameras, VCRs and TV
✓ FireWire Configuration
• Daisy chain
• Up to 63 devices on single port
— Really 64 of which one is the interface itself
• Up to 1022 buses can be connected with bridges
• Automatic configuration
• No bus terminators
• May be tree structure
✓ Simple FireWire Configuration
✓ FireWire 3 Layer Stack

• Physical
— Transmission medium, electrical and signaling characteristics
• Link
— Transmission of data in packets
• Transaction
— Request-response protocol
✓ FireWire Protocol Stack
✓ FireWire - Physical Layer
Data rates from 25 to 400Mbps

• Two forms of arbitration
— Based on tree structure
— Root acts as arbiter
— First come first served
— Natural priority controls simultaneous requests
– i.e. who is nearest to root
— Fair arbitration
— Urgent arbitration
✓ FireWire - Link Layer
• Two transmission types
— Asynchronous
– Variable amount of data and several bytes of transaction data transferred
as a packet
– To explicit address
– Acknowledgement returned
— Isochronous
– Variable amount of data in sequence of fixed size packets at regular
intervals
– Simplified addressing
– No acknowledgement
✓ FireWire Subactions
✓ InfiniBand
• I/O specification aimed at high end servers
— Merger of Future I/O (Cisco, HP, Compaq, IBM) and Next Generation I/O (Intel)
• Version 1 released early 2001
• Architecture and spec. for data flow between processor and intelligent I/O devices
• Intended to replace PCI in servers
• Increased capacity, expandability, flexibility
✓ InfiniBand Architecture
• Remote storage, networking and connection between servers
• Attach servers, remote storage, network devices to central fabric of switches and links
• Greater server density
• Scalable data centre
• Independent nodes added as required
• I/O distance from server up to
— 17m using copper
— 300m multimode fibre optic
— 10km single mode fibre
• Up to 30Gbps
✓ InfiniBand Switch Fabric
✓ InfiniBand Operation
• 16 logical channels (virtual lanes) per physical link
• One lane for management, rest for data
• Data in stream of packets
• Virtual lane dedicated temporarily to end to end transfer
• Switch maps traffic from incoming to outgoing lane
✓ InfiniBand Protocol Stack
✓ Foreground Reading
• Check out Universal Serial Bus (USB)
• Compare with other communication standards e.g. Ethernet
CHAPTER 8 : Operating System Support
• Convenience
—Making the computer easier to use
• Efficiency
—Allowing better use of computer resources
✓ Layers and Views of a Computer System
✓ Operating System Services
• Program creation
• Program execution
• Access to I/O devices
• Controlled access to files
• System access
• Error detection and response
• Accounting
✓ O/S as a Resource Manager
✓ Types of Operating System

• Interactive
• Batch
• Single program (Uni-programming)
• Multi-programming (Multi-tasking)
✓ Early Systems
• Late 1940s to mid 1950s
• No Operating System
• Programs interact directly with hardware
• Two main problems:
—Scheduling
—Setup time
✓ Simple Batch Systems
• Resident Monitor program
• Users submit jobs to operator
• Operator batches jobs
• Monitor controls sequence of events to process batch
• When one job is finished, control returns to Monitor which reads
next job
• Monitor handles scheduling
✓ Memory Layout for Resident Monitor
✓ Job Control Language

• Instructions to Monitor
• Usually denoted by $
• e.g.
—$JOB
—$FTN
—... Some Fortran instructions
—$LOAD
—$RUN
—... Some data
—$END
✓ Desirable Hardware Features
• Memory protection
—To protect the Monitor
• Timer
—To prevent a job monopolizing the system
• Privileged instructions
—Only executed by Monitor
—e.g. I/O
• Interrupts
—Allows for relinquishing and regaining control
✓ Multi-programmed Batch Systems

• I/O devices very slow
• When one program is waiting for I/O, another can use the CPU
✓ Single Program
✓ Multi-Programming with
Two Programs
✓ Multi-Programming with
Three Programs
✓ Utilization
✓ Time Sharing Systems

• Allow users to interact directly with the computer
—i.e. Interactive
• Multi-programming allows a number of users to interact with the
computer
✓ Scheduling
• Key to multi-programming
• Long term
• Medium term
• Short term
• I/O
✓ Long Term Scheduling
• Determines which programs are submitted for processing
• i.e. controls the degree of multi-programming
• Once submitted, a job becomes a process for the short term
scheduler
• (or it becomes a swapped out job for the medium term scheduler)
✓ Medium Term Scheduling
• Part of the swapping function (later…)
• Usually based on the need to manage multi-programming
• If no virtual memory, memory management is also an issue
✓ Short Term Scheduler
• Dispatcher
• Fine grained decisions of which job to execute next
• i.e. which job actually gets to use the processor in the next time
slot
✓ Process States
✓ Process Control Block
• Identifier
• State
• Priority
• Program counter
• Memory pointers
• Context data
• I/O status
• Accounting information
✓ PCB Diagram
Key Elements of O/S

✓ Process Scheduling
✓ Memory Management
• Uni-program
—Memory split into two
—One for Operating System (monitor)
—One for currently executing program
• Multi-program
—“User” part is sub-divided and shared among active
processes
✓ Swapping
• Problem: I/O is so slow compared with CPU that even in multi-
programming system, CPU can be idle most of the time
• Solutions:
—Increase main memory
– Expensive
– Leads to larger programs
—Swapping
✓ What is Swapping?
• Long term queue of processes stored on disk
• Processes “swapped” in as space becomes available
• As a process completes it is moved out of main memory
• If none of the processes in memory are ready (i.e. all I/O blocked)
—Swap out a blocked process to intermediate queue
—Swap in a ready process or a new process
—But swapping is an I/O process...
✓ Partitioning
• Splitting memory into sections to allocate to processes (including
Operating System)
• Fixed-sized partitions
—May not be equal size
—Process is fitted into smallest hole that will take it (best fit)
—Some wasted memory
—Leads to variable sized partitions
✓ Fixed
Partitioning
✓ Variable Sized Partitions (1)

• Allocate exactly the required memory to a process
• This leads to a hole at the end of memory, too small to use
—Only one small hole - less waste
• When all processes are blocked, swap out a process and bring in
another
• New process may be smaller than swapped out process
• Another hole
✓ Variable Sized Partitions (2)
• Eventually have lots of holes (fragmentation)
• Solutions:
—Coalesce - Join adjacent holes into one large hole
—Compaction - From time to time go through memory and
move all hole into one free block (c.f. disk de-fragmentation)
✓ Effect of Dynamic Partitioning
✓ Relocation
• No guarantee that process will load into the same place in memory
• Instructions contain addresses
—Locations of data
—Addresses for instructions (branching)
• Logical address - relative to beginning of program
• Physical address - actual location in memory (this time)
• Automatic conversion using base address
✓ Paging
• Split memory into equal sized, small chunks -page frames
• Split programs (processes) into equal sized small chunks - pages
• Allocate the required number page frames to a process
• Operating System maintains list of free frames
• A process does not require contiguous page frames
• Use page table to keep track
✓ Logical and Physical Addresses – Paging
✓ Virtual Memory
• Demand paging
—Do not require all pages of a process in memory
—Bring in pages as required
• Page fault
—Required page is not in memory
—Operating System must swap in required page
—May need to swap out a page to make space
—Select page to throw out based on recent history
✓ Thrashing
• Too many processes in too little memory
• Operating System spends all its time swapping
• Little or no real work is done
• Disk light is on all the time
• Solutions
—Good page replacement algorithms
—Reduce number of processes running
—Fit more memory
✓ Bonus
• We do not need all of a process in memory for it to run
• We can swap in pages as required
• So - we can now run processes that are bigger than total memory
available!
• Main memory is called real memory
• User/programmer sees much bigger memory - virtual memory
✓ Page Table Structure
✓ Translation Lookaside Buffer

• Every virtual memory reference causes two physical memory
access
—Fetch page table entry
—Fetch data
• Use special cache for page table
—TLB
✓ TLB Operation
✓ TLB and Cache Operation
✓ Segmentation
• Paging is not (usually) visible to the programmer
• Segmentation is visible to the programmer
• Usually different segments allocated to program and data
• May be a number of program and data segments
✓ Advantages of Segmentation
• Simplifies handling of growing data structures
• Allows programs to be altered and recompiled independently,
without re-linking and re-loading
• Lends itself to sharing among processes
• Lends itself to protection
• Some systems combine segmentation with paging
Pentium II
• Hardware for segmentation and paging
• Unsegmented unpaged
—virtual address = physical address
—Low complexity
—High performance
• Unsegmented paged
—Memory viewed as paged linear address space
—Protection and management via paging
—Berkeley UNIX
• Segmented unpaged
—Collection of local address spaces
—Protection to single byte level
—Translation table needed is on chip when segment is in
memory
• Segmented paged
—Segmentation used to define logical memory partitions
subject to access control
—Paging manages allocation of memory within partitions
—Unix System V
✓ Pentium II Address Translation Mechanism

✓ Pentium II Segmentation
• Each virtual address is 16-bit segment and 32-bit offset
• 2 bits of segment are protection mechanism
• 14 bits specify segment
• Unsegmented virtual memory 232 = 4Gbytes
• Segmented 246=64 terabytes
—Can be larger – depends on which process is active
—Half (8K segments of 4Gbytes) is global
—Half is local and distinct for each process
✓ Pentium II Protection
• Protection bits give 4 levels of privilege
—0 most protected, 3 least
—Use of levels software dependent
—Usually level 3 for applications, level 1 for O/S and level 0 for
kernel (level 2 not used)
—Level 2 may be used for apps that have internal security e.g.
database
—Some instructions only work in level 0
✓ Pentium II Paging
• Segmentation may be disabled
—In which case linear address space is used
• Two level page table lookup
—First, page directory
– 1024 entries max
– Splits 4G linear memory into 1024 page groups of 4Mbyte
– Each page table has 1024 entries corresponding to 4Kbyte
pages
– Can use one page directory for all processes, one per
process or mixture
– Page directory for current process always in memory
—Use TLB holding 32 page table entries
—Two page sizes available 4k or 4M
✓ PowerPC Memory Management Hardware
• 32 bit – paging with simple segmentation
—64 bit paging with more powerful segmentation
• Or, both do block address translation
—Map 4 large blocks of instructions & 4 of memory to bypass
paging
—e.g. OS tables or graphics frame buffers
• 32 bit effective address
—12 bit byte selector
– =4kbyte pages
—16 bit page id
– 64k pages per segment
—4 bits indicate one of 16 segment registers
– Segment registers under OS control
✓ PowerPC 32-bit Memory Management Formats
✓ PowerPC 32-bit Address Translation

✓ Required Reading
• Stallings chapter 8
• Stallings, W. Operating Systems, Internals and Design Principles,
Prentice Hall 1998
• Loads of Web sites on Operating Systems

Architecture Finals Complete Reviewer

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Architecture Finals Complete Reviewer

Uploaded by

Copyright:

Available Formats

REVIEW FOR ARCHITECTURE AND ORGANIZATONS ICT 118

— e.g. Is there a multiply instruction?

— All Intel x86 family share the same basic

— The IBM System/370 family share the same basic

— Control signals, interfaces, memory technology.

— e.g. Is there a hardware multiply unit or is it done

— Organization differs between different version

Functional view Structure - The Control Unit (CU)

von Neumann/Turing Commercial Computers

1.Memory Buffer Register(MBR) – contains a word and stores Microelectronics

IBM 360 series DRAM(Dynamic Random Access Memory) and

DEC - PDP-8 Bus Structure

Pentium Evolution (2)

Instruction Cycle Interrupts

Execute Cycle Interrupt Cycle

Program Timing Multiple Interrupts – Nested

Program Timing Time Sequence of Multiple Interrupts

Instruction Cycle (with Interrupts) - State Diagram Connecting

Memory Connection Big and Yellow?

Synchronous Timing Diagram PCI Bus Arbitration

Asynchronous Timing – Read Diagram

PCI Bus Lines (required)

Chapter 4: Cache Memory

Capacity Memory Hierarchy

Access Methods (1) Memory Hierarchy – Diagram

Physical Types So you want fast?

The Bottom Line

Hierarchy List Cache operation – overview

Size does matter Direct Mapping

Typical Cache Organization Direct Mapping Cache Organization

Mapping Function Direct Mapping Example

Direct Mapping Direct Mapping Summary

Associative Mapping Associative Mapping Summary

memory • Number of lines in cache = undetermined

Fully Associative Cache Organization Set Associative Mapping

Associative Mapping Example Set Associative Mapping

Two Way Set Associative Cache

Two Way Set Associative Mapping Example Write through

Set Associative Mapping Summary Write back

Pentium 4 Diagram (Simplified) Power PC Cache Organization

Pentium 4 Core Processor PowerPC G4

✓ Semiconductor Memory Types

o Misnamed as all semiconductor memory is random access

✓ Memory Cell Operation

o Transistor switch closed (current flows)

o Voltage to bit line

▪ High for 1 low for 0

o Then signal address line

▪ Transfers charge to capacitor

o Address line selected

o Charge from capacitor fed via bit line to sense amplifier

▪ Compares with reference value to determine 0 or 1

o Capacitor charge must be restored

o Power needed to preserve data

o Simpler to build, smaller

o Larger memory units

✓ Read Only Memory (ROM)

• Microprogramming (see later)

• Systems programs (BIOS)

o Very expensive for small runs

o Needs special equipment to program

o Erasable Programmable (EPROM)

o Electrically Erasable (EEPROM)

▪ Takes much longer to write than read