You are on page 1of 243

INPUT/OUTPUT

CHAPTER # 6 Computer Organization & Architecture


External Devices

 Human readable
 Screen, printer, keyboard
 Machine readable
 Monitoring and control
 Communication
 Modem
 Network Interface Card (NIC)

Chapter # 6 Computer Organization & Architecture 2


Input/Output Problems

 Wide variety of peripherals


 Delivering different amounts of data
 At different speeds

 In different formats

 All slower than CPU and RAM


 Need I/O modules

Chapter # 6 Computer Organization & Architecture 3


Input/Output Module

 Interface to CPU and Memory


 Interface to one or more peripherals

Chapter # 6 Computer Organization & Architecture 4


Generic Model of I/O Module

Chapter # 6 Computer Organization & Architecture 5


External Device Block Diagram

Chapter # 6 Computer Organization & Architecture 6


I/O Module Function

 Control & Timing


 CPU Communication
 Device Communication
 Data Buffering
 Error Detection

Chapter # 6 Computer Organization & Architecture 7


I/O Steps

 CPU checks I/O module device status


 I/O module returns status
 If ready, CPU requests data transfer
 I/O module gets data from device
 I/O module transfers data to CPU
 Variations for output, DMA, etc.

Chapter # 6 Computer Organization & Architecture 8


I/O Module Diagram

Chapter # 6 Computer Organization & Architecture 9


I/O Module Decisions

 Hide or reveal device properties to CPU


 Support multiple or single device
 Control device functions or leave for CPU
 Also O/S decisions
 e.g. Unix treats everything it can as a file

Chapter # 6 Computer Organization & Architecture 10


Input Output Techniques

 Programmed I/O
 Interrupt driven I/O
 Direct Memory Access (DMA)

Chapter # 6 Computer Organization & Architecture 11


Three Techniques for Input of a Block of Data

Chapter # 6 Computer Organization & Architecture 12


Programmed I/O

 CPU has direct control over I/O


 Sensing status
 Read/write commands

 Transferring data

 CPU waits for I/O module to complete operation


 Wastes CPU time

Chapter # 6 Computer Organization & Architecture 13


Programmed I/O - Detail

 CPU requests I/O operation


 I/O module performs operation
 I/O module sets status bits
 CPU checks status bits periodically
 I/O module does not inform CPU directly
 I/O module does not interrupt CPU
 CPU may wait or come back later

Chapter # 6 Computer Organization & Architecture 14


I/O Commands

 CPU issues address


 Identifies module (& device if >1 per module)
 CPU issues command
 Control - telling module what to do
 e.g. spin up disk
 Test - check status
 e.g. power? Error?
 Read/Write
 Module transfers data via buffer from/to device

Chapter # 6 Computer Organization & Architecture 15


Addressing I/O Devices

 Under programmed I/O data transfer is very like


memory access (CPU viewpoint)
 Each device given unique identifier
 CPU commands contain identifier (address)

Chapter # 6 Computer Organization & Architecture 16


I/O Mapping

 Memory mapped I/O


 Devices and memory share an address space
 I/O looks just like memory read/write

 No special commands for I/O


 Large selection of memory access commands available
 Isolated I/O
 Separate address spaces
 Need I/O or memory select lines

 Special commands for I/O


 Limited set

Chapter # 6 Computer Organization & Architecture 17


Interrupt Driven I/O

 Overcomes CPU waiting


 No repeated CPU checking of device
 I/O module interrupts when ready

Chapter # 6 Computer Organization & Architecture 18


Interrupt Driven I/O Basic Operation

 CPU issues read command


 I/O module gets data from peripheral whilst CPU
does other work
 I/O module interrupts CPU
 CPU requests data
 I/O module transfers data

Chapter # 6 Computer Organization & Architecture 19


Simple Interrupt Processing

Chapter # 6 Computer Organization & Architecture 20


CPU Viewpoint

 Issue read command


 Do other work
 Check for interrupt at end of each instruction cycle
 If interrupted
 Save context (registers)
 Process interrupt
 Fetch data & store

Chapter # 6 Computer Organization & Architecture 21


Changes in Memory and Registers for an Interrupt

Chapter # 6 Computer Organization & Architecture 22


Design Issues

 How do you identify the module issuing the


interrupt?
 How do you deal with multiple interrupts?
 i.e. an interrupt handler being interrupted

Chapter # 6 Computer Organization & Architecture 23


Identifying Interrupting Module

 Multiple Interrupt Lines


 Different line for each module
 Limits number of devices
 Software poll
 CPU asks each module in turn
 Slow
 Daisy Chain or Hardware poll
 Interrupt Acknowledge sent down a chain
 Module responsible places vector on bus
 CPU uses vector to identify handler routine
 Bus Master
 Module must claim the bus before it can raise interrupt
 e.g. PCI & SCSI
Chapter # 6 Computer Organization & Architecture 24
Multiple Interrupts

 Each interrupt line has a priority


 Higher priority lines can interrupt lower priority lines
 If bus mastering only current master can interrupt

Chapter # 6 Computer Organization & Architecture 25


Direct Memory Access

 Interrupt driven and programmed I/O require active


CPU intervention
 Transfer rate is limited
 CPU is tied up

 DMA is the answer

Chapter # 6 Computer Organization & Architecture 26


DMA Function

 Additional Module (hardware) on bus


 DMA controller takes over from CPU for I/O

Chapter # 6 Computer Organization & Architecture 27


Typical DMA Module Diagram

Chapter # 6 Computer Organization & Architecture 28


DMA Operation

 CPU tells DMA controller


 Read/Write
 Device address

 Starting address of memory block for data

 Amount of data to be transferred

 CPU carries on with other work


 DMA controller deals with transfer
 DMA controller sends interrupt when finished

Chapter # 6 Computer Organization & Architecture 29


DMA Transfer Cycle Stealing

 DMA controller takes over bus for a cycle


 Transfer of one word of data
 Not an interrupt
 CPU does not switch context
 CPU suspended just before it accesses bus
 i.e. before an operand or data fetch or a data write
 Slows down CPU but not as much as CPU doing
transfer

Chapter # 6 Computer Organization & Architecture 30


DMA Configurations

 Single Bus, Detached DMA controller


 Each transfer uses bus twice
 I/O to DMA then DMA to memory
 CPU is suspended twice

Chapter # 6 Computer Organization & Architecture 31


DMA Configurations

 Single Bus, Integrated DMA controller


 Controller may support >1 device
 Each transfer uses bus once
 DMA to memory
 CPU is suspended once

Chapter # 6 Computer Organization & Architecture 32


DMA Configurations

 Separate I/O Bus


 Bus supports all DMA enabled devices
 Each transfer uses bus once
 DMA to memory
 CPU is suspended once

Chapter # 6 Computer Organization & Architecture 33


Intel 8237A DMA Controller
 Interfaces to 80x86 family and DRAM
 When DMA module needs buses it sends HOLD signal to processor
 CPU responds HLDA (hold acknowledge)
 DMA module can use buses
 E.g. transfer data from memory to disk
1. Device requests service of DMA by pulling DREQ (DMA request) high
2. DMA puts high on HRQ (hold request),
3. CPU finishes present bus cycle (not necessarily present instruction) and puts
high on HDLA (hold acknowledge). HOLD remains active for duration of DMA
4. DMA activates DACK (DMA acknowledge), telling device to start transfer
5. DMA starts transfer by putting address of first byte on address bus and
activating MEMR; it then activates IOW to write to peripheral. DMA
decrements counter and increments address pointer. Repeat until count
reaches zero
6. DMA deactivates HRQ, giving bus back to CPU

Chapter # 6 Computer Organization & Architecture 34


8237 DMA Usage of Systems Bus

Chapter # 6 Computer Organization & Architecture 35


Fly-By

 While DMA using buses, processor is idle


 Processor using bus, DMA is idle
 Known as fly-by DMA controller
 Data does not pass through and is not stored in DMA
chip
 8237 contains four DMA channels
 Programmed independently
 Any one active
 Numbered 0, 1, 2, and 3

Chapter # 6 Computer Organization & Architecture 36


I/O Channels

 I/O devices getting more sophisticated


 e.g. 3D graphics cards
 CPU instructs I/O controller to do transfer
 I/O controller does entire transfer
 Improves speed
 Takes load off CPU
 Dedicated processor is faster

Chapter # 6 Computer Organization & Architecture 37


I/O Channel Architecture

Chapter # 6 Computer Organization & Architecture 38


The Evolution of the I/O Function
 Programmed I/O
 The CPU directly controls a peripheral device
 Simple microprocessor controlled devices
 I/O Module
 A controller or I/O module is added
 The CPU uses programmed I/O without interrupts
 CPU somewhat divorced from the details of external device interfaces
 Interrupt Driven
 Now interrupts are employed
 The CPU need not spend time waiting for an I/O operation to be
performed
 thus increasing efficiency
 DMA
 The I/O module is given direct access to memory via DMA
 It can now move a block of data to or from memory without involving
the CPU, except at the beginning and end of the transfer
Chapter # 6 Computer Organization & Architecture 39
The Evolution of the I/O Function

 I/O Channel
 The I/O module is enhanced to become a processor in its own right,
with a specialized instruction set tailored for I/O
 The CPU directs the I/O processor to execute an I/O program in
memory
 The I/O processor fetches and executes these instructions without CPU
intervention
 I/O Processor
 The I/O module has a local memory of its own
 A large set of I/O devices can be controlled, with minimal CPU
involvement

Chapter # 6 Computer Organization & Architecture 40


Parallel and Serial I/O

 Parallel interface
 multiple lines connecting the
peripheral
 multiple bits are transferred
simultaneously
 Serial interface
 Only one line used to transmit
data
 bits must be transmitted one at a
time

Chapter # 6 Computer Organization & Architecture 41


Point-to-Point vs Multipoint interface

 Connection between an I/O module in a computer


system and external devices can be either:
 point-to-point
 multiport
 Point-to-point interface
 provides a dedicated line between the I/O module and the
external device
 On small systems (PCs, workstations) typical point-to-point links
include those to the keyboard, printer, and external modem
 Multipoint external interfaces
 used to support external mass storage devices (disk and tape
drives) and multimedia devices (CD-ROMs, video, audio)

Chapter # 6 Computer Organization & Architecture 42


Thunderbolt
 Most recent and fastest peripheral connection technology to
become available for general-purpose use
 Developed by Intel with collaboration from Apple
 The technology combines data, video, audio, and power into a
single high-speed connection for peripherals such as hard drives,
RAID arrays, video-capture boxes, and network interfaces
 Provides up to 10 Gbps throughput in each direction and up to 10
Watts of power to connected peripherals
 A Thunderbolt-compatible peripheral interface is considerably
more complex than a simple USB device
 First generation products are primarily aimed at the professional-
consumer market such as audiovisual editors who want to be able
to move large volumes of data quickly between storage devices and
laptops
 Thunderbolt is a standard feature of Apple’s MacBook Pro laptop
and iMac desktop computers

Chapter # 6 Computer Organization & Architecture 43


Thunderbolt in a Computer Configuration

Chapter # 6 Computer Organization & Architecture 44


Thunderbolt Protocol Layers

Chapter # 6 Computer Organization & Architecture 45


InfiniBand

 Recent I/O specification aimed at the high-end server


market
 First version was released in early 2001
 Standard describes an architecture and specifications for
data flow among processors and intelligent I/O devices
 Has become a popular interface for storage area
networking and other large storage configurations
 Enables servers, remote storage, and other network
devices to be attached in a central fabric of switches and
links
 The switch-based architecture can connect up to 64,000
servers, storage systems, and networking devices

Chapter # 6 Computer Organization & Architecture 46


InfiniBand Switch Fabric

Chapter # 6 Computer Organization & Architecture 47


InfiniBand Links and Data Throughput Rates

Chapter # 6 Computer Organization & Architecture 48


zEnterprise 196
 Introduced in 2010
 IBM’s latest mainframe computer offering
 System is based on the use of the z196 chip
 5.2 GHz multi-core chip with four cores
 Can have a maximum of 24 processor chips (96 cores)
 Has a dedicated I/O subsystem that manages all I/O operations
 Of the 96 core processors, up to 4 of these can be dedicated for I/O
use, creating 4 channel subsystems (CSS)
 Each CSS is made up of the following elements:
 System assist processor (SAP)
 Hardware system area (HSA)
 Logical partitions
 Subchannels
 Channel path
 Channel

Chapter # 6 Computer Organization & Architecture 49


INSTRUCTION SET
CHARACTERISTICS AND
FUNCTIONS
CHAPTER # 7 Computer Organization & Architecture
What is an Instruction Set?

 The complete collection of different instructions that


the processor can execute or understand is referred
to as the processor’s instruction set
 Machine Code
 Binary
 Usually represented by assembly codes

Chapter # 7 Computer Organization & Architecture 2


Elements of an Instruction

 Operation code (Op code)


 Specifies the operation to be performed
 Source Operand reference
 To this
 The operation may involve one or more source operands

 Result Operand reference


 Put the answer here
 Next Instruction Reference
 This tells the processor where to fetch the next instruction
after the execution of this instruction is complete

Chapter # 7 Computer Organization & Architecture 3


Instruction Cycle State Diagram

Chapter # 7 Computer Organization & Architecture 4


Where the Source and Result Operands can be

 Main or virtual memory


 As with next instruction references, the main or virtual
memory address must be supplied
 I/O device
 The instruction must specify the I/O module and device for
the operation
 Processor register
 A processor contains one or more registers that may be
referenced by machine instructions
 Immediate
 The value of the operand is contained in a field in the
instruction being executed

Chapter # 7 Computer Organization & Architecture 5


Instruction Representation

 In machine code each instruction has a unique bit


pattern
 For human consumption (well, programmers
anyway) a symbolic representation is used
 e.g. ADD, SUB, LOAD
 Operands can also be represented in this way
 ADD A,B

Chapter # 7 Computer Organization & Architecture 6


Simple Instruction Format

Chapter # 7 Computer Organization & Architecture 7


Instruction Types

 Data processing
 Data storage (main memory)
 Data movement (I/O)
 Program flow control

Chapter # 7 Computer Organization & Architecture 8


Number of Addresses

 3 addresses
 Operand 1, Operand 2, Result
 a = b + c;
 ADD a, b, c
 May be a forth - next instruction (usually implicit)
 Not common

 Needs very long words to hold everything

Chapter # 7 Computer Organization & Architecture 9


Number of Addresses

 2 addresses
 One address doubles as operand and result
 a=a+b
 ADD a, b
 Reduces length of instruction
 Requires some extra work
 Temporary storage to hold some results

Chapter # 7 Computer Organization & Architecture 10


Number of Addresses

 1 address
 Implicit second address
 ADD a
 Usually a register (accumulator)
 Common on early machines

Chapter # 7 Computer Organization & Architecture 11


Number of Addresses

 0 (zero) addresses
 All addresses implicit
 Uses a stack

 e.g. c = a + b
 push a
 push b
 Add
 pop c

Chapter # 7 Computer Organization & Architecture 12


Number of Addresses

Chapter # 7 Computer Organization & Architecture 13


How Many Addresses

 More addresses
 More complex (powerful?) instructions
 More registers
 Inter-register operations are quicker
 Fewer instructions per program
 Fewer addresses
 Less complex (powerful?) instructions
 More instructions per program

 Faster fetch/execution of instructions

Chapter # 7 Computer Organization & Architecture 14


Instruction Set Design Decisions

 Operation range
 How many ops?
 What can they do?
 How complex are they?

 Data types
 The various types of data upon which operations are
performed
 Instruction formats
 Instruction length in bits
 Length of op code field
 Number of addresses
 Size of various fields, etc.

Chapter # 7 Computer Organization & Architecture 15


Instruction Set Design Decisions

 Registers
 Number of CPU registers available
 Which operations can be performed on which registers?

 Addressing modes
 RISC v CISC

Chapter # 7 Computer Organization & Architecture 16


Types of Operand

 Addresses
 Numbers
 Integer/floating point
 Characters
 ASCII etc.
 Logical Data
 Bits or flags

Chapter # 7 Computer Organization & Architecture 17


x86 Data Types

 8 bit Byte
 16 bit word
 32 bit double word
 64 bit quad word
 128 bit double quadword
 Addressing is by 8 bit unit
 Words do not need to align at even-numbered
address
 Data accessed across 32 bit bus in units of double
word read at addresses divisible by 4
 Little endian
Chapter # 7 Computer Organization & Architecture 18
x86 Numeric Data Formats

Chapter # 7 Computer Organization & Architecture 19


ARM Data Types
 8 (byte), 16 (halfword), 32 (word) bits
 Halfword and word accesses should be word aligned
 Unsigned integer interpretation supported for all
types
 Twos-complement signed integer interpretation
supported for all types
 Majority of implementations do not provide floating-
point hardware
 Saves power and area
 Floating-point arithmetic implemented in software
 Optional floating-point coprocessor
 Single- and double-precision IEEE 754 floating point data
types

Chapter # 7 Computer Organization & Architecture 20


Types of Operation

 Data Transfer
 Arithmetic
 Logical
 Conversion
 I/O
 System Control
 Transfer of Control

Chapter # 7 Computer Organization & Architecture 21


Types of Operation

Chapter # 7 Computer Organization & Architecture 22


Types of Operation

Chapter # 7 Computer Organization & Architecture 23


Data Transfer

 Specify
 Source
 Destination

 Amount of data

 May be different instructions for different


movements
 e.g. IBM 370
 Or one instruction and different addresses
 e.g. VAX

Chapter # 7 Computer Organization & Architecture 24


Arithmetic

 Add, Subtract, Multiply, Divide


 Signed Integer
 Floating point
 May include
 Increment (a++)
 Decrement (a--)

 Negate (-a)

Chapter # 7 Computer Organization & Architecture 25


Logical

 Bitwise operations
 AND
 OR

 NOT

 Test
 Compare

Chapter # 7 Computer Organization & Architecture 26


Conversion

 E.g. Binary to Decimal

Chapter # 7 Computer Organization & Architecture 27


Input/Output

 May be specific instructions


 May be done using data movement instructions
(memory mapped)
 May be done by a separate controller (DMA)
 Examples
 Input
 Output

 Start I/O

Chapter # 7 Computer Organization & Architecture 28


Systems Control

 Privileged instructions
 CPU needs to be in specific state
 Ring 0 on 80386+
 Kernel mode

 For operating systems use

Chapter # 7 Computer Organization & Architecture 29


Transfer of Control

 Branch
 e.g. branch to x if result is zero
 Skip
 e.g. increment and skip if zero
 ISZ Register1

 Branch xxxx

 ADD A

 Subroutine call
 c.f. interrupt call

Chapter # 7 Computer Organization & Architecture 30


Branch Instruction

Chapter # 7 Computer Organization & Architecture 31


x86 Operation Types

Chapter # 7 Computer Organization & Architecture 32


x86 Operation Types

Chapter # 7 Computer Organization & Architecture 33


x86 Single-Instruction, Multiple-Data (SIMD)
Instructions
 1996 Intel introduced MMX technology into its Pentium
product line
 MMX is a set of highly optimized instructions for multimedia
tasks
 Video and audio data are typically composed of large
arrays of small data types
 Three new data types are defined in MMX
 Packed byte
 Packed word
 Packed doubleword
 Each data type is 64 bits in length and consists of
multiple smaller data fields, each of which holds a fixed-
point integer
Chapter # 7 Computer Organization & Architecture 34
MMX Instruction Set

Chapter # 7 Computer Organization & Architecture 35


ARM Operation Types

 Load and store instructions


 Branch instructions
 Data-processing instructions
 Multiply instructions
 Parallel addition and subtraction instructions
 Extend instructions
 Status register access instructions

Chapter # 7 Computer Organization & Architecture 36


Byte Order (A portion of chips?)

 What order do we read numbers that occupy more


than one byte
 e.g. (numbers in hex to make it easy to read)
 12345678 can be stored in 4x8bit locations as
follows
 Address Value (1) Value(2)
 184 12 78
 185 34 56
 186 56 34
 186 78 12

Chapter # 7 Computer Organization & Architecture 37


Byte Order Names

 The problem is called Endian


 Big endian
 In big endian, you store the most significant byte in the
smallest address.
 Little-endian
 In little endian, you store the least significant byte in the
smallest address

Chapter # 7 Computer Organization & Architecture 38


Example of C Data Structure

Chapter # 7 Computer Organization & Architecture 39


Alternative View of Memory Map

Chapter # 7 Computer Organization & Architecture 40


Byte Order Standard

 Pentium (x86), VAX are little-endian


 IBM 370, Motorola 680x0 (Mac), and most RISC are
big-endian
 Internet is big-endian
 Makes writing Internet programs on PC more awkward!
 WinSock provides htoi and itoh (Host to Internet & Internet
to Host) functions to convert

Chapter # 7 Computer Organization & Architecture 41


INSTRUCTION ADDRESSING
MODES & FORMATS
CHAPTER # 8 Computer Organization & Architecture
Addressing Modes

 Immediate
 Direct
 Indirect
 Register
 Register Indirect
 Displacement (Indexed)
 Stack

Chapter # 8 Computer Organization & Architecture 2


Immediate Addressing

 Operand is part of instruction


 Operand = address field
 e.g. ADD 5
 Add 5 to contents of accumulator
 5 is operand

 No memory reference to fetch data


 Fast
 Limited range

Chapter # 8 Computer Organization & Architecture 3


Immediate Addressing Diagram

Instruction

Opcode Operand

Chapter # 8 Computer Organization & Architecture 4


Direct Addressing

 Address field contains address of operand


 Effective address (EA) = address field (A)
 e.g. ADD A
 Add contents of cell A to accumulator
 Look in memory at address A for operand

 Advantages:
 Single memory reference to access data
 No additional calculations to work out effective address

 Disadvantages:
 Limited address space

Chapter # 8 Computer Organization & Architecture 5


Direct Addressing Diagram

Instruction
Opcode Address A

Memory

Operand

Chapter # 8 Computer Organization & Architecture 6


Indirect Addressing

 Memory cell pointed to by address field contains the


address of (pointer to) the operand
 EA = (A)
 Look in A, find address (A) and look there for operand
 e.g. ADD (A)
 Add contents of cell pointed to by contents of A to
accumulator

Chapter # 8 Computer Organization & Architecture 7


Indirect Addressing

 Advantage:
 Large address space available
 2n where n = word length

 Disadvantage:
 Instruction execution requires two memory references to
fetch the operand
 One to get its address and a second to get its value

 Hence slower

 May be nested, multilevel, cascaded


 e.g. EA = (((A)))

Chapter # 8 Computer Organization & Architecture 8


Indirect Addressing Diagram
Instruction
Opcode Address A

Memory

Pointer to operand

Operand

Chapter # 8 Computer Organization & Architecture 9


Register Addressing

 Operand is held in register named in address filed


 EA = R
 Advantages:
 Very fast execution
 Only a small address field is needed in the instruction
 Shorter instructions
 Faster instruction fetch
 No time-consuming memory references are required
 Disadvantage:
 The address space is very limited
 Limited number of registers

Chapter # 8 Computer Organization & Architecture 10


Register Addressing

 Multiple registers helps performance


 Requires good assembly programming or compiler writing
 C programming
 register int a;
 Also called Register Direct addressing

Chapter # 8 Computer Organization & Architecture 11


Register Addressing Diagram

Instruction
Opcode Register Address R

Registers

Operand

Chapter # 8 Computer Organization & Architecture 12


Register Indirect Addressing

 EA = (R)
 Operand is in memory cell pointed to by contents of
register R
 Large address space (2n)
 One fewer memory access than indirect addressing

Chapter # 8 Computer Organization & Architecture 13


Register Indirect Addressing Diagram

Instruction

Opcode Register Address R

Memory

Registers

Pointer to Operand Operand

Chapter # 8 Computer Organization & Architecture 14


Displacement Addressing
 Combines the capabilities of direct addressing and
register indirect addressing
 EA = A + (R)
 Requires that the instruction have two address fields, at
least one of which is explicit
 A = base value
 The value contained in one address field is used directly
 R = register that holds displacement
 The other address field’s contents are added to A to produce the
effective address
 or vice versa
 Most common uses:
 Relative addressing
 Base-register addressing
 Indexing
Chapter # 8 Computer Organization & Architecture 15
Displacement Addressing Diagram

Instruction
Opcode Register R Address A

Memory

Registers

Pointer to Operand + Operand

Chapter # 8 Computer Organization & Architecture 16


Relative Addressing
 A version of displacement addressing
 R = Program counter, PC
 EA = A + (PC)
 i.e. get operand from A cells from current location pointed to by
PC
 The implicitly referenced register is the program counter
(PC)
 The next instruction address is added to the address field to
produce the EA
 Typically the address field is treated as a twos complement
number for this operation
 Thus the effective address is a displacement relative to the
address of the instruction
 Exploits the concept of locality

Chapter # 8 Computer Organization & Architecture 17


Base-Register Addressing

 A holds displacement
 R holds pointer to base address
 R may be explicit or implicit
 e.g. segment registers in 80x86
 Exploits the locality of memory references
 Convenient means of implementing segmentation
 In some implementations a single segment base register is
employed and is used implicitly
 In others the programmer may choose a register to hold
the base address of a segment and the instruction must
reference it explicitly

Chapter # 8 Computer Organization & Architecture 18


Indexed Addressing
 The method of calculating the EA is the same as for base-
register addressing
 An important use is to provide an efficient mechanism for
performing iterative operations
 Good for accessing arrays
 Autoindexing
 Automatically increment or decrement the index register after each
reference to it
 EA = A + R
 EA = A + R
 R++
 A = base
 The address field references a main memory address and the
 R = displacement
 referenced register contains a positive displacement from that address

Chapter # 8 Computer Organization & Architecture 19


Indexed Addressing

 Postindexing
 Indexing is performed after the indirection
 EA = (A) + (R)

 Preindexing
 Indexing is performed before the indirection
 EA = (A + (R))

Chapter # 8 Computer Organization & Architecture 20


Stack Addressing

 A stack is a linear array of locations


 Sometimes referred to as a pushdown list or last-in-first-out
queue
 Items are appended to the top of the stack so that the block is
partially filled
 Associated with the stack is a pointer whose value is the
address of the top of the stack
 The stack pointer is maintained in a register
 Thus references to stack locations in memory are in fact register
indirect addresses
 Operand is (implicitly) on top of stack
 e.g.
 ADD Pop top two items from stack and add

Chapter # 8 Computer Organization & Architecture 21


Addressing Modes Summary

Chapter # 8 Computer Organization & Architecture 22


Addressing Modes Summary

Chapter # 8 Computer Organization & Architecture 23


x86 Addressing Modes

 Virtual or effective address is offset into segment


 Starting address plus offset gives linear address
 This goes through page translation if paging enabled
 12 addressing modes available
 Immediate
 Register operand
 Displacement
 Base
 Base with displacement
 Scaled index with displacement
 Base with index and displacement
 Base scaled index with displacement
 Relative

Chapter # 8 Computer Organization & Architecture 24


x86 Addressing Mode Calculation

Chapter # 8 Computer Organization & Architecture 25


x86 Addressing Mode Calculation

Chapter # 8 Computer Organization & Architecture 26


ARM Addressing Modes Load/Store
 Data processing instructions
 Use either register addressing or a mixture of register and
immediate addressing
 For register addressing the value in one of the register
operands may be scaled using one of the five shift
operators
 Branch instructions
 The only form of addressing for branch instructions is
immediate
 Instruction contains 24 bit value
 Shifted 2 bits left so that the address is on a word boundary
 Effective range +/-32MB from from the program counter
 Indirectly through base register plus offset
 Offset added to or subtracted from base register contents
to form the memory address

Chapter # 8 Computer Organization & Architecture 27


ARM Addressing Modes Load/Store
 Preindex
 Memory address is formed as for offset addressing
 Memory address also written back to base register
 So base register value incremented or decremented by offset value
 Postindex
 Memory address is base register value
 Offset added or subtracted
 Result written back to base register
 Base register acts as index register for preindex and postindex
addressing
 Offset either immediate value in instruction or another
register
 If register scaled register addressing available
 Offset register value scaled by shift operator
 Instruction specifies shift size

Chapter # 8 Computer Organization & Architecture 28


ARM Indexing Methods

Chapter # 8 Computer Organization & Architecture 29


Instruction Formats

 Layout of bits in an instruction


 Includes opcode
 Includes (implicit or explicit) operand(s)
 Usually more than one instruction format in an
instruction set

Chapter # 8 Computer Organization & Architecture 30


Instruction Length

 Affected by and affects


 Memory size
 Memory organization

 Bus structure

 CPU complexity

 CPU speed

 Trade off between powerful instruction repertoire


and saving space

Chapter # 8 Computer Organization & Architecture 31


Address Mapping

 Address modification
 Base, displacement
 Base, index, displacement

Chapter # 8 Computer Organization & Architecture 32


Allocation of Bits

 Number of addressing modes


 Number of operands
 Register versus memory
 Number of register sets
 Address range
 Address granularity

Chapter # 8 Computer Organization & Architecture 33


Improvements

 Use hexadecimal rather than binary


 Code as series of lines
 Hex address and memory address
 Need to translate automatically using program
 Add symbolic names or mnemonics for instructions
 Three fields per line
 Location address
 Three letter opcode

 If memory reference: address

 Need more complex translation program

Chapter # 8 Computer Organization & Architecture 34


PDP-8 Instruction Format

Chapter # 8 Computer Organization & Architecture 35


PDP-10 Instruction Format

Chapter # 8 Computer Organization & Architecture 36


Variable-Length Instructions

 Variations can be provided efficiently and compactly


 Increases the complexity of the processor
 Does not remove the desirability of making all of the
instruction lengths integrally related to word length
 Because the processor does not know the length of the
next instruction to be fetched a typical strategy is to fetch a
number of bytes or words equal to at least the longest
possible instruction
 Sometimes multiple instructions are fetched

Chapter # 8 Computer Organization & Architecture 37


PDP-11 Instruction Format

Chapter # 8 Computer Organization & Architecture 38


VAX Instruction Examples

Chapter # 8 Computer Organization & Architecture 39


x86 Instruction Format

Chapter # 8 Computer Organization & Architecture 40


x86 Instruction Format

 Instruction prefixes
 Instruction prefix
 The instruction prefix, if present, consists of the LOCK prefix or one
of the repeat prefixes
 The LOCK prefix is used to ensure exclusive use of shared memory in
multiprocessor environments
 The repeat prefixes specify repeated operation of a string, which
enables the x86 to process strings much faster than with a regular
software loop
 There are five different repeat prefixes: REP, REPE, REPZ, REPNE, and
REPNZ

Chapter # 8 Computer Organization & Architecture 41


x86 Instruction Format

 Segment override
 Explicitly specifies which segment register an instruction should use,
overriding the default segment-register selection generated by the
x86 for that instruction
 Operand size
 An instruction has a default operand size of 16 or 32 bits, and the
operand prefix switches between 32-bit and 16-bit operands
 Address size
 The processor can address memory using either 16- or 32-bit
addresses
 The address size determines the displacement size in instructions
and the size of address offsets generated during effective address
calculation

Chapter # 8 Computer Organization & Architecture 42


x86 Instruction Format

 Opcode
 The opcode field is 1, 2, or 3 bytes in length
 The opcode may also include bits that specify if data is
byte- or full-size (16 or 32 bits depending on context),
direction of data operation (to or from memory), and
whether an immediate data field must be sign extended
 ModR/M
 This byte provide addressing information
 The ModR/M byte specifies whether an operand is in a
register or in memory
 If it is in memory, then fields within the byte specify the
addressing mode to be used
Chapter # 8 Computer Organization & Architecture 43
x86 Instruction Format

 SIB
 Certain encoding of the ModR/M byte specifies the inclusion of
the SIB byte to specify fully the addressing mode
 The SIB byte consists of three fields
 Scale field (2 bits) specifies the scale factor for scaled indexing
 Index field (3 bits) specifies the index register
 Base field (3 bits) specifies the base register
 Displacement
 When the addressing-mode specifier indicates that a
displacement is used, an 8-, 16-, or 32-bit signed integer
displacement field is added
 Immediate
 Provides the value of an 8-, 16-, or 32-bit operand
Chapter # 8 Computer Organization & Architecture 44
ARM Instruction Formats

Chapter # 8 Computer Organization & Architecture 45


Thumb Instruction Set

Chapter # 8 Computer Organization & Architecture 46


PROCESSOR STRUCTURE &
FUNCTION
CHAPTER # 9 Computer Organization & Architecture
CPU Function

 CPU must
 Fetch instruction
 The processor reads an instruction from memory (register, cache, main
memory)
 Interpret instruction
 The instruction is decoded to determine what action is required
 Fetch data
 The execution of an instruction may require reading data from memory
or an I/O module
 Process data
 The execution of an instruction may require performing some
arithmetic or logical operation on data
 Write data
 The results of an execution may require writing data to memory or an
I/O module

Chapter # 9 Computer Organization & Architecture 2


CPU With Systems Bus

Chapter # 9 Computer Organization & Architecture 3


CPU Internal Structure

Chapter # 9 Computer Organization & Architecture 4


Registers

 CPU must have some working space (temporary


storage)
 Called registers
 Number and function vary between processor
designs
 One of the major design decisions
 Top level of memory hierarchy

Chapter # 9 Computer Organization & Architecture 5


Types of Registers

 User-visible registers
 Enable the machine or assembly language programmer to
minimize main memory references by optimizing use of
registers
 Control and status registers
 Used by the control unit to control the operation of the
processor and by privileged, operating system programs to
control the execution of programs

Chapter # 9 Computer Organization & Architecture 6


User Visible Registers

 General Purpose
 Data
 Address
 Condition Codes

Chapter # 9 Computer Organization & Architecture 7


General Purpose Registers

 May be true general purpose


 May be restricted
 May be used for data or addressing
 Data
 Accumulator
 Addressing
 Segment

Chapter # 9 Computer Organization & Architecture 8


General Purpose Registers

 Make them general purpose


 Increase flexibility and programmer options
 Increase instruction size & complexity

 Make them specialized


 Smaller (faster) instructions
 Less flexibility

Chapter # 9 Computer Organization & Architecture 9


How Many GP Registers?

 Between 8 - 32
 Fewer = more memory references
 More does not reduce memory references and takes
up processor real estate

Chapter # 9 Computer Organization & Architecture 10


How big?

 Large enough to hold full address


 Large enough to hold full word
 Often possible to combine two data registers
 C programming
 double int a;

Chapter # 9 Computer Organization & Architecture 11


Condition Code Registers

 Sets of individual bits


 e.g. result of last operation was zero
 Can be read (implicitly) by programs
 e.g. Jump if zero
 Can not (usually) be set by programs

Chapter # 9 Computer Organization & Architecture 12


Control & Status Registers

 Program counter (PC)


 Contains the address of an instruction to be fetched
 Instruction register (IR)
 Contains the instruction most recently fetched
 Memory address register (MAR)
 Contains the address of a location in memory
 Memory buffer register (MBR)
 Contains a word of data to be written to memory or the
word most recently read

Chapter # 9 Computer Organization & Architecture 13


Program Status Word
A set of bits which includes Condition Codes
 Sign
 Contains the sign bit of the result of the last arithmetic operation
 Zero
 Set when the result is 0
 Carry
 Set if an operation resulted in a carry
 Equal
 Set if a logical compare result is equality
 Overflow
 Used to indicate arithmetic overflow
 Interrupt Enable/Disable
 Used to enable or disable interrupts
 Supervisor
 Indicates whether the processor is executing in supervisor or user mode

Chapter # 9 Computer Organization & Architecture 14


Supervisor Mode

 Intel ring zero


 Kernel mode
 Allows privileged instructions to execute
 Used by operating system
 Not available to user programs

Chapter # 9 Computer Organization & Architecture 15


Other Registers

 May have registers pointing to


 Process control blocks
 Interrupt Vectors

 Page table

 CPU design and operating system design are closely


linked

Chapter # 9 Computer Organization & Architecture 16


Example Register Organizations

Chapter # 9 Computer Organization & Architecture 17


EFLAGS Register

Chapter # 9 Computer Organization & Architecture 18


Intel x86-64 Registers
 16 integer general-purpose registers
 64-bit
 RAX, RBX, RCX, RDX, RBP, RSI, RDI, RSP, R8 - R15
 8 floating-point registers, FPU x87
 80-bit
 ST0 - ST7
 8 Multimedia Extensions registers
 64-bit
 MM0 – MM7
 they share space with the registers ST0 - ST7
 16 SSE (Streaming SIMD Extensions) registers
 128-bit
 XMM0 - XMM15
 64-bit RIP pointer
 64-bit flag register RFLAGS
Chapter # 9 Computer Organization & Architecture 19
Intel x86-64 Registers
 64-bit x86 adds 8 more general-purpose registers,
named R8-R15
 It also introduces a new naming convention
 except that AH, CH, DH and BH have no equivalents
 R0 is RAX
 R1 is RCX
 R2 is RDX
 R3 is RBX
 R4 is RSP
 R5 is RBP
 R6 is RSI
 R7 is RDI

Chapter # 9 Computer Organization & Architecture 20


Intel x86-64 Registers

 R8, R9, R10, R11, R12, R13, R14, R15 are the new
registers and have no other names
 R0D – R15D are the lowermost 32 bits of each
register
 For example, R0D is EAX
 R0W – R15W are the lowermost 16 bits of each
register
 For example, R0W is AX
 R0L – R15L are the lowermost 8 bits of each register
 for example, R0L is AL

Chapter # 9 Computer Organization & Architecture 21


Intel x86-64 Registers

Chapter # 9 Computer Organization & Architecture 22


Simplified ARM Organization

Chapter # 9 Computer Organization & Architecture 23


ARM (32-bit) Processor Modes
 The ARM architecture supports seven execution modes
 User Mode
 Most application programs execute in user mode
 User program being executed is unable to access protected system
resources or to change mode
 Privilege Modes
 These modes are used to run system software and divided into two
categories
 System Mode
 This mode is not entered by any exception and uses the same registers available
in User mode
 The System mode is used for running certain privileged operating system tasks
 System mode tasks may be interrupted by any of the five exception categories
 Exception Modes
 The exception are entered when specific exceptions occur
 Each of these modes has some dedicated registers that substitute for some of
the user mode registers

Chapter # 9 Computer Organization & Architecture 24


ARM Exception Modes
 The exception are entered when specific exceptions occur
 Each of these modes has some dedicated registers that substitute for
some of the user mode registers, and which are used to avoid corrupting
User mode state information when the exception occurs
 The exception modes are as follows
 Supervisor mode
 Usually what the OS runs in and it is entered when the processor encounters a software interrupt
instruction
 Abort mode
 Entered in response to memory faults
 Undefined mode
 Entered when the processor attempts to execute an instruction that is supported neither by the
main integer core nor by one of the coprocessors
 Interrupt mode
 Entered whenever the processor receives an interrupt signal from any other interrupt source
 Fast interrupt mode
 Entered whenever the processor receives an interrupt signal from the designated fast interrupt
source
 A fast interrupt cannot be interrupted, but a fast interrupt may interrupt a normal interrupt

Chapter # 9 Computer Organization & Architecture 25


ARM Register Organization
 The ARM processor has a total of thirty seven 32-bit registers,
classified as follows
 31 registers referred to in the ARM manual as general-purpose
registers
 In fact, some of these, such as the program counters, have special purposes
 6 program status registers
 Registers are arranged in partially overlapping banks, with the current
processor mode determining which bank is available
 At any time, sixteen numbered registers and one or two program
status registers are visible, for a total of 17 or 18 software-visible
registers
 Registers R0 through R7, register R15 (the program counter) and the current
program status register (CPSR) are visible in and shared by all modes
 Registers R8 through R12 are shared by all modes except fast interrupt, which
has its own dedicated registers R8_fiq through R12_fiq
 All the exception modes have their own versions of registers R13 and R14
 All the exception modes have a dedicated saved program status register (SPSR)

Chapter # 9 Computer Organization & Architecture 26


ARM Register Organization

Chapter # 9 Computer Organization & Architecture 27


ARM Register Organization

Chapter # 9 Computer Organization & Architecture 28


ARM AArch64 Registers

Chapter # 9 Computer Organization & Architecture 29


Instruction Cycle

Chapter # 9 Computer Organization & Architecture 30


Indirect Cycle

 May require memory access to fetch operands


 Indirect addressing requires more memory accesses
 Can be thought of as additional instruction sub-cycle

Chapter # 9 Computer Organization & Architecture 31


Instruction Cycle with Indirect

Chapter # 9 Computer Organization & Architecture 32


Instruction Cycle State Diagram

Chapter # 9 Computer Organization & Architecture 33


Data Flow (Instruction Fetch)

 Depends on CPU design


 Fetch
 PC contains address of next instruction
 Address moved to MAR

 Address placed on address bus

 Control unit requests memory read

 Result placed on data bus, copied to MBR, then to IR

 Meanwhile PC incremented by 1

Chapter # 9 Computer Organization & Architecture 34


Data Flow (Data Fetch)

 IR is examined
 If indirect addressing, indirect cycle is performed
 Right most N bits of MBR transferred to MAR
 Control unit requests memory read

 Result (address of operand) moved to MBR

Chapter # 9 Computer Organization & Architecture 35


Data Flow (Fetch Diagram)

Chapter # 9 Computer Organization & Architecture 36


Data Flow (Indirect Diagram)

Chapter # 9 Computer Organization & Architecture 37


Data Flow (Execute)

 May take many forms


 Depends on instruction being executed
 May include
 Memory read/write
 Input/Output

 Register transfers

 ALU operations

Chapter # 9 Computer Organization & Architecture 38


Data Flow (Interrupt)
 Simple
 Predictable
 Current PC saved to allow resumption after interrupt
 Contents of PC copied to MBR
 Special memory location (e.g. stack pointer) loaded to MAR
 MBR written to memory
 PC loaded with address of interrupt handling routine
 Next instruction (first of interrupt handler) can be fetched

Chapter # 9 Computer Organization & Architecture 39


Data Flow (Interrupt Diagram)

Chapter # 9 Computer Organization & Architecture 40


Prefetch

 Fetch accessing main memory


 Execution usually does not access main memory
 Can fetch next instruction during execution of
current instruction
 Called instruction prefetch

Chapter # 9 Computer Organization & Architecture 41


Improved Performance

 But not doubled


 Fetch usually shorter than execution
 Prefetch more than one instruction?
 Any jump or branch means that prefetched instructions are
not the required instructions
 Divide in more activities/stages to improve
performance
 Solution is processor pipelining

Chapter # 9 Computer Organization & Architecture 42


Pipelining is Natural
 Laundry Example
 Nazim, Botir, Babar, Temur
each have one load of clothes A B C D
to wash, dry, and fold
 “Washing” takes 30 minutes

 “Drying” takes 30 minutes

 “Folding” takes 30 minutes

 “Stashing” takes 30 minutes


to put clothes into drawers

Chapter # 9 Computer Organization & Architecture 43


Sequential Laundry

6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
T Time
a A
s
k
B
O
r C
d
e
r
D

 Sequential laundry takes 8 hours for 4 loads

Chapter # 9 Computer Organization & Architecture 44


Pipelined Laundry: Start work ASAP

6 PM 7 8 9 10 11 12 1 2 AM

30 30 30 30 30 30 30 Time
T
a A
s
k
B
O
r C
d
e
r
D

 Pipelined laundry takes 3.5 hours for 4 loads!

Chapter # 9 Computer Organization & Architecture 45


Processor Pipelining

 Fetch instruction
 Decode instruction
 Calculate operands address
 Fetch operands
 Execute instructions
 Write result

 Overlap these operations

Chapter # 9 Computer Organization & Architecture 46


Two Stage Instruction Pipeline

Chapter # 9 Computer Organization & Architecture 47


Timing Diagram for
Instruction Pipeline Operation

Chapter # 9 Computer Organization & Architecture 48


Why Pipeline?

 Suppose we execute 100 instructions


 Single Cycle Machine
 45 ns/cycle x 1 CPI x 100 inst = 4500 ns
 Multicycle Machine
 10 ns/cycle x 4.6 CPI (due to instr. mix) x 100 inst = 4600 ns
 Ideal pipelined machine
 10 ns/cycle x (1 CPI x 100 inst + 4 cycle drain) = 1040 ns

Chapter # 9 Computer Organization & Architecture 49


The Effect of a Conditional Branch on
Instruction Pipeline Operation

Chapter # 9 Computer Organization & Architecture 50


Six Stage Instruction Pipeline

Chapter # 9 Computer Organization & Architecture 51


Alternative Pipeline Depiction

Chapter # 9 Computer Organization & Architecture 52


Speedup Factors with Instruction Pipelining

Chapter # 9 Computer Organization & Architecture 53


Pipeline Hazards

 Pipeline, or some portion of pipeline, must stall


 Also called pipeline bubble
 Types of hazards
 Resource
 Data

 Control

Chapter # 9 Computer Organization & Architecture 54


Resource Hazards
 Two (or more) instructions in pipeline need same resource
 Executed in serial rather than parallel for part of pipeline
 Also called structural hazard
 E.g. Assume simplified five-stage pipeline
 Each stage takes one clock cycle
 Ideal case is new instruction enters pipeline each clock cycle
 Assume main memory has single port
 Assume instruction fetches and data reads and writes performed one at a time
 Ignore the cache
 Operand read or write cannot be performed in parallel with instruction fetch
 Fetch instruction stage must idle for one cycle fetching I3

 E.g. multiple instructions ready to enter execute instruction phase


 Single ALU

 One solution: increase available resources


 Multiple main memory ports
 Multiple ALUs

Chapter # 9 Computer Organization & Architecture 55


Resource Hazard Diagram

Chapter # 9 Computer Organization & Architecture 56


Data Hazards
 Conflict in access of an operand location
 Two instructions to be executed in sequence
 Both access a particular memory or register operand
 If in strict sequence, no problem occurs
 If in a pipeline, operand value could be updated so as to produce different
result from strict sequential execution
 E.g. x86 machine instruction sequence:

 ADD EAX, EBX /* EAX = EAX + EBX


 SUB ECX, EAX /* ECX = ECX – EAX

 ADD instruction does not update EAX until end of stage 5, at clock cycle 5
 SUB instruction needs value at beginning of its stage 2, at clock cycle 4
 Pipeline must stall for two clocks cycles
 Without special hardware and specific avoidance algorithms, results in
inefficient pipeline usage
Chapter # 9 Computer Organization & Architecture 57
Data Hazard Diagram

Chapter # 9 Computer Organization & Architecture 58


Types of Data Hazard

 Read after write (RAW), or true dependency


 An instruction modifies a register or memory location
 Succeeding instruction reads data in that location
 Hazard if read takes place before write complete
 Write after read (WAR), or antidependency
 An instruction reads a register or memory location
 Succeeding instruction writes to location
 Hazard if write completes before read takes place
 Write after write (WAW), or output dependency
 Two instructions both write to same location
 Hazard if writes take place in reverse of order intended sequence

Chapter # 9 Computer Organization & Architecture 59


Control Hazard

 Also known as branch hazard


 Pipeline makes wrong decision on branch prediction
 Brings instructions into pipeline that must
subsequently be discarded

Chapter # 9 Computer Organization & Architecture 60


Dealing with Branches

 Multiple Streams
 Prefetch Branch Target
 Loop buffer
 Branch prediction
 Delayed branching

Chapter # 9 Computer Organization & Architecture 61


Multiple Streams

 Have two pipelines


 Prefetch each branch into a separate pipeline
 Use appropriate pipeline
 Leads to bus & register contention
 Multiple branches lead to further pipelines being
needed

Chapter # 9 Computer Organization & Architecture 62


Prefetch Branch Target

 Target of branch is prefetched in addition to


instructions following branch
 Keep target until branch is executed
 Used by IBM 360/91

Chapter # 9 Computer Organization & Architecture 63


Loop Buffer

 Very fast memory


 Maintained by fetch stage of pipeline
 Check buffer before fetching from memory
 Very good for small loops or jumps
 c.f. cache
 Used by CRAY-1

Chapter # 9 Computer Organization & Architecture 64


Loop Buffer Diagram

Chapter # 9 Computer Organization & Architecture 65


Branch Prediction

 Predict never taken


 Assume that jump will not happen
 Always fetch next instruction

 68020 & VAX 11/780

 VAX will not prefetch after branch if a page fault would


result (O/S v CPU design)
 Predict always taken
 Assume that jump will happen
 Always fetch target instruction

Chapter # 9 Computer Organization & Architecture 66


Branch Prediction

 Predict by Opcode
 Some instructions are more likely to result in a jump than
others
 Can get up to 75% success

 Taken/Not taken switch


 Based on previous history
 Good for loops
 Refined by two-level or correlation-based branch history

 Correlation-based
 In more complex structures, branch direction correlates
with that of related branches
 Use recent branch history as well

Chapter # 9 Computer Organization & Architecture 67


Branch Prediction

 Delayed Branch
 Do not take jump until you have to
 Rearrange instructions

Chapter # 9 Computer Organization & Architecture 68


Branch Prediction State Diagram

Chapter # 9 Computer Organization & Architecture 69


Branch Prediction Flowchart

Chapter # 9 Computer Organization & Architecture 70


Dealing With Branches

Chapter # 9 Computer Organization & Architecture 71


Intel 80486 Pipelining
 Fetch
 From cache or external memory
 Put in one of two 16-byte prefetch buffers
 Fill buffer with new data as soon as old data consumed
 Average 5 instructions fetched per load
 Independent of other stages to keep buffers full
 Decode stage 1
 Opcode & address-mode info
 At most first 3 bytes of instruction
 Can direct D2 stage to get rest of instruction
 Decode stage 2
 Expand opcode into control signals
 Computation of complex address modes
 Execute
 ALU operations, cache access, register update
 Writeback
 Update registers & flags
 Results sent to cache & bus interface write buffers

Chapter # 9 Computer Organization & Architecture 72


80486 Instruction Pipeline Examples

Chapter # 9 Computer Organization & Architecture 73


Q.1 Q.2
 Given the following memory and  Consider different systems with and
register values. without pipelining. Each system has to
 Word 700 contains 740 execute 1400 instructions.
 Word 710 contains 750  Calculate the total execution time for
 Word 720 contains 710 1400 instructions in each of the
 Word 730 contains 740 following case?
 Word 740 contains 700  Single-cycle machine
 Word 750 contains 700  It takes 40ns for each cycle
 AX Register contains 720  Multi-cycle machine (without pipelining)
 BX Register contains 740
 It takes 6ns for each cycle
 CX Register contains 710
 DX Register contains 750  It has 7 stages

 Base Register contains 200  It consumes 7 clocks per instruction


 What would be the result in on average
following cases?  Ideal pipelined machine
 ADD DX, [BX]  It takes 6ns for each cycle
 SUB [CX], BX  It has 7 stage
 MOV DX, 30
 ADD [AX], [700]
Chapter # 9 Computer Organization & Architecture 74
SYSTEM INTERCONNECTION
CHAPTER # 10 Computer Organization & Architecture
Connecting

 All the units must be connected


 Different type of connection for different type of unit
 Memory
 Input/Output

 CPU

Chapter # 10 Computer Organization & Architecture 2


Types of Transfers

 Interconnection structure must support the


following types of transfers
 Memory to processor
 Processor reads an instruction or a unit of data from memory
 Processor to memory
 Processor writes a unit of data to memory
 I/O to processor
 Processor reads data from an I/O device via an I/O module
 Processor to I/O
 Processor sends data to the I/O device
 I/O to or from memory
 An I/O module is allowed to exchange data directly with memory
without going through the processor using direct memory access

Chapter # 10 Computer Organization & Architecture 3


Computer Modules

Chapter # 10 Computer Organization & Architecture 4


Memory Connection

 Receives and sends data


 Receives addresses (of locations)
 Receives control signals
 Read
 Write

 Timing

Chapter # 10 Computer Organization & Architecture 5


Input/Output Connection

 Similar to memory from computer’s viewpoint


 Output
 Receive data from computer
 Send data to peripheral

 Input
 Receive data from peripheral
 Send data to computer

Chapter # 10 Computer Organization & Architecture 6


Input/Output Connection

 Receive control signals from computer


 Send control signals to peripherals
 e.g. spin disk
 Receive addresses from computer
 e.g. port number to identify peripheral
 Send interrupt signals (control)

Chapter # 10 Computer Organization & Architecture 7


CPU Connection

 Reads instruction and data


 Writes out data (after processing)
 Sends control signals to other units
 Receives (& acts on) interrupts

Chapter # 10 Computer Organization & Architecture 8


What is a Bus?

 A communication pathway connecting two or more


devices
 Usually broadcast
 Signals transmitted by any one device are available for
reception by all other devices attached to the bus
 Often grouped
 A number of channels in one bus
 e.g. 32 bit data bus is 32 separate single bit channels

 Power lines may not be shown

Chapter # 10 Computer Organization & Architecture 9


Buses

 There are a number of possible interconnection


systems
 Single and multiple BUS structures are most
common
 e.g. Control/Address/Data bus (PC)
 e.g. Unibus (DEC-PDP)

Chapter # 10 Computer Organization & Architecture 10


Data Bus

 Carries data
 Remember that there is no difference between “data” and
“instruction” at this level
 The number of lines determines how many bits can
be transferred at a time
 May consist of 8, 16, 32, 64, 128, or more separate lines
 Width is a key determinant of performance

Chapter # 10 Computer Organization & Architecture 11


Address bus

 Identify the source or destination of data


 e.g. CPU needs to read an instruction or data from a given
location in memory
 Bus width determines maximum memory capacity of
system
 e.g. 8080 has 16 bit address bus giving 64k address space
 Also used to address I/O ports
 The higher order bits are used to select a particular
module on the bus and the lower order bits select a
memory location or I/O port within the module

Chapter # 10 Computer Organization & Architecture 12


Control Bus

 Used to control the access and the use of the data


and address lines
 Carries control and timing information
 Timing signals indicate the validity of data and address
information
 Command signals specify operations to be performed

 Generally use for


 Memory read/write signal
 Interrupt request

 Clock signals

Chapter # 10 Computer Organization & Architecture 13


Bus Interconnection Scheme

Chapter # 10 Computer Organization & Architecture 14


What do Buses Look Like?

 Has many shapes


 Parallel lines on circuit boards
 Ribbon cables

 Strip connectors on mother boards


 e.g. PCI
 Sets of wires

Chapter # 10 Computer Organization & Architecture 15


Single Bus Problems

 Lots of devices on one bus leads to:


 Propagation delays
 Long data paths mean that co-ordination of bus use can adversely
affect performance
 If aggregate data transfer approaches bus capacity
 Most systems use multiple buses to overcome these
problems

Chapter # 10 Computer Organization & Architecture 16


Traditional Architecture

Chapter # 10 Computer Organization & Architecture 17


High Performance Bus

Chapter # 10 Computer Organization & Architecture 18


Elements of Bus Design

 Bus Type  Bus Width


 Dedicated  Address
 Multiplexed  Data

 Method of Arbitration  Data Transfer Type


 Centralized  Read
 Distributed  Write

 Timing  Read-modify-write

 Synchronous  Read-after-write

 Asynchronous  Block

Chapter # 10 Computer Organization & Architecture 19


Bus Types

 Dedicated
 Separate data & address lines
 Multiplexed
 Shared lines
 Address valid or data valid control line

 Advantage
 fewer lines
 Disadvantages
 More complex control

Chapter # 10 Computer Organization & Architecture 20


Bus Arbitration

 More than one module controlling the bus


 e.g. CPU and DMA controller
 Only one module may control bus at one time
 Arbitration may be centralised or distributed

Chapter # 10 Computer Organization & Architecture 21


Centralised or Distributed Arbitration

 Centralised
 Single hardware device controlling bus access
 Bus Controller
 Arbiter
 May be part of CPU or separate
 Distributed
 Each module may claim the bus
 Control logic on all modules

Chapter # 10 Computer Organization & Architecture 22


Timing of Synchronous Bus Operations

Chapter # 10 Computer Organization & Architecture 23


Timing of Asynchronous Bus Operations

Chapter # 10 Computer Organization & Architecture 24


Point-to-Point Interconnect

 In conventional bus
 Over the period of time electrical constraints encountered
with increasing the frequency of wide synchronous buses
 At higher and higher data rates it becomes increasingly
difficult to perform the synchronization and arbitration
functions in a timely fashion
 Shared bus on the same chip magnified the difficulties of
increasing bus data rate and reducing bus latency to keep
up with the processors
 All this became reason for a change in bus

 Point-to-Point Interconnect was introduced


 It has lower latency, higher data rate, and better scalability
Chapter # 10 Computer Organization & Architecture 25
Quick Path Interconnect

 QPI is a point-to-point interconnect


 Introduced in 2008
 Multiple direct connections
 Direct pairwise connections to other components eliminating
the need for arbitration found in shared transmission systems
 Layered protocol architecture
 These processor level interconnects use a layered protocol
architecture rather than the simple use of control signals found
in shared bus arrangements
 Packetized data transfer
 Data are sent as a sequence of packets each of which includes
control headers and error control codes

Chapter # 10 Computer Organization & Architecture 26


Multicore Configuration using QPI

Chapter # 10 Computer Organization & Architecture 27


QPI Layers
 Physical
 Consists of the actual wires carrying the signals
 The unit of transfer at the is 20 bits, which is called a Phit (physical unit)
 Link
 Responsible for reliable transmission and flow control
 The Link layer’s unit of transfer is an 80-bit Flit (flow control unit)
 Routing
 Provides the framework for
directing packets through the
fabric
 Protocol
 The high-level set of rules for
exchanging packets of data
between devices. A packet is
comprised of an integral number
of Flits

Chapter # 10 Computer Organization & Architecture 28


Physical Interface of the Intel QPI Interconnect

Chapter # 10 Computer Organization & Architecture 29


Physical Interface of the Intel QPI Interconnect

 The QPI port consists of 84 individual links


 Each data path consists of a pair of wires referred to as a lane
 It transmits data one bit at a time
 There are 20 data lanes in each
 The 20-bit unit is referred to as a phit
 QPI can transmit in parallel in each direction
 Typical signaling speeds 6.4 GT/s
 At 20 bits per transfer, that adds up
to 16 GB/s
 Being bidirectional, the total
capacity is 32 GB/s
 The lanes are grouped into four
quadrants of 5 lanes each
 In some applications, the link can
also operate at half or quarter
widths

Chapter # 10 Computer Organization & Architecture 30


Peripheral Component Interconnect (PCI)
 A popular high bandwidth, processor independent bus
that can function as a mezzanine or peripheral bus
 Delivers better system performance for high speed I/O
subsystems
 PCI Special Interest Group (SIG)
 Created to develop further and maintain the compatibility of the
PCI specifications
 PCI Express (PCIe)
 Point-to-point interconnect scheme intended to replace bus-
based schemes such as PCI
 Key requirement is high capacity to support the needs of higher
data rate I/O devices, such as Gigabit Ethernet
 Another requirement deals with the need to support time
dependent data streams

Chapter # 10 Computer Organization & Architecture 31


PCIe Configuration

Chapter # 10 Computer Organization & Architecture 32


PCIe Protocol Layers
 Physical
 Consists of the actual wires carrying the signals
 Data link
 responsible for reliable transmission and flow control
 data packets generated/consumed are called Data Link Layer Packets (DLLPs)
 Transaction
 Generates and consumes data
packets used to implement
load/store data transfer
mechanisms
 Also manages the flow control of
those packets between the two
components on a link
 Data packets generated and
consumed by the TL are called
Transaction Layer Packets (TLPs)
Chapter # 10 Computer Organization & Architecture 33

You might also like