Professional Documents
Culture Documents
Human readable
Screen, printer, keyboard
Machine readable
Monitoring and control
Communication
Modem
Network Interface Card (NIC)
In different formats
Programmed I/O
Interrupt driven I/O
Direct Memory Access (DMA)
Transferring data
I/O Channel
The I/O module is enhanced to become a processor in its own right,
with a specialized instruction set tailored for I/O
The CPU directs the I/O processor to execute an I/O program in
memory
The I/O processor fetches and executes these instructions without CPU
intervention
I/O Processor
The I/O module has a local memory of its own
A large set of I/O devices can be controlled, with minimal CPU
involvement
Parallel interface
multiple lines connecting the
peripheral
multiple bits are transferred
simultaneously
Serial interface
Only one line used to transmit
data
bits must be transmitted one at a
time
Data processing
Data storage (main memory)
Data movement (I/O)
Program flow control
3 addresses
Operand 1, Operand 2, Result
a = b + c;
ADD a, b, c
May be a forth - next instruction (usually implicit)
Not common
2 addresses
One address doubles as operand and result
a=a+b
ADD a, b
Reduces length of instruction
Requires some extra work
Temporary storage to hold some results
1 address
Implicit second address
ADD a
Usually a register (accumulator)
Common on early machines
0 (zero) addresses
All addresses implicit
Uses a stack
e.g. c = a + b
push a
push b
Add
pop c
More addresses
More complex (powerful?) instructions
More registers
Inter-register operations are quicker
Fewer instructions per program
Fewer addresses
Less complex (powerful?) instructions
More instructions per program
Operation range
How many ops?
What can they do?
How complex are they?
Data types
The various types of data upon which operations are
performed
Instruction formats
Instruction length in bits
Length of op code field
Number of addresses
Size of various fields, etc.
Registers
Number of CPU registers available
Which operations can be performed on which registers?
Addressing modes
RISC v CISC
Addresses
Numbers
Integer/floating point
Characters
ASCII etc.
Logical Data
Bits or flags
8 bit Byte
16 bit word
32 bit double word
64 bit quad word
128 bit double quadword
Addressing is by 8 bit unit
Words do not need to align at even-numbered
address
Data accessed across 32 bit bus in units of double
word read at addresses divisible by 4
Little endian
Chapter # 7 Computer Organization & Architecture 18
x86 Numeric Data Formats
Data Transfer
Arithmetic
Logical
Conversion
I/O
System Control
Transfer of Control
Specify
Source
Destination
Amount of data
Negate (-a)
Bitwise operations
AND
OR
NOT
Test
Compare
Start I/O
Privileged instructions
CPU needs to be in specific state
Ring 0 on 80386+
Kernel mode
Branch
e.g. branch to x if result is zero
Skip
e.g. increment and skip if zero
ISZ Register1
Branch xxxx
ADD A
Subroutine call
c.f. interrupt call
Immediate
Direct
Indirect
Register
Register Indirect
Displacement (Indexed)
Stack
Instruction
Opcode Operand
Advantages:
Single memory reference to access data
No additional calculations to work out effective address
Disadvantages:
Limited address space
Instruction
Opcode Address A
Memory
Operand
Advantage:
Large address space available
2n where n = word length
Disadvantage:
Instruction execution requires two memory references to
fetch the operand
One to get its address and a second to get its value
Hence slower
Memory
Pointer to operand
Operand
Instruction
Opcode Register Address R
Registers
Operand
EA = (R)
Operand is in memory cell pointed to by contents of
register R
Large address space (2n)
One fewer memory access than indirect addressing
Instruction
Memory
Registers
Instruction
Opcode Register R Address A
Memory
Registers
A holds displacement
R holds pointer to base address
R may be explicit or implicit
e.g. segment registers in 80x86
Exploits the locality of memory references
Convenient means of implementing segmentation
In some implementations a single segment base register is
employed and is used implicitly
In others the programmer may choose a register to hold
the base address of a segment and the instruction must
reference it explicitly
Postindexing
Indexing is performed after the indirection
EA = (A) + (R)
Preindexing
Indexing is performed before the indirection
EA = (A + (R))
Bus structure
CPU complexity
CPU speed
Address modification
Base, displacement
Base, index, displacement
Instruction prefixes
Instruction prefix
The instruction prefix, if present, consists of the LOCK prefix or one
of the repeat prefixes
The LOCK prefix is used to ensure exclusive use of shared memory in
multiprocessor environments
The repeat prefixes specify repeated operation of a string, which
enables the x86 to process strings much faster than with a regular
software loop
There are five different repeat prefixes: REP, REPE, REPZ, REPNE, and
REPNZ
Segment override
Explicitly specifies which segment register an instruction should use,
overriding the default segment-register selection generated by the
x86 for that instruction
Operand size
An instruction has a default operand size of 16 or 32 bits, and the
operand prefix switches between 32-bit and 16-bit operands
Address size
The processor can address memory using either 16- or 32-bit
addresses
The address size determines the displacement size in instructions
and the size of address offsets generated during effective address
calculation
Opcode
The opcode field is 1, 2, or 3 bytes in length
The opcode may also include bits that specify if data is
byte- or full-size (16 or 32 bits depending on context),
direction of data operation (to or from memory), and
whether an immediate data field must be sign extended
ModR/M
This byte provide addressing information
The ModR/M byte specifies whether an operand is in a
register or in memory
If it is in memory, then fields within the byte specify the
addressing mode to be used
Chapter # 8 Computer Organization & Architecture 43
x86 Instruction Format
SIB
Certain encoding of the ModR/M byte specifies the inclusion of
the SIB byte to specify fully the addressing mode
The SIB byte consists of three fields
Scale field (2 bits) specifies the scale factor for scaled indexing
Index field (3 bits) specifies the index register
Base field (3 bits) specifies the base register
Displacement
When the addressing-mode specifier indicates that a
displacement is used, an 8-, 16-, or 32-bit signed integer
displacement field is added
Immediate
Provides the value of an 8-, 16-, or 32-bit operand
Chapter # 8 Computer Organization & Architecture 44
ARM Instruction Formats
CPU must
Fetch instruction
The processor reads an instruction from memory (register, cache, main
memory)
Interpret instruction
The instruction is decoded to determine what action is required
Fetch data
The execution of an instruction may require reading data from memory
or an I/O module
Process data
The execution of an instruction may require performing some
arithmetic or logical operation on data
Write data
The results of an execution may require writing data to memory or an
I/O module
User-visible registers
Enable the machine or assembly language programmer to
minimize main memory references by optimizing use of
registers
Control and status registers
Used by the control unit to control the operation of the
processor and by privileged, operating system programs to
control the execution of programs
General Purpose
Data
Address
Condition Codes
Between 8 - 32
Fewer = more memory references
More does not reduce memory references and takes
up processor real estate
Page table
R8, R9, R10, R11, R12, R13, R14, R15 are the new
registers and have no other names
R0D – R15D are the lowermost 32 bits of each
register
For example, R0D is EAX
R0W – R15W are the lowermost 16 bits of each
register
For example, R0W is AX
R0L – R15L are the lowermost 8 bits of each register
for example, R0L is AL
Meanwhile PC incremented by 1
IR is examined
If indirect addressing, indirect cycle is performed
Right most N bits of MBR transferred to MAR
Control unit requests memory read
Register transfers
ALU operations
6 PM 7 8 9 10 11 12 1 2 AM
30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
T Time
a A
s
k
B
O
r C
d
e
r
D
6 PM 7 8 9 10 11 12 1 2 AM
30 30 30 30 30 30 30 Time
T
a A
s
k
B
O
r C
d
e
r
D
Fetch instruction
Decode instruction
Calculate operands address
Fetch operands
Execute instructions
Write result
Control
ADD instruction does not update EAX until end of stage 5, at clock cycle 5
SUB instruction needs value at beginning of its stage 2, at clock cycle 4
Pipeline must stall for two clocks cycles
Without special hardware and specific avoidance algorithms, results in
inefficient pipeline usage
Chapter # 9 Computer Organization & Architecture 57
Data Hazard Diagram
Multiple Streams
Prefetch Branch Target
Loop buffer
Branch prediction
Delayed branching
Predict by Opcode
Some instructions are more likely to result in a jump than
others
Can get up to 75% success
Correlation-based
In more complex structures, branch direction correlates
with that of related branches
Use recent branch history as well
Delayed Branch
Do not take jump until you have to
Rearrange instructions
CPU
Timing
Input
Receive data from peripheral
Send data to computer
Carries data
Remember that there is no difference between “data” and
“instruction” at this level
The number of lines determines how many bits can
be transferred at a time
May consist of 8, 16, 32, 64, 128, or more separate lines
Width is a key determinant of performance
Clock signals
Timing Read-modify-write
Synchronous Read-after-write
Asynchronous Block
Dedicated
Separate data & address lines
Multiplexed
Shared lines
Address valid or data valid control line
Advantage
fewer lines
Disadvantages
More complex control
Centralised
Single hardware device controlling bus access
Bus Controller
Arbiter
May be part of CPU or separate
Distributed
Each module may claim the bus
Control logic on all modules
In conventional bus
Over the period of time electrical constraints encountered
with increasing the frequency of wide synchronous buses
At higher and higher data rates it becomes increasingly
difficult to perform the synchronization and arbitration
functions in a timely fashion
Shared bus on the same chip magnified the difficulties of
increasing bus data rate and reducing bus latency to keep
up with the processors
All this became reason for a change in bus