You are on page 1of 56

Computer Architecture and

Organization

Lecture 4
CPU Structure and Function

1
Lecture Objective :-
After this lecture you will understand:
Processor organization
Register Roles
Register design issues
Instruction cycle with indirect addressing
Processor organization

 CPU must:
Fetch instructions
Interpret instructions
Fetch data
Process data
Write data

3
Processor Organization
 To do these things, it needs small internal memory -
Registers
 To perform computation :– ALU
 To control movement of data & operation of ALU :–
Control Unit
 For data transfer and control logic :– Internal Bus

4
Processor Organization

 The CPU with the system bus

5
Processor Organization
 Internal structure of the CPU

6
Register Organization
 CPU must have some working space (temporary
storage)called registers
 Number and function vary between processor designs
 One of the major design decisions
 Top level of memory hierarchy
 CPU registers perform two roles:
User visible registers: enable programmer to
minimize main memory references by optimizing
their uses
Control and status registers: used by control unit to
control operation of the processor

7
Register Organization

 User visible registers: Referenced by machine


language that the processor executes.
 Categories:
General purpose
Data
Address
Condition codes

8
Register Organization
 General Purpose Registers:
Can be assigned to variety of functions by the
programmer
May be true general purpose
Sometimes any general-purpose register can contain the
operand for any opcode.
May be restricted
May be used for data or addressing
In some cases, general-purpose registers can be used for
addressing functions(e.g., register indirect,
displacement).
In other cases, there is a partial or clean separation
between data registers and address registers. 9
Register Organization

 Data Registers:-only to hold data and cannot be


employed in the calculation of an operand address.
E.g. Accumulator
 Addressing:-They may be devoted to a particular
addressing mode
 Examples:
Segment registers :– to hold base address of a segment in
segmented memory
Index registers: – for indexed addressing
Stack pointer :– points to top of stack
 This allows implicit addressing; that is, push, pop,
and other stack instructions

10
Register Organization

 Condition codes:
Hold condition codes (also referred to as flags)
Bits set by processor hardware as result of operations
E.g. result of last operation was zero
Can be read (implicitly) by programs
e.g. Jump if zero
Can not (usually) be set by programs

11
Register Organization

Design issues:
General purpose or specialized use:
Make them general purpose
Increase flexibility and programmer options
Increase instruction size & complexity
Make them specialized
Smaller (faster) instructions
Less flexibility

12
Register Organization

No. of registers:
Between 8 – 32 is optimum
Fewer = more memory references
More does not reduce memory references and
takes up processor real estate
Register length:
Large enough to hold full address
Large enough to hold full word
Often possible to combine two data registers

13
Register Organization

 Control and Status Registers:


To control operation of the CPU
Most are not visible to user
4 registers essential to instruction execution:
Program Counter (PC): Contains the address of an
instruction to be fetched
Instruction Register (IR): Contains the instruction most
recently fetched
Memory address register (MAR): Contains the address of
a location in memory
Memory buffer register (MBR): Contains a word of data
to be written to memory or the word most recently read
14
Register Organization

 Program Status Word (PSW):


Contains condition codes + other status
 Common fields or flags:
Sign of last result: Contains the sign bit of the result of the
last arithmetic operation.
 Zero: Set when the result is 0.
 Carry: Set if an operation resulted in a carry
Equal: Set if a logical compare result is equality.
Overflow: Used to indicate arithmetic overflow.
Interrupt Enable/Disable: Used to enable or disable interrupts.
Supervisor: Indicates whether the processor is executing in
supervisor or user mode.
15
Example Microprocessor Register Organization
 Two 16-bit microprocessors

16
Instruction Cycle
 To recall, an instruction cycle includes the
following stages:
 Fetch: Read the next instruction from memory into
the processor.
 Execute: Interpret the opcode and perform the
indicated operation.
 Interrupt: If interrupts are enabled and an interrupt
has occurred, save the current process state and
service the interrupt.

17
18
The Indirect Cycle
 The execution of an instruction may involve one or more
operands in memory
 Each of which requires a memory access.
 Further, if indirect addressing is used, then additional
memory accesses are required.

19
Data Flow (Instruction Fetch)
 The exact sequence of events during an instruction cycle
depends on the design of the processor.
 In general:
 Fetch
PC contains address of next instruction
 Address moved to MAR
 Address placed on address bus
 Control unit requests memory read
 Result placed on data bus, copied to MBR, then to IR
 Meanwhile PC incremented by 1

20
Data Flow (Instruction Fetch)

21
Data Flow (Indirect Fetch)

 IR is examined
 If indirect addressing, indirect cycle is performed
Right most N bits of MBR transferred to MAR
Control unit requests memory read
 Result (address of operand) moved to MBR

22
Data Flow (Indirect Fetch)

23
Data Flow (Execute)
 May take many forms
 Depends on instruction being executed
 May include
 Memory read/write
 Input/output
 Register transfers
 ALU operations

24
Data Flow (Interrupt)

 Simple and Predictable


 Current PC saved so that the processor resume after
interrupt
 Contents of PC copied to MBR
 Special memory location (e.g. stack pointer) loaded to
MAR
 MBR written to memory
 PC loaded with address of interrupt handling routine
 Next instruction (first of interrupt handler) can be
fetched

25
Data Flow (Interrupt Diagram)

26
Questions?
Next:
Pipelining
27
Computer Architecture and
Organization

Lecture 5
Instruction Pipelining

28
Lecture Objective :-

After this lecture you will understand:


Pipelining
Pipeline Performance
Pipeline Hazards
Pipelining Strategy

 In pipelining, new inputs are accepted at one end before


previously accepted inputs appear as outputs at the other
end.
 Instruction processing has two stages:
1. Fetch instruction
2. Execute instruction
 Two stage pipelining: the next instruction is fetched in
parallel with the execution of the current one (Prefetch).
 In general, pipelining requires registers to store data
between stages.

30
Instruction Pipelining
 Consider subdividing instruction processing in
two stages: fetch and execute
When main memory not accessed, fetch next
instruction in parallel with execution of current
While the second stage is executing, first stage
fetches and buffer next instruction
This is called instruction Prefetch or fetch overlap
This will speed up instruction execution

31
Instruction Pipelining
 Two stage instruction pipeline

32
Instruction Pipelining
 Improved Performance
 But not doubled: why?
 To gain further the execution time will generally be longer
than the fetch time (the fetch stage wait)
 A conditional branch instruction makes the address of the
next instruction to be fetched unknown. Thus, the fetch
stage must wait until it receives the next instruction address
from the execute stage.
 speed up, the pipeline must have more stages.
 The greater the number of stages in a pipeline, the faster the
execution rate.
33
Instruction Pipelining
 Decomposition of instruction processing:
 Fetch instruction(FI): Read the next expected instruction
into a buffer
 Decode instruction(DI): determine the opcode and the
operand specifiers.
 Calculate operands(CO): calculate the effective address
of each source operand
 Fetch operands(FO): fetch each operand from memory
 Execute instructions(EI): perform the indicated operation
and store the result
 Write operand(WO): store the result in memory.

34
Six Stage Instruction Pipeline

35
Six Stage Instruction Pipeline

 Six-stage pipeline can reduce the execution time for 9 instructions from
54 time units to 14 time units.
 The diagram assumes that :
Each instruction goes through all six stages of the pipeline.
All of the stages can be performed in parallel.
There are no memory conflicts.
For example, the FI, FO, and WO stages involve a memory
access.

36
Instruction Pipelining

 Several factors limit performance enhancement:


Simultaneous memory access problem
If the six stages are not of equal duration
Conditional branch instruction :– invalidates several
instruction fetches
Interrupts

37
Instruction Pipelining
 The effect of conditional branch

38
Logic to
account for
branches and
interrupts:

39
Alternative Pipeline Depiction

40
Pipeline Performance
 Some of the IBM S/360 designers pointed out two factors
that frustrate this seemingly simple pattern for high
performance design.
1. At each stage of the pipeline, there is some overhead
involved in moving data from buffer to buffer
2. The amount of control logic required to handle memory
and register dependencies and to optimize the use of the
pipeline increases enormously with the number of stages.

41
Pipeline Performance

 Measures of pipeline performance and relative speedup


 Cycle time τ of an instruction :– time needed to
advance a set of instructions one stage thru the pipeline
(each column)

  max[ i ]  d   m  d 1 i  k
i

42
Pipeline Performance

 Where:
τi = time delay of circuitry in the ith stage
τm = maximum stage delay (delay thru stage which
experiences the largest delay)
k = no of stages in the instruction pipeline
d = time delay of latch
 In general, time delay d is equivalent to a clock pulse
and τm >> d

43
Pipeline Performance
 Now, suppose n instructions are processed, with NO
branches:
Total time required for a pipeline with k stages:

Tk ,n  [k  (n 1)]
 First instruction: – requires k cycles to complete
 Remaining (n – 1) instructions: – require n – 1 cycles
 E.g. The ninth instruction completes at time cycle 14:
14 = [6 + (9 – 1)]

44
Pipeline Performance

 Now, consider a processor with equivalent


functions but no pipeline:
Assume the instruction cycle time is kτ
For n instructions,
cycle time with no pipeline = nkτ
Speedup factor Sk:
Ratio of cycle time with no pipeline, to cycle time with
pipeline;
given as:
T1,n nk nk
Sk   
Tk ,n [k  (n  1)] k  (n  1)
45
Example

Consider a pipeline having 4 phases with duration 60, 50, 90 and 80 ns.
Given latch delay is 10 ns. Calculate-
Pipeline cycle time
Non-pipeline execution time
Speed up ratio
Solution
Four stage pipeline is used
Delay of stages = 60, 50, 90 and 80 ns
Latch delay or delay due to each register = 10 ns
Part-01: Pipeline Cycle Time-
Cycle time
= Maximum delay due to any stage + Delay due to its register
= Max { 60, 50, 90, 80 } + 10 ns
= 90 ns + 10 ns
= 100 ns 46
Example

Part-02: Non-Pipeline Execution Time-


Non-pipeline execution time for one instruction
= 60 ns + 50 ns + 90 ns + 80 ns
= 280 ns
Part-03: Speed Up Ratio-
Speed up
= Non-pipeline execution time / Pipeline execution time
= 280 ns / Cycle time
= 280 ns / 100 ns
= 2.8 47
Exercise:
1. Draw a space-time diagram for a five-stage
pipeline showing the time it takes to process
six tasks. Also determine the clock cycle with
which the last instruction completes.
2. For the above pipelined processor, determine
the speedup factor for 200 instructions.

48
Pipeline Hazards

 A pipeline hazard occurs when the pipeline, or some


portion of the pipeline, must stall because conditions do not
permit continues execution.
 Such a pipeline stall is also referred to as a pipeline bubble.
 There are three types of hazards: resource, data and
control
 1. Resource Hazards: occurs when two or more
instructions that are already in the pipeline need the same
resource.
 The result is that the instructions must be executed in serial
rather than parallel for a portion of the pipeline
49
Pipeline Hazards
Example: Assume a simplified five stage pipeline, in
which
each stage takes one clock cycle.
 Assume that main memory has a single port and that
all instruction fetches and data reads and writes must be
performed one at a time.
 Another, example of resource conflict is a situation in
which multiple instructions are ready to enter the
execute instruction phase and there is a single ALU.
 One solution to such resource hazards is to increase
available resources, such as having multiple ports into
main memory and multiple ALU unit. 50
Pipeline Hazards

51
Pipeline Hazards
 2. Data Hazard: occurs when there is a conflict in the
access of an operand location.
 Two instructions in a program are to be executed in
sequence and both access a particular memory or register
operand.
Example: ADD A, B ; A=A+B
SUB C, A ; C=C-A
 To maintain correct operation, the pipeline must stall for
two clock cycles.

52
Pipeline Hazards

ADD A, B
SUB C, A

I3
I4
 The ADD instruction does not update register A until the end of stage
5, which occurs at clock cycle 5.
 But the SUB instruction needs that value at the beginning of its stage
3, which occurs at clock cycle 4.
 To maintain correct operation, the pipeline must stall for two clocks
cycles.
53
Pipeline Hazards

 There are three types of data hazards:


1. Read after write (RAW), or true dependency: a hazard
occurs if the read takes place before the write operation is
complete.
2. Write after read (WAR), or anti-dependency: a hazard
occurs if the write operation completes before the read
operation takes place.
3. Write after write (WAW), or output dependency: two
instructions both writes to the same location.
 A hazard occurs if the write operation takes place in the
reverse order of the intended sequence.
54
Pipeline Hazards

 3. Control Hazard: also known as a branch hazard, occurs when


the pipeline makes the wrong decision on a branch prediction and
therefore brings instructions into the pipeline that must subsequently
be discarded.
 Dealing with Branches
 Until the instruction is actually executed, it is impossible to
determine wheth.er a branch will be taken or not
 A variety of approaches have been taken for dealing with
conditional branches:
Multiple Streams
Prefetch Branch Target
Loop buffer Reading Assignment
Branch prediction
Delayed branching 55
Questions?
Next: RISC Vs
CISC
56

You might also like