You are on page 1of 97

Slide Set 6 for Lecture Section 01

for ENCM 369 Winter 2017

Steve Norman, PhD, PEng

Electrical & Computer Engineering


Schulich School of Engineering
University of Calgary

February 2017
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 2/97

Contents
Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 3/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 4/97

Introduction to Chapter 7 of the textbook

This chapter is called “Microarchitecture”. It’s about how


computer processors actually read and execute instructions.
So it’s about hardware: digital logic circuits.
It’s not about programming.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 5/97

Chapter 7 MIPS Instruction Subset

Much of the chapter focuses on several different logic circuit


designs capable of running programs that use a small subset of
the MIPS instruction set.

The instructions in the subset are:


I 5 “R-type” instructions: ADD, SUB, AND, OR, SLT.
(They’re called R-type because all of their operands are
Registers.)
I 3 other instructions: LW, SW, BEQ.

Extensions get made to some of the designs to support


additional instructions, such as ADDI and J.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 6/97

An outline of the first part of Chapter 7


7.1: Introduction. Includes descriptions of some key
elements of processor designs: Program Counter (PC),
Instruction Memory, Register File, Data Memory.
7.2: Performance Analysis. Brief discussion of how to
measure and report computer system performance.
7.3: Single-Cycle Processor. A MIPS-subset processor that
handles one instruction per long clock cycle, finishing each
instruction before starting the next one.
7.4: Multi-Cycle Processor. A MIPS-subset processor that
takes several short clock cycles to handle each instruction,
again finishing each instruction before starting the next one.
This slide set is related to Sections 7.1–7.3.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 7/97

Take time to do assigned reading in Chapter 7


carefully!

Many students may have been successful in ENCM 369 up to


now without using the textbook very much.
Things change with Chapters 7 and 8!
It will be really hard to follow lecture, lab and tutorial material
on processor designs without studying Chapter 7 of your
textbook!
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 8/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 9/97

Components of synchronous, sequential logic


systems
The simple computer designs we’ll study are synchronous,
sequential logic systems.
Review of two important words from ENEL 353:
I Sequential: A system in which the output depends not
just on current input, but also past input.
I Synchronous: A system in which changes to the state
bits occur all at the same time, in response to an
active edge of a clock signal.

Let’s look at some important components of synchronous,


sequential processor designs.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 10/97

Review of clock signals


TC tH tL
1

rising falling rising falling rising


edge edge edge edge edge
time
We’ll use the same model for a clock signal that we used in
ENEL 353. Things to note:
I A rising edge—also called a positive edge—is a
low-to-high transition.
I A falling edge—also called a negative edge—is a
high-to-low transition.
I The clock period is TC = tH + tL ; the frequency is 1/TC .

I tH is not necessarily equal to tL .


ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 11/97

About the clock signal model


TC tH tL
1

rising falling rising falling rising


edge edge edge edge edge

A model is a simplified description of a component that helps you


understand and predict behaviour of a system that uses that
component.
The clock model we use is fine for ENCM 369 but not good
enough for integrated circuit designers! Real-world clock edges do
not arrive at exactly the same time to all elements with clock
inputs—that is called clock skew, which we covered in ENEL 353
but will ignore in ENCM 369. Also, real-world clock signals are not
perfectly periodic.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 12/97

An essential element: The D Flip-Flop


Positive-edge-triggered Negative-edge-triggered
D flip-flop: D flip-flop:
CLK CLK

D Q D Q

The state Q copies the input The state Q copies the input
D on each rising clock edge. D on each falling clock edge.
(The “bubble” symbol indicates inversion of a signal.)
All of the DFFs we saw in ENEL 353 in Fall 2016 were
positive-edge-triggered. We’ll see in Section 7.5 that sometimes
it’s useful to have state updates on negative clock edges.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 13/97

D flip-flop: Example behaviour and a useful


definition

For a few clock cycles with one positive-edge-triggered D


flip-flop and one negative-edge-triggered D flip-flop let’s see
how the Q outputs responds to changes in D inputs . . .

Definition: Active clock edge means


I rising clock edge, for positive-edge-triggered DFFs;

I falling clock edge, for negative-edge-triggered DFFs.


ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 14/97

D flip-flops: What’s the point?


This is important! If you’re not clear about this point, you
will not really understand any of the circuits in Chapter 7 of
the textbook!
A clock cycle is a span of time from one active edge of a
clock to the next active edge.

A D flip-flop captures the value of the


input bit D at the end of a clock cycle,
and makes that captured bit value
available on Q throughout the next clock
cycle.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 15/97

D flip-flops: Applications

Typical use in ENEL 353: Two or three D flip-flops are used to


hold the state of a synchronous finite state machine.
Typical use in ENCM 369: A group of D flip-flops (usually but
not always 32 of them) are used to form a register, the state
of which can only be updated on an active clock edge.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 16/97

CLK
A 32-bit register
D31 Q31 This gets updated once per clock
cycle on positive clock edges.
Each DFF receives the same CLK
D30 Q30 input.
.. .. .. The diagram on the left shows the
. . . structure but is awkward to draw,
so we’ll use this compact symbol:
D1 Q1 CLK

D31:0 Q31:0
D0 Q0 32 32
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 17/97

Should a given register do a state update at the


end of every single clock cycle?

Looking ahead a little . . . The answer is definitely yes for the


PC register as used in Section 7.3, but no for the PC in
Section 7.4, and not quite always for the PC in Section 7.5.
Looking ahead some more . . . yes for “pipeline registers”
introduced in Section 7.5.
What about updates for the MIPS GPRs (general-purpose
registers)? Assume that we are designing a machine to handle
one instruction per clock cycle.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 18/97

Wires
A wire connects an output bit of some element to the input
bit(s) of one or more elements.
To keep things simple, we’ll model signalling over wires as
happening without delay. But keep in mind that in
real-world design of high-speed circuits, accounting for wire
delays can be very important.

Let’s sketch some conventions for drawing wires and groups of


wires.
Note: To reduce clutter, the textbook often uses a solid
thick line ( ) for a multi-wire bus, instead of the usual
bus notation ( ).
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 19/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 20/97

The PC, Register File, and Memory Units


PC (program counter): A special-purpose register used to
hold an instruction address.
Register File: Contains the 32 MIPS GPRs (general-purpose
registers). The is not at all like a file in the sense of files in
folders in a file system!
Memory Units: So far in ENCM 369, our model for a
computer has a single memory array, holding both
instructions and data. But the simplest possible design for our
MIPS subset requires split memory:
I Instruction Memory, a container for machine code
instructions
I Data Memory, written to by store instructions, and
read from by load instructions
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 21/97

The PC
CLK

PC0 PC
32 32

This is a simple 32-bit register. PC, the current value, is


comprised of the Q outputs of 32 DFFs. PC0 , the next value,
is a 32-bit signal applied to the D inputs of those DFFs.
Not shown:
I a reset input, to force the PC into a known state on
system power-up
I an enable input, for systems in which PC updates
happen on some but not all rising edges of CLK
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 22/97

Instruction Memory
32
A RD 32
Instruction
Memory

A is a 32-bit address input. RD is a 32-bit read data output.


Most real computers allow users to modify the programs that
the computer can run. That’s not allowed by this simple
ROM (read-only memory) circuit.
To change the program in our simple computer, you would
have to pull a ROM chip out of the instruction memory
socket, and replace it with a different ROM chip containing
different instructions.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 23/97

Register File: Inputs and Outputs (1)


CLK

A1 WE3 RD1
5 32
A2 RD2
5 32
Register
5
A3 File
WD3
32

Note the CLK input. This a synchronous sequential


element. State updates can happen only on rising edges of
CLK.
A1, A2, and A3 are “address” inputs. They could also be
called “register select” inputs. Why are they each 5 bits wide?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 24/97

But the prof said,


“Registers don’t have addresses!”

Let me be more precise: Registers do not have main


memory addresses.
You cannot make a C pointer point to a MIPS GPR.
You cannot use MIPS GPRs for array elements, because the
necessary address arithmetic won’t work.
However, a 5-bit address of a MIPS GPR really is an address,
in the sense that it is a number that selects the GPR out of
the set of 32 GPRs.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 25/97

Register File: Inputs and Outputs (2)


CLK

A1 WE3 RD1
5 32
A2 RD2
5 32
Register
5
A3 File
WD3
32

RD1 and RD2 are 32-bit outputs—read data ports.


WD3 is a 32-bit input—a write data port.
What is the relationship between the three data ports RD1,
RD2 and WD3, the address inputs A1, A2, and A3, and the
1-bit write enable signal WE3?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 26/97

Data Memory
CLK

WE
A RD
32 32
Data
Memory
WD
32

Again, note the CLK input. This, like the Register File, is a
synchronous sequential element. State updates can happen
only on rising edges of CLK.
Let’s make some notes about how this element behaves.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 27/97

A slighly fancier Data Memory


Your instructor would prefer the Data Memory element to have
two control inputs, EN (enable) and R/W (read/not-write) . . .
CLK

EN R/W EN R/W action


A RD 0 X none
32 32
SN’s Data 1 0 write to address A
Memory 1 1 read from address A
WD
32

That would have avoided waste of energy in handling instructions


that are neither loads nor stores. But we’ll follow the textbook
—the authors had the very reasonable goal of minimal clutter in
schematics.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 28/97

Abbrevs

To express ideas quickly and concisely, I will use the following


abbreviations from time to time in slides and lecture notes . . .
I I-Mem: Instruction Memory

I R-File: Register File

I D-Mem: Data Memory


ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 29/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 30/97

A model for Register File internals

Section 5.5.5 of the textbook suggests that register files are


usually built as SRAM (static RAM) arrays.
An SRAM-based design is small and efficient, but its operation
is hard to explain in the context of year 2 ENEL and ENSF
curricula. (ENEL students: See ENCM 467 for SRAM details.)
So I’ll present a design in which registers are made of enabled
D flip-flops.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 31/97

Pros and cons of the R-File model of this slide set


Pro:
I It’s built entirely from components presented thoroughly
in ENEL 353: decoders, muxes, and DFFs with enable
inputs.
I It accurately suggests that decoders are essential parts of
R-File designs.
Cons:
I As mentioned before, bits in real R-Files are likely held in
SRAM cells, not DFFs. (A DFF is much larger than an
SRAM cell and consumes much more energy.)
I The model uses 32-bit 32:1 bus multiplexers. Those
muxes would work perfectly in theory, but in practice
would tend to be unreasonably large and slow.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 32/97

CLK 32-bit register with EN input


D31 Q31 Each DFF receives the same CLK
EN input. On positive clock edges,
I all the DFFs copy D to Q if

D30 Q30 registerEN = 1;


EN
I all the DFFs keep their old
.. .. .. Q values if registerEN = 0.
. . .
Compact symbol:
D1 Q1
EN CLK

D31:0 Q31:0
D0 Q0 32
EN
32
EN
registerEN registerEN
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 33/97

The Register File has 32 GPRs

CLK

D31:0 Q31:0
32 32
EN
registerEN

How many of the above 32-bit enabled registers will we need?


How many enabled D flip-flops is that?
How can we build GPR 0 ($zero)?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 34/97

Register file building block: 5-to-32 decoder

If EN (“enable”) is 0, then all 32 output


bits are 0.
A4 Y31
Y30 If EN is 1, then one of the outputs is 1,
A3
A2 as selected by the 5-bit number
.. ..
A1 . . A4 A3 A2 A1 A0 , and the other 31 outputs
A0 are 0.
Y2
This is just a larger version of the
EN Y1
decoder-with-enable circuits (2-to-4,
Y0
3-to-8, 4-to-16) we saw several times in
ENEL 353.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 35/97

5-to-32 decoder truth table


EN bits A4 –A0 bits Y31 –Y0
0 XXXXX 0000 0000 0000 0000 0000 0000 0000 0000
1 00000 0000 0000 0000 0000 0000 0000 0000 0001
1 00001 0000 0000 0000 0000 0000 0000 0000 0010
1 00010 0000 0000 0000 0000 0000 0000 0000 0100
.. .. ..
. . .
1 01111 0000 0000 0000 0000 1000 0000 0000 0000
1 10000 0000 0000 0000 0001 0000 0000 0000 0000
.. .. ..
. . .
1 11110 0100 0000 0000 0000 0000 0000 0000 0000
1 11111 1000 0000 0000 0000 0000 0000 0000 0000
XXXXX in the first row means this: If EN is 0, it doesn’t matter
what A4 –A0 are.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 36/97

Register file write logic, supported by a 5-to-32


decoder
CLK

A1 WE3 RD1
5 32
A2 RD2
5 32
Register
5
A3 File
WD3
32

How should the decoder inputs be driven?


Where will the 32 bits of output from the decoder go?
Those choices will result in the R-File write logic shown on the
next slide . . .
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 37/97

CLK
GPR3131:0
32 EN 32
Y31

GPR3031:0
32 EN 32
Y30 .. ..
. .
R-File WD3 input
32
GPR0231:0
32 EN 32
Y2

GPR0131:0
32 EN 32
Y1
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 38/97

Register file read logic (1)

With the goal of saving some time, we’re not going to go into
detail here.
One possible arrangement is to use two (large!) 32-bit
32:1 bus multiplexers. The 5-bit select inputs to the bus
muxes would be the A1 and A2 R-File inputs. The first bus
mux would use A1 to select one of 32 32-bit GPR values to
copy to the RD1 output of the R-File, and the second bus mux
would do the same thing with A2 and RD2.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 39/97

Register file read logic (2)


CLK

A1 WE3 RD1
5 32
A2 RD2
5 32
Register
5
A3 File
WD3
32

Reminder:
I The write logic is sequential—a GPR update can only
happen in response to an active clock edge.
I The read logic is combinational—when A1 or A2
change, RD1 or RD2 will change without waiting for a
clock edge.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 40/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 41/97

Textbook Section 7.2: Performance Analysis

The textbook presents this equation for the execution time of


a program:

execution time = IC × CPI × TC

IC is instruction count, the number of instructions executed


in a program run. IC is not the size of a program! A single
instruction can count many times in IC. For example, if a
10-instruction loop runs 450 times, the loop makes a
4500-instruction contribution to IC.
TC is the processor clock period.
We’ll look at CPI in more detail on other slides . . .
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 42/97

More about IC: Instruction count

For a program written in a language such as C, the factors


affecting IC are:
I ISA (instruction set architecture);

I how good the compiler is at translating pieces of


high-level language into efficient sequences of instructions;
I what the program input is.

Microarchitecture can’t do much about IC, but has a major


impact on both CPI and TC . . .
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 43/97

CPI: Clock cycles per instruction (1)


Low CPI is good for processor performance, and high CPI is
bad.
Microarchitectures—arrangements of registers, memory
systems, and arithmetic/logic circuits—have a major influence
on CPI. Textbook Chapter 7 looks at three kinds of
microarchitecture:
I single-cycle, with a CPI of 1;

I multi-cycle, with a CPI of approximately 4;

I pipelined, with a CPI just a little greater than 1.

But CPI is not a number determined exactly and entirely by


microarchitecture . . .
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 44/97

CPI: Clock cycles per instruction (2)

CPI is not determined entirely by microarchitecture.


In fact, CPI is also program- and data-dependent:
I Certain programs, with certain inputs, result in execution
of mostly “easy”, low-CPI instructions.
I Other programs, and/or other inputs, result in execution
of a higher concentration of “hard”, high-CPI instructions.
So CPI is a useful concept, but not a number you can precisely
specify for any particular processor design.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 45/97

TC: Processor clock period

Here’s a review of the performance equation:

execution time = IC × CPI × TC

We’ll see that there is a tradeoff between CPI and TC :


I design ideas that reduce CPI tend to increase TC ;

I design ideas that reduce TC tend to increase CPI.

Another unwelcome consideration is that decreasing TC


increases clock frequency (1/TC ), which increases average
power consumption of a processor.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 46/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 47/97

Single-cycle processor: Overview


Let’s start with definitions of datapath and control.
Datapath: A collection of circuit elements, connected in a way
that will generate a result for some category of instructions.
For example, the datapath for an LW instruction will include
PC, I-Mem, R-File, D-Mem and a few other important
elements.
Control: A circuit element designed to send signals to
datapath elements, to tell those elements what to do and
sometimes when to do it.
Example: A control circuit for our MIPS subset will need to
turn on the WE input of D-Mem for SW, but turn it off for all
other instructions.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 48/97

One clock cycle in the single-cycle machine

This sketch shows how every instruction will work . . .

CLK
Datapath generates result(s) of Result(s)
current instruction. ready.

Update to PC, maybe to Update to PC, maybe to


R-File or D-Mem, from R-File or D-Mem, from
previous instruction. current instruction.

The width of the “Result(s) ready” time interval will differ


between instructions, but must always be greater than some
kind of setup time for safe operation.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 49/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 50/97

Details of datapaths for the single-cycle machine

The first datapath we’ll look at is the datapath for LW. After
that we’ll move on to SW, R-type instructions, and BEQ.
Before we start on LW, we’ll need a few more datapath
elements—32-bit adders, a 16-to-32-bit sign-extend unit, and
a 32-bit ALU (arithmetic/logic unit).
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 51/97

32-bit adder circuit for Chapter 7


Here is the symbol:
A31:0
32 + Y31:0
B31:0 32
32

Things to note:
I The carry-in to the LSB is 0.

I There is no carry-out-from-MSB output.

I As explained in previous lectures, this circuit works for


both unsigned and signed addition, without any sort of
input to indicate which of signed or unsigned
computation is desired.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 52/97

16-to-32-bit sign extend circuit

input Sign Extend


output
32
16

The above symbol can be (conceptually) implemented as


shown below. (A practical circuit would probably include some
buffers so that a single input wire would not have to drive 17
output wires.)
output31
.. ..
. .
output16
input15 output15
input14 output14
.. .. ..
. . .
input0 output0
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 53/97

32-bit ALU (arithmetic/logic unit)

This combinational element has 67 input wires and


33 output wires:
ALUControl
3
A31:0 Zero
32 ALU
B31:0 Y31:0
32 32
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 54/97

ALUControl
3
A31:0 Zero
32 ALU
B31:0 Y31:0
32 32

5 of the 8 possible ALUControl input bit patterns will matter


for our MIPS-subset processor designs:
ALUControl Y notes . . .
000two A&B bitwise AND
001two A|B bitwise OR
010two A+B addition
110two A−B subtraction
111two A<B 0 for false, 1 for true
What important aspect of the set-on-less-than comparison
does the table NOT specify?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 55/97

ALUControl
3
A31:0 Zero
32 ALU
B31:0 Y31:0
32 32

The 1-bit Zero signal has a confusing name! Sometimes the


ALU will make Zero = 0 and sometimes the ALU will make
Zero = 1.
Let’s write down the rules for how the Zero signal is computed.
Looking ahead . . . For our MIPS subset, for which instruction
will the Zero signal be useful?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 56/97

ALU examples
ALUControl
3
A31:0 Zero
32 ALU
B31:0 Y31:0
32 32

For each of the examples in the table, what will the outputs
Y and Zero be?
example A B ALUControl
(1) 0x0000_0002 0x0000_0003 001
(2) 0x0000_0002 0x0000_0003 010
(3) 0xffff_ffff 0x0000_0000 111
(4) 0x0000_002a 0x0000_002a 110
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 57/97

Details of ALU design

See textbook Section 5.2.4.


ENCM 369 will not cover the details of ALU design down to
the level of logic gates.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 58/97

Back to the datapath for LW . . .

We have looked at all of the necessary datapath elements.


Before we try to organize those elements, let’s review the
machine code formats for LW and SW . . .
LW
31 26 25 21 20 16 15 0
pointer dest.
100011 offset
GPR GPR

SW
31 26 25 21 20 16 15 0
pointer source
101011 offset
GPR GPR
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 59/97

LW datapath: Instruction fetch


31:26
CLK 5:0

instruction fields
instruction
25:21
address Instr
32 32
A RD 32 20:16
Instruction 20:16
PC Memory
15:11

15:0

Instr is short for “instruction”, obviously.


How many wires are there in total for “instruction fields”,
and why is that number so much greater than 32?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 60/97

LW datapath: GPR read and address calculation


CLK

Instr25:21 A1 WE3 RD1


5 32
A2 RD2
5 32 ALUControl
3
Register
5
A3 File Zero
ALU
WD3 ALUResult
32 32

Instr15:0 Sign Extend 32


16

Which signal is the data memory address? What bit pattern


should be applied to ALUControl? Why use an ALU at all—
wouldn’t it be easier to use an adder?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 61/97

LW datapath: D-Mem read and R-File update


CLK CLK

A1 WE3 RD1 WE
5 32 from
A RD
A2 RD2 ALU 32 32
5 32 Data
Register Memory
Instr20:16 A3 File
5 WD
WD3 32
32

Note the role of instruction bits Instr20:16 !


What are the correct values for the WE input to the D-Mem
and the WE3 input to the R-File?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 62/97

LW datapath: PC update
At the same time LW is doing its job of copying a word from
Data Memory to the Register File, an update to the PC must
be generated. What does the symbol 4 mean in this
schematic?
CLK

32 32 32
to I-Mem
0
PC PC

32 +
4 32
32
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 63/97

RTFT: Read The Fine Textbook

Section 7.3.1 of the textbook explains the single-cycle


datapaths for LW, SW, R-type and BEQ instructions in clear
and careful detail, with schematics that are very difficult to
squish into legible lecture slides.
Please read this textbook material carefully! For the same
reason, please be ready to carefully read other recommended
sections of Chapter 7!

Historical note: “RTFM”, or “Read The F****** Manual”, is


advice that experienced programmers have been handing out for
decades.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 64/97

MemtoReg
Control
MemWrite
Unit
Branch
ALUControl2:0 PCSrc
31:26
Op ALUSrc
5:0
Funct RegDst
RegWrite

CLK CLK
CLK
25:21 WE3 SrcA Zero WE
0 PC' PC Instr A1 RD1 0
A RD ALUResult ReadData

ALU
1 A RD 1
Instruction 20:16
A2 RD2 0 SrcB Data
Memory
A3 1 Memory
Register WriteData
WD3 WD
File
20:16
0
15:11
1
WriteReg4:0
PCPlus4
+
SignImm
4 15:0 <<2
Sign Extend PCBranch
+

Result

Image is Figure 7.11 from Harris D. M. and Harris S. L., Digital Design
and Computer Architecture, 2nd ed., c 2013, Elsevier, Inc.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 65/97

SW datapath: Instruction fetch


31:26
CLK 5:0

instruction fields
instruction
25:21
address Instr
32 32
A RD 32 20:16
Instruction 20:16
PC Memory
15:11

15:0

This is exactly the same as instruction fetch for LW!


In fact, instruction fetch is the same for all instructions—how
an instruction gets copied out of I-Mem does not depend on
what kind of instruction it is!
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 66/97

SW datapath: GPR read and address calculation


CLK

Instr25:21 A1 WE3 RD1


5 32
Instr20:16 A2 RD2
ALUControl
5 32
3
Register
5
A3 File Zero
ALU
WD3 ALUResult
32 32

Instr15:0 Sign Extend 32 to D-Mem WD input


16

The address calculation is exactly the same as in LW.


But two registers must be read for SW: one to compute the
address, and a second to supply the data to be stored.
Signals involved in transferring that data are shown in red.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 67/97

SW datapath: Data memory and PC updates

A schematic for the data memory update can be sketched


quickly by hand, so let’s do that, and write down a few notes.

Is the PC update for SW different in any way from the PC


update for LW?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 68/97

2:1 bus multiplexers


Make sure you understand what these circuit elements do.
We’ll use them as key components in creating datapaths for
R-type and BEQ instructions.
32-bit 2:1 bus mux: 5-bit 2:1 bus mux:
S S

A31:0 0 C4:0 0
32 F31:0 5 G4:0
B31:0 1 32 D4:0 1 5
32 5
 
A31:0 if S = 0 C4:0 if S = 0
F31:0 = G4:0 =
B31:0 if S = 1 D4:0 if S = 1
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 69/97

R-type instructions

The R-type instructions in our MIPS subset are ADD, SUB,


AND, OR, and SLT.
Why are they called R-type?
The instruction format for R-type instructions is . . .
31 26 25 21 20 16 15 11 10 65 0
source source dest. “funct”
000000 00000
GPR 1 GPR 2 GPR field

The “funct” field is the part of the instruction that identifies


which of ADD, SUB, AND, OR, or SLT should be performed.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 70/97

A datapath for R-type instructions

Our goal will be to build a datapath for R-type instructions


that is compatible with the datapath already set up for LW
and SW.
To do that, we’ll need to use multiplexers to solve problems
like these . . .
I In LW and SW, one ALU input is a GPR value and the
other is a sign-extended offset. What should the ALU
inputs be for R-type instructions?
I In LW, the R-File A3 input is Instr20:16 . What should the
R-File A3 input be for R-type instructions?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 71/97

GPR read and ALU use for LW, SW, and R-type
CLK ALUSrc ALUControl
3
Instr25:21 A1 WE3 RD1 Zero
5 ALU
Instr20:16 A2 RD2 0 ALUResult
5 1
Register
5
A3 File to D-Mem WD input
WD3
32

Instr15:0 Sign Extend


16

What should ALUSrc and ALUControl be for LW? For SW?


For R-type instructions?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 72/97

R-File update for LW and R-type instructions

RegDst CLK MemtoReg


32
A1 WE3 RD1 ALUResult 0 32
5 32
A2 RD2 RD output 1
5 32 32
5 from D-Mem
Instr20:16 0 Register
Instr15:11 1 5
A3 File
5 WD3
32

What should RegDst and MemtoReg be for LW? For R-type


instructions? What about SW?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 73/97

PC update for R-type instructions

Is the PC update for R-type instructions different in any way


from the PC update for LW or SW?
We have now covered seven instructions from our
eight-instruction subset: LW, SW and five R-type instructions.
The last instruction to cover is BEQ. As you may have
suspected, handling BEQ adds complexity to the PC update
logic!
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 74/97

BEQ instruction format and behaviour


Instruction format:
31 26 25 21 20 16 15 0
source source
000100 offset
GPR 1 GPR 2

Behaviour:
if source GPRs are equal
PC0 = (PC + 4) + 4 × sign-extended offset
else
PC0 = PC + 4
We already have a datapath to compute PC + 4. We’ll need to
add features to get (PC + 4) + 4 × sign-extended offset.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 75/97

Datapath for BEQ instruction: design ideas

We can multiply by 4 by doing a shift-left-2 (<< 2) of the


Sign Extend output.
Q1: How can we use the ALU to decide whether or not a
branch should be taken? Q2: What does that say about the
values of ALUSrc and ALUControl for a BEQ instruction?
Q3: If a signal called Branch is 1 for BEQ, but 0 for LW, SW
and R-type instructions, how can we use that signal to ensure
correct PC updates for BEQ and all the other instructions?
These ideas lead to the schematic on the next slide . . .
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 76/97

Datapath for BEQ instruction: schematic


Branch
ALUControl
ALUSrc
CLK CLK
3

0 0
PC PC Instr25:21 A1 WE3 RD1 Zero
5 ALU
1 Instr20:16 A2 RD2 0
5 1
Register
5
A3 File
+ WD3
4 32

<< 2
Instr15:0 Sign Extend
+
16
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 77/97

Datapath for BEQ instruction: mux for input to PC


To understand how BEQ is handled it may help to “zoom in”
on a small but critical part of the schematic from the previous
slide:
?
CLK
? 0
32 PC0 PC
32 32
? 1
32
For each of the mux inputs, let’s write a brief but precise
description.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 78/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 79/97

Control for the single-cycle machine

A review of definitions for datapath and control . . .

Datapath: A collection of circuit elements, connected in a way


that will generate a result for some category of instructions.
For example, the datapath for an LW instruction will include
PC, I-Mem, R-File, D-Mem and a few other important
elements.
Control: A circuit element designed to send signals to
datapath elements, to tell those elements what to do and
sometimes when to do it.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 80/97

Single-cycle control: Inputs and outputs (1)

The control unit for our single-cycle processor will be


combinational logic.
The inputs to the control unit describe what kind of
instruction is being executed.
Which bits from the current instruction must be supplied as
inputs to the control unit?
The outputs of the control unit will be six 1-bit signals—
MemtoReg, MemWrite, Branch, ALUSrc, RegDst, and
RegWrite—and one 3-bit signal—ALUControl.
Let’s make some notes about all of the control unit outputs.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 81/97

Single-cycle control: Inputs and outputs (2)


MemtoReg
Control
MemWrite
Unit
Branch
ALUControl 3
Instr31:26
opcode
ALUSrc
Instr5:0
funct RegDst
RegWrite

If we implemented this as a ROM circuit, what would be the


dimensions of the ROM array?
(See textbook Section 5.5.6 to review ROMs.)
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 82/97

Single-cycle control: Split into two parts

MemtoReg
Main
MemWrite
Decoder Let’s write some rules
Branch
Instr31:26 for the 2-bit ALUOp
ALUSrc signal.
RegDst
What are the
RegWrite dimensions for each
2
part, if each of the
ALUOp
two parts is a ROM?
Instr5:0 ALU ALUControl
Decoder 3
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 83/97

SLT $t1, $t0, $zero machine code is . . .


000000 01000 00000 01001 00000 101010

MemtoReg
Main
MemWrite
Decoder
Branch For this example SLT
Instr31:26
ALUSrc instruction, what
does the Main
RegDst
Decoder do?
RegWrite
What does the ALU
2 ALUOp Decoder do?
Instr5:0 ALU ALUControl
Decoder 3
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 84/97

LW $s1, 0x1234($s0) machine code is . . .


100011 10000 10001 0001 0010 0011 0100

MemtoReg
Main
MemWrite
Decoder
Branch For this example LW
Instr31:26
ALUSrc instruction, what
does the Main
RegDst
Decoder do?
RegWrite
What does the ALU
2 ALUOp Decoder do?
Instr5:0 ALU ALUControl
Decoder 3
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 85/97

BEQ $t9, $zero, [6 instructions back] machine code is . . .


000100 11001 00000 1111 1111 1111 1010

MemtoReg
Main
MemWrite
Decoder
Branch For this example BEQ
Instr31:26
ALUSrc instruction, what
does the Main
RegDst
Decoder do?
RegWrite
What does the ALU
2 ALUOp Decoder do?
Instr5:0 ALU ALUControl
Decoder 3
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 86/97

Complete specification for Main Decoder within


the Control Unit of the Figure 7.11 computer

MemtoReg
MemWrite
RegWrite

ALUSrc

ALUOp
RegDst

Branch
Instruction

R-type 1 1 0 0 0 0 10
LW 1 0 1 0 0 1 00
SW 0 X 1 0 1 X 00
BEQ 0 X 0 1 0 X 01
Exercise: Make a blank version of this table, then fill it in by
looking at Figure 7.11 and deciding what all the signal values
should be.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 87/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
Sketch of timing for LW $s1, 0x1234($s0) slide 88/97
CLK

PC output

Instruction

main decoder outputs

R-File outputs

ALU decoder outputs

ALU result

D-Mem RD output

$s1 contents
CLK

PC output

Instruction

main decoder outputs

R-File outputs

ALU decoder outputs

ALU result

D-Mem RD output

$s1 contents
1 2 3 4 5 6 7
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 90/97

What happens as we adjust the clock period?

Which clock speeds work for LW, and which ones do not?

“fast”
clock
1 2 3 4 5 6

“slow”
clock
1 2 3 4 5 6

“medium”
clock
1 2 3 4 5 6
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 91/97

Detailed timing analysis for the single-cycle


machine

To study this material, it may be useful to review textbook


Sections 2.9 and 3.5 on timing.
We’re going to follow the notation and presentation of
textbook Section 7.3.4, and make the following assumptions:
I reading the R-File (tRFread ) takes longer than sign-extend
and a mux combined (as stated in the textbook);
I reading the R-File (tRFread ) takes longer than generating
Control Unit outputs (assumed but not actually stated).
We’ll look at the critical path for an LW instruction.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 92/97
After a rising clock edge, there is a delay of up to tpcq PC
(PC clock-to-Q propagation delay) until the PC output is
ready.
Once the PC is ready, the critical path for LW will run through
5 units: I-Mem, R-File, ALU, D-Mem, and the mux controlled
by MemtoReg.
The R-File and D-Mem act like combinational logic when
they’re read, so the overall propagation delay through the
5 units is just the sum of 5 individual delays:

tmem + tRFread + tALU + tmem + tmux

It’s assumed that I-Mem and D-Mem have the same delay,
tmem , so the overall combinational delay simplifies to

2tmem + tRFread + tALU + tmux


ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 93/97

It’s assumed that R-File updates will work correctly if its WD3
(write data) input is ready no later than tRFsetup (R-File setup
time) in advance of a rising clock edge.
So for safe operation of an LW instruction:

TC − tRFsetup ≥ tpcq PC + 2tmem + tRFread + tALU + tmux


TC ≥ tpcq PC + 2tmem + tRFread + tALU + tmux + tRFsetup

If that isn’t clear, please study Section 7.3.4 carefully.


There will be lab exercise or two to help get you comfortable
with this kind of timing analysis.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 94/97

Outline of Slide Set 6 for Lecture Section 01


Introduction to Chapter 7 of the textbook
Components of synchronous, sequential logic systems
The PC, Register File, and Memory Units
A model for Register File internals
Textbook Section 7.2: Performance Analysis
Single-cycle processor: Overview
Details of datapaths for the single-cycle machine
Control for the single-cycle machine
Single-cycle timing example: LW instruction
More instructions, and next steps
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 95/97

More instructions, and next steps

Textbook Section 7.3.3 looks at adding support for ADDI and


J instructions to the single-cycle design.
We won’t spend lecture time on that, but we’ll look at
supporting ADDI, J, and perhaps some other instructions, in
lab exercises.
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 96/97

Moving on

What determines the minimum safe clock period in the


single-cycle machine?
We’ve just seen that, roughly speaking, it’s the SUM of the
response times of several datapath elements.
Idea: Could there be a different design that does not require
such a long cascade of events in a single clock cycle?
Could that allow a much shorter clock period?
ENCM 369 Winter 2017 Slide Set 6 for Lecture Section 01 slide 97/97

Final comments on the single-cycle processor


Would it work if you built it?
Yes! The only missing detail from Section 7.3 of the textbook
is logic to properly initialize the PC when the system is
powered up. The PC must start with a specific instruction
address, not some random bit pattern in its flip-flops.
It’s quite cool to realize that between last September, at the
start of ENEL 353, and now, you’ve learned enough about
digital logic, assembly language and machine code to truly
understand a simple but real computer design!
However, the simplicity of the single-cycle design has some
disadvantages, so in the next slide set we’re going to look at a
more complex design.

You might also like