# The University of Texas at Dallas

Erik Jonsson School of Engineering and
Computer Science

The CPU Control Unit
• We now have a fairly good picture of the logic circuits
in the CPU.
• Having “designed” the ALU or datapath so that it can
perform the necessary instructions, we now have to do
the same thing for the control unit, which decodes
instructions and provides direction to the CPU.
• The MIPS control unit decodes the six bits on either
end of the 32-bit instruction word, that is, the op code
and function code* fields, to determine each instruction
sequence.
* On R-R instructions.
1

Lecture # 19: Control Unit Design and Multicycle Implementation

The University of Texas at Dallas

Erik Jonsson School of Engineering and
Computer Science

The Central Processor Unit (CPU)
ALU
Registers

Control Unit
Instruction Fetch/Decode

• In lecture 19, we covered processing elements (blue),
including the registers, ALU, and data buses.
• We now address the control unit circuitry (red).
– Instruction decoding.
– Control signals to ALU and other control elements.
2

Lecture # 19: Control Unit Design and Multicycle Implementation

Erik Jonsson School of Engineering and
Computer Science

The University of Texas at Dallas

Functionality of Control Unit
00 0000
Op. code
Bit 31

0 1000

0 1001

0 1010

\$t0

\$t1

\$t2

0 0000

10 0000

Shift amt. Fn. code
Bit 0

• The control unit determines ALU functions in each
instruction and selects operands for the ALU.
• The operation code (the left six bits of the instruction)
determines the type of operation and in some cases (such
as jump instructions) the actual instruction itself.
• In the case of register-register instructions, the function
code determines the instruction (for example, in the R/R
instruction above, the function code 0x 20 means “add”).
3

Lecture # 19: Control Unit Design and Multicycle Implementation

decoder 0 1000 • As mentioned before. • The decoded instruction fields tell (1) the ALU what function to perform. decoder 00 0000 Functionality of Control Unit (2) To operation code decoder Lecture # 19: Control Unit Design and Multicycle Implementation © N. reg. Dodge 9/15 . (2) what operands to use. B. To source 1 reg. the control unit is a collection of decoders and multiplexers.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas 4 10 0000 0 0000 To shift amount decoder 01010 Bit 31 To function code decoder To dest. decoder 0 1001 Bit 0 To source 2 reg.

The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Current Architecture • The ALU control uses instruction bits 0-5 to obtain information about the ALU operation in register-toregister instructions. • The ALU control also has input control lines from the operation code decoder which decodes bits 26-31. slides 8 and 9) that identifies source and destination registers in load/store and register/register operations. B. and which will be shown later. 5 Lecture # 19: Control Unit Design and Multicycle Implementation © N. which has the decoding mechanism (as discussed in lecture 8. Dodge 9/15 . • Note in the following diagram that some of the decoding is done in the register block.

B. Hennessy. 2nd Edition © N. 0-31 Instruction bits 0-31 Instruction Address Instruction Memory 6 M 32 U X ADD Left shift 2 P C 32 32 5 5 5 32 Rs Rt M 5 U Rd X Write Data 32 Read Data 2 32 Reg.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas ALU Design with ALU Control Block Shown 32 32 +4 ADD Inst. Block 16 (Bits 0-15) Lecture # 19: Control Read Data 1 Sign 32 Extend ALU M 32 U X ALU 6 (Bits 0-5) Control 32 Data Address Write Data Read 32 Data 32 M 32 U X Data Memory ALU Control Block (function code/ op code decoder) After David A. Patterson and John L. Dodge 9/15 Unit Design and Multicycle Implementation . Computer Organization and Design.

Dodge 9/15 . B.The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science ALU Control Block ALU operation bits from op code control unit 3-bit ALU control bus Function code (instruction bits 0-5) ALU Control Block Detail 7 Lecture # 19: Control Unit Design and Multicycle Implementation © N.

Patterson and John L. 2nd Edition © N. Write M 32 U X 32 ADD Left shift 2 32 Rs Read Data 1 Rt M 5 U Rd X Write Data Read Data 2 Write ALU 32 Reg. Dest. Computer Organization and Design. Block 16 (Bits 0-15) Lecture # 19: Control 32 Sign 32 Extend M 32 U X 32 Data Address Write Data Read Mem. 32 32 ADD +4 Instruction Address Instruction bits 0-31 P C 6 (Bits 26-31) Inst. To Reg. Write ALU Srce. Reg. ALU Op. B./Reg. Mem.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Single-Cycle ALU Design with Full Control Blocks Reg. 0-31 Instruction Memory 8 5 5 5 Control Branch Mem. Read Mem. Dodge 9/15 Unit Design and Multicycle Implementation . Hennessy. Select Read 32 Data 32 M 32 U X Data Memory ALU 6 (Bits 0-5) Control After David A.

Dodge 9/15 . B.The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Op Code Control Block Signal Identifier “RegDst” – Register destination control (MUX input) “Branch” – Activates branch address change function “MemRead” – Signals read cycle to data memory circuits Op code decode/ control block “MemtoReg” – Selects ALU or memory write to register “ALUOp” (2 lines) – Used with function code in ALU control “MemWrite” – Signals write cycle to data memory circuits “ALUSrc” – Selects register and immediate ALU operand “RegWrite” – Activates write function to register block 9 Lecture # 19: Control Unit Design and Multicycle Implementation © N.

bits 0-15) 2nd ALU operand is from \$rt (instruction bits 16-20) RegWrite Memory/ALU data → write reg. Dodge 9/15 . = \$rt (bits 16-20) Branch ALU branch compare activated No branch activated MemRead Memory data → write register No data read from memory MemtoReg Memory data → write register ALU results → write register ALUOp NA. lines go to ALU control block NA. = \$rd (bits 11–15) Write reg.The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Function of Op Code Control Signals Signal Name 10 When Signal = 1 When Signal = 0 RegDst Write reg. lines go to ALU control block MemWrite ALU or register data → memory No data written to memory ALUSrc 2nd ALU operand is immediate (sign-extended instr. No input to register block Lecture # 19: Control Unit Design and Multicycle Implementation © N. B.

B. Dodge 9/15 Control Unit Design and Multicycle Implementation . Computer Organization and Design.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Op Code Control Block Circuitry Instruction bits 26-31 “0” “23” “2a” “04” TO ALU control block 11 Lecture # 19: After David A. Hennessy. 2nd Edition © N. Patterson and John L.

The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Register Block To shift amount decoder (ALU) Registers To \$rt source register decoder \$rt decode To \$rs source register decoder \$rs decode TO Operation Code Control Block 01010 To \$rd destination register decoder \$rd decode 0 1001 12 TO ALU Control Block 0 1000 Bit 31 ALU Control Block 00 0000 0 0000 Bit 0 10 0000 Instruction Disposition Showing Destination Units Lecture # 19: Control Unit Design and Multicycle Implementation Op Code Control Block © N. B. Dodge 9/15 .

are utilized extensively.The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Data/control Signal Flow Examples • The following diagrams illustrate the flow of control signals and data in some example MIPS instructions in the single cycle implementation. but this simpler example has all the features of the more complex final design in terms of data routing and the way in which the control signals determine the specific operation for each given instruction. 13 Lecture # 19: Control Unit Design and Multicycle Implementation © N. B. and pay special attention to the way some of the combinational devices we studied. such as the decoder and multiplexer. • Note the data flow in these instructions. Dodge 9/15 . • The “single cycle” implementation is just a stepping stone to the final MIPS design.

Reg. Dest. Read Mem. Dodge 9/15 Unit Design and Multicycle Implementation . Hennessy. 2nd Edition © N. Patterson and John L. 0-31 Instruction Memory 14 Instruction bits 0-31 6 (Bits 26-31) 5 5 5 Control Branch Mem. Write ALU Srce. Select Read 32 Data 32 M 32 U X Data Memory ALU Control 6 (Bits 0-5) After David A. ALU Op. B.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Start of R-Type Instruction Reg. Block 16 (Bits 0-15) Lecture # 19: Control 32 Sign 32 Extend M 32 U X 32 Data Address Write Data Read Mem. Write M 32 U X 32 ADD Left shift 2 32 Rs Read Data 1 Rt M 5 U Rd X Write Data Read Data 2 Write ALU 32 Reg. 32 32 ADD +4 Instruction is fetched P C Instruction Address Inst. To Reg. Computer Organization and Design. Mem./Reg.

Block 16 (Bits 0-15) Write Sign 32 Extend M 32 U X 32 Data Address Write Data Read Mem. 2nd Edition © N. Dest.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Next Step of R-Type Instruction Reg. 32 32 ADD +4 Instruction Address Instruction bits 0-31 P C 6 (Bits 26-31) Inst./Reg. 0-31 Instruction Memory 5 5 5 Control Branch Mem. Mem. Reg. Select Read 32 Data 32 M 32 U X Data Memory ALU Registers are identified and Control 6 (Bits 0-5) needed operands are routed to the ALU and the PC update circuit. Write ALU Srce. ALU Op. After David A. Computer Organization and Design. Hennessy. B. Write M 32 U X 32 ADD Left shift 2 32 Rs Read Data 1 Rt M 5 U Rd X Write Data Read Data 2 32 ALU 32 Reg. Read Mem. To Reg. Dodge 9/15 Lecture # 19: Control Unit Design and Multicycle Implementation 15 . Patterson and John L.

Sign 32 Extend M 32 U X 32 Data Address Write Data Read Mem. 16 Branch Mem. 0-31 Instruction Memory Control M 32 U X 32 ADD Left shift 2 32 Rs Read Data 1 Rt M 5 U Rd X Write Data Read Data 2 32 16 (Bits 0-15) Lecture # 19: Control Write ALU 32 Reg.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Third Step of R-Type Instruction Reg. Block ALU operation is selected and performed. Hennessy. Computer Organization and Design. 2nd Edition © N. To Reg. Patterson and John L. Write ALU Srce./Reg. Select Read 32 Data 32 M 32 U X Data Memory ALU Control 6 (Bits 0-5) After David A. Dodge 9/15 Unit Design and Multicycle Implementation . Reg. 32 32 ADD +4 Instruction Address Instruction bits 0-31 P C 6 (Bits 26-31) 5 5 5 Inst. Dest. Mem. Write The ALU output is routed to write select MUX. ALU Op. Read Mem. B.

Dest. 32 32 ADD +4 Instruction Address Instruction bits 0-31 P C 6 (Bits 26-31) 5 5 5 Inst./Reg. Mem. Write Sign 32 Extend M 32 U X 32 Data Address Write Data Read Mem. Computer Organization and Design. Hennessy. To Reg. Write ALU Srce. Reg. Patterson and John L. 0-31 Instruction Memory Control M 32 U X 32 ADD Left shift 2 32 Rs Read Data 1 Rt M 5 U Rd X Write Data Read Data 2 Lecture # 19: Control 32 Write ALU 32 Reg.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Completion of R-Type Instruction Reg. B. 17 Branch Mem. Block 16 (Bits 0-15) With the write register line active. Dodge 9/15 Unit Design and Multicycle Implementation . 2nd Edition © N. data is written back to the destination register. ALU Op. Read Mem. Select Read 32 Data 32 M 32 U X Data Memory ALU 6 (Bits 0-5) Control After David A.

Read Mem. Write ALU Srce. Block 16 (Bits 0-15) Sign 32 Extend 6 (Bits 0-5) 21 M 32 U X 32 M U X ALU ALU Control 32 Data Address Write Data Read Mem. Write 32 ADD Left shift 2 32 Rs Read Data 1 Rt M 5 U Rd X Write Data Read Data 2 32 Write 32 M 32 U X Reg. Dest./Reg. Mem. Hennessy. ALU Op. B. 2nd Edition Lecture # 19: Control Unit Design and Multicycle Implementation © N. Patterson and John L. 0-31 Instruction Memory 5 5 5 Branch Mem. Dodge 9/15 . Reg. Select Read 32 Data 32 M 32 U X Data Memory After David A. +4 ADD 6 (Bits 26-31) Control Instruction bits 0-31 PC+4 data PC jump data PC update Jump control P C Instruction Address Inst. To Reg.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Jump Instruction Flow Instruction Bits 0-25 Left shift 2 PC+4 (Bits 28-31) 32 32 Jump Reg. Computer Organization and Design.

Circle the device that allows a 16-bit number to be successfully added to a 32-bit number. On a copy of that diagram: 1. 22 Lecture # 19: Control Unit Design and Multicycle Implementation © N. Dodge 9/15 .The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Exercise 1 • On the next slide is a diagram of the complete “single-cycle” preliminary MIPS architecture. Highlight the line that controls the content of the data written back to the destination register. 3. B. Circle every element that contains a decoder. 2.

Select Read 32 Data 32 M 32 U X Data Memory After David A. Lecture # 19: Control Unit Design and Multicycle Implementation . +4 ADD Instruction Address Instruction bits 0-31 6 (Bits 26-31) P C M U X 32 Inst. Block 16 (Bits 0-15) Sign 32 Extend 6 (Bits 0-5) ALU ALU Control 32 Data Address Write Data Read Mem. Patterson and John L. 2nd Edition Print out a copy of this diagram and bring to class.Instruction Bits 0-25 Left shift 2 PC+4 (Bits 28-31) 32 Jump Reg. To Reg. Hennessy. Read Mem. Write M 32 U X 32 32 ADD Left shift 2 32 Rs Read Data 1 Rt M 5 U Rd X Write Data Read Data 2 32 Write 32 M 32 U X Reg. Computer Organization and Design. Write ALU Srce. 0-31 Instruction Memory 5 5 5 Control Branch Mem. Mem. Dest./Reg. ALU Op. Reg.

it has a serious drawback: – Our processor is designed so that all instructions complete in one clock cycle. ALL instructions take as long as the longest instruction. this means that we are slowing execution of our CPU a large part of the time to accommodate instructions that occur substantially less frequently. – Thus. • Since many (most!) instructions in the MIPS architecture take less time to execute than the longest instructions (which are usually the lw memory reference instructions).The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Making the ALU More Efficient • We have now completed “design” of the basic MIPS R-2000 CPU. it also means that one clock period must be long enough to accommodate the longest and most complicated instruction. B. – While this assures that there is sufficient time to complete any instruction. • Although a good basic design. 25 Lecture # 19: Control Unit Design and Multicycle Implementation © N. Dodge 9/15 .

Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Comparative Instruction Timing One clock cycle (= one instruction cycle) PC update time Instructions accessed and inputs to register and op decoders stable Op and function codes decoded and stable Register outputs (operands) stable ALU processing complete Memory accessed and data storage or read complete Data written to write (destination) register if necessary Jumps and branches completed about here Register-register instructions done about here 26 Lecture # 19: Control Unit Design and Multicycle Implementation © N. B. Dodge 9/15 .

The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Multicycle Implementation • A solution to the single-cycle problem is stated as follows: – Each instruction has several phases. – As most instructions execute faster than the longest instructions (such as lw). register-register or store instructions). so these instructions execute much faster. 27 Lecture # 19: Control Unit Design and Multicycle Implementation © N. branch [the fewest phases]. jump. ALU processing. etc. and have a single clock cycle for each of the elements or phases of the instruction process. B. such as fetch/decode. – Instead of using a single clock cycle for the whole instruction. the average instruction time will be reduced substantially. run the clock much faster. – Many instructions take fewer phases (for example. Dodge 9/15 . register selection.

but do not slow down the other instructions to their speed. Do one instruction segment per clock cycle. B. • Register-register instructions do not require memory access. branches and jumps take 3 processing segments since they are simpler. Run the clock much faster (essentially 5X faster!). PC updates. • Only load memory-access instructions take a full five processing segments (store takes only four). 28 Lecture # 19: Control Unit Design and Multicycle Implementation © N.The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science MIPS Multicycle Concept • • • • Split the processing into five processing segments. Dodge 9/15 . they run much faster. They take four instruction segments and finish in about 30% more time than jumps and branches.

Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Multicycle Implementation P C Instruction Fetch Instruction Decode/ Register Fetch Instruction Memory Register Block ALU Execution ALU Memory Access (if required) Data Memory Data Writeback Register Block Skipped for jump and branch instructions Op/Fn 1 clock cycle 29 Skipped for reg.-reg. 1 clock cycle © N. Dodge 9/15 . inst. 1 clock 1 clock 1 clock cycle cycle cycle All five segments (five clock cycles) required only for load instructions Lecture # 19: Control Unit Design and Multicycle Implementation Skipped for store inst. B.

reg. B. --- --- Memory Data Writeback --- Data written to destination register --- --- Lecture # 19: Control Unit Design and Multicycle Implementation © N. reg. [0-25] (shifted 2 places) + [PC 28-31] Memory Access or ALU Data Writeback Logical/shift/math operation result written to dest.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Instruction Cycle Times Instruction Step 30 RType Memory Reference Branches Jumps Instruction Fetch Fetch instruction at address [PC] [PC]→[PC+4] Instruction Decode /Register Fetch Register operands fetched. [PC]→address output of ALU PC address is instr. Reg. Instruction decoded and control units activate appropriate ALU components Instruction Execution ALU output = result of operation on register operand(s) ALU output is mem. data stored or data accessed for load to dest. Dodge 9/15 . address for load/store If condition met.

since it can do the PC update functions prior to the ALU processing. we save 20-40% in clock cycles and our processor is much faster. as mentioned earlier. we can use less circuitry because parts of the computer can be reused in different cycles. Dodge 9/15 .The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Multicycle Advantages • For most instructions. – The CPU now needs only one ALU. – Since we access memory for data and instructions in different clock cycles. we only need one path to memory. • Since different parts of the circuit are active only for one cycle at a time. 31 Lecture # 19: Control Unit Design and Multicycle Implementation © N. B.

B. Dodge 9/15 . not all of it. • That means that at the end of each clock cycle. • Each time the clock ticks. we have partial instruction results. but no place to store them! • A first concern is therefore a way to store intermediate data as the instruction winds its way through the various segments of processing. 32 Lecture # 19: Control Unit Design and Multicycle Implementation © N.The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Major Impediment to Multicycle Implementation • Our multicycle processor takes up to 5 clock cycles to complete an instruction. part of the instruction is completed.

2nd Edition 34 Lecture # 19: Control Unit Design and Multicycle Implementation © N. Block Instr. 11-15 Rd Read Data 1 A Read Data 2 B ALU ALU Out Write Data Reg. Computer Organization and Design.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Memory Out Write Data Memory Memory Data Register Instr. 21-25 Instr. Dodge 9/15 . 16-20 Instruction 0-31 Memory Address Instruction Register First-Pass Register Placement Rs Rt Inst. Patterson and John L. B. Hennessy. 0-15 After David A.

Block Sign Extend Instr. 16-20 Instruction 0-31 Memory Address Instruction Register Preliminary Multicycle Design Without Control Ins. 0-15 ALU Left shift 2 M U X ALU Out ALU now serves jump/branch PC update function Instr. B.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Memory Out Write Data Memory Note memory data path simplified Memory Data Register Instr. Hennessy. 0-5 After David A. Patterson and John L. 21-25 Instr. Computer Organization and Design. 35 Lecture # 19: Control Unit Design and Multicycle Implementation © N. Dodge 9/15 . 2nd Edition • Our preliminary multicycle processing design is more compact. 1115 M U X Rs Rt Rd Read Data 1 A Read Data 2 B 4 Write Data M U X Reg.

2nd Edition 36 Lecture # 19: Control Unit Design and Multicycle Implementation © N. Hennessy. 0-5 After David A. Patterson and John L. 16-20 Instruction 0-31 P C Instruction Register PC 0-31 Ins. B.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Multicycle Design with ALU Control and PC M U X Memory Address Memory Out Write Data Memory Memory Data Register Instr. 0-15 Extend ALU Out ALU Control Left shift 2 Instr. Computer Organization and Design. 21-25 Instr. Block Sign Instr. Dodge 9/15 . 1115 M U X M U X Rs Rt Rd Read Data 1 A Read Data 2 B 4 Write Data M U X ALU M U X Reg.

PC Write ALU Op.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Completed Multicycle Design PC Source PC Write Cond. 2nd Edition © N. Read ALU Srce. 16-20 Ins. Dest. Write Inst. B. To. 0-15 Extend ALU 4 Write Data M U X Left shift 2 M U X M U X ALU Out ALU Control Left shift 2 Instr. 0-25 Reg. Inst. 21-25 Instr. Reg. Dodge 9/15 Design and Multicycle Implementation . Hennessy. Patterson and John L. Instr. Reg. 0-5 37 Lecture # 19: Control Unit After David A. Control Reg. Write Mem. A Instr. 26-31 P C PC 28-31 ALU Srce. B Mem. Computer Organization and Design. Block Sign Instr. Write Memory Out Write Data Memory PC 0-31 Instruction 0-31 Memory Address Instruction Register M U X Instr. / Data Mem. 1115 Memory Data Register M U X M U X Rs Rt Rd Read Data 1 A Read Data 2 B Reg.

• In doing so.Erik Jonsson School of Engineering and Computer Science The University of Texas at Dallas Multicycle Summary Cont. P C A Reg. instruction execution time was decreased ~ 30-40% and greater efficiency was obtained by reducing the circuitry. • We have redesigned the MIPS CPU to accommodate a 5-segment instruction partition with each segment taking one clock cycle. Dodge 9/15 . • In the next lecture. Memory ALU B ALU Op. B. 38 Lecture # 19: Control Unit Design and Multicycle Implementation © N. we will take the final step in the MIPS design and complete the R2000 architecture.

B. Dodge 9/15 . 2. Why is the 16-bit sign-extender input directly to the ALU in one case and left-shifted two places in the other? 3. do the following: 1.The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Exercise 2 • On the Complete Multicycle Design Diagram on the next page. Why is the output of the ALU sent directly to the MUX that inputs an instruction address into the PC? 39 Lecture # 19: Control Unit Design and Multicycle Implementation © N. Summarize the inputs into the lower MUX to the ALU.

16-20 Ins. 26-31 P C PC 28-31 ALU Srce. PC Source PC Write Cond. To. Write Inst.Print out a copy of this diagram and bring to class. Write Memory Out Write Data Memory Memory Data Register PC 0-31 Instruction 0-31 Memory Address Instruction Register M U X Instr. Reg. Read ALU Srce. Reg. Dest. PC Write ALU Op. Control Reg. 0-15 Extend ALU 4 Write Data M U X Left shift 2 M U X ALU Control Left shift 2 Instr. 0-5 Lecture # 19: Control Unit Design and Multicycle Implementation M U X ALU Out . 0-25 Reg. Block Sign Instr. 21-25 Instr. Inst. Write Mem. / Data Mem. A Instr. 1115 M U X M U X Rs Rt Rd Read Data 1 A Read Data 2 B Reg. Instr. B Mem.