You are on page 1of 9

The University of Texas at Dallas

Department of Electrical and Computer Engineering

Midterm Test

Student Name: ________________________________ Student ID: ____________________

Question Points Score


1 16
2 16
3 16
4 16
Total 64

Subject : Computer Architecture Subject Code : EE(CE) 6304


Session : Semester 2, 2016/2017 Date : 24 Feb 2017
Time Allowed : 1 hour Time : 4:00-5:00pm
Lecturer : B. Carrion Schaefer

This question paper has 9 pages including this cover page and attachments

Instruction to Candidates: This question paper contains FOUR (4) questions


Answer ALL questions
You may write on the back of each page if needed.
You may find the information in the attachments helpful in
answering your questions.
Be concise. You will be penalized for verbosity
Write legibly.

DO NOT TURN OVER THIS PAGE UNTIL YOU ARE TOLD TO DO SO


WRITE YOUR NAME AND STUDENT NUMBER ON EACH PAGE

This study source was downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________


Question 1
(a) Enumerate and describe the 3 main steps that every microprocessor executes.
(2 marks)
Solution:
Fetch à decoderà Execute.
Fetches an instruction from memory, decodes the instruction fetched and executes the instruction

(b) A program spends 75% of its time doing multiply instructions. If the multiplier is sped up by
3x, how much faster does the application run?

Solution:
The program is 2x faster. (75% of the time is 3x faster, so that 75% now takes up 1/3 of that
time, or only 25% of the original execution time. We have effectively eliminated 50% of the
total execution time so the program is 2x faster.)
Tnew = Told x(0.25+0.75/3) = 0.5 Told à Speedup Told/Tnew= 2

(c) How many memory accesses does the following code require? Explain them. (INC=increment
instruction)
(2 marks)
INC (r1)

Solution:
This instruction involves two memory accesses: The first gets the data stored in data segment and
the second store the data back in the same address in the data segment.

(d) What do VLIW, superscalar, and array processing concepts have in common?
(2 marks)
Solution:
All three execute multiple operations per cycle.

(e) A microprocessor manufacturer decides to advertise its newest chip based only on the
metric IPC (Instructions per cycle). Is this a good metric? Why or why not? (Use less than
20 words)
(2 marks)

2
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________


Solution:
No, because the metric does not take into account frequency or number of executed
instructions, both of which affect execution time.

(f) If you were the chief architect for another company and were asked to design a chip to
compete based solely on this metric, what important design decision would you make (in less
than 20 words)?
(2 marks)
Solution:
Make the cycle time as long as possible and process many instructions per cycle.

(g) Assuming that the stack starts out empty. Write a stack-based program that computes
((10x8)+(4-7))2
(4 marks)

Solution:
Because the processor does not provide an instruction to compute the square of a value, you need
tom compute (10x8)+(4-7) twice.

PUSH 10
PUSH 8
MUL
PUSH 4
PUSH 7
SUB
ADD ( at this point the stack contains the first results)
PUSH 10
PUSH 8
MUL
PUSH 4
PUSH 7
SUB
ADD (at this point, the stack contains the second result)
MUL

3
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________


Question 2.
(a) Two computers’ performance need to be benchmarked. For this purpose a set of different
benchmark programs are used. Table I shows the characteristics of the benchmarks in terms of
number of instructions. Compute the SPEC rating and the speed-up factor of the fastest machine
over the other. Machine A runs at 1.0 GHz and has an average Cycle Per Instruction (CPI) of 2.5.
Machine 2 runs at 1.3 GHz and its CPI is 3

Table I. Benchmark characteristics in number of Instructions


Bench 1 Bench2 Bench 3
Instructions 10,000,000 15,000,000 35,000,000
(10 marks)
Solution:

Execution time : (# Instructions * CPI )/ frequency


Machine A:
Bench 1 Exec time = (10*2.5)/1000 = 0.025 (s)
Bench 2 Exec time = (15*2.5)/1000 = 0.0375(s)
Bench 3 Exce time = (35*2.5)/1000 =0.0875 (s)

Machine B:
Bench 1 Exec time = (10*3)/1300 = 0.0231 (s)
Bench 2 Exec time = (15*3)/1300 = 0.0346(s)
Bench 3 Exce time = (35*3)/1300 =0.0808 (s)

SPECA = (0.025*0.0375*0.0875)1/3 = 0.0435


SPECB = (0.0231*0.0346*0.0808)1/3 = 0.0401
Machine B is faster by Speed up = 0.0435/0.0401 = 1.085 è8.5% faster

4
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________

(b) Given a processor with the following Instructions, encoded using 16-bits.
Instruction Opcode 16-bit encoding Function
MOV r1, d 0000 Opcode Destination register Address R1 ß d
(4 bits) ( 4 bits) (8 bits)
MOV d, r1 0001 Opcode Source register Address d ß R1
(4 bits) (4 bits) (8 bits)
ADD r1,r2,r3 0010 Opcode Destination register Source Source R1 ßr2+r3
(4 bits) (4 bits) register register (4
(4 bits) bits)
MOV r1,#c 0011 Opcode Destination register Constant R1 ßc
(4 bits) (4 bits) (8 bits)
SUB r1,r2,r3 0100 Opcode Destination register Source Source R1 ßr2-r3
(4 bits) (4 bits) register register (4
(4 bits) bits)
JMP r1,X 1010 Opcode source register Offset (8 bits) if(r1==0) PC ß
(4 bits) (4 bits) PC+offset

Assume that you want to augment this ISA to support 20 additional and unique instructions (e.g.
MUL, AND, OR, etc..), while still keeping the instruction encoding as 16 bits. How will the
execution and encoding of the ADD instruction be affected? (Other instructions could be affected
too, but you just need to comment on how the ADD instruction will be impacted.)
(6 marks)
Solution:
The number of bits dedicated to the opcode will need to increase from 4 to 5. As such, there will
be one less bit to encode the number of the destination register or the number of one of the source
registers. Given this change:
• An extra bit will be needed to specify the opcode
• The result of the ADD must be written to registers 0 through 7 OR
• One of the source registers will only be allowed to be register 0 through 7.

5
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________


Question 3
(a) Assume that to spell check a large file, 820,000,000 instructions are needed. The instructions
in the program are broken down into 4 different classes, and each class requires N clock cycles
to execute. Specific information is given in the table below.
Instruction Class Clock cycles per instruction Number of Instructions
Branch 3 150,000,000
Store 4 185,000,000
Load 5 260,000,000
ALU 4 225,000,000

If the total execution time for this program is found to be 1.57 seconds, what is the clock cycle
time of the computer on which it was run? Show your calculations.
(6 marks)
Solution:
Applying the CPU time formula:
CPU time = time/program = instru/program X cycles/instr X time/cycle = 1.57s
150,000,000 185,000,000 260,000,000
820,000,000𝑥 3 +4 +5
820,000,000 820,000,000 820,000,000
225,000,000
+4 𝑥𝑁 = 1.57𝑠
820,000,000

Time/program = Istr/Prof x CPI x Time/Cycle =


Thus, the clock cycle is 4.63x10-10 s = 2.16GHz

(b) Assume that as part of the 820,000,000 instruction spell check, 25% of all load instructions are
immediately followed by an ALU type instructions that uses the data that was just loaded. To
speed this program up, you are thinking about adding a new type of instruction. An ALU
instruction where one of the source operations is a value from memory. In particular:
• This new instruction will replace the previous 2 instruction sequence
• It will take 7 clock cycles
Will this change offer any speedup over the original design? If so, how much?
You may assume that the clock rate does not change and your answer to this question does not
depend on your answer to question 3(a)
(10 marks)

Solution:

6
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________


We need to apply the CPU time formula again, but first need to calculate the new number of load,
ALI and “new type” instructions:
The # of branches remain constant 150,000,000
The # of stores remain constant 185,000,0000
The new # of loads is= (260,000,000 x 0.75) 195,000,000
The new # of ALU is = (225,000,000-65,000,000) 160,000,000
The number of new instructions is = 260,000,000x0.25 65,000,000
Total 755,000,000

150,000,000 185,000,000 195,000,000 160,000,000


755,000,000𝑥 3 +4 +5 +4
755,000,000 755,000,000 755,000,000 755,000,000
65,000,000
+7 𝑥𝑁 = 3,260,000,000 𝑁
755,000,000

Compared to similar expression in question 3.1 = 3,390,000,000 N


Thus, the speedup with the new design is 1-3,260,000,000/3,390,000,000 ≈ 4%

7
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________


Question 4
(a) Given an unpipelined processor with a 10ns cycle time, and pipeline latches with 0.5ns latency,
what are the cycle time of pipelined versions of the processor with 2, 4, 8 and 16 stages if the
datapath logic is evenly distributed among the pipeline stages? Also, what is the latency of
each of the pipeline versions of the processor?
(10 marks)

Solution:

𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒A@=<=>?<@>B
𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒;<=>?<@> + 𝑃𝑖𝑝𝑒𝑙𝑖𝑛𝑒 𝐿𝑎𝑡𝑐ℎ 𝐿𝑎𝑡𝑒𝑛𝑐𝑦
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑃𝑖𝑝𝑒𝑙𝑖𝑛𝑒 𝑆𝑡𝑎𝑔𝑒𝑠

Applying this formula gives cycle times of 5.5, 3, 1.75, 1.125ns, showing the diminishing returns
of pipelining as the pipeline latch latency becomes a significant part of the overall cycle time.

To compute the latency of each processor, simply multiply the cycle time by the number of
pipeline stages, giving latencies of 11, 12, 14, and 18ns.

(b) How long would the given code sequence and the rename sequence take to issue on an out-of-
order superscalar processor with 4 execution units, each of which can execute any operation ?
Assume all instructions have latencies of 1 cycle, use the greedy scheduling assumption, and
assume that the processor’s instruction window is large enough to cover the entire code
sequence.
(10 marks)

LD, r1. (r2)


ADD r3, r4, r1
SUB r4, r5, r6
MUL r7, r4, r8
ASH r8, r9, r10
SUB r11, r8, r12
DIV r12, r13, r14
ST (r15), r12)

Solution:
Without register renaming, the sequence takes 5 cycles to issue, because instructions with a WAR
dependency can issue in the same cycle, but not out of order:

8
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Student Name: ________________________________ Student ID: ____________________

Cycle 1: LD r1, (r2)


Cycle 2: ADD r3, r4, r1 SUB r4, r5, r6
Cycle 3: MUL r7, r4, r8 ASH r8, r9, r10
Cycle 4: SUB r11, r8, r12 DIV r12, r13, r14
Cycle 5: ST (r15), r12

With register renaming, the sequence can be issued in 2 cycles, because we can issue instructions
that originally had WAR dependencies out of order:

Cycle 1: LD hw1, (hw2) SUB hw16, hw5, hw 6 ASH hw17, hw9, hw10 DIV hw18, hw13, hw14
Cycle 2: ADD hw3, hw4, hw1 MUL hw7, hw16, hw8 SUB hw11, hw17, hw12 ST (hw15), hw18

9
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/
Powered by TCPDF (www.tcpdf.org)

You might also like