Ca Mid1 2017

The University of Texas at Dallas
Department of Electrical and Computer Engineering
Midterm Test
Student Name: ________________________________ Student ID: ____________________
Question Points Score

1 16
2 16
3 16
4 16
Total 64
Subject : Computer Architecture Subject Code : EE(CE) 6304

Session : Semester 2, 2016/2017 Date : 24 Feb 2017
Time Allowed : 1 hour Time : 4:00-5:00pm
Lecturer : B. Carrion Schaefer
This question paper has 9 pages including this cover page and attachments
Instruction to Candidates: This question paper contains FOUR (4) questions

Answer ALL questions
You may write on the back of each page if needed.
You may find the information in the attachments helpful in
answering your questions.
Be concise. You will be penalized for verbosity
Write legibly.
DO NOT TURN OVER THIS PAGE UNTIL YOU ARE TOLD TO DO SO

WRITE YOUR NAME AND STUDENT NUMBER ON EACH PAGE
This study source was downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
https://www.coursehero.com/file/25689555/EE6304-exam-2017-midterm-spdf/

Question 1
(a) Enumerate and describe the 3 main steps that every microprocessor executes.
(2 marks)
Solution:
Fetch à decoderà Execute.
Fetches an instruction from memory, decodes the instruction fetched and executes the instruction
(b) A program spends 75% of its time doing multiply instructions. If the multiplier is sped up by
3x, how much faster does the application run?
Solution:
The program is 2x faster. (75% of the time is 3x faster, so that 75% now takes up 1/3 of that
time, or only 25% of the original execution time. We have effectively eliminated 50% of the
total execution time so the program is 2x faster.)
Tnew = Told x(0.25+0.75/3) = 0.5 Told à Speedup Told/Tnew= 2
(c) How many memory accesses does the following code require? Explain them. (INC=increment
instruction)
(2 marks)
INC (r1)
Solution:
This instruction involves two memory accesses: The first gets the data stored in data segment and
the second store the data back in the same address in the data segment.
(d) What do VLIW, superscalar, and array processing concepts have in common?
(2 marks)
Solution:
All three execute multiple operations per cycle.
(e) A microprocessor manufacturer decides to advertise its newest chip based only on the
metric IPC (Instructions per cycle). Is this a good metric? Why or why not? (Use less than
20 words)
(2 marks)
2
downloaded by 100000851289071 from CourseHero.com on 10-13-2022 20:16:55 GMT -05:00
This study source was

Solution:
No, because the metric does not take into account frequency or number of executed
instructions, both of which affect execution time.
(f) If you were the chief architect for another company and were asked to design a chip to
compete based solely on this metric, what important design decision would you make (in less
than 20 words)?
(2 marks)
Solution:
Make the cycle time as long as possible and process many instructions per cycle.
(g) Assuming that the stack starts out empty. Write a stack-based program that computes
((10x8)+(4-7))2
(4 marks)
Solution:
Because the processor does not provide an instruction to compute the square of a value, you need
tom compute (10x8)+(4-7) twice.
PUSH 10
PUSH 8
MUL
PUSH 4
PUSH 7
SUB
ADD ( at this point the stack contains the first results)
PUSH 10
PUSH 8
MUL
PUSH 4
PUSH 7
SUB
ADD (at this point, the stack contains the second result)
MUL
3

Question 2.
(a) Two computers’ performance need to be benchmarked. For this purpose a set of different
benchmark programs are used. Table I shows the characteristics of the benchmarks in terms of
number of instructions. Compute the SPEC rating and the speed-up factor of the fastest machine
over the other. Machine A runs at 1.0 GHz and has an average Cycle Per Instruction (CPI) of 2.5.
Machine 2 runs at 1.3 GHz and its CPI is 3
Table I. Benchmark characteristics in number of Instructions

Bench 1 Bench2 Bench 3
Instructions 10,000,000 15,000,000 35,000,000
(10 marks)
Solution:
Execution time : (# Instructions * CPI )/ frequency

Machine A:
Bench 1 Exec time = (10*2.5)/1000 = 0.025 (s)
Bench 2 Exec time = (15*2.5)/1000 = 0.0375(s)
Bench 3 Exce time = (35*2.5)/1000 =0.0875 (s)
Machine B:
Bench 1 Exec time = (10*3)/1300 = 0.0231 (s)
Bench 2 Exec time = (15*3)/1300 = 0.0346(s)
Bench 3 Exce time = (35*3)/1300 =0.0808 (s)
SPECA = (0.025*0.0375*0.0875)1/3 = 0.0435

SPECB = (0.0231*0.0346*0.0808)1/3 = 0.0401
Machine B is faster by Speed up = 0.0435/0.0401 = 1.085 è8.5% faster
4
(b) Given a processor with the following Instructions, encoded using 16-bits.
Instruction Opcode 16-bit encoding Function
MOV r1, d 0000 Opcode Destination register Address R1 ß d
(4 bits) ( 4 bits) (8 bits)
MOV d, r1 0001 Opcode Source register Address d ß R1
(4 bits) (4 bits) (8 bits)
ADD r1,r2,r3 0010 Opcode Destination register Source Source R1 ßr2+r3
(4 bits) (4 bits) register register (4
(4 bits) bits)
MOV r1,#c 0011 Opcode Destination register Constant R1 ßc
(4 bits) (4 bits) (8 bits)
SUB r1,r2,r3 0100 Opcode Destination register Source Source R1 ßr2-r3
(4 bits) (4 bits) register register (4
(4 bits) bits)
JMP r1,X 1010 Opcode source register Offset (8 bits) if(r1==0) PC ß
(4 bits) (4 bits) PC+offset
Assume that you want to augment this ISA to support 20 additional and unique instructions (e.g.
MUL, AND, OR, etc..), while still keeping the instruction encoding as 16 bits. How will the
execution and encoding of the ADD instruction be affected? (Other instructions could be affected
too, but you just need to comment on how the ADD instruction will be impacted.)
(6 marks)
Solution:
The number of bits dedicated to the opcode will need to increase from 4 to 5. As such, there will
be one less bit to encode the number of the destination register or the number of one of the source
registers. Given this change:
• An extra bit will be needed to specify the opcode
• The result of the ADD must be written to registers 0 through 7 OR
• One of the source registers will only be allowed to be register 0 through 7.
5

Question 3
(a) Assume that to spell check a large file, 820,000,000 instructions are needed. The instructions
in the program are broken down into 4 different classes, and each class requires N clock cycles
to execute. Specific information is given in the table below.
Instruction Class Clock cycles per instruction Number of Instructions
Branch 3 150,000,000
Store 4 185,000,000
Load 5 260,000,000
ALU 4 225,000,000
If the total execution time for this program is found to be 1.57 seconds, what is the clock cycle
time of the computer on which it was run? Show your calculations.
(6 marks)
Solution:
Applying the CPU time formula:
CPU time = time/program = instru/program X cycles/instr X time/cycle = 1.57s
150,000,000 185,000,000 260,000,000
820,000,000𝑥 3 +4 +5
820,000,000 820,000,000 820,000,000
225,000,000
+4 𝑥𝑁 = 1.57𝑠
820,000,000
Time/program = Istr/Prof x CPI x Time/Cycle =

Thus, the clock cycle is 4.63x10-10 s = 2.16GHz
(b) Assume that as part of the 820,000,000 instruction spell check, 25% of all load instructions are
immediately followed by an ALU type instructions that uses the data that was just loaded. To
speed this program up, you are thinking about adding a new type of instruction. An ALU
instruction where one of the source operations is a value from memory. In particular:
• This new instruction will replace the previous 2 instruction sequence
• It will take 7 clock cycles
Will this change offer any speedup over the original design? If so, how much?
You may assume that the clock rate does not change and your answer to this question does not
depend on your answer to question 3(a)
(10 marks)
Solution:
6

We need to apply the CPU time formula again, but first need to calculate the new number of load,
ALI and “new type” instructions:
The # of branches remain constant 150,000,000
The # of stores remain constant 185,000,0000
The new # of loads is= (260,000,000 x 0.75) 195,000,000
The new # of ALU is = (225,000,000-65,000,000) 160,000,000
The number of new instructions is = 260,000,000x0.25 65,000,000
Total 755,000,000
150,000,000 185,000,000 195,000,000 160,000,000

755,000,000𝑥 3 +4 +5 +4
755,000,000 755,000,000 755,000,000 755,000,000
65,000,000
+7 𝑥𝑁 = 3,260,000,000 𝑁
755,000,000
Compared to similar expression in question 3.1 = 3,390,000,000 N

Thus, the speedup with the new design is 1-3,260,000,000/3,390,000,000 ≈ 4%
7

Question 4
(a) Given an unpipelined processor with a 10ns cycle time, and pipeline latches with 0.5ns latency,
what are the cycle time of pipelined versions of the processor with 2, 4, 8 and 16 stages if the
datapath logic is evenly distributed among the pipeline stages? Also, what is the latency of
each of the pipeline versions of the processor?
(10 marks)
Solution:
𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒A@=<=>?<@>B
𝐶𝑦𝑐𝑙𝑒 𝑇𝑖𝑚𝑒;<=>?<@> + 𝑃𝑖𝑝𝑒𝑙𝑖𝑛𝑒 𝐿𝑎𝑡𝑐ℎ 𝐿𝑎𝑡𝑒𝑛𝑐𝑦
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑃𝑖𝑝𝑒𝑙𝑖𝑛𝑒 𝑆𝑡𝑎𝑔𝑒𝑠
Applying this formula gives cycle times of 5.5, 3, 1.75, 1.125ns, showing the diminishing returns
of pipelining as the pipeline latch latency becomes a significant part of the overall cycle time.
To compute the latency of each processor, simply multiply the cycle time by the number of
pipeline stages, giving latencies of 11, 12, 14, and 18ns.
(b) How long would the given code sequence and the rename sequence take to issue on an out-of-
order superscalar processor with 4 execution units, each of which can execute any operation ?
Assume all instructions have latencies of 1 cycle, use the greedy scheduling assumption, and
assume that the processor’s instruction window is large enough to cover the entire code
sequence.
(10 marks)
LD, r1. (r2)

ADD r3, r4, r1
SUB r4, r5, r6
MUL r7, r4, r8
ASH r8, r9, r10
SUB r11, r8, r12
DIV r12, r13, r14
ST (r15), r12)
Solution:
Without register renaming, the sequence takes 5 cycles to issue, because instructions with a WAR
dependency can issue in the same cycle, but not out of order:
8
Cycle 1: LD r1, (r2)

Cycle 2: ADD r3, r4, r1 SUB r4, r5, r6
Cycle 3: MUL r7, r4, r8 ASH r8, r9, r10
Cycle 4: SUB r11, r8, r12 DIV r12, r13, r14
Cycle 5: ST (r15), r12
With register renaming, the sequence can be issued in 2 cycles, because we can issue instructions
that originally had WAR dependencies out of order:
Cycle 1: LD hw1, (hw2) SUB hw16, hw5, hw 6 ASH hw17, hw9, hw10 DIV hw18, hw13, hw14
Cycle 2: ADD hw3, hw4, hw1 MUL hw7, hw16, hw8 SUB hw11, hw17, hw12 ST (hw15), hw18
9
Powered by TCPDF (www.tcpdf.org)

Ca Mid1 2017

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ca Mid1 2017

Uploaded by

Copyright:

Available Formats

The University of Texas at Dallas

Department of Electrical and Computer Engineering

Student Name: ____________ Student ID:

Question Points Score

Subject : Computer Architecture Subject Code : EE(CE) 6304

Instruction to Candidates: This question paper contains FOUR (4) questions

DO NOT TURN OVER THIS PAGE UNTIL YOU ARE TOLD TO DO SO

Table I. Benchmark characteristics in number of Instructions

Execution time : (# Instructions * CPI )/ frequency

SPECA = (0.0250.03750.0875)1/3 = 0.0435

Time/program = Istr/Prof x CPI x Time/Cycle =

150,000,000 185,000,000 195,000,000 160,000,000

Compared to similar expression in question 3.1 = 3,390,000,000 N

LD, r1. (r2)

Cycle 1: LD r1, (r2)

You might also like