You are on page 1of 4

University of Southern California

Department of Electrical Engineering
EE557 Fall 2K16
Instructor: Michel Dubois
Section: 30630R and 30628D
Project #3, Due: 5 PM., Tuesday, November 29th

I. Extending Project 2
Project 3 builds on your experience gained in Project 2 configuring architectural simulators. In this
project your goal is to redesign the baseline processor by changing several micro-architectural blocks,
such as branch predictors, Register update units etc., to improve the performance of the baseline
processor. In this project you will iteratively look for an optimal design choice for all the
micro-architectural blocks by exploring the design space using simulations. Again, this task can be
accomplished without any need to modify the code and instead by simply (and intelligently) changing
the simulation parameters in the configuration file as you have already done in Project 2.
Unless otherwise stated, every detail in Project 3 stays the same as in Project 2. In particular, the
simulator and the benchmark locations, baseline configuration, and all other project environments are
identical to Project 2.

II. Project Description
In this project you are given a MAXIMUM transistor and area budget. Your goal is to change any
combination of the following micro-architectural blocks below to achieve the best performance for four
benchmark programs. We will measure the performance as:
!!! # of committed instrucitons!"#$%&'() !
! (in MIPS)
!!!(# of cycles×clock cycle period)!"#$%&'() !

The four benchmark programs are bitcnt, equake, bzip2, and art as below in the project environments.
For instance, if there are 1 million instructions committed per each benchmark; the simulation cycles of
the four benchmarks are 1, 2, 3, 4 million cycles; and the clock cycle time is 1 ns, then performance is
computed as follows:
(1 + 1 + 1 + 1)million instructions 4million instructions
= = 400MIPS
1 + 2 + 3 + 4 million cycles×1ns 10×10!! seconds

The transistor count including every component and the area budget are given below. Your design is
NOT allowed to exceed either of them. This budget will be measured by the Real Estate Estimator tool.
Designs over the budget will get 0 point.

Any number under 32 is NOT allowed 4 The address space is assumed to be 42 bits and the number of bits per tag (Nr. you can increase or decrease the sizes of the components.3 (must be equal or larger than 32-entry) Load/Store Queue Size Number of Integer ALUs and Multiplier/Divider Units Number of Floating-point ALUs and Multiplier/Divider Units Number of Memory Ports Caches (Size. which is affected appropriately. In this step you will look at the result files generated from the SimpleScalar simulation tool and decide which one of the allowed micro-architectural blocks you want to change. 1. change the cache associativities. Block Size) 2. Basic Project Steps Here are new steps for doing this project: First. For instance. Please keep in mind that as you increase or decrease some of the sizes. each time you change one of those. 3 The RUU size must equal or larger than 32-entry. We will use CACTI. change the cache replacement policies. So. Associativity. Obviously accessing a 16KB L1 cache should be much faster than accessing a 1MB L1 cache! So you should adjust the latency of any structure. Of Bits per Tag in CACTI) should be calculated based on the cache size and structure. the number of read and write ports will be affected. Again use the CACTI tool to come up with latency estimates.4 1 The perfect branch predictor is not allowed. SimpleScalar and Real Estate Estimator that we already used in Project 2. 2 Remember when you change your RUU or cache structures. Replacement Algorithm. Transistor count: 200 million Area: 25 mm2 You are allowed to change only the following micro-architectural blocks. you need to check the estimator tool for any change in number of ports. your CPU clock period and the access time of your memory structures will be affected. repeat the steps 1-6 of Project 2. Dynamic Branch Predictor1 Branch Target Buffer Size of Return Address Stack Machine Width (issue/decode/commit per cycle) Instruction Fetch Queue Size Register Update Unit Size2. and then use CACTI to compute access time and latencies. Keep in mind that you cannot exceed the area and the transistor count limits specified above when you increase the .

in equake < equake. Project Environment Project environment is the same as that of Project 2 except that you need to copy the additional benchmarks and inputs from the class directory. your excel sheet of the Real estimator tool (FirstnameLastname_Proj3. b) Name: <your name>. Compare it with all prior configurations. Section 2. Section (/ee557d/spec2k) For all benchmarks. structure sizes. based on what results you observed and how that observation affect your next step of design iteration - at least a half page. 3. there is no need to change the code. Also. III. Once you change one or more micro-architectural parameters you redo steps 1 through 6 of Project #1.conf). So be clever about which structure to change and by how much. 3. Please copy the following benchmarks with all other necessary files into your directory in addition to bzip2 and art that were used in Project 2: Executables Input Files Commands bitcnt -- bitcnts 1125000 (/ee557d/mibench) equake equake. hits etc. we will limit all our simulations to only 50 million instructions. Front page a) Title: EE557 Fall 2016 Project #3 Report.xls). set the following parameters in your configuration files or include them in your command line parameters. 2 pts. To do this. Finally. c) your email address. Since the SimpleScalar result file contains various block access counts. Support your arguments with charts and compelling arguments. Project Submission You must submit your final configuration file (FirstnameLastname_Proj3. Intermediate Results . as necessary. and an electronic copy of your project report (FirstnameLastname_Proj3. you will generate a report that shows how you iterated through the design space and why you made those design choices. d) affiliation (optional) – 1 pt. make sure that the clock cycle latency is appropriately adjusted to reflect the new structure sizes. Design Process Description and discussion of your design process – your iteration process: for example. cache misses. what design progress and iteration you made to approach your final design. 2.pdf) that includes the followings by the due date on the Den class-page: 1. We will fast-forward through first 300 million instructions. IV. Iterate the steps till you think you have the world’s best processor. Look at the new MIPS rating of the processor with your enhanced processor configuration. 2.

2 pts. . d) transistor count from Real Estate Estimator. f) cache miss rates – in a table. Please keep all of your shell scripts and simplescalar config files as they might be required to be submitted or asked to run by the TA. 1 pt.) From the report pdf. a) Intermediate average MIPS rates in a graph. d) cache miss rates for all caches in a table – for 3 intermediate iterations. Like other assignments. 0 point will be given for a mismatch between a reported MIPS and a MIPS from running a config file. e) cache latencies. b) cycle time.) This part is evaluated by ranking the overall MIPS of all students. this project must be done INDIVIDUALLY! Similar designs will be securitized. 0 point will be given to designs over the transistor count and the area budget. c) area from Real Estate Estimator. Final Design a) MIPS rate. b) RUU access times estimated from Cacti with converted cycle times in a table. Grading Your final design will be evaluated based on the following criteria: 1. V. c) transistor count and area estimates from Real Estate Estimator in two graphs. 4. Performance (4 pts. Section 3. 2. Report (6 pts.