You are on page 1of 32

Computer Architecture EE6304 Project #1

PROJECT #1
Computer Architecture
EE6304
Due Date: 3/8/2012
Team Number:22

Submitted By:
Kaur Guneet
Patel Jayendra

Computer Architecture EE6304 Project #1

Index
Objective

General description for Alpha 21264 EV6 configuration

Part1:Preparation

Part2:Find CPI

Part3:Optimize CPI For Each Benchmark

Procedure followed for each Benchmark

Plots for benchmarks: GCC, Anagram, GO

Data tables for lowest CPIs for benchmarks: GCC, Anagram, GO

19

Part4:Define Cost Function

24

Part5:Optimize Cache for Performance/Cost

25

References

32

Computer Architecture EE6304 Project #1

Objective
In this project, we have been asked to fine-tune the cache hierarchy of an Alpha
microprocessor for three individual benchmarks. The cache design parameters we can modify
are:
Cache levels: One or two levels, for data and instruction caches.
Unified caches: Selection of separate vs. unified instruction/data caches. For example,
you can have separate L1 caches and a unified L2 cache.
Size: Cache size, one of the most important choices.
Associativity: Selection of cache associativity (e.g. direct mapped, 2-way set associative,
etc.).
Block size: Block size of the cache, usually 64 or 32 bytes.
Block replacement policy: Selection between FIFO, LRU and Random.
While larger caches generally mean better performance, they also come at a greater cost. Thus,
sensible design choices and trade-offs are required. In the end, of this project we will also be
defining a cost function and to use it in order to identify the optimal configuration.

Computer Architecture EE6304 Project #1

General description for Alpha 21264 EV6 configuration


Architecture
Alpha architecture is a 64-bit load and store RISC architecture designed with particular emphasis
on speed, multiple instruction issue, multiple processors, and software migration from many
operating systems.
All registers are 64 bits long and all operations are performed between 64-bit registers
All instructions are 32 bits long. Memory operations are either load or store operations
All data manipulation is done between registers.
The Alpha architecture supports the following data types:
8-, 16-, 32-, and 64-bit integers
IEEE 32-bit and 64-bit floating-point formats
VAX architecture 32-bit and 64-bit floating-point formats
Addressing
The basic addressable unit in the Alpha architecture is the 8-bit byte. The 21264 supports a 48bit or 43-bit virtual address (selectable under IPR control).
Virtual addresses as seen by the program are translated into physical memory addresses
by the memory-management mechanism. The 21264 supports a 44-bit physical address.

Alpha 21264 Microarchitecture


The 21264 microprocessor is a high-performance third-generation implementation of the
Compaq Alpha architecture. The 21264 consists of the following sections, as shown in below
figure.
Instruction fetch, issue, and retire unit (Ibox)
Integer execution unit (Ebox)
Floating-point execution unit (Fbox)
Onchip caches (Icache and Dcache)
Memory reference unit (Mbox)
External cache and system interface unit (Cbox)
Pipeline operation sequence

Computer Architecture EE6304 Project #1

Figure:Alpha 21264 Microarchitecture

Computer Architecture EE6304 Project #1

Part1:Preparation
Followed all the steps as given in Part1 of the project document for setting up simple scalar.
Benchmarks were downloaded and verified to match with the data given in the project handout

Part2:Find CP1
Given:

Cache levels: Two levels.


Unified caches: Separate L1 data and instruction cache, unified L2 cache.
Size: 64K Separate L1 data and instruction caches, 1MB unified L2 cache.
Associativity: Two-way set-associative L1 caches, Direct-mapped L2 cache.
Block size: 64 bytes.
Block replacement policy: FIFO
97L1 Miss Penalty=5 Cycles
L2 Miss Penalty=40 Cycles
Cache hit/Instruction execution=1 Cycle

Table showing data from running all the three benchmarks


Benchmarks
Number Of Instruction
il1 Accesses
il1 Misses
il1 Missrate
dl1 Accesses
dl1 Misses
dl1 Missrate
dl2 Accesses
dl2 Misses
dl2 Missrate

CC1
337353464
337353464
1589871
0.0047
122134327
1287463
0.0105
3265974
401908
0.1231

GO
545823063
545823063
716280
0.0013
211701947
176428
0.0008
955573
69334
0.0726

Anagram
25729033
25729033
504
0.00000
9295564
18357
0.002
24107
5252
0.2179

Computer Architecture EE6304 Project #1

Calculating CPI:
CPI = 1+

+
=

es

Benchmarks
CC1
GO
ANAGRAM

CPI
1.06634195
1.01313547
1.01177939

Computer Architecture EE6304 Project #1

Part3: Optimize CPI for each Benchmark


Procedure followed for each Benchmark[a]
Below are the parameters and their corresponding combinations that were selected for running
the optimization based.

L1 Cache: 128KB
L2 Cache : 1MB
Block size: 128B, 64B, 32B
Associativity: 8, 2, 1
Block Replacement Policy: LRU, FIFO and Random

Nsets can then be calculated using the formula:

In addition, we chose three combinations of L1 cache with L2 cache for each of the three
benchmarks as shown below:

L1 separate, L2 separate
L1 separate, L2 unified
L1 unified, L2 unified

Data was collected for all these stated parameters using a python script (Can be provided if asked
for but has not been included as it wasnt listed as one of the deliverables). The script was run for
all the above parameters for each of the benchmarks to obtain data for the three L1 and L2 cache
combinations listed above.
Below are the plots showing the CPI for each of the benchmarks for all three above listed cache
configurations for L1 and L2 caches. The chart titles specify the benchmark and also the L1 and
L2 cache combination. The y-axis shows the CPI and the x-axis shows the L1 and L2
configuration corresponding to each and every CPI.
The data in these plots have been arranged to show the least CPI first and then follow an
ascending order reaching the highest CPI. The CPIs are easily traceable by looking at the
corresponding point on the x-axis which would give information on L1 and L2 cache
configurations pertaining to the CPI under observation.
Please note that the data tables and the graphs will follow the following notation:
L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block
size):(associativity):(replacement policy)
8

p
e
r

I
n
s
t
r

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

L1:2048:32:1:l_L2:32768:32:1:l

L1:2048:32:1:r_L2:32768:32:1:r

L1:2048:32:1:l_L2:16384:32:1:l

L1:2048:32:1:r_L2:16384:32:1:r

L1:2048:32:1:f_L2:16384:32:1:f

of

L1:2048:32:1:f_L2:32768:32:1:f

L1:1024:64:1:r_L2:8192:64:1:r

L1:1024:64:1:f_L2:8192:64:1:f

L1:1024:64:1:l_L2:8192:64:1:l

L1:512:128:1:r_L2:4096:128:1:r

L1:512:128:1:f_L2:4096:128:1:f

policy)_L2:(No.

L1:1024:64:1:r_L2:16384:64:1:r

L1:1024:64:1:f_L2:16384:64:1:f

L1:1024:64:1:l_L2:16384:64:1:l

L1:512:128:1:r_L2:8192:128:1:r

L1:512:128:1:f_L2:8192:128:1:f

L1:512:128:1:l_L2:4096:128:1:l

L1:1024:32:2:r_L2:8192:32:2:r

L1:1024:32:2:f_L2:8192:32:2:f

L1:1024:32:2:l_L2:8192:32:2:l

L1:512:64:2:r_L2:4096:64:2:r

L1:512:64:2:f_L2:4096:64:2:f

L1:256:32:8:r_L2:2048:32:8:r

L1:512:64:2:l_L2:4096:64:2:l

L1:256:32:8:f_L2:2048:32:8:f

L1:256:128:2:f_L2:2048:128:2:f

size):(associativity):(replacement

L1:512:128:1:l_L2:8192:128:1:l

L1:1024:32:2:r_L2:16384:32:2:r

L1:1024:32:2:f_L2:16384:32:2:f

L1:1024:32:2:l_L2:16384:32:2:l

L1:256:32:8:r_L2:4096:32:8:r

L1:256:32:8:f_L2:4096:32:8:f

L1:512:64:2:r_L2:8192:64:2:r

L1:512:64:2:f_L2:8192:64:2:f

L1:512:64:2:l_L2:8192:64:2:l

L1:256:32:8:l_L2:4096:32:8:l

L1:256:128:2:r_L2:2048:128:2:r

L1:256:128:2:l_L2:2048:128:2:l

L1:256:32:8:l_L2:2048:32:8:l

L1:128:64:8:r_L2:1024:64:8:r

L1:128:64:8:f_L2:1024:64:8:f

L1:64:128:8:r_L2:512:128:8:r

L1:64:128:8:f_L2:512:128:8:f

L1:128:64:8:l_L2:1024:64:8:l

NOTATION: L1:(No. of sets):(block


size):(associativity):(replacement policy)

L1:256:128:2:r_L2:4096:128:2:r

L1:128:64:8:r_L2:2048:64:8:r

L1:256:128:2:f_L2:4096:128:2:f

L1:128:64:8:f_L2:2048:64:8:f

L1:256:128:2:l_L2:4096:128:2:l

1.12
1.11
1.1
1.09
1.08
1.07
1.06
1.05
1.04
1.03
1.02
1.01
1

L1:128:64:8:l_L2:2048:64:8:l

I
n
s
t
r

L1:64:128:8:r_L2:1024:128:8:r

C
y
c
l
e
s
1.1
1.09
1.08
1.07
1.06
1.05
1.04
1.03
1.02
1.01
1

L1:64:128:8:f_L2:1024:128:8:f

p
e
r

L1:64:128:8:l_L2:512:128:8:l

C
y
c
l
e
s

L1:64:128:8:l_L2:1024:128:8:l

Computer Architecture EE6304 Project #1

Plots for benchmarks: GCC, Anagram, GO


GCC benchmark
sets):(block

GCC Benchmark: CPI - versus - L1separate_L2separate cache


config

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

GCC Benchmark: CPI - versus - L1separate_L2unified cache


config

C
y
c
l
e
s

I
n
s
t
r

L1:128:128:8:l_L2:1024:128:8:l
L1:128:128:8:f_L2:1024:128:8:f
L1:128:128:8:r_L2:1024:128:8:r
L1:256:64:8:l_L2:2048:64:8:l
L1:256:64:8:f_L2:2048:64:8:f
L1:256:64:8:r_L2:2048:64:8:r
L1:512:128:2:r_L2:4096:128:2:r
L1:512:128:2:l_L2:4096:128:2:l
L1:512:32:8:l_L2:4096:32:8:l
L1:512:128:2:f_L2:4096:128:2:f
L1:512:32:8:f_L2:4096:32:8:f
L1:512:32:8:r_L2:4096:32:8:r
L1:1024:64:2:l_L2:8192:64:2:l
L1:1024:64:2:f_L2:8192:64:2:f
L1:1024:64:2:r_L2:8192:64:2:r
L1:2048:32:2:l_L2:16384:32:2:l
L1:2048:32:2:f_L2:16384:32:2:f
L1:2048:32:2:r_L2:16384:32:2:r
L1:1024:128:1:l_L2:8192:128:1:l
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:1:r
L1:2048:64:1:l_L2:16384:64:1:l
L1:2048:64:1:f_L2:16384:64:1:f
L1:2048:64:1:r_L2:16384:64:1:r
L1:4096:32:1:l_L2:32768:32:1:l
L1:4096:32:1:f_L2:32768:32:1:f
L1:4096:32:1:r_L2:32768:32:1:r

Computer Architecture EE6304 Project #1

NOTATION: L1:(No. of sets):(block


size):(associativity):(replacement policy)
size):(associativity):(replacement

10

policy)_L2:(No.
of
sets):(block

GCC Benchmark: CPI - versus - L1unified_L2unified


cache config

1.009
1.008
1.007
1.006
1.005
1.004
1.003
p 1.002
e 1.001
1

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

Each of the plots above clearly shows that L1 unified cache with L2 unified cache gives the best
possible CPI
More of the data analysis is done in the following section.

I
n
s
t
r

C
y
c
l
e
s

p
e
r

I
n
s
t
r

L1:512:128:1:l_L2:4096:128:1:l
L1:512:128:1:f_L2:4096:128:1:f
L1:512:128:1:r_L2:4096:128:1:r
L1:1024:64:1:l_L2:8192:64:1:l
L1:1024:64:1:f_L2:8192:64:1:f
L1:1024:64:1:r_L2:8192:64:1:r
L1:64:128:8:r_L2:512:128:8:r
L1:256:128:2:r_L2:2048:128:2:r
L1:512:64:2:r_L2:4096:64:2:r
L1:128:64:8:r_L2:1024:64:8:r
L1:2048:32:1:l_L2:16384:32:1:l
L1:2048:32:1:f_L2:16384:32:1:f
L1:2048:32:1:r_L2:16384:32:1:r
L1:1024:32:2:r_L2:8192:32:2:r
L1:256:32:8:r_L2:2048:32:8:r
L1:256:128:2:f_L2:2048:128:2:f
L1:256:128:2:l_L2:2048:128:2:l
L1:512:64:2:f_L2:4096:64:2:f
L1:64:128:8:l_L2:512:128:8:l
L1:64:128:8:f_L2:512:128:8:f
L1:128:64:8:l_L2:1024:64:8:l
L1:512:64:2:l_L2:4096:64:2:l
L1:128:64:8:f_L2:1024:64:8:f
L1:256:32:8:l_L2:2048:32:8:l
L1:1024:32:2:l_L2:8192:32:2:l
L1:256:32:8:f_L2:2048:32:8:f
L1:1024:32:2:f_L2:8192:32:2:f

1.00015
1.0001
p 1.00005
1
e

L1:512:128:1:l_L2:8192:128:1:l
L1:512:128:1:f_L2:8192:128:1:f
L1:512:128:1:r_L2:8192:128:1:r
L1:1024:64:1:l_L2:16384:64:1:l
L1:1024:64:1:f_L2:16384:64:1:f
L1:1024:64:1:r_L2:16384:64:1:r
L1:256:128:2:r_L2:4096:128:2:r
L1:64:128:8:r_L2:1024:128:8:r
L1:2048:32:1:l_L2:32768:32:1:l
L1:2048:32:1:f_L2:32768:32:1:f
L1:2048:32:1:r_L2:32768:32:1:r
L1:512:64:2:r_L2:8192:64:2:r
L1:128:64:8:r_L2:2048:64:8:r
L1:1024:32:2:r_L2:16384:32:2:r
L1:256:32:8:r_L2:4096:32:8:r
L1:256:128:2:f_L2:4096:128:2:f
L1:256:128:2:l_L2:4096:128:2:l
L1:512:64:2:f_L2:8192:64:2:f
L1:64:128:8:l_L2:1024:128:8:l
L1:64:128:8:f_L2:1024:128:8:f
L1:128:64:8:l_L2:2048:64:8:l
L1:512:64:2:l_L2:8192:64:2:l
L1:128:64:8:f_L2:2048:64:8:f
L1:256:32:8:l_L2:4096:32:8:l
L1:1024:32:2:l_L2:16384:32:2:l
L1:256:32:8:f_L2:4096:32:8:f
L1:1024:32:2:f_L2:16384:32:2:f

Computer Architecture EE6304 Project #1

NOTATION: L1:(No. of sets):(block


size):(associativity):(replacement policy)

Anagram benchmark
size):(associativity):(replacement

11

policy)_L2:(No.
of
sets):(block

Anagram Benchmark: CPI - versus - L1separate_L2separate


cache config

C
y
1.00045
c
1.0004
l 1.00035
1.0003
e 1.00025
s
1.0002

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

Anagram Benchmark: CPI - versus - L1separate_L2unified cache config

1.00045
1.0004
1.00035
1.0003
1.00025
1.0002
1.00015
1.0001
1.00005
1

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

C
y
c
l
e
s

p
e
r

I
n
s
t
r

1.000009
1.000008
1.000007
1.000006
1.000005
1.000004
1.000003
1.000002
1.000001
1

L1:128:128:8:r_L2:1024:128:8:r
L1:512:128:2:r_L2:4096:128:2:r
L1:128:128:8:l_L2:1024:128:8:l
L1:512:128:2:l_L2:4096:128:2:l
L1:1024:128:1:l_L2:8192:128:1:l
L1:128:128:8:f_L2:1024:128:8:f
L1:512:128:2:f_L2:4096:128:2:f
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:
L1:256:64:8:r_L2:2048:64:8:r
L1:1024:64:2:r_L2:8192:64:2:r
L1:256:64:8:l_L2:2048:64:8:l
L1:1024:64:2:l_L2:8192:64:2:l
L1:2048:64:1:l_L2:16384:64:1:l
L1:256:64:8:f_L2:2048:64:8:f
L1:1024:64:2:f_L2:8192:64:2:f
L1:2048:64:1:f_L2:16384:64:1:f
L1:2048:64:1:r_L2:16384:64:1:r
L1:512:32:8:r_L2:4096:32:8:r
L1:512:32:8:l_L2:4096:32:8:l
L1:2048:32:2:l_L2:16384:32:2:l
L1:4096:32:1:l_L2:32768:32:1:l
L1:512:32:8:f_L2:4096:32:8:f
L1:2048:32:2:f_L2:16384:32:2:f
L1:4096:32:1:f_L2:32768:32:1:f
L1:2048:32:2:r_L2:16384:32:2:r
L1:4096:32:1:r_L2:32768:32:1:r

Computer Architecture EE6304 Project #1

NOTATION: L1:(No. of sets):(block


size):(associativity):(replacement policy)
size):(associativity):(replacement

12

policy)_L2:(No.
of
sets):(block

Anagram Benchmark: CPI - versus - L1unified_L2unified cache config

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

Each of the plots above clearly shows that L1 unified cache with L2 unified cache gives the best
possible CPI
More of the data analysis is done in the following section.

p
e
r

I
n
s
t
r

e
r

I
n
s
t
r

L1:64:128:8:r_L2:512:128:8:r
L1:64:128:8:l_L2:512:128:8:l
L1:64:128:8:f_L2:512:128:8:f
L1:256:128:2:l_L2:2048:128:2:l
L1:256:128:2:r_L2:2048:128:2:r
L1:256:128:2:f_L2:2048:128:2:f
L1:128:64:8:r_L2:1024:64:8:r
L1:128:64:8:l_L2:1024:64:8:l
L1:512:64:2:f_L2:4096:64:2:f
L1:512:64:2:l_L2:4096:64:2:l
L1:512:64:2:r_L2:4096:64:2:r
L1:128:64:8:f_L2:1024:64:8:f
L1:1024:32:2:l_L2:8192:32:2:l
L1:256:32:8:r_L2:2048:32:8:r
L1:1024:32:2:r_L2:8192:32:2:r
L1:1024:32:2:f_L2:8192:32:2:f
L1:256:32:8:l_L2:2048:32:8:l
L1:256:32:8:f_L2:2048:32:8:f
L1:1024:64:1:l_L2:8192:64:1:l
L1:1024:64:1:f_L2:8192:64:1:f
L1:1024:64:1:r_L2:8192:64:1:r
L1:2048:32:1:l_L2:16384:32:1:l
L1:2048:32:1:f_L2:16384:32:1:f
L1:2048:32:1:r_L2:16384:32:1:r
L1:512:128:1:l_L2:4096:128:1:l
L1:512:128:1:f_L2:4096:128:1:f
L1:512:128:1:r_L2:4096:128:1:r

C
y
1.03
c 1.025
1.02
l
1.015
e
1.01
s 1.005

L1:64:128:8:r_L2:1024:128:8:r
L1:64:128:8:l_L2:1024:128:8:l
L1:64:128:8:f_L2:1024:128:8:f
L1:256:128:2:l_L2:4096:128:2:l
L1:256:128:2:r_L2:4096:128:2:r
L1:256:128:2:f_L2:4096:128:2:f
L1:128:64:8:l_L2:2048:64:8:l
L1:128:64:8:r_L2:2048:64:8:r
L1:512:64:2:l_L2:8192:64:2:l
L1:512:64:2:f_L2:8192:64:2:f
L1:512:64:2:r_L2:8192:64:2:r
L1:128:64:8:f_L2:2048:64:8:f
L1:256:32:8:r_L2:4096:32:8:r
L1:1024:32:2:l_L2:16384:32:2:l
L1:256:32:8:l_L2:4096:32:8:l
L1:1024:32:2:r_L2:16384:32:2:r
L1:1024:32:2:f_L2:16384:32:2:f
L1:256:32:8:f_L2:4096:32:8:f
L1:1024:64:1:l_L2:16384:64:1:l
L1:1024:64:1:f_L2:16384:64:1:f
L1:1024:64:1:r_L2:16384:64:1:r
L1:512:128:1:l_L2:8192:128:1:l
L1:512:128:1:f_L2:8192:128:1:f
L1:512:128:1:r_L2:8192:128:1:r
L1:2048:32:1:l_L2:32768:32:1:l
L1:2048:32:1:f_L2:32768:32:1:f
L1:2048:32:1:r_L2:32768:32:1:r

Computer Architecture EE6304 Project #1

GO benchmark

NOTATION: L1:(No. of sets):(block


size):(associativity):(replacement policy)
size):(associativity):(replacement

13

policy)_L2:(No.
of
sets):(block

GO Benchmark: CPI - versus - L1separate_L2separate cache


config

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

GO Benchmark: CPI - versus - L1separate_L2unified cache config

C
y
c
l
e
s

1.04
1.035
1.03
1.025
1.02
1.015
1.01
1.005
p
1

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

1.003
1.0025
1.002
1.0015
1.001
1.0005
1
p

e
r

I
n
s
t
r

L1:512:128:2:l_L2:4096:128:2:l
L1:1024:128:1:l_L2:8192:128:1:l
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:1:r
L1:512:128:2:f_L2:4096:128:2:f
L1:512:128:2:r_L2:4096:128:2:r
L1:1024:64:2:l_L2:8192:64:2:l
L1:128:128:8:f_L2:1024:128:8:f
L1:1024:64:2:r_L2:8192:64:2:r
L1:2048:64:1:l_L2:16384:64:1:l
L1:2048:64:1:f_L2:16384:64:1:f
L1:2048:64:1:r_L2:16384:64:1:r
L1:1024:64:2:f_L2:8192:64:2:f
L1:128:128:8:r_L2:1024:128:8:r
L1:128:128:8:l_L2:1024:128:8:l
L1:256:64:8:f_L2:2048:64:8:f
L1:2048:32:2:r_L2:16384:32:2:r
L1:4096:32:1:l_L2:32768:32:1:l
L1:4096:32:1:f_L2:32768:32:1:f
L1:4096:32:1:r_L2:32768:32:1:r
L1:2048:32:2:f_L2:16384:32:2:f
L1:2048:32:2:l_L2:16384:32:2:l
L1:256:64:8:r_L2:2048:64:8:r
L1:256:64:8:l_L2:2048:64:8:l
L1:512:32:8:f_L2:4096:32:8:f
L1:512:32:8:r_L2:4096:32:8:r
L1:512:32:8:l_L2:4096:32:8:l

Computer Architecture EE6304 Project #1

NOTATION: L1:(No. of sets):(block


size):(associativity):(replacement policy)
size):(associativity):(replacement

14

policy)_L2:(No.
of
sets):(block

GO Benchmark: CPI - versus - L1unified_L2unified cache config

C
y
c
l
e
s

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

Each of the plots above clearly shows that L1 unified cache with L2 unified cache gives the best
possible CPI
More of the data analysis is done in the following section.

Computer Architecture EE6304 Project #1

Data tables for lowest CPIs for benchmarks: GCC, Anagram, GO


Data below has been chosen to contain the six lowest CPI values from the three L1 and L2 cache
configurations from each of the benchmarks. They have been listed in the below tables along
with their corresponding charts.
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

policy)_L2:(No.

GCC benchmark
GCC
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified

L1 and L2 cache config


CPI
L1:128:128:8:l_L2:1024:128:8:l
1.001753557
L1:128:128:8:f_L2:1024:128:8:f
1.001758036
L1:128:128:8:r_L2:1024:128:8:r
1.002152616
L1:256:64:8:l_L2:2048:64:8:l
1.002231389
L1:256:64:8:f_L2:2048:64:8:f
1.002353557
L1:256:64:8:r_L2:2048:64:8:r
1.002748137
L1:64:128:8:l_L2:512:128:8:l
1.026271362
L1:128:64:8:l_L2:1024:64:8:l
1.028479706
L1:64:128:8:f_L2:512:128:8:f
1.028799969
L1:64:128:8:r_L2:512:128:8:r
1.028815025
L1:128:64:8:f_L2:1024:64:8:f
1.031192022
L1:128:64:8:r_L2:1024:64:8:r
1.03282593
L1:64:128:8:l_L2:1024:128:8:l
1.034493517
L1:64:128:8:f_L2:1024:128:8:f
1.039331023
L1:64:128:8:r_L2:1024:128:8:r
1.040554121
L1:128:64:8:l_L2:2048:64:8:l
1.040779169
L1:256:128:2:l_L2:4096:128:2:l
1.045580361
L1:128:64:8:f_L2:2048:64:8:f
1.046203326
*listed in ascending order

Plot is on the next page.

15

of

sets):(block

p
e
r

I
n
s
t
r

16

L1:128:64:8:f_L2:2048:64:8:f

L1:256:128:2:l_L2:4096:128:2:l

L1:128:64:8:l_L2:2048:64:8:l

L1:64:128:8:r_L2:1024:128:8:r

L1:64:128:8:f_L2:1024:128:8:f

L1:64:128:8:l_L2:1024:128:8:l

L1:128:64:8:r_L2:1024:64:8:r

L1:128:64:8:f_L2:1024:64:8:f

L1:64:128:8:r_L2:512:128:8:r

L1:64:128:8:f_L2:512:128:8:f

L1:128:64:8:l_L2:1024:64:8:l

L1:64:128:8:l_L2:512:128:8:l

L1:512:128:2:l_L2:4096:128:2:l

L1:256:64:8:f_L2:2048:64:8:f

L1:256:64:8:l_L2:2048:64:8:l

L1:128:128:8:r_L2:1024:128:8:r

L1:128:128:8:f_L2:1024:128:8:f

C
y
c
l
e
s

L1:128:128:8:l_L2:1024:128:8:l

Computer Architecture EE6304 Project #1

1.0499

1.0399

1.0299

1.0199

1.0099

0.9999

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

Computer Architecture EE6304 Project #1

ANAGRAM benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate

size):(associativity):(replacement

policy)_L2:(No.

of

sets):(block

Anagram
CPI
L1:128:128:8:l_L2:1024:128:8:l
1.000004509
L1:512:128:2:l_L2:4096:128:2:l
1.000004509
L1:256:64:8:l_L2:2048:64:8:l
1.000006355
L1:1024:64:2:l_L2:8192:64:2:l
1.000006355
L1:512:32:8:l_L2:4096:32:8:l
1.000008563
L1:2048:32:2:l_L2:16384:32:2:l
1.000008563
L1:512:128:1:l_L2:8192:128:1:l
1.000321074
L1:512:128:1:f_L2:8192:128:1:f
1.000321074
L1:512:128:1:r_L2:8192:128:1:r
1.000321074
L1:512:128:1:l_L2:4096:128:1:l
1.000333232
L1:512:128:1:f_L2:4096:128:1:f
1.000333232
L1:512:128:1:r_L2:4096:128:1:r
1.000333232
L1:1024:64:1:l_L2:16384:64:1:l
1.000342179
L1:1024:64:1:f_L2:16384:64:1:f
1.000342179
L1:1024:64:1:r_L2:16384:64:1:r
1.000342179
L1:1024:64:1:l_L2:8192:64:1:l
1.000348435
L1:1024:64:1:f_L2:8192:64:1:f
1.000348435
L1:1024:64:1:r_L2:8192:64:1:r
1.000348435
*listed in ascending order

Anagram Benchmark: CPI - versus - L1_L2 cache config

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

17

L1:1024:64:1:r_L2:8192:64:1:r

L1:1024:64:1:f_L2:8192:64:1:f

L1:1024:64:1:l_L2:8192:64:1:l

L1:1024:64:1:r_L2:16384:64:1:r

L1:1024:64:1:f_L2:16384:64:1:f

L1:1024:64:1:l_L2:16384:64:1:l

L1:512:128:1:r_L2:4096:128:1:r

L1:512:128:1:f_L2:4096:128:1:f

L1:512:128:1:l_L2:4096:128:1:l

L1:512:128:1:r_L2:8192:128:1:r

L1:512:128:1:f_L2:8192:128:1:f

L1:512:128:1:l_L2:8192:128:1:l

L1:2048:32:2:l_L2:16384:32:2:l

L1:512:32:8:l_L2:4096:32:8:l

L1:1024:64:2:l_L2:8192:64:2:l

L1:256:64:8:l_L2:2048:64:8:l

I
n
s
t
r

L1:512:128:2:l_L2:4096:128:2:l

p
e
r

1.0004
1.00035
1.0003
1.00025
1.0002
1.00015
1.0001
1.00005
1
0.99995
0.9999
L1:128:128:8:l_L2:1024:128:8:l

C
y
c
l
e
s

Computer Architecture EE6304 Project #1

GO benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

policy)_L2:(No.

of

sets):(block

GO
CPI
L1:512:128:2:l_L2:4096:128:2:l
1.000804
L1:1024:128:1:l_L2:8192:128:1:l
1.000816
L1:1024:128:1:f_L2:8192:128:1:f
1.000816
L1:1024:128:1:r_L2:8192:128:1:r
1.000816
L1:512:128:2:f_L2:4096:128:2:f
1.000855
L1:512:128:2:r_L2:4096:128:2:r
1.000892
L1:64:128:8:r_L2:512:128:8:r
1.004668
L1:64:128:8:l_L2:512:128:8:l
1.004956
L1:64:128:8:f_L2:512:128:8:f
1.005102
L1:256:128:2:l_L2:2048:128:2:l
1.005156
L1:256:128:2:r_L2:2048:128:2:r
1.005336
L1:256:128:2:f_L2:2048:128:2:f
1.005828
L1:64:128:8:r_L2:1024:128:8:r
1.008772
L1:64:128:8:l_L2:1024:128:8:l
1.008818
L1:64:128:8:f_L2:1024:128:8:f
1.009194
L1:256:128:2:l_L2:4096:128:2:l
1.009255
L1:256:128:2:r_L2:4096:128:2:r
1.009582
L1:256:128:2:f_L2:4096:128:2:f
1.010047
*listed in ascending order

L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified

GO Benchmark: CPI - versus - L1_L2 cache config


C
y
c
l
e
s

1.0119

p
e
r

1.0019

1.0079
1.0059
1.0039

L1:256:128:2:f_L2:4096:128:2:f

L1:256:128:2:r_L2:4096:128:2:r

L1:256:128:2:l_L2:4096:128:2:l

L1:64:128:8:f_L2:1024:128:8:f

L1:64:128:8:l_L2:1024:128:8:l

L1:64:128:8:r_L2:1024:128:8:r

L1:256:128:2:f_L2:2048:128:2:f

L1:256:128:2:r_L2:2048:128:2:r

L1:256:128:2:l_L2:2048:128:2:l

L1:64:128:8:f_L2:512:128:8:f

L1:64:128:8:l_L2:512:128:8:l

L1:64:128:8:r_L2:512:128:8:r

L1:512:128:2:r_L2:4096:128:2:r

L1:512:128:2:f_L2:4096:128:2:f

L1:1024:128:1:r_L2:8192:128:1:r

L1:1024:128:1:f_L2:8192:128:1:f

L1:1024:128:1:l_L2:8192:128:1:l

0.9999
L1:512:128:2:l_L2:4096:128:2:l

I
n
s
t
r

1.0099

L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block


size):(associativity):(replacement policy)

18

Computer Architecture EE6304 Project #1

Data Analysis
From analysis of this data, we can clearly point out that L1 unified and L2 unified gives the
lowest CPI values for all three benchmarks.
The trend followed by L1 cache and L2 cache configuration on the three benchmarks going
from lower CPI to higher CPI
GCC Benchmark
L1 united, L2 united
L1 separate, L2 separate
L1 separate, L2 unified
ANAGRAM Benchmark
L1 united, L2 united
L1 separate, L2 separate
(back and forth)
L1 separate, L2 unified
GO Benchmark
L1 united, L2 united
L1 separate, L2 separate
L1 separate, L2 unified

Now we can clearly reach the following conclusions for the most optimized CPI:

L1 unified and L2 unified for all three benchmarks


Block size of 128 for all three benchmarks
Associativity of 8 for GCC and Anagram; associativity of 2 for GO
Replacement policy is l for all three benchmarks

19

Computer Architecture EE6304 Project #1

GCC Benchmark (selected cache config for the best six CPIs)
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

GCC
L1:128:128:8:l_L2:1024:128:8:l
L1:128:128:8:f_L2:1024:128:8:f
L1:128:128:8:r_L2:1024:128:8:r
L1:256:64:8:l_L2:2048:64:8:l
L1:256:64:8:f_L2:2048:64:8:f
L1:256:64:8:r_L2:2048:64:8:r

policy)_L2:(No.

of

sets):(block

CPI
1.001753557
1.001758036
1.002152616
1.002231389
1.002353557
1.002748137

It has been inferred that CPI follows a trend with the least to the maximum CPI values varying
according to block size->replacement->associativity
For example:
128:8:l will have better CPI than 128:8:f
128:8:f will have a better CPI than 128:8:r
128:8:r will have a better CPI than 64:8:l
64:8:l will have a better CPI than 64:8:f
64:8:f will have a better CPI than 128:8:r
Notation:
Block_size:associativity:replacement_policy

20

Computer Architecture EE6304 Project #1

ANAGRAM Benchmark (selected cache config for the best six CPIs)
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

Anagram
L1:128:128:8:l_L2:1024:128:8:l
L1:512:128:2:l_L2:4096:128:2:l
L1:256:64:8:l_L2:2048:64:8:l
L1:1024:64:2:l_L2:8192:64:2:l
L1:512:32:8:l_L2:4096:32:8:l
L1:2048:32:2:l_L2:16384:32:2:l

policy)_L2:(No.

of

sets):(block

CPI
1.000004509
1.000004509
1.000006355
1.000006355
1.000008563
1.000008563

The CPI follows a trend with the least to the maximum CPI values varying according to block
associativity->block size->replacement
For example:
128:8:l will have better CPI than 128:2:l
128:2:l will have a better CPI than 64:8:l
64:8:l will have a better CPI than 64:2:l
64:2:l will have a better CPI than 32:8:l
32:8:l will have a better CPI than 32:2:l
Notation:
Block_size:associativity:replacement_policy

21

Computer Architecture EE6304 Project #1

GO Benchmark (selected cache config for the best six CPIs)


NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

GO
L1:512:128:2:l_L2:4096:128:2:l
L1:1024:128:1:l_L2:8192:128:1:l
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:1:r
L1:512:128:2:f_L2:4096:128:2:f
L1:512:128:2:r_L2:4096:128:2:r

policy)_L2:(No.

of

sets):(block

CPI
1.000804
1.000816
1.000816
1.000816
1.000855
1.000892

The CPI follows a trend with the least to the maximum CPI values varying according to block
replacement->associativity->block size
For example:
128:1:l will have better CPI than 128:1:f
128:1:f will have a better CPI than 128:1:r
128:2:f will have a better CPI than 128:2:r

Notation:
Block_size:associativity:replacement_policy
However, for optimized CPI, we chose to select the one with the least possible CPI and
domination amongst the above listed six lowest CPI values.

22

Computer Architecture EE6304 Project #1

The below choices have been made to obtain the lowest possible CPI that would guarantee the
best possible performance under the assumptions listed in the starting on Part3 of this report[a].
CPI is a function of block size, associativity and replacement policy. We also consider L1 and L2
cache configuration combinations. As can been seen in the below table, unified L1 unified L2
cache, a bigger block size, higher associativity and LRU replacement policy help us satisfy the
objective of this part.
The most optimized L1 and L2 configuration is chosen to be:
L1 unified and L2 unified
GCC
Anagram
GO

Cache Config
L1 unified,
L2 unified
L1 unified,
L2 unified
L1 unified,
L2 unified

Block size

Associativity

Replacement

128

"l"

128

"l"

128

"l"

However, there are tradeoffs for considering the best possible CPI. We will look into this in
Part4 of this report.

23

Computer Architecture EE6304 Project #1

Part 4: Define Cost Function


Since we are given fixed L1 and L2 cache sizes, we consider the cost to be a function of:

Associativity :
Cost increases as the associativity is increased. Therefore, it was assumed that the cost to
implement an 8way associativity is higher than the cost for a 2way associativity which would
be higher than a 1way associativity.

Replacement policy
Considered as a 2% increase from Random to FIFO and a 5% increase from FIFO to LRU

L1, L2 cache configuration


- Unified or Separate:
A separated instruction and data cache would cost higher than a unified cache because of
the hardware overhead accounted for in separate cache.
- L1 is more expensive than L2
Since L1 cache is internal, it has been taken that L1 is more expensive compared to L2
cache.

The above choices have been considered to completely define the cost function for caches in
terms of area overhead and performance.
Data supporting the above stated factors influencing cost function for caches follows in Part5 of
this report.

24

Computer Architecture EE6304 Project #1

Part 5: Optimize Cache for Performance/Cost


The factors that improve the performance of CPI also have a trade off with cost as stated in Part4
of this report. The following observations have been made from the analyzed data:
-

As the cost increases, the CPI becomes better getting closer to


since better
performance in guaranteed by using higher associativity, replacement policy, cache
configuration.

It is observed from the above plot that as the cost increases, the configurations give CPI
values that are closer to
=1 compared to when the cost is on the lower edge. Also
depicted by the narrowing nature of the CPI curve as the cost increases (Fig1).

The stability factor also increases with use of cache configuration that gives desirable
CPI but nothing comes for free, so we end up paying a higher cost.

Hence we decide to choose values to the right half of the plots guarantying a better CPI.

25

Computer Architecture EE6304 Project #1

GCC Benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

policy)_L2:(No.

of

sets):(block

The above plot shows GCC benchmark showing Cost on the left side vertical axis and CPI on the
right side vertical axis. Both have been plotted with x-axis in common. X-axis shows L1 and L2
cache configuration.
The arrows on the plot show the chosen CPIs and Cache config for optimized cost

26

Computer Architecture EE6304 Project #1

Fig. 1
The ideal configuration chosen to have cost optimization along with good performance is shown
below with its CPI, cost and cache configuration. This was chosen because it gave a good CPI
value for its cost. Another better alternative is shown in Fig. 3 but that would be a cost of 49.5
which means a relatively high increase in cost for a small change in CPI value that gauarantees
CPI to be closer to CPI ideal. Hence we choose the optimized CPI to be in the one in Fig2.
GCC
L1unified_L2unified
CPI
L1:1024:64:2:l_L2:8192:64:2:l 1.003789425

Cost
27

Fig. 2 (Optimal Configuration)


GCC
L1unified_L2unified
CPI
L1:128:128:8:f_L2:1024:128:8:f 1.001758036

Cost
49.5

Fig. 3
GCC Optimal Configuration: L1 unified, L2 unified Cache; Block size: 64, Associativity: 2,
Replacement Policy: LRU
27

Computer Architecture EE6304 Project #1

Anagram Benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

policy)_L2:(No.

of

sets):(block

The arrows on the plot show the chosen CPIs and Cache config for optimized cost
The ideal configuration chosen to have cost optimization along with good performance is shown
below with its CPI, cost and cache configuration. This was chosen because it gave a good CPI
value for its cost. Another better alternative is shown in Fig. 5 but that would be a cost of 49.5
which means a relatively high increase in cost for a small change in CPI value that guarantees
CPI to be closer to CPI ideal. Hence we choose the optimized CPI to be in the one in Fig 4.
Anagram
L1unified_L2unified
CPI
L1:1024:64:2:l_L2:8192:64:2:l 1.000006355

Fig. 4 (Optimal Configuration)

28

Cost
27

Computer Architecture EE6304 Project #1


Anagram
L1unified_L2unified
L1:512:32:8:r_L2:4096:32:8:r

CPI
1.000008519

Cost
48

Fig. 5
ANAGRAM Optimal Configuration: L1 unified, L2 unified Cache; Block size: 64,
Associativity: 2, Replacement Policy: LRU

GO Benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)

size):(associativity):(replacement

policy)_L2:(No.

The arrows on the plot show the chosen CPIs and Cache config for optimized cost

29

of

sets):(block

Computer Architecture EE6304 Project #1

The ideal configuration chosen to have cost optimization along with good performance is shown
below with its CPI, cost and cache configuration. This was chosen because it gave a good CPI
value for its cost. Another better alternative is shown in Fig. 7 but that would be a cost of 49.5
which means a relatively high increase in cost for a small change in CPI value that guarantees
CPI to be closer to CPI ideal. Hence we choose the optimized CPI to be in the one in Fig 6.
GO
L1separate_L2unified
CPI
L1:1024:64:2:l_L2:8192:64:2:l 1.001065804

Cost
27

Fig. 6 (Optimal Configuration)


GO
L1separate_L2unified
CPI
L1:128:128:8:r_L2:1024:128:8:r 1.001156244

Cost
48

Fig. 7
GO Optimal Configuration: L1 separate, L2 unified Cache; Block size: 64, Associativity: 2,
Replacement Policy: LRU
Below is a table comparing all of the optimized cost and performance based results from the
three benchmarks.

30

Computer Architecture EE6304 Project #1

OPTIMAL CONFIGURATION
The best optimal configuration from all the benchmarks is guaranteed by Anagram considering
cost in terms of optimized CPI ( Fig. 8).
GCC
L1unified_L2unified
L1:1024:64:2:l_L2:8192:64:2:l

CPI
1.003789425

Cost
27

Anagram
L1unified_L2unified
CPI
Cost
L1:1024:64:2:l_L2:8192:64:2:l 1.0000064
27
GO
L1separate_L2unified
L1:1024:64:2:l_L2:8192:64:2:l

CPI
1.001065804

Cost
27

Fig. 8
The best optimal configuration for all the benchmarks considered in terms of average CPI
shown in Fig. 10 as asked in the question
Average CPI
1.001620543

Average Cost
27

Fig. 9
Optimal Configurations:
Optimal Config for all Benchmarks(in terms of avg CPI)
GCC
L1:1024:64:2:l_L2:8192:64:2:l
Anagram

L1:1024:64:2:l_L2:8192:64:2:l

GO

L1:1024:64:2:l_L2:8192:64:2:l

Fig. 10

31

Computer Architecture EE6304 Project #1

References
-

A Quantative Approach 5th Edition ,John L Hennessy & David A Patterson


http://datasheets.chipdb.org/DEC/21264/21264-Alpha.pdf
http://www.eecs.umich.edu/~taustin/eecs573_public/instruct-progs.tar.gz
www.simplescalar.com

32

You might also like