Professional Documents
Culture Documents
PROJECT #1
Computer Architecture
EE6304
Due Date: 3/8/2012
Team Number:22
Submitted By:
Kaur Guneet
Patel Jayendra
Index
Objective
Part1:Preparation
Part2:Find CPI
19
24
25
References
32
Objective
In this project, we have been asked to fine-tune the cache hierarchy of an Alpha
microprocessor for three individual benchmarks. The cache design parameters we can modify
are:
Cache levels: One or two levels, for data and instruction caches.
Unified caches: Selection of separate vs. unified instruction/data caches. For example,
you can have separate L1 caches and a unified L2 cache.
Size: Cache size, one of the most important choices.
Associativity: Selection of cache associativity (e.g. direct mapped, 2-way set associative,
etc.).
Block size: Block size of the cache, usually 64 or 32 bytes.
Block replacement policy: Selection between FIFO, LRU and Random.
While larger caches generally mean better performance, they also come at a greater cost. Thus,
sensible design choices and trade-offs are required. In the end, of this project we will also be
defining a cost function and to use it in order to identify the optimal configuration.
Part1:Preparation
Followed all the steps as given in Part1 of the project document for setting up simple scalar.
Benchmarks were downloaded and verified to match with the data given in the project handout
Part2:Find CP1
Given:
CC1
337353464
337353464
1589871
0.0047
122134327
1287463
0.0105
3265974
401908
0.1231
GO
545823063
545823063
716280
0.0013
211701947
176428
0.0008
955573
69334
0.0726
Anagram
25729033
25729033
504
0.00000
9295564
18357
0.002
24107
5252
0.2179
Calculating CPI:
CPI = 1+
+
=
es
Benchmarks
CC1
GO
ANAGRAM
CPI
1.06634195
1.01313547
1.01177939
L1 Cache: 128KB
L2 Cache : 1MB
Block size: 128B, 64B, 32B
Associativity: 8, 2, 1
Block Replacement Policy: LRU, FIFO and Random
In addition, we chose three combinations of L1 cache with L2 cache for each of the three
benchmarks as shown below:
L1 separate, L2 separate
L1 separate, L2 unified
L1 unified, L2 unified
Data was collected for all these stated parameters using a python script (Can be provided if asked
for but has not been included as it wasnt listed as one of the deliverables). The script was run for
all the above parameters for each of the benchmarks to obtain data for the three L1 and L2 cache
combinations listed above.
Below are the plots showing the CPI for each of the benchmarks for all three above listed cache
configurations for L1 and L2 caches. The chart titles specify the benchmark and also the L1 and
L2 cache combination. The y-axis shows the CPI and the x-axis shows the L1 and L2
configuration corresponding to each and every CPI.
The data in these plots have been arranged to show the least CPI first and then follow an
ascending order reaching the highest CPI. The CPIs are easily traceable by looking at the
corresponding point on the x-axis which would give information on L1 and L2 cache
configurations pertaining to the CPI under observation.
Please note that the data tables and the graphs will follow the following notation:
L1:(No. of sets):(block size):(associativity):(replacement policy)_L2:(No. of sets):(block
size):(associativity):(replacement policy)
8
p
e
r
I
n
s
t
r
L1:2048:32:1:l_L2:32768:32:1:l
L1:2048:32:1:r_L2:32768:32:1:r
L1:2048:32:1:l_L2:16384:32:1:l
L1:2048:32:1:r_L2:16384:32:1:r
L1:2048:32:1:f_L2:16384:32:1:f
of
L1:2048:32:1:f_L2:32768:32:1:f
L1:1024:64:1:r_L2:8192:64:1:r
L1:1024:64:1:f_L2:8192:64:1:f
L1:1024:64:1:l_L2:8192:64:1:l
L1:512:128:1:r_L2:4096:128:1:r
L1:512:128:1:f_L2:4096:128:1:f
policy)_L2:(No.
L1:1024:64:1:r_L2:16384:64:1:r
L1:1024:64:1:f_L2:16384:64:1:f
L1:1024:64:1:l_L2:16384:64:1:l
L1:512:128:1:r_L2:8192:128:1:r
L1:512:128:1:f_L2:8192:128:1:f
L1:512:128:1:l_L2:4096:128:1:l
L1:1024:32:2:r_L2:8192:32:2:r
L1:1024:32:2:f_L2:8192:32:2:f
L1:1024:32:2:l_L2:8192:32:2:l
L1:512:64:2:r_L2:4096:64:2:r
L1:512:64:2:f_L2:4096:64:2:f
L1:256:32:8:r_L2:2048:32:8:r
L1:512:64:2:l_L2:4096:64:2:l
L1:256:32:8:f_L2:2048:32:8:f
L1:256:128:2:f_L2:2048:128:2:f
size):(associativity):(replacement
L1:512:128:1:l_L2:8192:128:1:l
L1:1024:32:2:r_L2:16384:32:2:r
L1:1024:32:2:f_L2:16384:32:2:f
L1:1024:32:2:l_L2:16384:32:2:l
L1:256:32:8:r_L2:4096:32:8:r
L1:256:32:8:f_L2:4096:32:8:f
L1:512:64:2:r_L2:8192:64:2:r
L1:512:64:2:f_L2:8192:64:2:f
L1:512:64:2:l_L2:8192:64:2:l
L1:256:32:8:l_L2:4096:32:8:l
L1:256:128:2:r_L2:2048:128:2:r
L1:256:128:2:l_L2:2048:128:2:l
L1:256:32:8:l_L2:2048:32:8:l
L1:128:64:8:r_L2:1024:64:8:r
L1:128:64:8:f_L2:1024:64:8:f
L1:64:128:8:r_L2:512:128:8:r
L1:64:128:8:f_L2:512:128:8:f
L1:128:64:8:l_L2:1024:64:8:l
L1:256:128:2:r_L2:4096:128:2:r
L1:128:64:8:r_L2:2048:64:8:r
L1:256:128:2:f_L2:4096:128:2:f
L1:128:64:8:f_L2:2048:64:8:f
L1:256:128:2:l_L2:4096:128:2:l
1.12
1.11
1.1
1.09
1.08
1.07
1.06
1.05
1.04
1.03
1.02
1.01
1
L1:128:64:8:l_L2:2048:64:8:l
I
n
s
t
r
L1:64:128:8:r_L2:1024:128:8:r
C
y
c
l
e
s
1.1
1.09
1.08
1.07
1.06
1.05
1.04
1.03
1.02
1.01
1
L1:64:128:8:f_L2:1024:128:8:f
p
e
r
L1:64:128:8:l_L2:512:128:8:l
C
y
c
l
e
s
L1:64:128:8:l_L2:1024:128:8:l
C
y
c
l
e
s
I
n
s
t
r
L1:128:128:8:l_L2:1024:128:8:l
L1:128:128:8:f_L2:1024:128:8:f
L1:128:128:8:r_L2:1024:128:8:r
L1:256:64:8:l_L2:2048:64:8:l
L1:256:64:8:f_L2:2048:64:8:f
L1:256:64:8:r_L2:2048:64:8:r
L1:512:128:2:r_L2:4096:128:2:r
L1:512:128:2:l_L2:4096:128:2:l
L1:512:32:8:l_L2:4096:32:8:l
L1:512:128:2:f_L2:4096:128:2:f
L1:512:32:8:f_L2:4096:32:8:f
L1:512:32:8:r_L2:4096:32:8:r
L1:1024:64:2:l_L2:8192:64:2:l
L1:1024:64:2:f_L2:8192:64:2:f
L1:1024:64:2:r_L2:8192:64:2:r
L1:2048:32:2:l_L2:16384:32:2:l
L1:2048:32:2:f_L2:16384:32:2:f
L1:2048:32:2:r_L2:16384:32:2:r
L1:1024:128:1:l_L2:8192:128:1:l
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:1:r
L1:2048:64:1:l_L2:16384:64:1:l
L1:2048:64:1:f_L2:16384:64:1:f
L1:2048:64:1:r_L2:16384:64:1:r
L1:4096:32:1:l_L2:32768:32:1:l
L1:4096:32:1:f_L2:32768:32:1:f
L1:4096:32:1:r_L2:32768:32:1:r
10
policy)_L2:(No.
of
sets):(block
1.009
1.008
1.007
1.006
1.005
1.004
1.003
p 1.002
e 1.001
1
Each of the plots above clearly shows that L1 unified cache with L2 unified cache gives the best
possible CPI
More of the data analysis is done in the following section.
I
n
s
t
r
C
y
c
l
e
s
p
e
r
I
n
s
t
r
L1:512:128:1:l_L2:4096:128:1:l
L1:512:128:1:f_L2:4096:128:1:f
L1:512:128:1:r_L2:4096:128:1:r
L1:1024:64:1:l_L2:8192:64:1:l
L1:1024:64:1:f_L2:8192:64:1:f
L1:1024:64:1:r_L2:8192:64:1:r
L1:64:128:8:r_L2:512:128:8:r
L1:256:128:2:r_L2:2048:128:2:r
L1:512:64:2:r_L2:4096:64:2:r
L1:128:64:8:r_L2:1024:64:8:r
L1:2048:32:1:l_L2:16384:32:1:l
L1:2048:32:1:f_L2:16384:32:1:f
L1:2048:32:1:r_L2:16384:32:1:r
L1:1024:32:2:r_L2:8192:32:2:r
L1:256:32:8:r_L2:2048:32:8:r
L1:256:128:2:f_L2:2048:128:2:f
L1:256:128:2:l_L2:2048:128:2:l
L1:512:64:2:f_L2:4096:64:2:f
L1:64:128:8:l_L2:512:128:8:l
L1:64:128:8:f_L2:512:128:8:f
L1:128:64:8:l_L2:1024:64:8:l
L1:512:64:2:l_L2:4096:64:2:l
L1:128:64:8:f_L2:1024:64:8:f
L1:256:32:8:l_L2:2048:32:8:l
L1:1024:32:2:l_L2:8192:32:2:l
L1:256:32:8:f_L2:2048:32:8:f
L1:1024:32:2:f_L2:8192:32:2:f
1.00015
1.0001
p 1.00005
1
e
L1:512:128:1:l_L2:8192:128:1:l
L1:512:128:1:f_L2:8192:128:1:f
L1:512:128:1:r_L2:8192:128:1:r
L1:1024:64:1:l_L2:16384:64:1:l
L1:1024:64:1:f_L2:16384:64:1:f
L1:1024:64:1:r_L2:16384:64:1:r
L1:256:128:2:r_L2:4096:128:2:r
L1:64:128:8:r_L2:1024:128:8:r
L1:2048:32:1:l_L2:32768:32:1:l
L1:2048:32:1:f_L2:32768:32:1:f
L1:2048:32:1:r_L2:32768:32:1:r
L1:512:64:2:r_L2:8192:64:2:r
L1:128:64:8:r_L2:2048:64:8:r
L1:1024:32:2:r_L2:16384:32:2:r
L1:256:32:8:r_L2:4096:32:8:r
L1:256:128:2:f_L2:4096:128:2:f
L1:256:128:2:l_L2:4096:128:2:l
L1:512:64:2:f_L2:8192:64:2:f
L1:64:128:8:l_L2:1024:128:8:l
L1:64:128:8:f_L2:1024:128:8:f
L1:128:64:8:l_L2:2048:64:8:l
L1:512:64:2:l_L2:8192:64:2:l
L1:128:64:8:f_L2:2048:64:8:f
L1:256:32:8:l_L2:4096:32:8:l
L1:1024:32:2:l_L2:16384:32:2:l
L1:256:32:8:f_L2:4096:32:8:f
L1:1024:32:2:f_L2:16384:32:2:f
Anagram benchmark
size):(associativity):(replacement
11
policy)_L2:(No.
of
sets):(block
C
y
1.00045
c
1.0004
l 1.00035
1.0003
e 1.00025
s
1.0002
1.00045
1.0004
1.00035
1.0003
1.00025
1.0002
1.00015
1.0001
1.00005
1
C
y
c
l
e
s
p
e
r
I
n
s
t
r
1.000009
1.000008
1.000007
1.000006
1.000005
1.000004
1.000003
1.000002
1.000001
1
L1:128:128:8:r_L2:1024:128:8:r
L1:512:128:2:r_L2:4096:128:2:r
L1:128:128:8:l_L2:1024:128:8:l
L1:512:128:2:l_L2:4096:128:2:l
L1:1024:128:1:l_L2:8192:128:1:l
L1:128:128:8:f_L2:1024:128:8:f
L1:512:128:2:f_L2:4096:128:2:f
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:
L1:256:64:8:r_L2:2048:64:8:r
L1:1024:64:2:r_L2:8192:64:2:r
L1:256:64:8:l_L2:2048:64:8:l
L1:1024:64:2:l_L2:8192:64:2:l
L1:2048:64:1:l_L2:16384:64:1:l
L1:256:64:8:f_L2:2048:64:8:f
L1:1024:64:2:f_L2:8192:64:2:f
L1:2048:64:1:f_L2:16384:64:1:f
L1:2048:64:1:r_L2:16384:64:1:r
L1:512:32:8:r_L2:4096:32:8:r
L1:512:32:8:l_L2:4096:32:8:l
L1:2048:32:2:l_L2:16384:32:2:l
L1:4096:32:1:l_L2:32768:32:1:l
L1:512:32:8:f_L2:4096:32:8:f
L1:2048:32:2:f_L2:16384:32:2:f
L1:4096:32:1:f_L2:32768:32:1:f
L1:2048:32:2:r_L2:16384:32:2:r
L1:4096:32:1:r_L2:32768:32:1:r
12
policy)_L2:(No.
of
sets):(block
Each of the plots above clearly shows that L1 unified cache with L2 unified cache gives the best
possible CPI
More of the data analysis is done in the following section.
p
e
r
I
n
s
t
r
e
r
I
n
s
t
r
L1:64:128:8:r_L2:512:128:8:r
L1:64:128:8:l_L2:512:128:8:l
L1:64:128:8:f_L2:512:128:8:f
L1:256:128:2:l_L2:2048:128:2:l
L1:256:128:2:r_L2:2048:128:2:r
L1:256:128:2:f_L2:2048:128:2:f
L1:128:64:8:r_L2:1024:64:8:r
L1:128:64:8:l_L2:1024:64:8:l
L1:512:64:2:f_L2:4096:64:2:f
L1:512:64:2:l_L2:4096:64:2:l
L1:512:64:2:r_L2:4096:64:2:r
L1:128:64:8:f_L2:1024:64:8:f
L1:1024:32:2:l_L2:8192:32:2:l
L1:256:32:8:r_L2:2048:32:8:r
L1:1024:32:2:r_L2:8192:32:2:r
L1:1024:32:2:f_L2:8192:32:2:f
L1:256:32:8:l_L2:2048:32:8:l
L1:256:32:8:f_L2:2048:32:8:f
L1:1024:64:1:l_L2:8192:64:1:l
L1:1024:64:1:f_L2:8192:64:1:f
L1:1024:64:1:r_L2:8192:64:1:r
L1:2048:32:1:l_L2:16384:32:1:l
L1:2048:32:1:f_L2:16384:32:1:f
L1:2048:32:1:r_L2:16384:32:1:r
L1:512:128:1:l_L2:4096:128:1:l
L1:512:128:1:f_L2:4096:128:1:f
L1:512:128:1:r_L2:4096:128:1:r
C
y
1.03
c 1.025
1.02
l
1.015
e
1.01
s 1.005
L1:64:128:8:r_L2:1024:128:8:r
L1:64:128:8:l_L2:1024:128:8:l
L1:64:128:8:f_L2:1024:128:8:f
L1:256:128:2:l_L2:4096:128:2:l
L1:256:128:2:r_L2:4096:128:2:r
L1:256:128:2:f_L2:4096:128:2:f
L1:128:64:8:l_L2:2048:64:8:l
L1:128:64:8:r_L2:2048:64:8:r
L1:512:64:2:l_L2:8192:64:2:l
L1:512:64:2:f_L2:8192:64:2:f
L1:512:64:2:r_L2:8192:64:2:r
L1:128:64:8:f_L2:2048:64:8:f
L1:256:32:8:r_L2:4096:32:8:r
L1:1024:32:2:l_L2:16384:32:2:l
L1:256:32:8:l_L2:4096:32:8:l
L1:1024:32:2:r_L2:16384:32:2:r
L1:1024:32:2:f_L2:16384:32:2:f
L1:256:32:8:f_L2:4096:32:8:f
L1:1024:64:1:l_L2:16384:64:1:l
L1:1024:64:1:f_L2:16384:64:1:f
L1:1024:64:1:r_L2:16384:64:1:r
L1:512:128:1:l_L2:8192:128:1:l
L1:512:128:1:f_L2:8192:128:1:f
L1:512:128:1:r_L2:8192:128:1:r
L1:2048:32:1:l_L2:32768:32:1:l
L1:2048:32:1:f_L2:32768:32:1:f
L1:2048:32:1:r_L2:32768:32:1:r
GO benchmark
13
policy)_L2:(No.
of
sets):(block
C
y
c
l
e
s
1.04
1.035
1.03
1.025
1.02
1.015
1.01
1.005
p
1
1.003
1.0025
1.002
1.0015
1.001
1.0005
1
p
e
r
I
n
s
t
r
L1:512:128:2:l_L2:4096:128:2:l
L1:1024:128:1:l_L2:8192:128:1:l
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:1:r
L1:512:128:2:f_L2:4096:128:2:f
L1:512:128:2:r_L2:4096:128:2:r
L1:1024:64:2:l_L2:8192:64:2:l
L1:128:128:8:f_L2:1024:128:8:f
L1:1024:64:2:r_L2:8192:64:2:r
L1:2048:64:1:l_L2:16384:64:1:l
L1:2048:64:1:f_L2:16384:64:1:f
L1:2048:64:1:r_L2:16384:64:1:r
L1:1024:64:2:f_L2:8192:64:2:f
L1:128:128:8:r_L2:1024:128:8:r
L1:128:128:8:l_L2:1024:128:8:l
L1:256:64:8:f_L2:2048:64:8:f
L1:2048:32:2:r_L2:16384:32:2:r
L1:4096:32:1:l_L2:32768:32:1:l
L1:4096:32:1:f_L2:32768:32:1:f
L1:4096:32:1:r_L2:32768:32:1:r
L1:2048:32:2:f_L2:16384:32:2:f
L1:2048:32:2:l_L2:16384:32:2:l
L1:256:64:8:r_L2:2048:64:8:r
L1:256:64:8:l_L2:2048:64:8:l
L1:512:32:8:f_L2:4096:32:8:f
L1:512:32:8:r_L2:4096:32:8:r
L1:512:32:8:l_L2:4096:32:8:l
14
policy)_L2:(No.
of
sets):(block
C
y
c
l
e
s
Each of the plots above clearly shows that L1 unified cache with L2 unified cache gives the best
possible CPI
More of the data analysis is done in the following section.
size):(associativity):(replacement
policy)_L2:(No.
GCC benchmark
GCC
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
15
of
sets):(block
p
e
r
I
n
s
t
r
16
L1:128:64:8:f_L2:2048:64:8:f
L1:256:128:2:l_L2:4096:128:2:l
L1:128:64:8:l_L2:2048:64:8:l
L1:64:128:8:r_L2:1024:128:8:r
L1:64:128:8:f_L2:1024:128:8:f
L1:64:128:8:l_L2:1024:128:8:l
L1:128:64:8:r_L2:1024:64:8:r
L1:128:64:8:f_L2:1024:64:8:f
L1:64:128:8:r_L2:512:128:8:r
L1:64:128:8:f_L2:512:128:8:f
L1:128:64:8:l_L2:1024:64:8:l
L1:64:128:8:l_L2:512:128:8:l
L1:512:128:2:l_L2:4096:128:2:l
L1:256:64:8:f_L2:2048:64:8:f
L1:256:64:8:l_L2:2048:64:8:l
L1:128:128:8:r_L2:1024:128:8:r
L1:128:128:8:f_L2:1024:128:8:f
C
y
c
l
e
s
L1:128:128:8:l_L2:1024:128:8:l
1.0499
1.0399
1.0299
1.0199
1.0099
0.9999
ANAGRAM benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
size):(associativity):(replacement
policy)_L2:(No.
of
sets):(block
Anagram
CPI
L1:128:128:8:l_L2:1024:128:8:l
1.000004509
L1:512:128:2:l_L2:4096:128:2:l
1.000004509
L1:256:64:8:l_L2:2048:64:8:l
1.000006355
L1:1024:64:2:l_L2:8192:64:2:l
1.000006355
L1:512:32:8:l_L2:4096:32:8:l
1.000008563
L1:2048:32:2:l_L2:16384:32:2:l
1.000008563
L1:512:128:1:l_L2:8192:128:1:l
1.000321074
L1:512:128:1:f_L2:8192:128:1:f
1.000321074
L1:512:128:1:r_L2:8192:128:1:r
1.000321074
L1:512:128:1:l_L2:4096:128:1:l
1.000333232
L1:512:128:1:f_L2:4096:128:1:f
1.000333232
L1:512:128:1:r_L2:4096:128:1:r
1.000333232
L1:1024:64:1:l_L2:16384:64:1:l
1.000342179
L1:1024:64:1:f_L2:16384:64:1:f
1.000342179
L1:1024:64:1:r_L2:16384:64:1:r
1.000342179
L1:1024:64:1:l_L2:8192:64:1:l
1.000348435
L1:1024:64:1:f_L2:8192:64:1:f
1.000348435
L1:1024:64:1:r_L2:8192:64:1:r
1.000348435
*listed in ascending order
17
L1:1024:64:1:r_L2:8192:64:1:r
L1:1024:64:1:f_L2:8192:64:1:f
L1:1024:64:1:l_L2:8192:64:1:l
L1:1024:64:1:r_L2:16384:64:1:r
L1:1024:64:1:f_L2:16384:64:1:f
L1:1024:64:1:l_L2:16384:64:1:l
L1:512:128:1:r_L2:4096:128:1:r
L1:512:128:1:f_L2:4096:128:1:f
L1:512:128:1:l_L2:4096:128:1:l
L1:512:128:1:r_L2:8192:128:1:r
L1:512:128:1:f_L2:8192:128:1:f
L1:512:128:1:l_L2:8192:128:1:l
L1:2048:32:2:l_L2:16384:32:2:l
L1:512:32:8:l_L2:4096:32:8:l
L1:1024:64:2:l_L2:8192:64:2:l
L1:256:64:8:l_L2:2048:64:8:l
I
n
s
t
r
L1:512:128:2:l_L2:4096:128:2:l
p
e
r
1.0004
1.00035
1.0003
1.00025
1.0002
1.00015
1.0001
1.00005
1
0.99995
0.9999
L1:128:128:8:l_L2:1024:128:8:l
C
y
c
l
e
s
GO benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
size):(associativity):(replacement
policy)_L2:(No.
of
sets):(block
GO
CPI
L1:512:128:2:l_L2:4096:128:2:l
1.000804
L1:1024:128:1:l_L2:8192:128:1:l
1.000816
L1:1024:128:1:f_L2:8192:128:1:f
1.000816
L1:1024:128:1:r_L2:8192:128:1:r
1.000816
L1:512:128:2:f_L2:4096:128:2:f
1.000855
L1:512:128:2:r_L2:4096:128:2:r
1.000892
L1:64:128:8:r_L2:512:128:8:r
1.004668
L1:64:128:8:l_L2:512:128:8:l
1.004956
L1:64:128:8:f_L2:512:128:8:f
1.005102
L1:256:128:2:l_L2:2048:128:2:l
1.005156
L1:256:128:2:r_L2:2048:128:2:r
1.005336
L1:256:128:2:f_L2:2048:128:2:f
1.005828
L1:64:128:8:r_L2:1024:128:8:r
1.008772
L1:64:128:8:l_L2:1024:128:8:l
1.008818
L1:64:128:8:f_L2:1024:128:8:f
1.009194
L1:256:128:2:l_L2:4096:128:2:l
1.009255
L1:256:128:2:r_L2:4096:128:2:r
1.009582
L1:256:128:2:f_L2:4096:128:2:f
1.010047
*listed in ascending order
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1unified_L2unified
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2separate
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
L1separate_L2unified
1.0119
p
e
r
1.0019
1.0079
1.0059
1.0039
L1:256:128:2:f_L2:4096:128:2:f
L1:256:128:2:r_L2:4096:128:2:r
L1:256:128:2:l_L2:4096:128:2:l
L1:64:128:8:f_L2:1024:128:8:f
L1:64:128:8:l_L2:1024:128:8:l
L1:64:128:8:r_L2:1024:128:8:r
L1:256:128:2:f_L2:2048:128:2:f
L1:256:128:2:r_L2:2048:128:2:r
L1:256:128:2:l_L2:2048:128:2:l
L1:64:128:8:f_L2:512:128:8:f
L1:64:128:8:l_L2:512:128:8:l
L1:64:128:8:r_L2:512:128:8:r
L1:512:128:2:r_L2:4096:128:2:r
L1:512:128:2:f_L2:4096:128:2:f
L1:1024:128:1:r_L2:8192:128:1:r
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:l_L2:8192:128:1:l
0.9999
L1:512:128:2:l_L2:4096:128:2:l
I
n
s
t
r
1.0099
18
Data Analysis
From analysis of this data, we can clearly point out that L1 unified and L2 unified gives the
lowest CPI values for all three benchmarks.
The trend followed by L1 cache and L2 cache configuration on the three benchmarks going
from lower CPI to higher CPI
GCC Benchmark
L1 united, L2 united
L1 separate, L2 separate
L1 separate, L2 unified
ANAGRAM Benchmark
L1 united, L2 united
L1 separate, L2 separate
(back and forth)
L1 separate, L2 unified
GO Benchmark
L1 united, L2 united
L1 separate, L2 separate
L1 separate, L2 unified
Now we can clearly reach the following conclusions for the most optimized CPI:
19
GCC Benchmark (selected cache config for the best six CPIs)
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
size):(associativity):(replacement
GCC
L1:128:128:8:l_L2:1024:128:8:l
L1:128:128:8:f_L2:1024:128:8:f
L1:128:128:8:r_L2:1024:128:8:r
L1:256:64:8:l_L2:2048:64:8:l
L1:256:64:8:f_L2:2048:64:8:f
L1:256:64:8:r_L2:2048:64:8:r
policy)_L2:(No.
of
sets):(block
CPI
1.001753557
1.001758036
1.002152616
1.002231389
1.002353557
1.002748137
It has been inferred that CPI follows a trend with the least to the maximum CPI values varying
according to block size->replacement->associativity
For example:
128:8:l will have better CPI than 128:8:f
128:8:f will have a better CPI than 128:8:r
128:8:r will have a better CPI than 64:8:l
64:8:l will have a better CPI than 64:8:f
64:8:f will have a better CPI than 128:8:r
Notation:
Block_size:associativity:replacement_policy
20
ANAGRAM Benchmark (selected cache config for the best six CPIs)
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
size):(associativity):(replacement
Anagram
L1:128:128:8:l_L2:1024:128:8:l
L1:512:128:2:l_L2:4096:128:2:l
L1:256:64:8:l_L2:2048:64:8:l
L1:1024:64:2:l_L2:8192:64:2:l
L1:512:32:8:l_L2:4096:32:8:l
L1:2048:32:2:l_L2:16384:32:2:l
policy)_L2:(No.
of
sets):(block
CPI
1.000004509
1.000004509
1.000006355
1.000006355
1.000008563
1.000008563
The CPI follows a trend with the least to the maximum CPI values varying according to block
associativity->block size->replacement
For example:
128:8:l will have better CPI than 128:2:l
128:2:l will have a better CPI than 64:8:l
64:8:l will have a better CPI than 64:2:l
64:2:l will have a better CPI than 32:8:l
32:8:l will have a better CPI than 32:2:l
Notation:
Block_size:associativity:replacement_policy
21
size):(associativity):(replacement
GO
L1:512:128:2:l_L2:4096:128:2:l
L1:1024:128:1:l_L2:8192:128:1:l
L1:1024:128:1:f_L2:8192:128:1:f
L1:1024:128:1:r_L2:8192:128:1:r
L1:512:128:2:f_L2:4096:128:2:f
L1:512:128:2:r_L2:4096:128:2:r
policy)_L2:(No.
of
sets):(block
CPI
1.000804
1.000816
1.000816
1.000816
1.000855
1.000892
The CPI follows a trend with the least to the maximum CPI values varying according to block
replacement->associativity->block size
For example:
128:1:l will have better CPI than 128:1:f
128:1:f will have a better CPI than 128:1:r
128:2:f will have a better CPI than 128:2:r
Notation:
Block_size:associativity:replacement_policy
However, for optimized CPI, we chose to select the one with the least possible CPI and
domination amongst the above listed six lowest CPI values.
22
The below choices have been made to obtain the lowest possible CPI that would guarantee the
best possible performance under the assumptions listed in the starting on Part3 of this report[a].
CPI is a function of block size, associativity and replacement policy. We also consider L1 and L2
cache configuration combinations. As can been seen in the below table, unified L1 unified L2
cache, a bigger block size, higher associativity and LRU replacement policy help us satisfy the
objective of this part.
The most optimized L1 and L2 configuration is chosen to be:
L1 unified and L2 unified
GCC
Anagram
GO
Cache Config
L1 unified,
L2 unified
L1 unified,
L2 unified
L1 unified,
L2 unified
Block size
Associativity
Replacement
128
"l"
128
"l"
128
"l"
However, there are tradeoffs for considering the best possible CPI. We will look into this in
Part4 of this report.
23
Associativity :
Cost increases as the associativity is increased. Therefore, it was assumed that the cost to
implement an 8way associativity is higher than the cost for a 2way associativity which would
be higher than a 1way associativity.
Replacement policy
Considered as a 2% increase from Random to FIFO and a 5% increase from FIFO to LRU
The above choices have been considered to completely define the cost function for caches in
terms of area overhead and performance.
Data supporting the above stated factors influencing cost function for caches follows in Part5 of
this report.
24
It is observed from the above plot that as the cost increases, the configurations give CPI
values that are closer to
=1 compared to when the cost is on the lower edge. Also
depicted by the narrowing nature of the CPI curve as the cost increases (Fig1).
The stability factor also increases with use of cache configuration that gives desirable
CPI but nothing comes for free, so we end up paying a higher cost.
Hence we decide to choose values to the right half of the plots guarantying a better CPI.
25
GCC Benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
size):(associativity):(replacement
policy)_L2:(No.
of
sets):(block
The above plot shows GCC benchmark showing Cost on the left side vertical axis and CPI on the
right side vertical axis. Both have been plotted with x-axis in common. X-axis shows L1 and L2
cache configuration.
The arrows on the plot show the chosen CPIs and Cache config for optimized cost
26
Fig. 1
The ideal configuration chosen to have cost optimization along with good performance is shown
below with its CPI, cost and cache configuration. This was chosen because it gave a good CPI
value for its cost. Another better alternative is shown in Fig. 3 but that would be a cost of 49.5
which means a relatively high increase in cost for a small change in CPI value that gauarantees
CPI to be closer to CPI ideal. Hence we choose the optimized CPI to be in the one in Fig2.
GCC
L1unified_L2unified
CPI
L1:1024:64:2:l_L2:8192:64:2:l 1.003789425
Cost
27
Cost
49.5
Fig. 3
GCC Optimal Configuration: L1 unified, L2 unified Cache; Block size: 64, Associativity: 2,
Replacement Policy: LRU
27
Anagram Benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
size):(associativity):(replacement
policy)_L2:(No.
of
sets):(block
The arrows on the plot show the chosen CPIs and Cache config for optimized cost
The ideal configuration chosen to have cost optimization along with good performance is shown
below with its CPI, cost and cache configuration. This was chosen because it gave a good CPI
value for its cost. Another better alternative is shown in Fig. 5 but that would be a cost of 49.5
which means a relatively high increase in cost for a small change in CPI value that guarantees
CPI to be closer to CPI ideal. Hence we choose the optimized CPI to be in the one in Fig 4.
Anagram
L1unified_L2unified
CPI
L1:1024:64:2:l_L2:8192:64:2:l 1.000006355
28
Cost
27
CPI
1.000008519
Cost
48
Fig. 5
ANAGRAM Optimal Configuration: L1 unified, L2 unified Cache; Block size: 64,
Associativity: 2, Replacement Policy: LRU
GO Benchmark
NOTATION: L1:(No. of sets):(block
size):(associativity):(replacement policy)
size):(associativity):(replacement
policy)_L2:(No.
The arrows on the plot show the chosen CPIs and Cache config for optimized cost
29
of
sets):(block
The ideal configuration chosen to have cost optimization along with good performance is shown
below with its CPI, cost and cache configuration. This was chosen because it gave a good CPI
value for its cost. Another better alternative is shown in Fig. 7 but that would be a cost of 49.5
which means a relatively high increase in cost for a small change in CPI value that guarantees
CPI to be closer to CPI ideal. Hence we choose the optimized CPI to be in the one in Fig 6.
GO
L1separate_L2unified
CPI
L1:1024:64:2:l_L2:8192:64:2:l 1.001065804
Cost
27
Cost
48
Fig. 7
GO Optimal Configuration: L1 separate, L2 unified Cache; Block size: 64, Associativity: 2,
Replacement Policy: LRU
Below is a table comparing all of the optimized cost and performance based results from the
three benchmarks.
30
OPTIMAL CONFIGURATION
The best optimal configuration from all the benchmarks is guaranteed by Anagram considering
cost in terms of optimized CPI ( Fig. 8).
GCC
L1unified_L2unified
L1:1024:64:2:l_L2:8192:64:2:l
CPI
1.003789425
Cost
27
Anagram
L1unified_L2unified
CPI
Cost
L1:1024:64:2:l_L2:8192:64:2:l 1.0000064
27
GO
L1separate_L2unified
L1:1024:64:2:l_L2:8192:64:2:l
CPI
1.001065804
Cost
27
Fig. 8
The best optimal configuration for all the benchmarks considered in terms of average CPI
shown in Fig. 10 as asked in the question
Average CPI
1.001620543
Average Cost
27
Fig. 9
Optimal Configurations:
Optimal Config for all Benchmarks(in terms of avg CPI)
GCC
L1:1024:64:2:l_L2:8192:64:2:l
Anagram
L1:1024:64:2:l_L2:8192:64:2:l
GO
L1:1024:64:2:l_L2:8192:64:2:l
Fig. 10
31
References
-
32