Professional Documents
Culture Documents
1
Focus of this tutorial Program
Types: 8:30 – 8:45: Introduction
High Level, Hybrid High/Low level 8:45 – 9:00: Capturing temporal locality behavior
A priori knowledge: 9:00 – 9:30: Modeling cache sharing
White Box 9:30 – 10:00: Modeling cache replacement policy
Scope: 10:00 – 10:30: Coffee Break
Miss count & rate
10:30 – 11:30: Analysis of the effects of miss
Miss cost
clustering on the cost of a cache miss
Bandwidth usage
11:30 – 12:30: Interaction of Caching and Bandwidth
Pressure
5 6
2
Stack Distance Profiling [Mattson’70] Typical Shape
Early attempt to capture temporal reuse behavior Empirical observation ⇒ Geometric or exponential sequence
Models LRU stack with a counter for each stack position Due to temporal locality
Example: fully associative cache with 8-entry stack Ci+1 = Ci x r, where 0<r<1 is the common ratio
C1: incremented whenever the MRU block is accessed Incremented on access to MRU line
C2: incremented whenever the 2nd MRU block is accessed
30% Incremented on access to 2nd MRU line
C3: incremented whenever the 3rd MRU block is accessed
Percent of Accesses
25%
… Incremented on access to 3rd MRU line
20% Incremented on a miss
C8: incremented whenever the 8th MRU (or LRU) block is
accessed 15%
9 10
11 12
3
Limitations of Stack Distance Profile Definitions
Useful only for predicting cache misses across seq (d,n) = sequence of n accesses to d distinct addresses
different cache associativities (in a cache set)
cseq (d,n) (circular sequence) = a sequence in which the
For other purposes, we need to capture temporal first and the last accesses are to the same address
reuse patterns in greater details
So, use Circular Sequence Profile [Chandra’05]
Extends stack distance profiling seq(5,8)
Counts the occurrence of cseq(d,n) cseq(5,7)
A B C D A E E B
cseq(4,5) cseq(1,2)
13 14
4
Collecting Circular Sequence Profile Collecting Circular Sequence Profile
MRU LRU MRU LRU
1 2 3 4 1 2 3 4
LRU Stack A LRU Stack B A
Access Counter 1 Access Counter 1 2
5
Collecting Circular Sequence Profile Collecting Circular Sequence Profile
MRU LRU MRU LRU
1 2 3 4 1 2 3 4
Found a
LRU Stack B C A LRU Stack B C A
Circular Sequence!
Access Counter 1 2 4 Access Counter 2 3 5
6
Shared Cache Challenge Impact of Cache Space Contention
400%
L2 Cache Misses
mcf+mst
mcf+gzip
mcf+art
mcf+swim
Alone
mcf+swim
mcf+gzip
mcf+mst
mcf+art
Alone
L2 Cache
25 26
27 28
7
Circular Sequence Properties Example
Thread X runs alone in the system: Assume a 4-way associative cache:
Given a circular sequence cseqX(dX,nX), the last access is
a cache miss iff dX > Assoc X’s circular sequence Y’s intervening
cseqX(2,3) access sequence
Thread X shares the cache with thread Y: A B A U V V W
During cseqX(dX,nX)’s lifetime if there is a sequence of
intervening accesses seqY(dY,nY), the last access of thread lifetime
X is a miss iff dX+dY > Assoc
No cache sharing: A is a cache hit
Cache sharing: is A a cache hit or miss?
29 30
31 32
8
Computing P(seq(dY, nY)) Overall Formula
d
Basic Idea: Define: ∑C i
P (d − ) = i =1
and P(d + ) = 1 − P(d − )
seq(d,n) + 1 access to an ∞
already-seen address ∑C i
+ 1 access to a i.e. forming a circular sequence i =1
new address A B with 1..d distinct addresses
P(seq(d,n)) is computed by:
seq(d-1,n-1) seq(d,n-1) 1 d = n =1
P ((d − 1) + ) × P( seq(d − 1, n − 1)
This is a Markov process with 3 states, and 2 edges d = n >1
P(seq(d,n)) = A * P(seq(d-1,n-1)) + B * P(seq(d-1,n)) P( seq(d , n)) = P (1− ) × P( seq(1, n − 1)) n > d =1
d P (d − ) × P( seq(d , n − 1)) n > d >1
∑C i
B= i =1
∞
and A = 1− B + P((d − 1) + ) × P( seq(d − 1, n − 1))
∑ Ci
i =1
33 34
P (1+ ) P (2 − )
Predict the total misses for thread X:
seq(1,2) seq(2,2) A
missX = oldmissX + ∑ Pmiss(cseqX (d X , nX )) × Cd X
P(1− ) P(1+ ) d X =1
seq(1,1) seq(1,1)
1 1
35 36
9
Observations
Based on how vulnerable to cache sharing impact:
Some are highly vulnerable
Some are not vulnerable
Many are somewhat / sometimes vulnerable
Modeling Replacement Policy
Insights:
Traditional characterizations: not indicative of impact of
Performance
sharing
Low vs. High IPC
Int vs. Floating-Point
High Miss Rate vs. Low Miss Rate
Rather, interaction of temporal reuse behavior determine
impact of cache sharing
37
Motivation Motivation
Cache design critical to performance Performance variation due to replacement policy is significant
L2 Miss Rates Normalized Exec Time
Memory wall: cache miss cost hundreds of processor
cycles 100%
18% 32%
100% 13% 19%
Capacity pressure: Multi-core design, Virtual Machine
80% 80%
67%
60% 47% 60%
40% 40%
20% 20%
0% 0%
art ammp cg art ammp cg
Important parameters: size, associativity, block size,
LRU Rand-MRUskw LRU Rand-MRUskw
and replacement policy No agreement on the “best implementation”
Intel Pentium: LRU
Intel XScale: FIFO
IBM Power 4: tree-based pseudo-LRU
Others: round robin, random, replacement hints, etc.
39 40
10
Motivation Would be useful to model replacement
policies
No analytical model, past models assume
LRU [Cascaval03, Chandra05, Ghosh97, Quong94, Sen02,
Singh92, and Suh01] App 1 Circular Seq
or Random [Agarwal89, Berg04, Ladner99] Profiling
...
Predicted miss rate for each app
App N Circular Seq on each replacement policy
Prediction
LRU/Random simplifies modeling, but Profiling
Model MissRate App 1 ... App N
Ignores performance variation due to replacement policy RP 1
RP 1’s Replacement
Inaccurate for highly associative caches Probability Function (RPF) ...
...
RP M’s Replacement RP M
Probability Function (RPF)
41 42
Outline Outline
Input of the Model Input of the Model
Replacement Policy Model Replacement Probability Function (RPF)
Case Study Circular Sequence Profiles
43 44
11
Replacement Prob Function (RPF) Outline
RPF, denoted as Prepl(.) = a probability function, where Prepl(i) Input of the Model
is the probability that a cache block on the ith stack position
is replaced on a cache miss. Replacement Policy Model
Markov states
1 1 1
Prepl(i)
1 1
Markov state transitions
LRU NMRU1 NMRU4 0.8 Rand-MRUskw 0.8 Rand-LRUskw
0.5 0.5 0.5
0.6
0.4
0.6
0.4
Case Study
0.2 0.2
0
1 2 3 4 5 6 7 8
0
1 2 3 4 5 6 7 8
0
1 2 3 4 5 6 7 8
0
1 2 3 4 5 6 7 8
0
1 2 3 4 5 6 7 8
Conclusions
stack position i stack position i stack position i stack position i stack position i
45 46
47 48
12
Illustration Illustration
Initial state: (d=4, n=5, p=∞) Current state: (d=3, n=4, p=1)
49 50
Illustration Illustration
Current state: (d=2, n=3, p=2) Current state: (d=1, n=2, p=3)
51 52
13
Illustration Illustration
Current state: (d=0, n=1, p=?) Final state: (d=0, n=1, p=∞)
53 54
55 56
14
State Transitions State Transitions Diagram
57 58
2: Dist, Miss, NoRp, Shift 4: NoDist, Miss, NoRp, NoShift 2: Dist, Miss, NoRp, Shift 4: NoDist, Miss, NoRp, NoShift
(d,n,p) (d,n,p)
7: Dist, Hit 5: NoDist, Miss, NoRp, Shift 7: Dist, Hit 5: NoDist, Miss, NoRp, Shift
59 60
15
State Transitions Diagram State Transitions Diagram
2: Dist, Miss, NoRp, Shift 4: NoDist, Miss, NoRp, NoShift 2: Dist, Miss, NoRp, Shift 4: NoDist, Miss, NoRp, NoShift
(d,n,p) (d,n,p)
7: Dist, Hit 5: NoDist, Miss, NoRp, Shift 7: Dist, Hit 5: NoDist, Miss, NoRp, Shift
61 62
(d-1,n-1,p+1) (d,n-1,p+1)
3: Dist, Miss, Rp 6: NoDist, Miss, Rp
End of State
63 64
16
Case Study: Using Model Only Case Study Unimodal
Number of
Peak
Accesses
Goal : When is LRU pathological? Case1: Unimodal Working Set
1 2 … … A … … … …
Hard to pinpoint with simulations because many possible Stack Position
contributing factors 100%
Isolate the impact of temporal reuse pattern
80%
60%
Synthetic Stack Distance Profiles:
L2 Miss rate
NMRU4
NMRU1
40%
Rand-LRUskw
Rand-MRUskw
20%
LRU
0%
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31
Peak1
Accesses
LRU
L2 Miss Rate
LRU
30% 30%
Modeling can reveal non-obvious insights:
20% 20%
Cache miss rates due to shared cache space contention
10% 10% Not capturable by simple metrics low IPC vs. high IPC, low
0%
miss rates vs. high miss rates, int vs. floating-point
0%
9 10 9 11 10 12 11 13 12 14 13 15 14 16 15 16 Interaction of temporal reuse patterns of several applications
Peak2's Stack Position
Peak2's Stack Position
17
Introduction
Analyical CAche Performance Prediction (ACAPP)
tool suite
How to Use ACAPP Prediction for different cache associativities
Prediction for different cache replacement polices
Prediction for cache contention when two threads share the cache
Adding new replacement policies with user-specified RPFs
Input
Circular Sequence profile of each application
Generated by any simulator follow certain format
Providing extension code for SimpleScalar
Released
Available for download in
http://www.ece.ncsu.edu/arpers
70
71 72
18
Prediction under varying cache associativity Print supported replacement policies
acapp -a <assoc> [<min assoc> <max assoc>] -f1 <profile1> acapp -log
EXAMPLE
EXAMPLE
#./acapp -log
#./acapp -a 4 7 -f1 ./csq/benchmark1.csq
OUTPUT
OUTPUT
* * * * SUPPORTED REPLACEMENT POLICIES LOGFILE * * * *
The CSQ file is generated using the following cache parameters:
1 - NMRU4
Sets: 1024
2 - NMRU1
Associativity: 4
3 - LRUskw
Block size: 64
Original miss rate: 0.768043 4 - MRUskw
Miss rate for A = 4: 0.768043
Miss rate for A = 5: 0.733912 Prepl(i)
Miss rate for A = 6: 0.689150 1 1 1 1 1
LRU NMRU1 NMRU4 0.8 Rand-MRUskw 0.8 Rand-LRUskw
Miss rate for A = 7: 0.607186 0.6 0.6
0.5 0.5 0.5 0.4 0.4
0.2 0.2
Note: benchmark1.csq represents the L2 profile of swim (ref input set). 0 0 0 0 0
benchmark2.csq represents the L2 profile of apsi (ref input set). 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
stack position i stack position i stack position i stack position i stack position i
benchmark3.csq represents the L2 profile of ammp (ref input set).
73 74
Prediction under varying cache replacement Prediction for all supported replacement
policies policies
acapp -pA -f1 <profile1>
acapp -p <rpindex> -f1 <profile1>
EXAMPLE
EXAMPLE #./accap -pA -f1 ./csq/benchmark3.csq
#./acapp -p 2 -f1 ./csq/benchmark1.csq
OUTPUT
The CSQ file is generated using the following cache parameters:
OUTPUT
Sets: 1024
The CSQ file is generated using the following cache parameters:
Associativity: 8
Sets: 1024
Block size: 64
Associativity: 4
Block size: 64
1 - NMRU4 3 - LRUskw
Prediction Result for NMRU1
Prediction Result for NMRU4 Prediction Result for LRUskw
LRU: 0.768043
LRU: 0.702653 LRU: 0.702653
Pred: 0.739310
Pred: 0.406974 Pred: 0.376921
******************************* *******************************
2 - NMRU1 4 - MRUskw
Prediction Result for NMRU1 Prediction Result for MRUskw
LRU: 0.702653 LRU: 0.702653
Pred: 0.331667 Pred: 0.201158
******************************* *******************************
75 76
19
Prediction under cache contention Adding new replacement policy
require “usr_rp.in” file:
#This is the configuration file of user specified replacement policy.
accap -c -f1 <profile1> -f2 <profile2> #Please do no change the format of this file and the NAME, ASSOC or PROB keywords. Only
#the values of each of line can be changed by the user.
EXAMPLE NAME newRp
ASSOC 8
#./accap -c -f1 ./csq/benchmark1.csq -f2 ./csq/benchmark2.csq PROB
0.2 0.1 0.05 0.25 0.15 0.07 0.03 0.15
OUTPUT
The CSQ file is generated using the following cache parameters: accap - n (default)
Sets: 1024 EXAMPLE
Associativity: 4
Block size: 64 #./accap –n
******** RESULTS ********
OUTPUT
./csq/benchmark1.csq ./csq/benchmark2.csq Creating new replacement policy...
Accesses: 68182266 11805186 Coefficient file(s) added to: ./fine/newRp/d_0
Predicted miss rate: 0.868560 0.338920 Coefficient file(s) added to: ./fine/newRp/d_1
…
Original miss rate: 0.768043 0.244366 …
77 78
79 80
20
The Effects of Miss Clustering on
the Cost of a Cache Miss Acknowledgments
Arthur Nadas
Jim Mitchell
Phil Emma Jane Bartik
Allan Hartstein Dan Prener
Thomas R. Puzak Peter Oden
Doug Logan
Viji Srinivasan
John Griswell
C R Attanasio
Danny Lynch
Dept: Systems Technology and Microarchitecture
Moin Qureshi
IBM T. J. Watson Researh Center
Percent
10%
L3 = 75 Cycles
5%
0% Memory = 300 Cycles
20% 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900
25 75 125 175 225 275 325 375 425 475 525 575 625 675 725 775 825 875
15% L2 = 15 Cycles Cluster Size = 1 12%
Percent
L3 = 75 Cycles 10%
10% 8%
Cluster Size = 2
Percent
Memory = 300 Cycles 6%
4%
5% 2%
0%
0% 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900 25 75 125 175 225 275 325 375 425 475 525 575 625 675 725 775 825 875
25 75 125 175 225 275 325 375 425 475 525 575 625 675 725 775 825 875
6%
5%
4%
Cluster Size = 3
Percent
3%
2%
1%
0%
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900
25 75 125 175 225 275 325 375 425 475 525 575 625 675 725 775 825 875
3%
3%
2%
2%
Cluster Size = 4
1%
1%
0%
0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 750 800 850 900
25 75 125 175 225 275 325 375 425 475 525 575 625 675 725 775 825 875
C+L
( )
C
=N Decode/Endop
times, Miss Info
3%
2%
Cluster Size = 4 Report Output Miss
Spectrogram
2%
1%
1% Cost/Analysis
0% Report
540 600 660 720 780 840 900 960 1020 1080 1140 1200 1260
570 630 690 750 810 870 930 990 1050 1110 1170 1230
Miss/Cost Report
Cost Analysis Report
Highest count items: Highest cost items:
Cluster Size Cost Infimum Supremum Miss Inst Inst ASID Inst Addr Count % of Total ASID Inst Addr Total Cost % of Total
012E 93DAA 4410 0.49% 0146 9724EA6 144045 0.33%
Num Inst Inst Address Address Number 0134 9CDAA 3854 0.43% 0146 3BE10A3E 142423 0.32%
0146 3B46E 3023 0.34% 0146 3BE10A02 139804 0.32%
0146 9724EA6 2596 0.29% 0146 96F2A68 136345 0.31%
2797 3 45 348389 348417 20B1FA20 000668EC 348389 012E 3CF3DD62 2444 0.27% 012E 93DAA 125253 0.28%
2797 3 45 348389 348417 20B1FB40 000668EC 348401
2797 3 45 348389 348417 20B1FC00 000668EC 348409
Fraction of Misses
Cluster Size Analysis
0.3
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18 20
1 3 5 7 9 11 13 15 17 19 21
Cluster Size
How do you use a spectrogram?
Cycles Per Miss Versus Cluster Size
80 Analyze Prefetching algorithm
70 Hardware or Software
Cycles Per Miss
60
Theory:
Theory:
Can we Predict the shape of a spectrogram (cluster size = 3, 4, or 5)
From analyzing smaller miss clusters? Can we Predict the shape of a spectrogram (cluster size = 3, 4, or 5)
From analyzing smaller miss clusters?
Observations:
30%
Observations:
20% L2 = 15 cycles Cluster Size = 1
10% Memory = 100 cycles
For cluster size = 1, we can have a Hit or Miss in the L2
0%
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 H or M
9%
6% Cluster Size = 2 2
For cluster size = 2, we can have 2 = 4 possible outcomes.
3%
HH,HM, MH, MM
0%
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450
3
4% For cluster size = 3, we can have 2 = 8 possible outcomes.
Cluster Size = 3 HHH, HHM, HMH, HMM, MHH, MHM, MMH, MMM,
2%
0% 4
For cluster size = 4, we have 2 =16 possible outcomes
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450
6% HHHH, .... .... MMMM
4%
Cluster Size = 4
2%
0% C
0 25 50 75 100 125 150 175 200 225 250 275 300 325 350 375 400 425 450 For cluster of size C there are 2 possible outcomes.
Determine unique sums for N items choosing C
Number of Peaks in spectrogram with C clusters and N levels
C
90
L=3 S N
=
N
+
N
+
N
+
N
+ +
N
80
70
L=1
Peaks = ( )
C+N
C
i=0
i 0 1 2 3 C
Number of Peaks
c
2 C=Cluster Size
60
L=2
Hit/Miss
Using Relation
N
k
= ( )
N+K-1
K
= 2C
50 L=3 Combinations
40 = ( ) () ( ) ( )
N-1
0 +
N
1 +
N+1
2 +
N+2
3 + +
( )
N+C-1
C
30 H/M Combinations
L=2
20
We can combine first two term using
( )=( ) ( )
N
K
N-1
K-1 +
N-1
K (1a)
10 L=1
0
= () () ()
N+1
1 +
N+1
2 +
N+2
3 + + ( )
N+C-1
C
1 2 3 4 5 6
Cluster Size Applying 1a repeatedly, the series collapses to ( ) N+C
C
0.03 0.008
0.05
Percent
0.02
Percent
Percent
0.025
0.04 0.006
Percent
0.02
0.015 0.015 0.004
0.03
0.01 0.002
0.02 0.01
0.005
0
0.01 0.005 0 90 120 150 180 210
50 100 150 200 250 300 350 105 135 165 195 225
0 0 cluster=3 cluster=3
50 100 150 200 250 70 80 90 100 110 120 130
Cluster = 2 Cluster = 2
HM Cluster Overlap = 34/66 HH Cluster Overlap = 45/55
0.05 0.08 HHH Cluster Overlap = 12 48 40
0.07 0.04
0.04 HMH Cluster Overlap = 21 46 33
0.06
0.01 0.03
Percent
Percent
Percent
0.03 0.05
0.008
0.02
Percent
0.04
0.006
0.02
0.03 0.004 0.01
0.02 0.002
0.01 0
0.01 0 0 10 20 30 40 50 60
80 90 100 110 120 130 140 150 cluster=3
0 0
cluster=3
70 80 90 100 110 120 130 0 10 20 30 40 50 60
Cluster = 2 Cluster = 2
What is a?
Let Xi represent ith hit or miss and let 0 represent a hit and
Let Prob[M] = P Then Prob[Hit] = (1-P) 1 represent a miss, then Xi =M or Xi=H is equivalent to Xi=0 or Xi=1.
By definition the correlation coefficient
If Hit and Miss Probabilities are independent then
q= COV(X1,X 2)
Note the function (1-a+ap) has the properties we desire. For 0 [ a [ 1 and P [ 1, (1- a+ap) m P. Note the function (1-aP ) has the properties we desire. For 0 [ a [ 1 and P [ 1, (1-aP ) m (1-P).
Also for a = (1- q), as q d 1, (the correlation between XiXi+1 becomes stronger) a d 0 and Also for a = (1- q), as q d 1, (the correlation between XiXi+1 becomes stronger) a d 0 and
Pr[X2=M | X1 =M] = (1-a+ap) d 1. Perfect Correlation Pr[X2=H | X1=H] = (1-aP ) d 1. Perfect Correlation
For q d 0, (the correlation between XiXi+1 becomes weaker) a d 1 and For q d 0, (the correlation between XiXi+1 becomes weaker) a d 1 and
Pr[X2=M | X1 =M] = (1-a+ap) d P. As if they are independent. Pr[X2=H | X1=H] = (1-aP ) d P. As if they are independent.
Hit Miss and (p)(1- ap)w (ap)x (a-ap)y (1-a+ap)z if it starts with a miss
Hit 1-a p ap
Xi
Miss a- ap 1-a+ap
Number of Workloads
7 6
6 5
5
4
4 MM Cluster Overlap = 35/65 MH Cluster Overlap = 62/38
3 0.03
3 0.07
2 2 0.06 0.025
1 1 0.05 0.02
Percent
Percent
0 0.04
0 0.015
0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.03
Number of Workloads
Cluster = 2
5 4 HM Cluster Overlap = 34/66 HH Cluster Overlap = 45/55
0.05 0.08
4 3
0.07
0.04
3 0.06
2
Percent
Percent
0.03 0.05
2 0.04
1
0.02
1 0.03
0 0.02
0.01
0 0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.01
0.01 0.04 0.07 0.1 0.13 0.16 0.19 0
Maximum Difference 70 80 90 100 110 120 130
0
Maximum Difference 0 10 20 30 40 50 60
Cluster = 2 Cluster = 2
Kolmogorov-Smirnov Test For Predicted Hit/Miss Probabilities For
Clusters = 2, 3, 4, and 5. Individual Miss Spectrogram for Cluster Size = 2, L1=64KB,
L2=256KB 15 Cycle Latency, L3 = 100 Cycles Latency, Data for OLTP
Cluster Size = 2 X1 X1X2 X3=HHH X1 X2 Cluster Size = 3
=.193 OVL
NO
OVL
.45 .55
Hit Miss
15 30
X2 X2X3 X2X3
X 2
OVL NO OVL
NO
OVL
OVL
.45 .55 .45 .55
Hit Miss Hit Miss
15 30 30 45
HH .306 HM .188 MH .175 MM .331 .45X.45X.193= .45X.55.X193= .55X.45X.193= .55X.55X.193=
NO .039 .048 .048 .059
OVL OVL OVL NO OVL NO OVL NO
.45 .55 .34 .66 OVL .62 .38OVL .35 .65OVL
X1X2 X3=MMH X1X2
15 30 100 115 100 115 100 200
=.117 OVL
NO
.35 .65 OVL
.306X.45= .306X.55= .188X.34= .188X.66= .175X.62= .175X.38= .331X.35= .331X.65=
.138 .168 .064 .124 .109 .067 .116 .215 100 200
X2X3 X2 X3 NO
OVL NO OVL
OVL OVL
.62 .38 .62 .38
15 30 100 115 200
100 215
115 200
.138 .168 .288 .191 .215 .35X.62X.117= .35X.38X.117= .65X.62X.117= .65X.65X.117=
Constructing a Miss Spectrogram. HH, MM, HM, MH, and overlap .025 .016 .047 .049
Probabilities for Cluster = 2 Spectrogram. Data is for OLTP workload Overlap/No-Overlap Probability Tree For HHH and MMH.
Data For OLTP
Cluster = 2 Cluster = 3
0.07 7
8
Number of Workloads
Number of Workloads
Percent of Misses
0.06 7 6
0.05 6
0.04
Cluster = 3 5
5
4
0.03 4
3
0.02
3
2 2
0.01
1 1
0
0 30 60 90 120 150 180 210 240 270 300 330 0 0
0.01 0.04 0.07 0.1 0.13 0.16 0.19 0.01 0.04 0.07 0.1 0.13 0.16 0.19
15 45 75 105 135 165 195 225 255 285 315
Maximum Difference Maximum Difference
0.06
Cluster = 4 Cluster = 5
Percent of Misses
0.05
Cluster = 4 6 5
Number of Workloads
Number of Workloads
0.04
5 4
0.03
4 3
0.02
3
2
0.01 2
1
0 1
0 30 60 90 120 150 180 210 240 270 300 330 360 390 420 0
15 45 75 105 135 165 195 225 255 285 315 345 375 405 435 0 0.01 0.04 0.07 0.1 0.13 0.16 0.19
0.01 0.04 0.07 0.1 0.13 0.16 0.19
Miss Penalty Maximum Difference
Predicted Spectrogram for Clusters = 3 and 4. L1=64KB, Maximum Difference
L2=256KB Kolmogorov-Smirnov Test For Predicted Hit/Miss Probabilities For
15 Cycles Latency, L3=100 Cycle Latency, Data For OLTP Clusters = 2, 3, 4, and 5.
Conclusions
Agreement between and theory and the experiment is very good.