Professional Documents
Culture Documents
Caches AReview
Assignment3:PaperReview
Tullsenetal.,SimultaneousMultithreading:MaximizingOnchip
Parallelism,,ISCA1995
Parallelism
ISCA 1995
WillbethereonLMSbyThursdayNovember8,2012(12:00pm)
12pagesummaryhighlightingthekeyidea,performance
improvementclaimedandacriticalanalysisoftheproposedapproach
LateSubmission:25%Markreductionperday
SubmissionthroughLMSonly
g
y
Latesubmission:throughemail(adeel.pasha@lums.edu.pk)
EE/CS520 Comp.Archi.
11/7/2012
MemoryOrganization
MotivationandPrinciple:
Increasingcomplexityofapplications
Increaseindatavolumetobememorized
Increaseinmemorysize
Increase in memory size
Evolution:
IN1980,afewKBsofmainmemory
Now,atleastfewhundredMBsisnecessaryforsystems
l
f h d d
f
Thememorymustcover2constraints:
Largesize
g
Shortaccesstime
Contradiction
3
EE/CS520 Comp.Archi.
11/7/2012
MemoryOrganization
LargeGap
p
TheperformancegapbetweenCPUandMemory
The performance gap between CPU and Memory
EE/CS520 Comp.Archi.
11/7/2012
MemoryOrganization
DataLocality
Temporal:ifonedataitemneedednow,
Temporal: if one data item needed now
itislikelytobeneededagaininnearfuture
Spatial:ifonedataitemneedednow,
p
nearbydatalikelytobeneededinnearfuture
ExploitingLocality:Caches
Keeprecentlyuseddata infastmemoryclosetothe processor
Alsobringnearbydatathere
Also bring nearby data there
EE/CS520 Comp.Archi.
11/7/2012
MemoryOrganization
Basicidea:
Implementamemoryhierarchy:
Smallsize,fast,closetoprocessor
Largesize,slow,farfromprocessor
Capacity+
Speed
Disk
MainMemory
y
L3Cache
L2Cache
ITLB
InstructionCache
DataCache
DTLB
RegisterFile
BypassNetwork
EE/CS520 Comp.Archi.
Speed+
Capacity
11/7/2012
MemoryOrganization
EE/CS520 Comp.Archi.
11/7/2012
MemoryLatencyisLong
60100nsnottotallyuncommon
Whatdoesthatmean?
2.0GHzCPU 0.5nscycletime
100nsmemory 200ccmemorylatency!
Solution:Caches
EE/CS520 Comp.Archi.
11/7/2012
CachesTheLibraryAnalogy
CacheLine
Processor
Cache
M i M
MainMemory
Register
g
Library
SecondaryMemory
(Disk)
EE/CS520 Comp.Archi.
11/7/2012
CachesTheLibraryAnalogy
Worksonthepages placedinfront
Canrapidlyaccessfiles
Can rapidly access files presentonhisdesk
present on his desk
Sincedesksizeissmall,onlyfilesunderreading
ALU
10
Reg
gisters
areavailable
Processonebyonethepagespresentinthefile
Process one by one the pages present in the file
EE/CS520 Comp.Archi.
Processor
Ca
acheMem
mory
TheStudent:
Cache
Register
11/7/2012
CachesTheLibraryAnalogy
TheBookshelf:
Containsthefiles
Contains the files
Accessisslower
Isaccessedperfile and
11
EE/CS520 Comp.Archi.
11/7/2012
CachesTheLibraryAnalogy
TheLibrary:
Containsthefiles
Accesstimeisverylong
IsaccessedperNfiles and
notfilebyfile
Mainmemory
Library
12
Secondarymemory
(disk)
EE/CS520 Comp.Archi.
11/7/2012
CacheBasics
Terminology
CacheHit
C
h Hit
HitTime
CacheMiss :Bringinentireblock(notjustoneword/byte)
MissPenalty
Missrate
Fundamentaldecisions
Placement:whereinthecachecanablockgo?
Lookup:howdowefindablockincache?
Lookup: how do we find a block in cache?
Replacement:whattomoveouttomakeroomincache?
Writepolicy:whattodoaboutstores?
13
EE/CS520 Comp.Archi.
11/7/2012
CacheBasics
Cacheconsistsofblocksizedlines
Linesize normallypowerof2
Typically16to128bytesinsize
Example
E
l
Supposeblocksizeis128B
Lowest7bitsdetermineoffset
ff
withinblock
ReaddataataddressA=0x7fffa3f4
Blockaddressbeginswithbase address
0 7fff 380
0x7fffa380
14
EE/CS520 Comp.Archi.
xx000xx07F
xx000
xx07F
xx080xx0FF
xx100xx17F
xx180xx1FF
xx200xx27F
xx280xx2FF
xx300xx37F
xx380xx3FF
xx380
xx3FF
11/7/2012
CachePlacement
Placement
Whichmemoryblocksareallowedintowhichcachelines
Wh h
bl k
ll
d
h h
h l
PlacementPolicies
Direct
Directmapped(ablockcangotoonlyoneline)
mapped (a block can go to only one line)
(BlockAddress)MOD(No.ofLinesinCache)
FullyAssociative(blockcangotoanyline)
Setassociative(blockcangotooneofNlines)
Set associative (block can go to one of N lines)
EachNlinesiscalledaSet
(BlockAddress)MOD(No.ofSetsinCache)
E.g.,ifN=4,thecacheis4waysetassociative
E if N 4 h
h i 4
i i
Previoustwopoliciesareextremescasesofthispolicy
(E.g.,ifN=1wegetadirectmappedcache)
15
EE/CS520 Comp.Archi.
11/7/2012
CacheIdentification/Lookup
Whenaddressisreferenced,needtofind:
Whetheritsdataisinthecache
Ifitis,findwhereinthecache
Thisiscalledacachelookup
p
Eachcachelinemusthave
Valid bit(1iflinehasdata,0iflineempty)
Blockoffsetselectsdesireddatafromblock
Index representsthesizeoftheindividualset
Tag toidentifywhichblockisintheline(iflineisvalid)
to identify which block is in the line (if line is valid)
16
PhysicalAddress
EE/CS520 Comp.Archi.
11/7/2012
CacheLookup:Example
TheOpterondatacache:64KBdata,2waysetassociative,64Bcacheline
SET1
Valid
Tag
SET2
Data
Data
(64B)
Index(???)?bits
Tag
Data
D
t
(64B)
=?
?:1MUX
WordSelect(??)
=?
Tag(???)?bits
?:1MUX
?Bd
data
W d S l (??)
WordSelect(??)
Tag(???)?bits
Valid
???
???
?Bdata
40bittPhysicalAddress
Blockoffset(???)?bits
?:1MUX
ToProcessor
17
EE/CS520 Comp.Archi.
11/7/2012
CacheLookup:Example
SET1
Valid
Index(A14A6)
9bits
9
bits
Tag
SET2
Data
Data
(64B)
Tag
Data
D
t
(64B)
512
512
W d S l (A5 A3)
WordSelect(A5A3)
Tag(A39A15)25bits
Valid
=?
8:1MUX
WordSelect(A5A3)
Tag(A39A15)25bits
=?
8:1MUX
8Bd
data
Blockoffset(A5A0)
6bits
6
bits
8B
Bdata
40bittPhysicalAddress
TheOpterondatacache:64KBdata,2waysetassociative,64Bcacheline
1)BlockSize=64B Blockoffset=6bits
2)
2:1MUX
3)Tag=4096=25bits
4)Eachblockisdividedinto8x8Bwords wordselect=3bit
18
EE/CS520 Comp.Archi.
ToProcessor
11/7/2012
CacheReplacement
Needtofreealinetoinsertnewblock
Whichblockshouldwekickout?
Severalstrategies
Random(randomlyselectedline)
R d
( d l
l
d li )
FIFO(linethathasbeenincachethelongest)
LRU(leastrecentlyusedline)
LRU (least recently used line)
LRUApproximations
NMRU
19
EE/CS520 Comp.Archi.
11/7/2012
WritePolicy
Doweallocatecachelinesonawrite?
Writeallocate
Awritemissbringsblockintocache
Nowriteallocate
Awritemissleavescacheasitwasandonlywritesinmainmemory
Doweupdatememoryonwrites?
Writethrough
Memoryimmediatelyupdatedoneachwrite
Writeback
M
Memoryupdatedwhenlinereplaced
d t d h li
l d
Writebuffer
Lowerlatencyforcachesthanwritingdirectlytomainmemory
WriteintheCache
h
h
WriteinMain
Memory
20
EE/CS520 Comp.Archi.
Yes
No
Yes
Writethrough
Writeallocate
Writethrough
Nowriteallocate
No
WriteBack
11/7/2012
WriteBackCaches
NeedaDirty bitforeachline
Adirtylinehasmorerecentdatathanmemory
Linestartsasclean (notdirty)
Linebecomesdirtyonfirstwritetoit
Memorynotupdatedyet,cachehastheonlyuptodate
21
EE/CS520 Comp.Archi.
11/7/2012
WritePolicy:Example
Fullyassociativecache,initiallyempty
Findoutno.ofhitsandmisses
Comparewrite
Compare writeallocate/nowriteallocate
allocate/no write allocate
nowriteallocatescheme
SD
SD
LD
SD
SD
SD
MEM[100]
MEM[100]
MEM[200]
MEM[200]
MEM[100]
1stSD@100 Amiss(ascacheempty@100)
2ndSD@100 Amiss(aswedontwritecacheonwritemiss)
3rdLD@200 Amiss(ascacheempty@200)
4thSD@200 Ahit(ascacheisloadedonreadmiss@200)
5thSD@100 Amiss(sameas2ndSD)
Total = 4 misses 1 hits
Total=4misses,1hits
writeallocatescheme
22
1stSD@100 Amiss(ascacheempty@100)
2ndSD@100 Ahit(aswewritecacheonwritemiss)
3rdLD@200 Amiss(ascacheempty@200)
4thSD@200 Ahit(ascacheisloadedonreadmiss@200)
5thSD@100 Ahit(sameas1sthit)
Total = 2 misses 3 hits
Total=2misses,3hits
EE/CS520 Comp.Archi.
11/7/2012
The3CsofCacheMisses
Compulsory(ColdStart/FirstReference)
Theveryfirstaccesstoablockisnotinthecache
Theblockmustbebroughtintothecache.
Missesinevenaninfiniteentrycache.
Capacity
Ifthecachecannotcontainalltheblocksneededduringprog.execution
Occurduetoblocksbeingdiscardedandlaterretrieved.
g
Missesinfullyassociativecache
Conflict(Collision/Interference)
Ifblockplacementstrategyissetassociativeordirectmapped
If blockplacement strategy is setassociative or direct mapped
Occurbecauseablockcanbediscardedandlaterretrieved,iftoomanyblocks
maptoitsset.
MissesinN
Misses in Nway
wayassociative
associative
23
EE/CS520 Comp.Archi.
11/7/2012
CachePerformance
AverageMemoryAccessTime
AMAT=hittime+missrate*misspenalty
Memorystallcycles(outoforderprocessor)
M
ll
l (
f d
)
EE/CS520 Comp.Archi.
11/7/2012
ImprovingCachePerformance
AMAT=hittime+missrate*misspenalty
Reducemisspenalty
Reducemissrate
Reducehittime
Reduce hit time
CyclesMemoryStall
= CacheMisses x (MissLatencyTotal
M
S ll =CacheMissesx(MissLatency
T l MissLatencyOverlapped
O l
d)
Increaseoverlappedmisslatency
25
EE/CS520 Comp.Archi.
11/7/2012
BasicCacheOptimizations
26
EE/CS520 Comp.Archi.
11/7/2012
6BasicCacheOptimizations
ReducingMissPenalty
1. GivingReadsPriorityoverWrites
R d
Readcouldcompletebeforeearlierwritesinwritebuffer
ld
l t b f
li
it i
it b ff
2. MultilevelCaches
ReducingMissRate
R
d i Mi R t
3. LargerBlocksize
4. LargerCachesize
g
5. HigherAssociativity
Reducinghittime
6. Avoidingaddresstranslationduringcacheindexing
27
EE/CS520 Comp.Archi.
11/7/2012