You are on page 1of 27

Lecture19

Caches AReview

Assignment3:PaperReview
Tullsenetal.,SimultaneousMultithreading:MaximizingOnchip

Parallelism,,ISCA1995
Parallelism
ISCA 1995
WillbethereonLMSbyThursdayNovember8,2012(12:00pm)

12pagesummaryhighlightingthekeyidea,performance

improvementclaimedandacriticalanalysisoftheproposedapproach

NewTimesRoman,11 Defaultmargins Linespacing=1line


>2pages 5%marksreductionperline
Deadline:
D dli
N
November15,2012(12:00pm)
b 15 2012 (12 00
)

LateSubmission:25%Markreductionperday

SubmissionthroughLMSonly
g
y

Latesubmission:throughemail(adeel.pasha@lums.edu.pk)

EE/CS520 Comp.Archi.

11/7/2012

MemoryOrganization
MotivationandPrinciple:
Increasingcomplexityofapplications
Increaseindatavolumetobememorized
Increaseinmemorysize
Increase in memory size
Evolution:
IN1980,afewKBsofmainmemory
Now,atleastfewhundredMBsisnecessaryforsystems
l
f h d d
f
Thememorymustcover2constraints:
Largesize
g
Shortaccesstime

Contradiction
3

EE/CS520 Comp.Archi.

11/7/2012

MemoryOrganization

LargeGap
p

TheperformancegapbetweenCPUandMemory
The performance gap between CPU and Memory

EE/CS520 Comp.Archi.

11/7/2012

MemoryOrganization
DataLocality
Temporal:ifonedataitemneedednow,
Temporal: if one data item needed now

itislikelytobeneededagaininnearfuture
Spatial:ifonedataitemneedednow,
p
nearbydatalikelytobeneededinnearfuture
ExploitingLocality:Caches
Keeprecentlyuseddata infastmemoryclosetothe processor
Alsobringnearbydatathere
Also bring nearby data there

EE/CS520 Comp.Archi.

11/7/2012

MemoryOrganization
Basicidea:
Implementamemoryhierarchy:
Smallsize,fast,closetoprocessor
Largesize,slow,farfromprocessor
Capacity+
Speed

Disk
MainMemory
y
L3Cache
L2Cache

ITLB

InstructionCache

DataCache

DTLB

RegisterFile
BypassNetwork

EE/CS520 Comp.Archi.

Speed+
Capacity
11/7/2012

MemoryOrganization

EE/CS520 Comp.Archi.

11/7/2012

MemoryLatencyisLong
60100nsnottotallyuncommon
Whatdoesthatmean?
2.0GHzCPU 0.5nscycletime
100nsmemory 200ccmemorylatency!

Solution:Caches

EE/CS520 Comp.Archi.

11/7/2012

CachesTheLibraryAnalogy
CacheLine
Processor

Cache
M i M
MainMemory

Register
g

Library

SecondaryMemory
(Disk)
EE/CS520 Comp.Archi.

11/7/2012

CachesTheLibraryAnalogy
Worksonthepages placedinfront
Canrapidlyaccessfiles
Can rapidly access files presentonhisdesk
present on his desk
Sincedesksizeissmall,onlyfilesunderreading

ALU

10

Reg
gisters

areavailable
Processonebyonethepagespresentinthefile
Process one by one the pages present in the file

EE/CS520 Comp.Archi.

Processor
Ca
acheMem
mory

TheStudent:

Cache
Register
11/7/2012

CachesTheLibraryAnalogy
TheBookshelf:
Containsthefiles
Contains the files
Accessisslower
Isaccessedperfile and

not per page


notperpage

11

EE/CS520 Comp.Archi.

11/7/2012

CachesTheLibraryAnalogy
TheLibrary:
Containsthefiles
Accesstimeisverylong
IsaccessedperNfiles and

notfilebyfile

Mainmemory

Library

12

Secondarymemory
(disk)
EE/CS520 Comp.Archi.

11/7/2012

CacheBasics
Terminology

CacheHit
C
h Hit
HitTime
CacheMiss :Bringinentireblock(notjustoneword/byte)
MissPenalty
Missrate

Fundamentaldecisions
Placement:whereinthecachecanablockgo?
Lookup:howdowefindablockincache?
Lookup: how do we find a block in cache?
Replacement:whattomoveouttomakeroomincache?
Writepolicy:whattodoaboutstores?

13

EE/CS520 Comp.Archi.

11/7/2012

CacheBasics
Cacheconsistsofblocksizedlines
Linesize normallypowerof2
Typically16to128bytesinsize

Example
E
l
Supposeblocksizeis128B
Lowest7bitsdetermineoffset
ff
withinblock
ReaddataataddressA=0x7fffa3f4
Blockaddressbeginswithbase address

0 7fff 380
0x7fffa380

14

EE/CS520 Comp.Archi.

xx000xx07F
xx000
xx07F
xx080xx0FF
xx100xx17F
xx180xx1FF
xx200xx27F
xx280xx2FF
xx300xx37F
xx380xx3FF
xx380
xx3FF

11/7/2012

CachePlacement
Placement
Whichmemoryblocksareallowedintowhichcachelines
Wh h
bl k
ll
d
h h
h l

PlacementPolicies
Direct
Directmapped(ablockcangotoonlyoneline)
mapped (a block can go to only one line)
(BlockAddress)MOD(No.ofLinesinCache)
FullyAssociative(blockcangotoanyline)
Setassociative(blockcangotooneofNlines)
Set associative (block can go to one of N lines)
EachNlinesiscalledaSet
(BlockAddress)MOD(No.ofSetsinCache)
E.g.,ifN=4,thecacheis4waysetassociative
E if N 4 h
h i 4
i i
Previoustwopoliciesareextremescasesofthispolicy
(E.g.,ifN=1wegetadirectmappedcache)

15

EE/CS520 Comp.Archi.

11/7/2012

CacheIdentification/Lookup
Whenaddressisreferenced,needtofind:
Whetheritsdataisinthecache
Ifitis,findwhereinthecache

Thisiscalledacachelookup
p

Eachcachelinemusthave
Valid bit(1iflinehasdata,0iflineempty)
Blockoffsetselectsdesireddatafromblock
Index representsthesizeoftheindividualset
Tag toidentifywhichblockisintheline(iflineisvalid)
to identify which block is in the line (if line is valid)

16

PhysicalAddress
EE/CS520 Comp.Archi.

11/7/2012

CacheLookup:Example
TheOpterondatacache:64KBdata,2waysetassociative,64Bcacheline
SET1
Valid

Tag

SET2
Data
Data
(64B)

Index(???)?bits

Tag

Data
D
t
(64B)

=?

?:1MUX

WordSelect(??)

=?

Tag(???)?bits

?:1MUX

?Bd
data

W d S l (??)
WordSelect(??)

Tag(???)?bits

Valid

???

???

?Bdata

40bittPhysicalAddress

Blockoffset(???)?bits

?:1MUX
ToProcessor

17

EE/CS520 Comp.Archi.

11/7/2012

CacheLookup:Example
SET1
Valid

Index(A14A6)
9bits
9
bits

Tag

SET2
Data
Data
(64B)

Tag

Data
D
t
(64B)

512

512

W d S l (A5 A3)
WordSelect(A5A3)

Tag(A39A15)25bits

Valid

=?

8:1MUX

WordSelect(A5A3)

Tag(A39A15)25bits

=?

8:1MUX

8Bd
data

Blockoffset(A5A0)
6bits
6
bits

8B
Bdata

40bittPhysicalAddress

TheOpterondatacache:64KBdata,2waysetassociative,64Bcacheline

1)BlockSize=64B Blockoffset=6bits
2)

2:1MUX

3)Tag=4096=25bits
4)Eachblockisdividedinto8x8Bwords wordselect=3bit

18

EE/CS520 Comp.Archi.

ToProcessor

11/7/2012

CacheReplacement
Needtofreealinetoinsertnewblock
Whichblockshouldwekickout?

Severalstrategies
Random(randomlyselectedline)
R d
( d l
l
d li )
FIFO(linethathasbeenincachethelongest)
LRU(leastrecentlyusedline)
LRU (least recently used line)
LRUApproximations
NMRU

19

EE/CS520 Comp.Archi.

11/7/2012

WritePolicy
Doweallocatecachelinesonawrite?

Writeallocate
Awritemissbringsblockintocache
Nowriteallocate
Awritemissleavescacheasitwasandonlywritesinmainmemory

Doweupdatememoryonwrites?

Writethrough
Memoryimmediatelyupdatedoneachwrite
Writeback
M
Memoryupdatedwhenlinereplaced
d t d h li
l d
Writebuffer
Lowerlatencyforcachesthanwritingdirectlytomainmemory
WriteintheCache
h
h

WriteinMain
Memory
20

EE/CS520 Comp.Archi.

Yes

No

Yes

Writethrough
Writeallocate

Writethrough
Nowriteallocate

No

WriteBack

11/7/2012

WriteBackCaches
NeedaDirty bitforeachline
Adirtylinehasmorerecentdatathanmemory

Linestartsasclean (notdirty)
Linebecomesdirtyonfirstwritetoit
Memorynotupdatedyet,cachehastheonlyuptodate

copy of data for a dirty line


copyofdataforadirtyline
Replacingadirtyline
Mustwritedatabacktomemory(writeback)
y(
)

21

EE/CS520 Comp.Archi.

11/7/2012

WritePolicy:Example
Fullyassociativecache,initiallyempty
Findoutno.ofhitsandmisses
Comparewrite
Compare writeallocate/nowriteallocate
allocate/no write allocate
nowriteallocatescheme

SD
SD
LD
SD
SD
SD

MEM[100]
MEM[100]
MEM[200]
MEM[200]
MEM[100]

1stSD@100 Amiss(ascacheempty@100)
2ndSD@100 Amiss(aswedontwritecacheonwritemiss)
3rdLD@200 Amiss(ascacheempty@200)
4thSD@200 Ahit(ascacheisloadedonreadmiss@200)
5thSD@100 Amiss(sameas2ndSD)
Total = 4 misses 1 hits
Total=4misses,1hits

writeallocatescheme

22

1stSD@100 Amiss(ascacheempty@100)
2ndSD@100 Ahit(aswewritecacheonwritemiss)
3rdLD@200 Amiss(ascacheempty@200)
4thSD@200 Ahit(ascacheisloadedonreadmiss@200)
5thSD@100 Ahit(sameas1sthit)
Total = 2 misses 3 hits
Total=2misses,3hits

EE/CS520 Comp.Archi.

11/7/2012

The3CsofCacheMisses
Compulsory(ColdStart/FirstReference)
Theveryfirstaccesstoablockisnotinthecache
Theblockmustbebroughtintothecache.
Missesinevenaninfiniteentrycache.

Capacity
Ifthecachecannotcontainalltheblocksneededduringprog.execution
Occurduetoblocksbeingdiscardedandlaterretrieved.
g
Missesinfullyassociativecache

Conflict(Collision/Interference)
Ifblockplacementstrategyissetassociativeordirectmapped
If blockplacement strategy is setassociative or direct mapped
Occurbecauseablockcanbediscardedandlaterretrieved,iftoomanyblocks

maptoitsset.
MissesinN
Misses in Nway
wayassociative
associative
23

EE/CS520 Comp.Archi.

11/7/2012

CachePerformance
AverageMemoryAccessTime

AMAT=hittime+missrate*misspenalty

Memorystallcycles(outoforderprocessor)
M
ll
l (
f d
)

CPUtime =Cycletime x(CyclesExec +CyclesMemoryStall)


CyclesMemoryStall =CacheMissesx(MissLatencyTotal MissLatencyOverlapped)
24

EE/CS520 Comp.Archi.

11/7/2012

ImprovingCachePerformance
AMAT=hittime+missrate*misspenalty
Reducemisspenalty
Reducemissrate
Reducehittime
Reduce hit time
CyclesMemoryStall
= CacheMisses x (MissLatencyTotal
M
S ll =CacheMissesx(MissLatency
T l MissLatencyOverlapped
O l
d)

Increaseoverlappedmisslatency

25

EE/CS520 Comp.Archi.

11/7/2012

BasicCacheOptimizations

26

EE/CS520 Comp.Archi.

11/7/2012

6BasicCacheOptimizations

ReducingMissPenalty
1. GivingReadsPriorityoverWrites

R d
Readcouldcompletebeforeearlierwritesinwritebuffer
ld
l t b f
li
it i
it b ff

2. MultilevelCaches

ReducingMissRate
R
d i Mi R t
3. LargerBlocksize
4. LargerCachesize
g
5. HigherAssociativity

Reducinghittime

6. Avoidingaddresstranslationduringcacheindexing

27

EE/CS520 Comp.Archi.

11/7/2012

You might also like