You are on page 1of 3

07/03/2017 Thechangingmemoryhierarchy

Oneofthemainwaystoincreasesystemperformanceisminimisinghowfardownthememoryhierarchy
onehastogotomanipulatedata.It'snotjustsystemlevelprogrammersthatneedtobeawareoftheseissues,
asmostsystemshaveatime/costrequirement,beithowfastyourwebapplicationresponds,orhowmany
racksyouneedinyourdatacenter.

InthiseraofmultipleCPUspersystemthiscanbefurthercomplicatedforprogrammersduetomemory
contentionbetweeneachCPU.Also,virtualizationintroducesfurthercomplications.Considerthefollowing
diagramwhichshowsthememoryhierarchycurrentlyina4socketby4coresystem,whichUlrichDrepper
mentionsisgoingtobeacommonsysteminhisexcellentpaperoncomputermemory.

[UpdateSep2010:NotetheorganisationofcachelevelsinmulticoreCPUscanvaryquiteabit]

http://www.pixelbeat.org/docs/memory_hierarchy/ 1/3
07/03/2017 Thechangingmemoryhierarchy

[UpdateOct2010:hwlocisahandytoolforautomaticallygeneratingdiagramslikethese]

Upuntillatelywe'vejusthadincrementalimprovementstotheperformance(notsize),ofRAMand
mechanicalharddisks,andCPUperformancehasdivergedfromthemalot.Sochangestothememory
hierarchywouldbothspeedsystemsupalot,andsimplifysoftwarerunningontheCPU.It'stheseexciting
changesthatarehappeningnowandinthenextfewyearsthatI'mfocusingonhere.

[UpdateOct2015:Asstatedabove,thedivergenceinspeedbetweenmainmemoryandCPUs,impliesmuch
moreperformanceforefficientuseoftheCPUcaches.Thisisdemonstratedinprofilinghardwareevents,
whereadjustingthememorysizeandaccesspatternreducestheaccessdepthinthememoryhierarchy,thus
greatlyincreasingperformance.Nowoftenit'snotpossibleorpracticaltoadjustallmemoryaccesses,andso
IntelasoftheBroadwellmicroarchitecture(Sep2014),hasmadeCATavailable(incertainXEON
processorstostart),whichallowsonetodynamicallypartitionthesharedcache,tolimitwhatpartofthe
cachecanbewrittentobyacore.Inthisway,restrictingVMs/containers/apps/...toacore,willrestrictthem
toevictingonlypartofthesharedcacheacrosscores,resultinginmoreefficientutilizationofthesystem.
ThisiswellexplainedinDanLuu'ssummaryofCATadvantages.Partitioningfunctionalitylikethiswillalso
improvesecurityisolation,andprotectagainstsidechannelattacks.Infuturedynamiccacheallocationwill
probablybecomeavailableonmostCPUsandacrossmorecachelevels.]

[UpdateSep2015:Notecachecoherenceisabiglimitationtothenumberofcorespossible,andanew
"tardis"cachecoherencemodelpromisingtoremovethelinearincreaseincacheaccountingmemoryper
core.Itworksbytaggingtheoperationswithacountertoorderreads/writes,thusallowingcorestooperate
onolderdataifthatsuffices.Generationcountersareusefulforrelativeorderingratherthantryingto
synchronizewiththeuniversewithtimestampsorsomething.Iproposedonlkml(andstillstandby)asimilar
mechanismforrelativeorderingoffileswithinafilesystem.Distributedcores/filesystemscanusehigher
levelmethodsforcoherence,butwithinthe"system"countershaveanadvantage.]

SolidStateDisks
ConsiderforexamplehowSSDsaffectprocessingofalargefileonamulticoresystem.Becauserandom
seeksareofnoextracostonSSDscomparedtomechanicaldisks,it'ssensibleformultiplecorestoprocess
separateportionsofafiledirectly.Withmechanicaldiskseachcorewouldjustbefightingoverthe
mechanicaldiskhead,andslowdownalotcomparedtojustasinglecoreprocessingthefile.Inotherwords,
datapartitioningtotakeadvantageofmultiplecoresismuchmorecomplicatedformechanicaldisksthanfor
SSDs,requiringmorecomplexlogicandarraysofdiskstoachieveparallelization.Noteforcertain
operationslikesorting,onehastotakeRAMsizeintoaccount,sothecoresshouldprocesschunksofthefile
inparallelwhereeachchunkis((ramsize/numcpus)abit).Forotheroperationslikesearchingforexample,
RAMsizeisnotafactor,andonecanjustsplitthefileintochunksof(filesize/numcpus).[UpdateDec
2012:GiventhewideningdisparitybetweentraditionaldisksandSSDs,they'reseparatingouttodistinct
http://www.pixelbeat.org/docs/memory_hierarchy/ 2/3
07/03/2017 Thechangingmemoryhierarchy

layersinthememoryhierarchy.Totakeadvantageofthis,hybriddrivesarebecomingavailable,asis
softwaretotransparentlycombineseparatedrives,likeSRTorLinuxsolutionslikebcache.][UpdateJan
2016:ACMQueuediscussiononfasternonvolatilestorage"itisrarethattheperformanceassumptionsthat
wemakeaboutanunderlyinghardwarecomponentchangeby1,000x".]

2TransistorDRAM
2TDRAMcurrentlybeingdevelopedbyIntel,hasthepotentialtoenhancecachesinCPUsatleast.Youcan
seeinthediagramabovethatthelevel2cachecanbebothusedtospeedaccesstotherelativelyslowRAM
andspeedupcommunicationbetweencoresinasingleprocessor.Whenthismemorywallislowereditagain
givestheopportunitytousedifferentalgorithms,especiallyonmulticoresystems.TianTianofIntelhas
writtenagoodarticleonhowsharedcachesenhanceamulticoresystemandhowprogrammerscantake
furtheradvantageofthem.TherealsoisanothergoodACMarticleonoptimizingapplicationperformancein
thepresenceofcaches,andthisexcellentpresentationonlockfreealgorithmstakingconsiderationsofthe
currentmemoryhierarchy.[UpdateDec2008:InoticedanIEEEreferencetoaSandiaNationalLaboratories
simulation,whichshowedthatformanyapplications,thememorywallwithcurrentarchitecturescauses
performancetodeclinewithgreaterthan8processors,soitlooksliketechnologylike2TDRAMwillbe
requiredinthenearfuture.]

MRAMandMemristors
Thesetechnologieshavethepotentialtobethebiggestgamechangers.They'reessentiallyveryfastnon
volatilememory,andsowillaffectbothcurrentRAMandflashtechnologies.

MRAMhasbeenindevelopmentforawhile,butwhilebeingaboutastwiceasfastascurrentRAM
technologies,it'smuchmoreexpensive.HoweverresearchersinGermanyhaverecentlyfiguredouthowto
makeit10timesfasteragain!

MemristorshaverecentlybeencreatedbyHPlabsandagaintheyhavethepotentialtobeafast,dense,
cheap,nonvolatilememory.Thememristorwasfirsttheorizedin1971byLeonChua,beingafourth
fundamentalcircuitelement,havingpropertiesthatcannotbeachievedbyanycombinationoftheotherthree
elements(resistor,inductor,capacitor).[UpdateSep2010:Memristorswillbeavailableby2014apparently.]
[UpdateNov2011:Youcanapparentlymakehomemadememristors:)][UpdateJun2014:Informative
memristorinfoandroadmapfromHP]Interestingtimes...

[UpdateJul2015:3DXpointwasannouncedbyIntel/Microntobeavailablein2016.Mostlymarketingfor
now,butasatransistorlessnonvolatiletechnology,haspotentialtobeanotherlevelinthehierarchyunder
DRAMatfirst,andeventuallyreplacingitaltogether.]

Aug192008

http://www.pixelbeat.org/docs/memory_hierarchy/ 3/3