CS152 Computer Architecture and Engineering Memory Systems (Recap) Caches

CS152
Computer Architecture and Engineering

Lecture 21
Memory Systems (recap)

Caches
November 11th, 2001

John Kubiatowicz (http.cs.berkeley.edu/~kubitron)
lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recall: The Big Picture: Where are We Now?
° The Five Classic Components of a Computer

Processor
Input
Control
Memory
Datapath Output
° Today’s Topics:
• Recap last lecture
• Simple caching techniques
• Many ways to improve cache performance
• Virtual memory?
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recall: Who Cares About the Memory Hierarchy?
Processor-DRAM Memory Gap (latency)
1000 CPU
µProc
60%/yr.
“Moore’s Law”
Performance
(2X/1.5yr)
100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
“Less’ Law?” DRAM
DRAM
9%/yr.
1 (2X/10 yrs)
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
Time CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recap: Memory Hierarchy of a Modern Computer System
° By taking advantage of the principle of locality:
• Present the user with as much memory as is available in the
cheapest technology.
• Provide access at the speed offered by the fastest technology.
Processor
Control Tertiary
Secondary Storage
Storage (Tape)
Second Main
(Disk)
On-Chip
Registers
Level Memory
Cache
Datapath Cache (DRAM)

(SRAM)
Speed (ns): 1s 10s 100s 10,000,000s 10,000,000,000s

(10s sec)
Size (bytes): 100s Ks Ms (10s
Gsms) Ts
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recap: Memory Hierarchy: Why Does it Work?
Locality!
Probability
of reference
0 Address Space 2^n - 1
° Temporal Locality (Locality in Time):

=> Keep most recently accessed data items closer to the processor
° Spatial Locality (Locality in Space):

=> Move blocks consists of contiguous words to the upper levels
Lower Level
To Processor Upper Level Memory
Memory
Blk X
From Processor Blk Y
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recap: Cache Performance
Execution_Time =
Instruction_Count x Cycle_Time x
(ideal CPI + Memory_Stalls/Inst + Other_Stalls/Inst)
Memory_Stalls/Inst =
Instruction Miss Rate x Instruction Miss Penalty +
Loads/Inst x Load Miss Rate x Load Miss Penalty +
Stores/Inst x Store Miss Rate x Store Miss Penalty
Average Memory Access time (AMAT) =

Hit TimeL1 + (Miss RateL1 x Miss PenaltyL1) =
(Hit RateL1 x Hit TimeL1) + (Miss RateL1 x Miss TimeL1)
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recall: Static RAM
Cell
6-Transistor SRAM Cell word
word
0 1 (row select)
0 1
bit bit
° Write:
bit bit
1. Drive bit lines (bit=1, bit=0)
2.. Select row
replaced with pullup
° Read: to save area
1. Precharge bit and bit to Vdd or Vdd/2 => make sure equal!
2.. Select row
3. Cell pulls one line low
4. Sense amp on column detects difference between bit and bit
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recall: 1-Transistor Memory Cell
(DRAM)
row select
° Write:
• 1. Drive bit line
• 2.. Select row
° Read:
• 1. Precharge bit line to Vdd/2
• 2.. Select row bit
• 3. Cell and bit line share charges
- Very small voltage changes on the bit line
• 4. Sense (fancy sense amp)
- Can detect changes of ~1 million electrons
• 5. Write: restore the value
° Refresh
• 1. Just do a dummy read to every cell.
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recall: Classical DRAM Organization
(square) bit (data) lines
r
o Each intersection represents
w a 1-T DRAM Cell
RAM Cell
d Array
e
c
o word (row) select
d
e
r
row Sense-AMPS,
Column Selector & I/O Column
address Address
data
° Row and Column Address Select 1 bit at a time
° Act of reading refreshes one complete row
• Sense amps detect slight variations from VDD/2 and amplify them
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recall: DRAM Read Timing
° Every DRAM access RAS_L CAS_L WE_L OE_L
begins at:
• The assertion of the RAS_L
A 256K x 8
• 2 ways to read: DRAM D
9 8
early or late v. CAS
DRAM Read Cycle Time
RAS_L
CAS_L
A Row Address Col Address Junk Row Address Col Address Junk
WE_L
OE_L
D High Z Junk Data Out High Z Data Out

Read Access Output Enable
Time Delay
Early Read Cycle: OE_L asserted before CAS_L Late Read Cycle: OE_L asserted after CAS_L
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recall: Fast Page Mode
Operation
° Regular DRAM Organization: Column
• N rows x N column x M-bit Address N cols
• Read & Write M-bit at a time
• Each M-bit access requires
a RAS / CAS cycle DRAM
Row
° Fast Page Mode DRAM
N rows
Address
• N x M “SRAM” to save a row
° After a row is read into the
register
• Only CAS is needed to access N x M “SRAM”
other M-bit blocks on that row M bits
• RAS_L remains asserted while M-bit Output
CAS_L is toggled
1st M-bit Access 2nd M-bit 3rd M-bit 4th M-bit
RAS_L
CAS_L
A Row Address Col Address Col Address Col Address Col Address
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
SDRAM: Syncronous DRAM?
° More complicated, on-chip controller
• Operations syncronized to clock
- So, give row address one cycle
- Column address some number of cycles later (say 3)
- Date comes out later (say 2 cycles later)
• Burst modes
- Typical might be 1,2,4,8, or 256 length burst
- Thus, only give RAS and CAS once for all of these accesses
• Multi-bank operation (on-chip interleaving)
- Lets you overlap startup latency (5 cycles above) of two banks
° Careful of timing specs!

• 10ns SDRAM may still require 50ns to get first data!
• 50ns DRAM means first data out in 50ns
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Memory Systems: Delay more than RAW DRAM
address n
DRAM DRAM
Controller 2^n x 1
chip
Memory w
Timing
Controller
Bus Drivers
Tc = Tcycle + Tcontroller + Tdriver
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Main Memory Performance
° Wide: ° Interleaved:
• CPU/Mux 1 word; • CPU, Cache, Bus 1 word:
Mux/Cache, Bus, Memory N Modules
Memory N words (4 Modules); example is
(Alpha: 64 bits & 256 word interleaved
bits)
° Simple:
• CPU, Cache, Bus, Memory
same width
(32 bits)
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Cycle Time
Access Time Time
° DRAM (Read/Write) Cycle Time >> DRAM

(Read/Write) Access Time
• 2:1; why?
• Compare with Core: Access time 750 ns, cycle time 1500-3000 ns!
° DRAM (Read/Write) Cycle Time :

• How frequent can you initiate an access? 1/CycleTime
• Analogy: A little kid can only ask his father for money on Saturday
° DRAM (Read/Write) Access Time:

• How quickly will you get what you want once you initiate an access?
• Analogy: As soon as he asks, his father will give him the money
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Increasing Bandwidth -
Interleaving
Access Pattern without Interleaving:
CPU Memory
D1 available
Start Access for D1 Start Access for D2
Memory
Access Pattern with 4-way Interleaving: Bank 0
Memory
Bank 1
CPU
Memory
Bank 2
Memory
Access Bank 0
Bank 3
Access Bank 1
Access Bank 2
Access Bank 3
We can Access Bank 0 again
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
° Timing model
• 1 to send address,
• 4 for access time, 10 cycle time, 1 to return data
• Cache Block is 4 words
° Simple M.P. = 4 x (1+10+1) = 48
° Wide M.P. = 1 + 10 + 1 = 12
° Interleaved M.P. = 1+10+1 + 3 =15
address address address address

0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Bank 0 Bank 1 Bank 2 Bank 3
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Independent Memory Banks
° How many banks?

number banks  number clocks to access word in bank
• For sequential accesses, otherwise will return to original bank
before it has next word ready
• For strided accesses (access ever N words), want prime number of
banks!!! (Address computation easy if prime of form 2x-1)
° Increasing DRAM => fewer chips => harder to have

banks
• Growth bits/chip DRAM : 50%-60%/yr
• Nathan Myrvold M/S: mature software growth
(33%/yr for NT) growth MB/$ of DRAM (25%-30%/yr)
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Fewer DRAMs/System over Time
(from Pete
MacWilliams,
Intel) DRAM Generation
‘86 ‘89 ‘92 ‘96 ‘99 ‘02
1 Mb 4 Mb 16 Mb 64 Mb 256 Mb 1 Gb
Minimum PC Memory Size
4 MB 32 8 Memory per
16 4 DRAM growth
8 MB @ 60% / year
16 MB 8 2
32 MB 4 1
Memory per 8 2
64 MB
System growth
128 MB @ 25%-30% / year 4 1
256 MB 8 2
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Administrative Issues
° Should be reading Chapter 7 of your book
° Second midterm: Wednesday November 28th
• Pipelining
- Hazards, branches, forwarding, CPI calculations
- (may include something on dynamic scheduling)
• Memory Hierarchy
• Possibly something on I/O (see where we get in lectures)
• Possibly something on power (Broderson Lecture)
° Lab 6: Hopefully all is well!
• Sorry about fact that burst writes don’t work: will fix this!
• Cache/DRAM Bus options:
- #0 (no extra credit): 32-bit bus, 1 DRAM, 4 RAS/CAS per cache line
- #1 (extra cred 2a): 64-bit bus, 2 DRAMs, 2 RAS/CAS per line
- #2 (extra cred 2b): 32-bit bus, 1 DRAM, 1 RAS/CAS, 3 CAS per line
• Be careful for option #1: single word write must only mod 1 DRAM!
- This would happen with write-through policy!
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
The Art of Memory System
Design
Workload or
Benchmark
programs
Processor
reference stream
<op,addr>, <op,addr>,<op,addr>,<op,addr>, . . .
op: i-fetch, read, write
Memory Optimize the memory system organization

$ to minimize the average memory access time
for typical workloads
MEM
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Example: 1 KB Direct Mapped Cache with 32 B
°Blocks
For a 2 ** N byte cache:
• The uppermost (32 - N) bits are always the Cache Tag
• The lowest M bits are the Byte Select (Block Size = 2M)
• One cache miss, pull in complete “Cache Block” (or “Cache Line”)
Block address
31 9 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 Ex: 0x00
Stored as part
of the cache “state”
Valid Bit Cache Tag Cache Data

Byte 31 Byte 1 Byte 0 0
: :
0x50 Byte 63 Byte 33 Byte 32 1
2
3
: : :
Byte 1023 Byte 992 31
:
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Set Associative Cache
° N-way set associative: N entries for each Cache Index
• N direct mapped caches operates in parallel
° Example: Two-way set associative cache

• Cache Index selects a “set” from the cache
• The two tags in the set are compared to the input in parallel
• Data is selected based on the tag result
Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0
: : : : : :
Adr Tag
Compare Sel1 1 Mux 0 Sel0 Compare
OR
Cache Block
Hit
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Disadvantage of Set Associative Cache
° N-way Set Associative Cache versus Direct Mapped
Cache:
• N comparators vs. 1
• Extra MUX delay for the data
• Data comes AFTER Hit/Miss decision and set selection
° In a direct mapped cache, Cache Block is available
BEFORE Hit/Miss:
• Possible to assume a hit and continue. Recover later if miss.
Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0
: : : : : :
Adr Tag
Compare Sel1 1 Mux 0 Sel0 Compare
OR
Cache Block
Hit CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Example: Fully
Associative
° Fully Associative Cache
• Forget about the Cache Index
• Compare the Cache Tags of all cache entries in parallel
• Example: Block Size = 32 B blocks, we need N 27-bit comparators
° By definition: Conflict Miss = 0 for a fully associative

cache
31 4 0
Cache Tag (27 bits long) Byte Select
Ex: 0x01
Cache Tag Valid Bit Cache Data

= Byte 31 Byte 1 Byte 0
: :
= Byte 63 Byte 33 Byte 32
=
=
=
: : :
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
A Summary on Sources of Cache Misses
° Compulsory (cold start or process migration,
first reference): first access to a block
• “Cold” fact of life: not a whole lot you can do about it
• Note: If you are going to run “billions” of instruction,
Compulsory Misses are insignificant
° Capacity:
• Cache cannot contain all blocks access by the program
• Solution: increase cache size
° Conflict (collision):
• Multiple memory locations mapped
to the same cache location
• Solution 1: increase cache size
• Solution 2: increase associativity
° Coherence (Invalidation): other process (e.g., I/O)

updates memory CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Design options at constant
cost
Direct Mapped N-way Set Associative Fully Associative
Cache Size Big Medium Small
Compulsory Miss Same Same Same
Conflict Miss High Medium Zero
Capacity Miss Low Medium High
Coherence Miss Same Same Same
Note:
If you are going to run “billions” of instruction, Compulsory Misses are insignificant
(except for streaming media types of programs).
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Recap: Four Questions for Caches and Memory Hierarchy
° Q1: Where can a block be placed in the upper

level? (Block placement)
° Q2: How is a block found if it is in the upper level?
(Block identification)
° Q3: Which block should be replaced on a miss?
(Block replacement)
° Q4: What happens on a write?
(Write strategy)
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Q1: Where can a block be placed in the upper
level?
° Block 12 placed in 8 block cache:
• Fully associative, direct mapped, 2-way set associative
• S.A. Mapping = Block Number Modulo Number Sets
Fully associative: Direct mapped: Set associative:

block 12 can go block 12 can go block 12 can go
anywhere only into block 4 anywhere in set 0
(12 mod 8) (12 mod 4)
Block 01234567 Block 01234567 Block 01234567
no. no. no.
Set Set Set Set

Block-frame address 0 1 2 3
Block 1111111111222222222233
no.
01234567890123456789012345678901
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Q2: How is a block found if it is in the upper
level?
Block Address Block

Tag Index offset
Set Select
Data Select
° Direct indexing (using index and block offset), tag

compares, or combination
° Increasing associativity shrinks index, expands tag
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Q3: Which block should be replaced on a
miss?
° Easy for Direct Mapped

° Set Associative or Fully Associative:
• Random
• LRU (Least Recently Used)
Associativity: 2-way 4-way 8-way

Size LRU Random LRU Random LRU Random
16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%
64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Q4: What happens on a
write?
° Write through—The information is written to both
the block in the cache and to the block in the lower-
level memory.
° Write back—The information is written only to the
block in the cache. The modified cache block is
written to main memory only when it is replaced.
• is block clean or dirty?
° Pros and Cons of each?

• WT: read misses cannot result in writes
• WB: no writes of repeated writes
° WT always combined with write buffers so that

don’t wait for lower level memory
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Write Buffer for Write
Through
Cache
Processor DRAM
Write Buffer
° A Write Buffer is needed between the Cache and

Memory
• Processor: writes data into the cache and the write buffer
• Memory controller: write contents of the buffer to memory
° Write buffer is just a FIFO:

• Typical number of entries: 4
• Must handle bursts of writes
• Works fine if: Store frequency (w.r.t. time) << 1 / DRAM write cycle
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Write Buffer
Saturation
Cache
Processor DRAM
Write Buffer
° Store frequency (w.r.t. time) > 1 / DRAM write cycle
• If this condition exist for a long period of time (CPU cycle time too
quick and/or too many store instructions in a row):
- Store buffer will overflow no matter how big you make it
- The CPU Cycle Time <= DRAM Write Cycle Time
° Solution for write buffer saturation:

• Use a write back cache
• Install a second level (L2) cache: (does this always work?)
Cache L2
Processor DRAM
Cache
Write Buffer CS152 / Kubiatowicz

11/14/01 ©UCB Fall 2001
RAW Hazards from Write
Buffer!
° Write-Buffer Issues: Could introduce RAW Hazard with
memory!
• Write buffer may contain only copy of valid data 
Reads to memory may get wrong result if we ignore write buffer
° Solutions:
• Simply wait for write buffer to empty before servicing reads:
- Might increase read miss penalty (old MIPS 1000 by 50% )
• Check write buffer contents before read (“fully associative”);
- If no conflicts, let the memory access continue
- Else grab data from buffer
° Can Write Buffer help with Write Back?

• Read miss replacing dirty block
- Copy dirty block to write buffer while starting read to memory
• CPU stall less since restarts as soon as do read
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Write-miss Policy: Write Allocate versus Not
Allocate
° Assume: a 16-bit write to memory location 0x0 and
causes a miss
• Do we allocate space in cache and possibly read in the block?
- Yes: Write Allocate
- No: Not Write Allocate
31 9 4 0
Cache Tag Example: 0x00 Cache Index Byte Select
Ex: 0x00 Ex: 0x00
Valid Bit Cache Tag Cache Data

0x50 Byte 31 Byte 1 Byte 0 0
: :
Byte 63 Byte 33 Byte 32 1
2
3
: : :
Byte 1023 Byte 992 31
:
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Impact of Memory Hierarchy on
Algorithms
° Today CPU time is a function of (ops, cache misses)
° What does this mean to Compilers, Data structures,
Algorithms?
• Quicksort:
fastest comparison based sorting algorithm when keys fit in memory
• Radix sort: also called “linear time” sort
For keys of fixed length and fixed radix a constant number of passes
over the data is sufficient independent of the number of keys
° “The Influence of Caches on the Performance of

Sorting” by A. LaMarca and R.E. Ladner. Proceedings of
the Eighth Annual ACM-SIAM Symposium on Discrete
Algorithms, January, 1997, 370-379.
• For Alphastation 250, 32 byte blocks, direct mapped L2 2MB cache, 8
byte keys, from 4000 to 4000000
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Quicksort vs. Radix as vary number keys:
Instructions
Radix sort
Quick
sort Instructions/key
Job size in keys CS152 / Kubiatowicz

11/14/01 ©UCB Fall 2001
Quicksort vs. Radix as vary number keys: Instrs &
Time
Radix sort
Time
Quick
sort
Instructions
Job size in keys CS152 / Kubiatowicz

11/14/01 ©UCB Fall 2001
Quicksort vs. Radix as vary number keys: Cache misses
Radix sort
Cache misses
Quick
sort
Job size in keys
What is proper approach to fast algorithms?

CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Summary #1/ 2:
° The Principle of Locality:

• Program likely to access a relatively small portion of the address
space at any instant of time.
- Temporal Locality: Locality in Time
- Spatial Locality: Locality in Space
° Three (+1) Major Categories of Cache Misses:

• Compulsory Misses: sad facts of life. Example: cold start misses.
• Conflict Misses: increase cache size and/or associativity.
Nightmare Scenario: ping pong effect!
• Capacity Misses: increase cache size
• Coherence Misses: Caused by external processors or I/O devices
° Cache Design Space

• total size, block size, associativity
• replacement policy
• write-hit policy (write-through, write-back)
• write-miss policy CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001
Summary #2 / 2: The Cache Design Space
° Several interacting dimensions Cache Size

• cache size
• block size Associativity
• associativity
• replacement policy
• write-through vs write-back
• write allocation
Block Size
° The optimal choice is a compromise
• depends on access characteristics
- workload Bad
- use (I-cache, D-cache, TLB)
• depends on technology / cost
Good Factor A Factor B
° Simplicity often wins Less More
CS152 / Kubiatowicz
11/14/01 ©UCB Fall 2001

CS152 Computer Architecture and Engineering Memory Systems (Recap) Caches

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CS152 Computer Architecture and Engineering Memory Systems (Recap) Caches

Uploaded by

Copyright:

Available Formats

CS152

Computer Architecture and Engineering

Memory Systems (recap)

November 11th, 2001

lecture slides: http://www-inst.eecs.berkeley.edu/~cs152/

° The Five Classic Components of a Computer

Processor-DRAM Memory Gap (latency)

Datapath Cache (DRAM)

Speed (ns): 1s 10s 100s 10,000,000s 10,000,000,000s

0 Address Space 2^n - 1

° Temporal Locality (Locality in Time):

° Spatial Locality (Locality in Space):

Average Memory Access time (AMAT) =

D High Z Junk Data Out High Z Data Out

° Careful of timing specs!

Tc = Tcycle + Tcontroller + Tdriver

Access Time Time

° DRAM (Read/Write) Cycle Time >> DRAM

° DRAM (Read/Write) Cycle Time :

° DRAM (Read/Write) Access Time:

address address address address

Bank 0 Bank 1 Bank 2 Bank 3

° How many banks?

° Increasing DRAM => fewer chips => harder to have

op: i-fetch, read, write

Memory Optimize the memory system organization

Valid Bit Cache Tag Cache Data

° Example: Two-way set associative cache

° By definition: Conflict Miss = 0 for a fully associative

Cache Tag Valid Bit Cache Data

° Coherence (Invalidation): other process (e.g., I/O)

Direct Mapped N-way Set Associative Fully Associative

Cache Size Big Medium Small

Compulsory Miss Same Same Same

Conflict Miss High Medium Zero

Capacity Miss Low Medium High

Coherence Miss Same Same Same

° Q1: Where can a block be placed in the upper

Fully associative: Direct mapped: Set associative:

Set Set Set Set

Block Address Block

° Direct indexing (using index and block offset), tag

° Easy for Direct Mapped

Associativity: 2-way 4-way 8-way

° Pros and Cons of each?

° WT always combined with write buffers so that

° A Write Buffer is needed between the Cache and

° Write buffer is just a FIFO:

° Solution for write buffer saturation:

Write Buffer CS152 / Kubiatowicz

° Can Write Buffer help with Write Back?

Valid Bit Cache Tag Cache Data

° “The Influence of Caches on the Performance of

Job size in keys CS152 / Kubiatowicz

Job size in keys CS152 / Kubiatowicz

Job size in keys

What is proper approach to fast algorithms?

° The Principle of Locality:

° Three (+1) Major Categories of Cache Misses:

° Cache Design Space

° Several interacting dimensions Cache Size

° Simplicity often wins Less More

You might also like