You are on page 1of 16

cache 1

Mapping of CACHE
Main memory- 256B, address bits for byte access 8,block size 8B , address bits for block access 32,total 32 blocks
Cache memory -128B, cache line size 8B, Number of cache lines-16
Mapping of CPU generated logical address in given cache
Tag 5-bit
Offset-3 bit
Fully associative (16-way set associative) , one set only
CPU generated
0-bit
Address (hex)
Tag
Valid bit
Cache line
Index
Set 0
0
Set 0
0
48
0 1001
1
Block 9
0
0
F4
1 1110
1
Block 30
0
0
0
D3
1 1010
1
Block 26
DC
1 1011
1
Block 27
O6
0 0000
1
Block 0
0
0
E2
1 1100
1
Block 28
1A
0 0011
1
Block 3

Example of a block of 8 Byte


Block address ooooo
Byte address within block
Ooooo xxx
So for 0th to 7th Byte
Block 0 is Selected

Ooooo ooo
Ooooo oo1
Ooooo o1o
Ooooo o11
Ooooo 1oo
Ooooo 1o1
Ooooo 11o
Ooooo 111

Page 1

Main memory
Block 0
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Block 8
Block 9
Block 10
Block 11
Block 12
Block 13
Block 14
Block 15
Block 16
Block 17
Block 18
Block 19
Block 20
Block 21
Block 22
Block 23
Block 24
Block 25
Block 26
Block 27
Block 28
Block 29
Block 30
Block 31

5-bit
Tag value
0 0000
0 0001
0 0010
0 0011
0 0100
0 0101
0 0110
0 0111
0 1000
0 1001
0 1010
0 1011
0 1100
0 1101
0 1110
0 1111
1 0000
1 0001
1 0010
1 0011
1 0100
1 0101
1 0110
1 0111
1 1000
1 1001
1 1010
1 1011
1 1100
1 1101
1 1110
1 1111

Block
Address
0 0 (oo o7)
08
10
1 8 (18-1F)
20
28
30
38
40
4 8 (48-4F)
50
58
60
68
70
78
80
88
90
98
A0
A8
B0
B8
C0
C8
D 0(D0-D7)
D 8 (D8-DF)
E 0(E0-E7)
E8
F 0 (F0-F7)
F8

cache 1
Example: valid bit of all cache lines are having 0
CPU generates following addresses- 48, F4, 06, E2, F4, DC, 1A, DC, E2
As fully associative, any block can go any where in cache.
Six compulsory miss, brought to cache as result of miss. And valid bits are set
Other addresses result in hit as block is already in cache
All the requested blocks are present results in 9 hits and 6 misses.
Example: cache is already having memory blocks as shown
CPU generates following addresses- 48, F4, 06, E2, F4, DC, 1A, DC, E2
As fully associative, any block can go any where in cache.
All the requested blocks are present results in 9 hits.

Page 2

cache 1
Set associative (8-way set associative) , two sets
CPU generated
Address (hex)
Tag
Set 0
77
0 111
47
0 100

03
F3
E2

0 000
1 111
1 110

D3
DC
18

1 101
1 101
0 001

7A
99

0 111
1 001

Set 1

Tag 4-bit
Valid bit
0
1
1
0
0
1
1
1
0
1
1
1
0
0
1
1

Set-1 bit
Cache line

Offset-3 bit
1-bit
Index
Set 0
0

Block 14
Block 8

Block 0
Block 30
Block 28

Block 26
Block 27
Block 3
Block 15
Block 19

Page 3

Set 1

Main memory
Block 0
Block 2
Block 4
Block 6
Block 8
Block 10
Block 12
Block 14
Block 16
Block 18
Block 20
Block 22
Block 24
Block 26
Block 28
Block 30

4-bit
Tag value
0 000
0 001
0 010
0 011
0 100
0 101
0 110
0 111
1 000
1 001
1 010
1 011
1 100
1 101
1 110
1 111

Block
Address
00
10
20
30
40
50
60
70
80
90
A0
B0
C0
D0
E0
F0

Block 1
Block 3
Block 5
Block 7
Block 9
Block 11
Block 13
Block 15
Block 17
Block 19
Block 21
Block 23
Block 25
Block 27
Block 29
Block 31

0 000
0 001
0 010
0 011
0 100
0 101
0 110
0 111
1 000
1 001
1 010
1 011
1 100
1 101
1 110
1 111

08
18
28
38
48
58
68
78
88
98
A8
B8
C8
D8
E8
FF

cache 1
Set associative (4-way set associative) , four sets
CPU generated
Address (hex)
Tag
Set 0
23
1 00
83
1 00
03
0 00

Tag 3-bit
Valid bit
0
1
1
1

Set 1
4B
E9

Set 2
F2
30
B1

1 11
0 01
1 10

79
DC
BE

0 11
1 10
1 01

Cache line

Set 3

TAG
3

Decoder to select a
SET

4 comperators and
4 & gates

Miss
either TAG not matched or
valid bit o
5

2-bit
Index
Set 0 0 0

Block 9
Block 29

0
1
1
1

Block 30
Block 6
Block 22

0
1
1
1

Block 15
Block 27
Block 23

CPU generated address


TAG
index
Word offset

index

Offset-3 bit

Block 4
Block 16
Block 0

0
1
1
0

0 10
1 11

Set-2 bit

Logic to
select
requested
cache line

Word
offset
4
HIT

Block 1
Block 5
Block 9
Block 13
Block 17
Block 21
Block 25
Block 29

0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11

08
28
48
68
88
A8
C8
E8

Set 2

10

Block 2
Block 6
Block 10
Block 14
Block 18
Block 22
Block 26
Block 30

0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11

10
30
50
70
90
B0
D0
F0

Set 3

11

Block 3
Block 7
Block 11
Block 15
Block 19
Block 23
Block 27
Block 31

0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11

18
38
58
78
98
B8
D8
F8

Decoder to select a
word

Read block from next


level memory

Page 4

Block
Address
00
20
40
60
80
A0
C0
E0

01

Word to CPU

3-bit
Tag value
0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11

Set 1

Decoder to select a
SET

Main memory
Block 0
Block 4
Block 8
Block 12
Block 16
Block 20
Block 24
Block 28

cache 1
Working1. CPU generates logical address . This is interpreted as tag, index and word offset as per cache organization in hardware
2. Index bits are applied to decoder to select one of the set of tags and corresponding valid bits
3. Tags of selected set are compared simultaneously with input tag if the corresponding valid bit is set.
If
4. Requested block is not in cache then there is no match between requested tag and stored tag . It is known as miss.
5. In case of miss, memory management unit reads the requested block from next level memory and place the new block either in a cache line
which has 0 valid bit or use one of the replacement methods to find target cache line.
Else
4. Requested block is present in cache,it results in match. It is known as hit
5. The corresponding cache line along with hit signal and word offset bits are applied to decoder
6. Word offset selects the desired word from the cache line.

Page 5

cache 1
Set associative (2-way set associative) , eight sets
Tag 2-bit
CPU generated
Address (hex)
Tag
Valid bit
03
00
Set 0
1
C7
11
1
88

10

14

00

DA

11

Set-3 bit
Cache line
Block 0
Block 24

Set 1

1
0

Set 2

0
1

Block 2

Block 27

Set 3

Block 17

65
27

01
00

Set 4

1
1

Block 12
Block 4

AB

10

Set 5

1
0

Block 21

Set 6
B2

10

0
1

Block 22

FF
BC

11
10

1
1

Block 31
Block 23

Set 7

Page 6

Offset-3 bit
3-bit
Index Main memory
Set 0 0 0 0 Block 0
Block 8
Block 16
Block 24

2-bit
Tag value
00
01
10
11

Block
Address
00
40
80
C0

Set 1 0 0 1 Block 1
Block 9
Block 17
Block 25

00
01
10
11

08
48
88
C8

Set 2 0 1 0 Block 2
Block 10
Block 18
Block 26

00
01
10
11

10
50
90
D0

Set 3 0 1 1 Block 3
Block 11
Block 19
Block 27

00
01
10
11

18
58
98
D8

Set 4 1 0 0 Block 4
Block 12
Block 20
Block 28

00
01
10
11

20
60
A0
E0

Set 5 1 0 1 Block 5
Block 13
Block 21
Block 29

00
01
10
11

28
68
A8
E8

Set 6 1 1 0 Block 6
Block 14
Block 22
Block 30

00
01
10
11

30
70
B0
F0

Set 7 1 1 1 Block 7
Block 15
Block 23
Block 31

00
01
10
11

38
78
B8
F8

cache 1
Example:A purely sequential programme occupies address space 80-CB in main memory
(assume each instruction takes one address- byte size, cache size is 8 Bytes))
CPU executes the programme once.(each instruction executed once in sequence)
Calculate number of hit and miss
- it is 2-way set associative cache . two blocks can occupy cache at a time
First instruction- address 80h compulsory miss, loads block 16 from MM in set 0 of cache.
Next 7 addresses (80 to 87h,in second row of set 0 in cache) are hit
8th address causes-88h- compulsory miss-loads block 17 from MM, in set1 of cache (in empty cache line-first row)
Next 7 addresses (88 to 8Fh,in first row of set 1 in cache)are hit
address causes-C2h- conflict miss-load block 24 from MM, in set 0 of cache (one of the cache line overwritten-say LRU used,second row)
Next 7 addresses (C0 to C7h,in second row of set 0 in cache) are hit

Address C8 causes- conflict miss-load block 25 from MM,in set 1 of cache (one of the cache line overwritten-say LRU used,first row)
Next 4 addresses ,in first row of set 1 in cache)are hit
Total reference 32, 4 causes miss and 28 hits

Total

Page 7

Miss-1 Hit - 0
Miss-0 Hit 8
Miss-1 Hit - 0
Miss-0 Hit 8
Miss-1 Hit - 0
Miss-0 Hit 8
Miss-1 Hit - 0
Miss-0 Hit 4
Miss-4 Hit 28

cache 1
Direct mapped (1-way set associative) , sixteen sets
Set 4-bit
CPU generated
Address (hex)
Tag
Valid bit
03
0
Set 0
1

Tag-1 bit
Cache line
Block 0

19

Set 1

Block 3

23

Set 2

Block 4

Set 3

Set 4

58

Set 5

Block 11

6B

Set 6

Block 13

Set 7

89

Set 8

Block 17

9A

Set 9

Block 19

AB

Set 10

Block 21

Set 11

Set 12

Set 13

Set 14

Set 15

D3

F3

Offset-3 bit
4-bit
Index Main memory
Set 0 0 0 0 0 Block 0
Block 1

1-bit
Tag value
0
1

Block
Address
00
08

Set 1 0 0 0 1 Block 2
Block 3

0
1

10
18

Set 2 0 0 1 0 Block 4
Block 5

0
1

20
28

Set 3 0 0 1 1 Block 6
Block 7

0
1

30
38

Set 4 0 1 0 0 Block 8
Block 9

0
1

40
48

Set 5 0 1 0 1 Block 10
Block 11

0
1

50
58

Set 6 0 1 1 0 Block 12
Block 13

0
1

60
68

Set 7 0 1 1 1 Block 14
Block 15

0
1

70
78

Set 8 1 0 0 0 Block 16
Block 17

0
1

80
88

Set 9 1 0 0 1 Block 18
Block 19

0
1

90
98

Set 10 1 0 1 0 Block 20
Block 21

0
1

A0
A8

Set 11 1 0 1 1 Block 22
Block 23

0
1

B0
B8

Set 12 1 1 0 0 Block 24
Block 25

0
1

C0
C8

Set 13 1 1 0 1 Block 26
Block 27

0
1

D0
D8

Block 26

Block 30

Here index bits and tag are interchanged. The logical interpretation also changes

Page 8

cache 1
Please refer class notes. Spring 2009

Page 9

Set 14 1 1 1 0 Block 28
Block 29

0
1

E0
E8

Set 15 1 1 1 1 Block 30
Block 31

0
1

F0
F8

cache 2
Cache memory
Portion of main memory is copied to the faster memory which is closer to processor.
Main memory of 64KB and cache of 1KB
CPU generates effective address (logical address)- say 16-bit
16- bit

Byte

Address

If block size (cache line) is 64B. Then log 2 64 bits are required to select one of 64 Bytes in
the block(cache line) 6-bits is the offset bits
10- bit Block

Address

Offset

There are 210 blocks of main memory each of 64bytes


Logical view of main memory
BLOCK number
OO 0000 0000
A15-6

Byte offset
With in block
OO OOOO

0th
BLOCK

OO 0000 0001

A5-0
.
.
11 1111
OO OOOO

1st
BLOCK

.
.
.
11 1111 1110

OO OOO1

.
.
.

As cache requires 10-bits (1 KB cache)to access a byte


So there are 24 cache lines each of 64 bytes
Logical view of cache
Byte offset
Cache line number
With in cache line
OO00
OO OOOO
0th
OO OOO1
Cache line
.
.
.
11 1111
OO01
OO OOOO

OO OOO1

1st

OO OOO1

.
.
.
11 1111
.
.
.

Cache line

.
.
.
11 1111
.
.
.

.
.
.
1110

OO OOOO

1022
BLOCK
nd

11 1111 1111

OO OOOO

OO OOO1

14th

OO OOO1

.
.
.
11 1111

Cache line

.
.
.
11 1111

1111

OO OOOO

1023rd
BLOCK

.
.
.

OO OOOO

OO OOO1

15th

OO OOO1

.
.
.
11 1111

Cache line

.
.
.
11 1111

Page 10

cache 2
Fully associative cache (16-way set associative cache)
CPU address is interpreted as
10- bit Block Address
Offset
10-bit block size is termed as tag to identify the block having requested byte
10- bit
TAG
Offset
Any 16 blocks of main memory can be stored in cache.
Any block can be stored in any cache line
As TAG differentiate block ID , so the tag of each block is also stored in tag line (respective cache line)
Byte offset
10-bit Tag line Cache line number
With in cache line
OO00
OO OOOO
0th
There can be only one of the comparator that generates
OO OOO1
Cache line
.
HIT at most. - when the block having the requested byte
Comparator
.
Is mapped on that cache line
H
.
M
11 1111
If none of the comparators generate HIT It means
OO01
the block having requested byte is not read from the
OO OOOO
1st
Main memory To the cache
OO OOO1
Cache line
.
Comparator
.
Valid bit and dirty bit are not included in this example
H
.
11 1111
There are 16 cache lines where a block can be mapped
M
.
.
.
.
without any restriction. There are 16 ways a block be
.
.
.
.
Placed in cache- 16-way set associative cache.
.
.
.
.
1110
OO OOOO

Comparator
M

14th

OO OOO1

Cache line

.
.
.
11 1111

H
1111

Comparator
M

OO OOOO
15th

OO OOO1

Cache line

.
.
.
11 1111

Index bits= log 2 (No. of cache line/No. Of way)

Page 11

cache 2
Set associative cache (8-way set associative cache)
CPU address is interpreted as
1-bit index
Offset
9-bit Block address
In 10-bit block No, 9-bit tag and 1-bit index is used to identify the block having requested byte
9- bit TAG
1-bit index
Offset
Cache lines are placed in 21 groups. And one of these groups are selected by decoder
Index bits are used to select one of the 2 1 groups. It is implemented through decoder ( 2 k :1 )
Logical view of the Main memory (one LSB bit of 10-bit block No. is used as index bit)
Index bits (A6)
Index bits (A6)
O
1
Tag number
O 0000 0000
A15-7

OO OOOO

0th
BLOCK

O 0000 0001

OO OOO1

A5-0
.
11 1111

.
Group
.
0
.

OO OOOO
1st

OO OOO1

BLOCK

A5-0
.
11 1111
OO OOOO

OO OOO1

3th

OO OOO1

.
.
11 1111
.
.
.

BLOCK

.
.
11 1111
.
.
.

.
.
.
1 1111 1110

OO OOOO
1020th

OO OOO1

BLOCK

.
.
11 1111

1 1111 1111

Byte No. with in block

O 0000 0001

OO OOOO

2nd
BLOCK
.
.
.
1 1111 1110

BLOCK number
O 0000 0000
A15-7

Byte No. with in block

OO OOO1

BLOCK

.
.
11 1111

Block Numbers in group 0 (index bit-0)


0,2,4,6,..... ..... 1020,1022
OR another logical view of the Main memory
Tag number
O 00000000 O 0000 0001
Index bits (A6)
O
0th
2nd
BLOCK
BLOCK

OO OOOO

1021std
BLOCK

1 1111 1111

OO OOOO
1022nd

.
Group
1.
.

OO OOO1

.
.
11 1111
OO OOOO

1023rd
BLOCK

OO OOO1

.
.
11 1111

Block Numbers in group 1 (index bit-1)


1,3,5,7,..... ..... 1021,1023
1 1111 1111
6-bit Byte offset
OO OOOO

One group of. 512-blocks .

1022nd

OO OOO1

BLOCK

.
.
11 1111

OO OOOO
1th

3th

BLOCK

BLOCK

1023rd
BLOCK

Page 12

OO OOO1

.
.
11 1111

cache 2
Block Numbers in group 0 (index bit-0)
0,2,4,6,..... ..... 1020,1022

Block Numbers in group 1 (index bit-1)


1,3,5,7,..... ..... 1021,1023

Blocks of same index No. map to the same group of cache. Since there are 9-bit tag, 512 blocks go to one group
of 8-cachelines.
Set associative cache (4-way set associative cache)
CPU address is interpreted as
2-bit index
Offset
8-bit Block address
In 10-bit block No, 8-bit tag and 2-bit index is used to identify the block having requested byte
8- bit TAG
2-bit index
Offset
Cache lines are placed in 22 groups. And one of these groups are selected by decoder
Index bits are used to select one of the 2 2 groups. It is implemented through decoder ( 2 k :1 )
Logical view of the Main memory (two LSB bits of 10-bit block No. is used as index bit)
Tag number
O000 0000
Index bits (A7-6)
OO
0th
BLOCK

O000 0001

1111 1111
6-bit Byte offset
OO OOOO

4st

BLOCK

One group .of 256- blocks.

1020th

OO OOO1

BLOCK

.
.
11 1111

O1

OO OOOO
1st

5th

1021st

OO OOO1

BLOCK

BLOCK

BLOCK

.
.
11 1111

2nd
BLOCK

6th
BLOCK

1022nd

OO OOO1

BLOCK

.
.
11 1111

3rd

7th

BLOCK

BLOCK

1O

OO OOOO

11

OO OOOO

1023rd
BLOCK

OO OOO1

.
.
11 1111

Blocks of same index No. map to the same group of cache. Since there are 8-bit tag, 256 blocks go to one group
of 4-cache lines.
(If combination of block address bits are taken as index bits- ???)

Page 13

cache 2

Set associative cache (1-way set associative cache)-Direct mapped


CPU address is interpreted as
4-bit index
Offset
6-bit Block address
In 10-bit block No, 6-bit tag and 4-bit index is used to identify the block having requested byte
6- bit TAG
4-bit index
Offset
Cache lines are placed in 24 groups. And one of these groups are selected by decoder
Index bits are used to select one of the 2 4 groups. It is implemented through decoder ( 2 k :1 )
Logical view of the Main memory (4 LSB bits of 10-bit block No. is used as index bit)
Tag number
O000 00
Index bits (A9-6)

O000 01
16th
BLOCK

1sth

17th

BLOCK

BLOCK

2nd

18th

BLOCK

BLOCK

3rd

19th

BLOCK

BLOCK

4th

BLOCK

20th
BLOCK

.
.
.

.
.
.

.
.
.
1o11

Ooo1

Oo10

Oo11

O1oo

11oo

11o1

111o

1111 11
6-bit Byte offset

0th
BLOCK

Oooo

1008th

One group of. 64-blocks

BLOCK
1009th

BLOCK
1010th

BLOCK
1011th

BLOCK
1012th

OO OOOO
.
OO OOOO
.
OO OOOO
.
OO OOOO
.
OO OOOO

BLOCK

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

.
.
.

11th

27th

BLOCK

BLOCK

12th

28th

BLOCK

BLOCK

13th

29th

BLOCK

BLOCK

14th

30th

BLOCK

BLOCK

1019th

BLOCK
1020th

BLOCK
1021st

BLOCK
1022nd

Page 14

BLOCK

OO OOOO
.
OO OOOO
.
OO OOOO
.
OO OOOO

cache 2
1111

15th

BLOCK

31st
BLOCK

1023th

OO OOOO

BLOCK

11 1111

Blocks of same index No. map to one group of cache. Since there are 6-bit tag, 64 blocks go to one group
of 1-cache line.

Cache Latency
Time to return requested data.
(assuming fully associative)
T1- time to access tag array
T2- time to perform tag comparison
T3- time to access cache data array
T4- time to return selected data or report miss
T1 +T2 and T3 a simultaneous action
So latency is T1+T2 +T4 OR T3+T4 which ever greater
So hit or miss both takes same time

Page 15

cache 3
Memory access hierarchy

CPU
Request

1
0

LEVEL I

8 HIT

MISS HIT

Update LEVEL 1
2

LEVEL II

MISS HIT

Update LEVEL 1I
6
3

LEVEL III

MISS HIT

Update LEVEL 1II


5
4

LEVEL IV

HIT

At level n

Page 22

You might also like