Professional Documents
Culture Documents
Mapping of CACHE
Main memory- 256B, address bits for byte access 8,block size 8B , address bits for block access 32,total 32 blocks
Cache memory -128B, cache line size 8B, Number of cache lines-16
Mapping of CPU generated logical address in given cache
Tag 5-bit
Offset-3 bit
Fully associative (16-way set associative) , one set only
CPU generated
0-bit
Address (hex)
Tag
Valid bit
Cache line
Index
Set 0
0
Set 0
0
48
0 1001
1
Block 9
0
0
F4
1 1110
1
Block 30
0
0
0
D3
1 1010
1
Block 26
DC
1 1011
1
Block 27
O6
0 0000
1
Block 0
0
0
E2
1 1100
1
Block 28
1A
0 0011
1
Block 3
Ooooo ooo
Ooooo oo1
Ooooo o1o
Ooooo o11
Ooooo 1oo
Ooooo 1o1
Ooooo 11o
Ooooo 111
Page 1
Main memory
Block 0
Block 1
Block 2
Block 3
Block 4
Block 5
Block 6
Block 7
Block 8
Block 9
Block 10
Block 11
Block 12
Block 13
Block 14
Block 15
Block 16
Block 17
Block 18
Block 19
Block 20
Block 21
Block 22
Block 23
Block 24
Block 25
Block 26
Block 27
Block 28
Block 29
Block 30
Block 31
5-bit
Tag value
0 0000
0 0001
0 0010
0 0011
0 0100
0 0101
0 0110
0 0111
0 1000
0 1001
0 1010
0 1011
0 1100
0 1101
0 1110
0 1111
1 0000
1 0001
1 0010
1 0011
1 0100
1 0101
1 0110
1 0111
1 1000
1 1001
1 1010
1 1011
1 1100
1 1101
1 1110
1 1111
Block
Address
0 0 (oo o7)
08
10
1 8 (18-1F)
20
28
30
38
40
4 8 (48-4F)
50
58
60
68
70
78
80
88
90
98
A0
A8
B0
B8
C0
C8
D 0(D0-D7)
D 8 (D8-DF)
E 0(E0-E7)
E8
F 0 (F0-F7)
F8
cache 1
Example: valid bit of all cache lines are having 0
CPU generates following addresses- 48, F4, 06, E2, F4, DC, 1A, DC, E2
As fully associative, any block can go any where in cache.
Six compulsory miss, brought to cache as result of miss. And valid bits are set
Other addresses result in hit as block is already in cache
All the requested blocks are present results in 9 hits and 6 misses.
Example: cache is already having memory blocks as shown
CPU generates following addresses- 48, F4, 06, E2, F4, DC, 1A, DC, E2
As fully associative, any block can go any where in cache.
All the requested blocks are present results in 9 hits.
Page 2
cache 1
Set associative (8-way set associative) , two sets
CPU generated
Address (hex)
Tag
Set 0
77
0 111
47
0 100
03
F3
E2
0 000
1 111
1 110
D3
DC
18
1 101
1 101
0 001
7A
99
0 111
1 001
Set 1
Tag 4-bit
Valid bit
0
1
1
0
0
1
1
1
0
1
1
1
0
0
1
1
Set-1 bit
Cache line
Offset-3 bit
1-bit
Index
Set 0
0
Block 14
Block 8
Block 0
Block 30
Block 28
Block 26
Block 27
Block 3
Block 15
Block 19
Page 3
Set 1
Main memory
Block 0
Block 2
Block 4
Block 6
Block 8
Block 10
Block 12
Block 14
Block 16
Block 18
Block 20
Block 22
Block 24
Block 26
Block 28
Block 30
4-bit
Tag value
0 000
0 001
0 010
0 011
0 100
0 101
0 110
0 111
1 000
1 001
1 010
1 011
1 100
1 101
1 110
1 111
Block
Address
00
10
20
30
40
50
60
70
80
90
A0
B0
C0
D0
E0
F0
Block 1
Block 3
Block 5
Block 7
Block 9
Block 11
Block 13
Block 15
Block 17
Block 19
Block 21
Block 23
Block 25
Block 27
Block 29
Block 31
0 000
0 001
0 010
0 011
0 100
0 101
0 110
0 111
1 000
1 001
1 010
1 011
1 100
1 101
1 110
1 111
08
18
28
38
48
58
68
78
88
98
A8
B8
C8
D8
E8
FF
cache 1
Set associative (4-way set associative) , four sets
CPU generated
Address (hex)
Tag
Set 0
23
1 00
83
1 00
03
0 00
Tag 3-bit
Valid bit
0
1
1
1
Set 1
4B
E9
Set 2
F2
30
B1
1 11
0 01
1 10
79
DC
BE
0 11
1 10
1 01
Cache line
Set 3
TAG
3
Decoder to select a
SET
4 comperators and
4 & gates
Miss
either TAG not matched or
valid bit o
5
2-bit
Index
Set 0 0 0
Block 9
Block 29
0
1
1
1
Block 30
Block 6
Block 22
0
1
1
1
Block 15
Block 27
Block 23
index
Offset-3 bit
Block 4
Block 16
Block 0
0
1
1
0
0 10
1 11
Set-2 bit
Logic to
select
requested
cache line
Word
offset
4
HIT
Block 1
Block 5
Block 9
Block 13
Block 17
Block 21
Block 25
Block 29
0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11
08
28
48
68
88
A8
C8
E8
Set 2
10
Block 2
Block 6
Block 10
Block 14
Block 18
Block 22
Block 26
Block 30
0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11
10
30
50
70
90
B0
D0
F0
Set 3
11
Block 3
Block 7
Block 11
Block 15
Block 19
Block 23
Block 27
Block 31
0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11
18
38
58
78
98
B8
D8
F8
Decoder to select a
word
Page 4
Block
Address
00
20
40
60
80
A0
C0
E0
01
Word to CPU
3-bit
Tag value
0 00
0 01
0 10
0 11
1 00
1 01
1 10
1 11
Set 1
Decoder to select a
SET
Main memory
Block 0
Block 4
Block 8
Block 12
Block 16
Block 20
Block 24
Block 28
cache 1
Working1. CPU generates logical address . This is interpreted as tag, index and word offset as per cache organization in hardware
2. Index bits are applied to decoder to select one of the set of tags and corresponding valid bits
3. Tags of selected set are compared simultaneously with input tag if the corresponding valid bit is set.
If
4. Requested block is not in cache then there is no match between requested tag and stored tag . It is known as miss.
5. In case of miss, memory management unit reads the requested block from next level memory and place the new block either in a cache line
which has 0 valid bit or use one of the replacement methods to find target cache line.
Else
4. Requested block is present in cache,it results in match. It is known as hit
5. The corresponding cache line along with hit signal and word offset bits are applied to decoder
6. Word offset selects the desired word from the cache line.
Page 5
cache 1
Set associative (2-way set associative) , eight sets
Tag 2-bit
CPU generated
Address (hex)
Tag
Valid bit
03
00
Set 0
1
C7
11
1
88
10
14
00
DA
11
Set-3 bit
Cache line
Block 0
Block 24
Set 1
1
0
Set 2
0
1
Block 2
Block 27
Set 3
Block 17
65
27
01
00
Set 4
1
1
Block 12
Block 4
AB
10
Set 5
1
0
Block 21
Set 6
B2
10
0
1
Block 22
FF
BC
11
10
1
1
Block 31
Block 23
Set 7
Page 6
Offset-3 bit
3-bit
Index Main memory
Set 0 0 0 0 Block 0
Block 8
Block 16
Block 24
2-bit
Tag value
00
01
10
11
Block
Address
00
40
80
C0
Set 1 0 0 1 Block 1
Block 9
Block 17
Block 25
00
01
10
11
08
48
88
C8
Set 2 0 1 0 Block 2
Block 10
Block 18
Block 26
00
01
10
11
10
50
90
D0
Set 3 0 1 1 Block 3
Block 11
Block 19
Block 27
00
01
10
11
18
58
98
D8
Set 4 1 0 0 Block 4
Block 12
Block 20
Block 28
00
01
10
11
20
60
A0
E0
Set 5 1 0 1 Block 5
Block 13
Block 21
Block 29
00
01
10
11
28
68
A8
E8
Set 6 1 1 0 Block 6
Block 14
Block 22
Block 30
00
01
10
11
30
70
B0
F0
Set 7 1 1 1 Block 7
Block 15
Block 23
Block 31
00
01
10
11
38
78
B8
F8
cache 1
Example:A purely sequential programme occupies address space 80-CB in main memory
(assume each instruction takes one address- byte size, cache size is 8 Bytes))
CPU executes the programme once.(each instruction executed once in sequence)
Calculate number of hit and miss
- it is 2-way set associative cache . two blocks can occupy cache at a time
First instruction- address 80h compulsory miss, loads block 16 from MM in set 0 of cache.
Next 7 addresses (80 to 87h,in second row of set 0 in cache) are hit
8th address causes-88h- compulsory miss-loads block 17 from MM, in set1 of cache (in empty cache line-first row)
Next 7 addresses (88 to 8Fh,in first row of set 1 in cache)are hit
address causes-C2h- conflict miss-load block 24 from MM, in set 0 of cache (one of the cache line overwritten-say LRU used,second row)
Next 7 addresses (C0 to C7h,in second row of set 0 in cache) are hit
Address C8 causes- conflict miss-load block 25 from MM,in set 1 of cache (one of the cache line overwritten-say LRU used,first row)
Next 4 addresses ,in first row of set 1 in cache)are hit
Total reference 32, 4 causes miss and 28 hits
Total
Page 7
Miss-1 Hit - 0
Miss-0 Hit 8
Miss-1 Hit - 0
Miss-0 Hit 8
Miss-1 Hit - 0
Miss-0 Hit 8
Miss-1 Hit - 0
Miss-0 Hit 4
Miss-4 Hit 28
cache 1
Direct mapped (1-way set associative) , sixteen sets
Set 4-bit
CPU generated
Address (hex)
Tag
Valid bit
03
0
Set 0
1
Tag-1 bit
Cache line
Block 0
19
Set 1
Block 3
23
Set 2
Block 4
Set 3
Set 4
58
Set 5
Block 11
6B
Set 6
Block 13
Set 7
89
Set 8
Block 17
9A
Set 9
Block 19
AB
Set 10
Block 21
Set 11
Set 12
Set 13
Set 14
Set 15
D3
F3
Offset-3 bit
4-bit
Index Main memory
Set 0 0 0 0 0 Block 0
Block 1
1-bit
Tag value
0
1
Block
Address
00
08
Set 1 0 0 0 1 Block 2
Block 3
0
1
10
18
Set 2 0 0 1 0 Block 4
Block 5
0
1
20
28
Set 3 0 0 1 1 Block 6
Block 7
0
1
30
38
Set 4 0 1 0 0 Block 8
Block 9
0
1
40
48
Set 5 0 1 0 1 Block 10
Block 11
0
1
50
58
Set 6 0 1 1 0 Block 12
Block 13
0
1
60
68
Set 7 0 1 1 1 Block 14
Block 15
0
1
70
78
Set 8 1 0 0 0 Block 16
Block 17
0
1
80
88
Set 9 1 0 0 1 Block 18
Block 19
0
1
90
98
Set 10 1 0 1 0 Block 20
Block 21
0
1
A0
A8
Set 11 1 0 1 1 Block 22
Block 23
0
1
B0
B8
Set 12 1 1 0 0 Block 24
Block 25
0
1
C0
C8
Set 13 1 1 0 1 Block 26
Block 27
0
1
D0
D8
Block 26
Block 30
Here index bits and tag are interchanged. The logical interpretation also changes
Page 8
cache 1
Please refer class notes. Spring 2009
Page 9
Set 14 1 1 1 0 Block 28
Block 29
0
1
E0
E8
Set 15 1 1 1 1 Block 30
Block 31
0
1
F0
F8
cache 2
Cache memory
Portion of main memory is copied to the faster memory which is closer to processor.
Main memory of 64KB and cache of 1KB
CPU generates effective address (logical address)- say 16-bit
16- bit
Byte
Address
If block size (cache line) is 64B. Then log 2 64 bits are required to select one of 64 Bytes in
the block(cache line) 6-bits is the offset bits
10- bit Block
Address
Offset
Byte offset
With in block
OO OOOO
0th
BLOCK
OO 0000 0001
A5-0
.
.
11 1111
OO OOOO
1st
BLOCK
.
.
.
11 1111 1110
OO OOO1
.
.
.
OO OOO1
1st
OO OOO1
.
.
.
11 1111
.
.
.
Cache line
.
.
.
11 1111
.
.
.
.
.
.
1110
OO OOOO
1022
BLOCK
nd
11 1111 1111
OO OOOO
OO OOO1
14th
OO OOO1
.
.
.
11 1111
Cache line
.
.
.
11 1111
1111
OO OOOO
1023rd
BLOCK
.
.
.
OO OOOO
OO OOO1
15th
OO OOO1
.
.
.
11 1111
Cache line
.
.
.
11 1111
Page 10
cache 2
Fully associative cache (16-way set associative cache)
CPU address is interpreted as
10- bit Block Address
Offset
10-bit block size is termed as tag to identify the block having requested byte
10- bit
TAG
Offset
Any 16 blocks of main memory can be stored in cache.
Any block can be stored in any cache line
As TAG differentiate block ID , so the tag of each block is also stored in tag line (respective cache line)
Byte offset
10-bit Tag line Cache line number
With in cache line
OO00
OO OOOO
0th
There can be only one of the comparator that generates
OO OOO1
Cache line
.
HIT at most. - when the block having the requested byte
Comparator
.
Is mapped on that cache line
H
.
M
11 1111
If none of the comparators generate HIT It means
OO01
the block having requested byte is not read from the
OO OOOO
1st
Main memory To the cache
OO OOO1
Cache line
.
Comparator
.
Valid bit and dirty bit are not included in this example
H
.
11 1111
There are 16 cache lines where a block can be mapped
M
.
.
.
.
without any restriction. There are 16 ways a block be
.
.
.
.
Placed in cache- 16-way set associative cache.
.
.
.
.
1110
OO OOOO
Comparator
M
14th
OO OOO1
Cache line
.
.
.
11 1111
H
1111
Comparator
M
OO OOOO
15th
OO OOO1
Cache line
.
.
.
11 1111
Page 11
cache 2
Set associative cache (8-way set associative cache)
CPU address is interpreted as
1-bit index
Offset
9-bit Block address
In 10-bit block No, 9-bit tag and 1-bit index is used to identify the block having requested byte
9- bit TAG
1-bit index
Offset
Cache lines are placed in 21 groups. And one of these groups are selected by decoder
Index bits are used to select one of the 2 1 groups. It is implemented through decoder ( 2 k :1 )
Logical view of the Main memory (one LSB bit of 10-bit block No. is used as index bit)
Index bits (A6)
Index bits (A6)
O
1
Tag number
O 0000 0000
A15-7
OO OOOO
0th
BLOCK
O 0000 0001
OO OOO1
A5-0
.
11 1111
.
Group
.
0
.
OO OOOO
1st
OO OOO1
BLOCK
A5-0
.
11 1111
OO OOOO
OO OOO1
3th
OO OOO1
.
.
11 1111
.
.
.
BLOCK
.
.
11 1111
.
.
.
.
.
.
1 1111 1110
OO OOOO
1020th
OO OOO1
BLOCK
.
.
11 1111
1 1111 1111
O 0000 0001
OO OOOO
2nd
BLOCK
.
.
.
1 1111 1110
BLOCK number
O 0000 0000
A15-7
OO OOO1
BLOCK
.
.
11 1111
OO OOOO
1021std
BLOCK
1 1111 1111
OO OOOO
1022nd
.
Group
1.
.
OO OOO1
.
.
11 1111
OO OOOO
1023rd
BLOCK
OO OOO1
.
.
11 1111
1022nd
OO OOO1
BLOCK
.
.
11 1111
OO OOOO
1th
3th
BLOCK
BLOCK
1023rd
BLOCK
Page 12
OO OOO1
.
.
11 1111
cache 2
Block Numbers in group 0 (index bit-0)
0,2,4,6,..... ..... 1020,1022
Blocks of same index No. map to the same group of cache. Since there are 9-bit tag, 512 blocks go to one group
of 8-cachelines.
Set associative cache (4-way set associative cache)
CPU address is interpreted as
2-bit index
Offset
8-bit Block address
In 10-bit block No, 8-bit tag and 2-bit index is used to identify the block having requested byte
8- bit TAG
2-bit index
Offset
Cache lines are placed in 22 groups. And one of these groups are selected by decoder
Index bits are used to select one of the 2 2 groups. It is implemented through decoder ( 2 k :1 )
Logical view of the Main memory (two LSB bits of 10-bit block No. is used as index bit)
Tag number
O000 0000
Index bits (A7-6)
OO
0th
BLOCK
O000 0001
1111 1111
6-bit Byte offset
OO OOOO
4st
BLOCK
1020th
OO OOO1
BLOCK
.
.
11 1111
O1
OO OOOO
1st
5th
1021st
OO OOO1
BLOCK
BLOCK
BLOCK
.
.
11 1111
2nd
BLOCK
6th
BLOCK
1022nd
OO OOO1
BLOCK
.
.
11 1111
3rd
7th
BLOCK
BLOCK
1O
OO OOOO
11
OO OOOO
1023rd
BLOCK
OO OOO1
.
.
11 1111
Blocks of same index No. map to the same group of cache. Since there are 8-bit tag, 256 blocks go to one group
of 4-cache lines.
(If combination of block address bits are taken as index bits- ???)
Page 13
cache 2
O000 01
16th
BLOCK
1sth
17th
BLOCK
BLOCK
2nd
18th
BLOCK
BLOCK
3rd
19th
BLOCK
BLOCK
4th
BLOCK
20th
BLOCK
.
.
.
.
.
.
.
.
.
1o11
Ooo1
Oo10
Oo11
O1oo
11oo
11o1
111o
1111 11
6-bit Byte offset
0th
BLOCK
Oooo
1008th
BLOCK
1009th
BLOCK
1010th
BLOCK
1011th
BLOCK
1012th
OO OOOO
.
OO OOOO
.
OO OOOO
.
OO OOOO
.
OO OOOO
BLOCK
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11th
27th
BLOCK
BLOCK
12th
28th
BLOCK
BLOCK
13th
29th
BLOCK
BLOCK
14th
30th
BLOCK
BLOCK
1019th
BLOCK
1020th
BLOCK
1021st
BLOCK
1022nd
Page 14
BLOCK
OO OOOO
.
OO OOOO
.
OO OOOO
.
OO OOOO
cache 2
1111
15th
BLOCK
31st
BLOCK
1023th
OO OOOO
BLOCK
11 1111
Blocks of same index No. map to one group of cache. Since there are 6-bit tag, 64 blocks go to one group
of 1-cache line.
Cache Latency
Time to return requested data.
(assuming fully associative)
T1- time to access tag array
T2- time to perform tag comparison
T3- time to access cache data array
T4- time to return selected data or report miss
T1 +T2 and T3 a simultaneous action
So latency is T1+T2 +T4 OR T3+T4 which ever greater
So hit or miss both takes same time
Page 15
cache 3
Memory access hierarchy
CPU
Request
1
0
LEVEL I
8 HIT
MISS HIT
Update LEVEL 1
2
LEVEL II
MISS HIT
Update LEVEL 1I
6
3
LEVEL III
MISS HIT
LEVEL IV
HIT
At level n
Page 22