Professional Documents
Culture Documents
Software Systems
N=8
Address 0 1 2 3 4 5 6 7
Contents V[0] V[1] V[2] V[3] V[4] V[5] V[6] V[7]
Access Order 1 2 3 4 5 6 7 8
Stride k – Byte Addressable
memory and word
reference pattern length is 1 byte
i Address i Address
0 0000 0 0000
Stride 1
1 0001 1 0001
2 0002 2 0002 Stride 2
3 0003 3 0003
4 0004 4 0004
Address difference
5 0005 5 0005
Stride=----------------------
6 0006 Word Length 6 0006
7 0007 7 0007
8 0008 8 0008
9 0009 9 0009
10 000A 10 000A
11 000B 11 000B
12
7 000C 12 000C
Stride k – reference pattern
Byte Addressable
memory and word
length is 2 bytes
i Address i Address
0 0000 0 0000
Stride 1
1 0002 1 0002 Stride 2
2 0004 2 0004
3 0006 3 0006
4 0008 4 0008
Address difference
5 000A 5 000A
Stride=----------------------
6 000C Word Length 6 000C
7 000E 7 000E
8 0010 8 0010
9 0012 9 0012
10 0014 10 0014
11 0016 11 0016
12
8 0018 12 0018
Revisiting Locality of reference
int i, j, sum = 0; L1 W5
W6
for (i = 0; i < 4; i++)
W7
for (j = 0; j < 4; j++) W8
sum += a[i][j]; A[i][j] J=0 J=1 J=2 J=3 W9
return sum; i=0 W10
} W11
i=1
W12
i=2 W13
W14
I=3
W15
Example 2
int sumarraycols(int a[4][4])
{
int i, j, sum = 0;
for (j = 0; j < 4; j++)
for (i = 0; i < 4; i++)
sum += a[i][j];
return sum;
}
Assumption:
• The cache has a block size of 4 words each, 2 cache lines
• Word size 4 bytes.
• C stores arrays in row-major order
W0
Example 2(Contd..) W1
W2
int sum_array(int a[4][4])
W3
{ W4
L0
int i, j, sum = 0; L1 W5
Program 1: Program 2:
for (int i = 0; i < n; i++) { for (int i = 0; i < n; i++) {
z[i] = x[i] – y[i]; z[i] = x[i] – y[i];
}
z[i] = z[i] * z[i];
for (int i = 0; i < n; i++) {
}
z[i] = z[i] * z[i];
}
Today’s Session