You are on page 1of 19

Memory Interleaving

&
performation evaluation

Submitted by:

Sumit Chansoliya---804167
Sumit Kumar ---804168
Sumit Singh ----804169
Swapnasshu ----804175
Introduction.
 Interleaving is an advanced technique used by high-end
motherboards/chipsets to improve memory performance.

 Memory interleaving increases bandwidth by allowing simultaneous


access to more than one chunk of memory. This improves performance
because the processor can transfer more information to/from memory in
the same amount of time, and helps alleviate the processor-memory
bottleneck that is a major limiting factor in overall performance.

 Interleaving works by dividing the system memory into multiple


Cont…
 Each block of memory is accessed using
different sets of control lines, which are
merged together on the memory bus.
 When a read or write is begun to one block, a
read or write to other blocks can be overlapped
with the first one. The more blocks, the more
that overlapping can be done.
Cont…
 As an analogy, consider eating a plate of food
with a fork. Two-way interleaving would mean
dividing the food onto two plates and eating
with both hands, using two forks. (Four-way
interleaving would require two more hands. :^)
) Remember that here the processor is doing
the "eating" and it is much faster than the forks
(memory) "feeding" it (unlike a person whose
hands are generally faster.)
Conti..
 In order to get the best performance from this type of
memory system, consecutive memory addresses are
spread over the different blocks of memory. In other
words, if you have 4 blocks of interleaved memory, the
system doesn't fill the first block, and then the second
and so on. It uses all 4 blocks, spreading the memory
around so that the interleaving can be exploited.
 Interleaving is an advanced technique that is not
generally supported by most PC motherboards, most
likely due to cost. It is most helpful on high-end
systems, especially servers, that have to process a great
deal of information quickly. The Intel Orion chipset is
 However the memory space is split up among the
banks, as long as requests are sent to two different
banks they can be handled simultaneously. The
processor can request a transfer from location on
one cycle, and on the next cycle request information
from location.
Conti..
 . If and are in different banks, the information will
be returned on successive cycles. Note that the
latency of the request, i.e. the number of cycles a
processor has to wait before receiving the contents
of location , is not affected. However the
bandwidth is improved; if there are enough banks
the memory system can potentially send
information at a rate of one word per processor
cycle, regardless of what the memory cycle time is
Segmentation.
 Segmentation is a technique that involves having all the
programs code and data resident in RAM at run-time. For a
given system, this limits the number of programs that can be
run simultaneously. Segment sizes can differ from program to
program, which means the operating system must employ
considerable time dedicated to managing the memory system.
 The most common problem associated with segmented
memory is fragmentation. This occurs when running
programs release their segmented space, but this space is
spread out over the entire address range. Thus, there could be
1Mb of free RAM, but is consists of many small blocks
scattered over a 4Mb address range.
Conti..
 A program requiring 500k to run could not be loaded,
as segmentation requires the memory to be contiguous
(one large block). In this instance, the program could
not run even though there is sufficient memory.

 To overcome this drawback, the operating system


employs a technique called compaction, which
involves relocating existing segments so as to combine
all the small free blocks into larger blocks, enabling
waiting programs to be run.
Performance.
 With large memories, many memory chips must be assembled together to
make one memory system. One issue to be addressed in interleaving.

 Let that the the memory is broken up into m physically-separate


components called modules,
 with m chosen so that the access time of a module is m bus cycles.

 To illustrate the role of interleaving, suppose we wish to set up a memory


system of 256M words, consisting of four modules of 64M words each.
Denote our memory modules by M0, M1, M2 and M3. Since 256M
 is equal to 2^28, our system bus would have address lines A0-A27. Since
64M is equal to 2^26, each memory module would have address pins A0-
A25.
Two types of interleaving are there:-

1. High order interleaving


2. Low order interleaving
HIGH ORDER INTERLEAVING:-Arguably the most “natural” arrangement
would be to use bus lines A26-A27 as the module determiner. In other words, we
would feed these two lines into a 2-to-4 decoder, the outputs of which would be
connected to the Chip Select pins of the four memory modules.

If we were to do this, the physical placement of our system addresses would be


as follows:
ADRESS MODULE
0-64M 0
64-128M 1
128-192M 2
192-256M 3
Note that this means consecutive addresses are stored within the same module,
except at the boundary. The above arrangement is called high-order interleaving,
because it uses the high-order, i.e. most significant,bits of the address to
determine which module the word is stored in.
Low-Order Interleaving:-

An alternative would be to use the low bits for that purpose. In


our example here, for instance, this would entail feeding bus lines A0-A1
into the decoder, with bus lines A2-A27 being tied to the address pins of
the memory modules. This would mean the following storage pattern:

Address Module
0 0
1 1
2 2
3 3
4 0
5 1
etc 2

In other words, consecutive addresses are stored in consecutive


modules, with the understanding that this is mod 4, i.e. we wrap back to
M0 after M3.
The Role of Interleaving in Shared-Memory Multiprocessors:-

High-order interleaving is useful in shared-memory multiprocessor


systems. Here the goal is to minimize the number of times two or more
processors need to use the same module at the same time, a situation which
causes delay while processors wait for each other. If the system is
configured for high-order interleaving,
we can write our application software in a way to minimize such conflicts.
In matrix applications, for instance, we can partition the matrix into blocks,
and have different processors work on different blocks.
In image processing applications, we can have different processors work on
different parts of the image. Such partitioning almost never works perfectly
—e.g. computation for one part of an image may need information from
another part—but if we are careful we can get good results.
Refinements to Low-Order Interleaving:-
Address Skewing:-
One refinement to low-order interleaving involves skewing of the distribution of
addresses to the memory modules. Let’s again use the case of four modules as an
example, with the following address assignment pattern:

Address Module
0 0
1 1
2 2
3 3
4 1
5 2
6 3
7 0
8 2
9 3
10 0
etc 1
Now if we compare this table with the upper table we can see that in that table
each group of four address are multiple of 4.The module orders were 0,1,2,3.

REFINEMENTS TO LOW-ORDER INTERLEAVING:-

Address Group Module order


0,1,2,3 0,1,2,3
4,5,6,7 0,1,2,3
8,9,10,11 0,1,2,3
12,13,14,15 0,1,2,3
etc 0,1,2,3
But in the skewed arrangement, each such group is
shifted rightward from the previous one:

Address Group Module Order


0,1,2,3 0,1,2,3
4,5,6,7 1,2,3,0
8,9,10,11 2,3,0,1
12,13,14,15 3,0,1,2
etc 1,2,3,0
Superinterleaving:-Up to this point, we have assumed that the number
of memory modules is equal to the access time, in bus
cycles, of one module. The concept of superinterleaving relaxes that
assumption.
For instance, in our examples above with m = 4, suppose we set m = 8 but
the module access time is still 4 bus cycles.
Because of the latter constraint, we cannot ever achieve a speed of any
greater than one memory access per cycle. However, we can reduce the
number of conflicts.
For example, with a stride of 4, we could not get any speedup at all in a
system for which m = 4, since all requested words would be in the same
module. But with m = 8, we have fewer such clashes, and thus can
keep two modules busy at the same time, doubling effective access speed.
Summary…
•Multi-threaded programming is hard
– Existing shared-memory programming model exposes too many
legal interleavings to the runtime
– Most interleavings remain untested in production code

•Interleaving constrained shared-memory multiprocessor


– Avoids untested (rare) interleavings to avoid concurrency bugs

•Predecessor Set interleaving constraints


– 15/17 concurrency bugs are avoidable
– Acceptable performance and space overhead
y ou
h a n k
T

You might also like