You are on page 1of 3

Memory Hierarchy

The term memory hierarchy is used in the theory of computation when discussing performance
issues in computer architectural design, algorithm predictions, and the lower level programming
constructs such as involving locality of reference. A 'memory hierarchy' in computer storage
distinguishes each level in the 'hierarchy' by response time. Since response time, complexity, and
capacity are related[1], the levels may also be distinguished by the controlling technology.

The many trade-offs in designing for high performance will include the structure of the memory
hierarchy, i.e. the size and technology of each component. So the various components can be
viewed as forming a hierarchy of memories (m1,m2,...,mn) in which each member mi is in a sense
subordinate to the next highest member mi-1 of the hierarchy. To limit waiting by higher levels, a
lower level will respond by filling a buffer and then signaling to activate the transfer.

There are four major storage levels.[1]

1. Internal – Processor registers and cache.


2. Main – the system RAM and controller cards.
3. On-line mass storage – Secondary storage.
4. Off-line bulk storage – Tertiary and Off-line storage.

This is a most general memory hierarchy structuring. Many other structures are useful. For
example, a paging algorithm may be considered as a level for virtual memory when designing a
computer architecture.

Contents
[hide]

 1 Example use of the term


 2 Application of the concept
 3 What a memory hierarchy is not
 4 See also
 5 References

Application of the concept


The memory hierarchy in most computers is:

 Processor registers – fastest possible access (usually 1 CPU cycle), only hundreds of
bytes in size
 Level 1 (L1) cache – often accessed in just a few cycles, usually tens of kilobytes
 Level 2 (L2) cache – higher latency than L1 by 2× to 10×, often 512 KiB or more
 Level 3 (L3) cache – higher latency than L2, often 2048 KiB or more
 Main memory – may take hundreds of cycles, but can be multiple gigabytes. Access
times may not be uniform, in the case of a NUMA machine.
 Disk storage – millions of cycles latency if not cached, but very large
 Tertiary storage – several seconds latency, can be huge

Note that the hobbyist who reads "L1 cache" in the computer specifications sheet is reading
about the 'internal' memory hierarchy .

Most modern CPUs are so fast that for most program workloads, the bottleneck is the locality of
reference of memory accesses and the efficiency of the caching and memory transfer between
different levels of the hierarchy.[citation needed] As a result, the CPU spends much of its time idling,
waiting for memory I/O to complete. This is sometimes called the space cost, as a larger memory
object is more likely to overflow a small/fast level and require use of a larger/slower level.

Modern programming languages mainly assume two levels of memory, main memory and disk
storage, though in assembly language and inline assemblers in languages such as C, registers can
be directly accessed. Taking optimal advantage of the memory hierarchy requires the
cooperation of programmers, hardware, and compilers (as well as underlying support from the
operating system):

 Programmers are responsible for moving data between disk and memory through file
I/O.
 Hardware is responsible for moving data between memory and caches.
 Optimizing compilers are responsible for generating code that, when executed, will
cause the hardware to use caches and registers efficiently.

Many programmers assume one level of memory. This works fine until the application hits a
performance wall. Then the memory hierarchy will be assessed during code refactoring.

What a memory hierarchy is not


Although a file server is a virtual device on the computer, it was not a design consideration of the
computer architect. Network Area Storage (NAS) devices optimize their own memory hierarchy
(to optimize their physical design). On-line" (secondary storage) here refers to the "network"
from the CPU's point of view, which is "on" the computer "line"—the hard drive. "Off-line"
(tertiary storage) here refers to "infinite" latency (waiting for manual intervention). It is the
boundary of the realm of the 'memory hierarchy'.

Memory hierarchy refers to a CPU-centric latency (delay)—the primary criterion for designing a
placement in storage in a memory hierarchy—that fits the storage device into the design
considerations concerning the operators of the computer. It is used only incidentally in
operations, artifactually. Its primary use is in thinking about an abstract machines.
Network architect's use the term latency directly instead of memory hierarchy because they do
not design what will become on-line. A NAS may have this article in "computer data storage"
but the memory hierarchy of the computers which read and edit it do not care how.

Theory of Computation

The theory of computation or computer theory is the branch of computer science and
mathematics that deals with whether and how efficiently problems can be solved on a model of
computation, using an algorithm. The field is divided into two major branches: computability
theory and complexity theory, but both branches deal with formal models of computation.

In order to perform a rigorous study of computation, computer scientists work with a


mathematical abstraction of computers called a model of computation. There are several models
in use, but the most commonly examined is the Turing machine. Computer scientists study the
Turing machine because it is simple to formulate, can be analyzed and used to prove results, and
because it represents what many consider the most powerful possible "reasonable" model of
computation. It might seem that the potentially infinite memory capacity is an unrealizable
attribute, but any decidable problem solved by a Turing machine will always require only a finite
amount of memory. So in principle, any problem that can be solved (decided) by a Turing
machine can be solved by a computer that has a bounded amount of memory.

You might also like