Purpose

:
This is a document designed to (hopefully) answer some frequently asked questions about the NUMA architecture.

Frequently Asked Questions:
1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. What does NUMA stand for? OK, So what does Non-Uniform Memory Access really mean? What is the difference between NUMA and SMP? What is the difference between NUMA and ccNUMA? What is a node? What is meant by local and remote memory? What do you mean by distance? Could you give a real-world analogy of the NUMA architecture to help understand all these terms? Why should I use NUMA? What are the benefits of NUMA? What are the peculiarities of NUMA? What are some alternatives to NUMA? Could you give a brief description of the main NUMA architecture implementations?

Frequently Given Answers:
1. What does NUMA stand for? NUMA stands for Non-Uniform Memory Access. [Top] 2. OK, So what does Non-Uniform Memory Access really mean to me? Non-Uniform Memory Access means that it will take longer to access some regions of memory than others. This is due to the fact that some regions of memory are on physically different busses from other regions. For a more visual description, please refer to the section on NUMA architeture implementations. Also, see the real-world analogy for the NUMA architecture. This can result in some programs that are not NUMA-aware performing poorly. It also introduces the concept of local and remote memory. [Top] 3. What is the difference between NUMA and SMP? The NUMA architecture was designed to surpass the scalability limits of the SMP architecture. With SMP, which stands for Symmetric Multi-Processing, all memory access are posted to the same shared memory bus. This works fine for a relatively small number of CPUs, but the problem with the shared bus appears when you have dozens, even hundreds, of CPUs competing for access to the shared memory bus. NUMA alleviates these bottlenecks by limiting the number of CPUs on any one memory bus, and connecting the various nodes by means of a high speed interconnect. [Top] 4. What is the difference between NUMA and ccNUMA? The difference is almost nonexistent at this point. ccNUMA stands for Cache-Coherent NUMA, but NUMA and ccNUMA have really come to be synonymous. The applications for non-cache coherent NUMA machines are almost non-existent, and they are a real pain to program for, so unless specifically stated otherwise, NUMA actually means ccNUMA. [Top]

If you try and buy more. physically on the same bus as the memory. What is a node? One of the problems with describing NUMA is that there are many different ways to implement this technology. I'll admit. the memory bus is under heavy contention. [Top] 6. we could say that a particular range of memory is 2 hops (busses) from CPUs 0. [Top] 7. remote.5. but hops is a popular metric. These terms all mean essentially the same thing that they do when used in a networking context (mostly because a NUMA machine is not all that different from a very tightly coupled cluster). Local and remote memory can also be used in reference to things other than the currently running process. That said. etc). In many cases. So when used to describe a node. Any memory that does not belong to the node on which the process is currently running is then.. This has led to a plethora of "definintions" for node. the less technical definition should be sufficient. but often the technical definition is more correct. Thus.. If you have a better analogy. Also. CPUs 0. A bit of a strange example. Some of the ingredients you may have in your cabinet(=local memory). I/O busses. along with latency and bandwidth. Some architectures do not have memory. At that number of CPUs. [Top] 8. but I think it works. since this reduces your time and effort in making the cake.) [Top] 9. CPUs. but memory on the node containing the CPU handling the interrupt is still called local memory. and have to ask a neighbor for(=remote memory). The metric used to determine a distance varies. A more common definition is: a block of memory and the CPUs. Could you give a real-world analogy of the NUMA architecture to help understand all these terms? Imagine that you are baking a cake. but you have no room to store it. You also have to remember that your cabinets can only hold a fixed amount of ingredients(=physical nodal memory). I'm all ears! . The general idea is to try and have as many of the ingredients in your own cabinet as possible. but some of the ingredients you might not have. you may have to ask your neighbor to keep it in his/her cabinet until you need it(=local memory full. This is accomplished by having several memory busses and only having a small . What is meant by local and remote memory? The terms local memory and remote memory are typically used in reference to a currently running process. Why should I use NUMA? What are the benefits of NUMA? The main benefit of NUMA is. the memory it is reading or writing would be called remote if it were located on another node (ie: node 0). I/O. For example if there was a disk (attatched to node 1) doing a DMA. What do you mean by distance? NUMA-based architectures necessarily introduce a notion of distance between system components (ie: CPUs.3 and SCSI Controller 0. you could use local and remote memory in terms of a disk. and I/O all on the same physical bus. scalability. so the second definition does not truly hold. so allocate pages remotely). It is extremely difficult to scale SMP past 8-12 CPUs. memory. as mentioned above. etc. NUMA is one way of reducing the number of CPUs competing for access to a shared memory bus. A fairly technically correct and also fairly ugly definition of a node is: a region of memory in which every byte has the same distance from each CPU. When in interrupt context.3 and the SCSI Controller are a part of the same node. by that definition. local memory is typically defined to be the memory that is on the same node as the CPU currently running the process. there technically is no currently executing process. You have a group of ingredients (=memory pages) that you need to complete the recipe(=process).

additions. so we'll leave the discussion of other methods to other FAQs. since all the memory is actually on the same bus. A setup like this would be like a regular NUMA machine where the line between local and remote memory is blurred. [Top] 12. the latency and bandwidth on the internodal links are likely to be much worse. Click here for descriptions and diagrams of the above system types. etc. Could you give a brief description of the main NUMA architecture implementations? Sure! The main types are IBM NUMA-Q.number of CPUs on each of those busses. [Top] Last updated: 1/04/02 Any problems. splitting memory up and (possibly arbitrarily) assigning it to groups of CPUs can give some performance benefits similar to actual NUMA. it's request will tend to beat out a request from a remote CPU z. What are some alternatives to NUMA? Also. Compaq Wildfire. The PowerPC Regatta system is an example of this. For example. and SGI MIPS64. The only real difference is the nodal latency. This is because if CPU x in the node requests a lock already held by another CPU y in the node. Due to this. You can achieve some NUMA-like performance by using clusters as well. the CPUs on a particular node will have a higher bandwidth and/or a lower latency to access the memory and CPUs on that same node. What are the peculiarities of NUMA? CPU and/or node caches can result in NUMA effects. you can see things like lock starvation under high contention.. but this is a NUMA FAQ. There are other ways of building massively multiprocessor machines. A cluster is very similar to a NUMA machine. [Top] 11. please send email to this page's maintainer . In a clustered environment. where each individual machine in the cluster becomes a node in our virtual NUMA machine. and also a standard SMP system for comparison. . [Top] 10.

Sign up to vote on this title
UsefulNot useful