Kernel build configuration, help with maximum numba nodes

damgar · 01-26-2010, 10:19 PM

I'm building an i7 based machine and I've been looking at kernel optimizations for it. One kernel option that I'm sketchy on and haven't been able to find a good resource for is for MAXIMUM NUMA NODES. The default is 6, but I'm not sure if this is best for my particular hardware or not and I'm not exactly sure what this is all about.

Any help is appreciate.

cladisch · 01-28-2010, 08:10 AM

Do you actually have a NUMA system? I.e., does your mainboard have more than one processor socket?

damgar · 01-28-2010, 08:29 AM

As best I can figure, both the i7 920 processor and the x58 chipset both support numa. I could be wrong. It's not the easiest topic to find laymen information for.

cladisch · 01-29-2010, 04:05 AM

Quote:

As best I can figure, both the i7 920 processor and the x58 chipset both support numa.

Yes, both chips can be used in a NUMA system.
However, if your mainboard has only one processor (socket), then you have only one memory controller, and you do not have a NUMA system.

Quote:

It's not the easiest topic to find laymen information for.

http://en.wikipedia.org/wiki/Non-Uniform_Memory_Access

damgar · 01-29-2010, 08:08 AM

This is what is confusing me since the memory controller is physically on the cpu with the i7 and with QPI replacing the front side bus system , then a numa enabled processor, with a numa enabled chipset SHOULD be all that's required for NUMA?

Quote:

Coherency Leaps Forward at Intel

CSI is a switched fabric and a natural fit for cache coherent non-uniform memory architectures (ccNUMA). However, simply recycling Intel’s existing MESI protocol and grafting it onto a ccNUMA system is far from efficient. The MESI protocol complements Intel’s older bus-based architecture and elegantly enforces coherency. But in a ccNUMA system, the MESI protocol would send many redundant messages between different nodes, often with unnecessarily high latency. In particular, when a processor requests a cache line that is stored in multiple locations, every location might respond with the data. However, the requesting processor only needs a single copy of the data, so the system is wasting a bit of bandwidth.

Intel's solution to this issue is rather elegant. They adapted the standard MESI protocol to include an additional state, the Forwarding (F) state, and changed the role of the Shared (S) state. In the MESIF protocol, only a single instance of a cache line may be in the F state and that instance is the only one that may be duplicated [3]. Other caches may hold the data, but it will be in the shared state and cannot be copied. In other words, the cache line in the F state is used to respond to any read requests, while the S state cache lines are now silent. This makes the line in the F state a first amongst equals, when responding to snoop requests. By designating a single cache line to respond to requests, coherency traffic is substantially reduced when multiple copies of the data exist.

cladisch · 01-29-2010, 08:27 AM

Quote:

a numa enabled processor, with a numa enabled chipset SHOULD be all that's required for NUMA?

No, you'd need at least two memory controllers (i.e., processors).

Consider this system:

Code:

+----------+  +-------+  +-------+  +----------+
| memory 1 +--+ CPU 1 +--+ CPU 2 +--+ memory 2 |
+----------+  +-------+  +-------+  +----------+

Accesses from CPU1 to memory1 are fast, as are accesses from CPU2 to memory2.
Accesses from CPU1 to memory2 are slower, as are accesses from CPU2 to memory1.
This is what is meant by "non-uniform".

If you have only one processor, all memory accesses go through the same memory controller and have the same speed, so you do not have NUMA in this case.

On systems where the memory controller is integreated in each CPU, the number of NUMA nodes is the same as the number of processors.
The maximum number of NUMA nodes is a separate configuration option because in other systems, multiple CPUs can share a memory controller.

damgar · 01-29-2010, 08:50 AM

Thank you for taking the time to post that. I've read about this multiple times, and just kept missing the basic picture. Only after you put it that simply, and then reading the first line from wikipedia did it actually sink into my thick skull what it was really all about.

In the end It doesn't seem to have been a problem FINDING the information, but the PERSON who found the information making use of it.

Thanks