According to numactl, this dual CPU Opteron box is UMA rather than the expected NUMA:
$ numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 65534 MB
node 0 free: 381 MB
node distances:
node 0
0: 10
I think it should be NUMA because there are four 4-core CPUs*. As I understand it, each CPU has its own memory channel; if a CPU needs to access memory in a non-local node, it must use hypertransport, which costs more time than accessing memory in the local node. AMD explains it here.
The motherboard has 16 RAM slots, 8 for each CPU. All 16 are populated with 4GB each for a total of 64GB. Some more particulars:
- Processors: 2 x AMD 6128 ("Magny-Cours")
- Mobo: Supermicro H8DG6/i(-F)
- BIOS: AMI v02.68 -- mobo/BIOS manual (pdf)
- Linux: 2.6.32
- OS: Debian "wheezy"
- BIOS memory config:
- bank interleaving: auto
- node interleaving: auto
- channel interleaving: auto
- CS sparing: disabled
- bank swizzle mode: enabled
Why is numactl reporting that this box is UMA?
*There are two CPU dies per package, so the motherboard only has two CPU sockets.
The BIOS is hiding the NUMA reality behind the
node interleaving
setting. Setting that to Disabled will give you a true NUMA system as far as the OS is concerned. Not many systems really use NUMA effectively, which is why motherboard manufacturers default to make everything equally slow rather than, let the OS figure out what needs fast and slow access.