Quantcast

[10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[10] RFR (S) 8175813: PPC64: "mbind: Invalid argument" when -XX:+UseNUMA is used

Gustavo Romero
Hi,

Could the following webrev be reviewed please?

It improves the numa node detection when non-consecutive or memory-less nodes
exist in the system.

webrev: http://cr.openjdk.java.net/~gromero/8175813/v2/
bug   : https://bugs.openjdk.java.net/browse/JDK-8175813

Currently, although no problem exists when the JVM detects numa nodes that are
consecutive and have memory, for example in a numa topology like:

available: 2 nodes (0-1)
node 0 cpus: 0 8 16 24 32
node 0 size: 65258 MB
node 0 free: 34 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 65320 MB
node 1 free: 150 MB
node distances:
node   0   1
  0:  10  20
  1:  20  10,

it fails on detecting numa nodes to be used in the Parallel GC in a numa
topology like:

available: 4 nodes (0-1,16-17)
node 0 cpus: 0 8 16 24 32
node 0 size: 130706 MB
node 0 free: 7729 MB
node 1 cpus: 40 48 56 64 72
node 1 size: 0 MB
node 1 free: 0 MB
node 16 cpus: 80 88 96 104 112
node 16 size: 130630 MB
node 16 free: 5282 MB
node 17 cpus: 120 128 136 144 152
node 17 size: 0 MB
node 17 free: 0 MB
node distances:
node   0   1  16  17
  0:  10  20  40  40
  1:  20  10  40  40
 16:  40  40  10  20
 17:  40  40  20  10,

where node 16 is not consecutive in relation to 1 and also nodes 1 and 17 have
no memory.

If a topology like that exists, os::numa_make_local() will receive a local group
id as a hint that is not available in the system to be bound (it will receive
all nodes from 0 to 17), causing a proliferation of "mbind: Invalid argument"
messages:

http://cr.openjdk.java.net/~gromero/logs/jdk10_pristine.log

That change improves the detection by making the JVM numa API aware of the
existence of numa nodes that are non-consecutive from 0 to the highest node
number and also of nodes that might be memory-less nodes, i.e. that might not
be, in libnuma terms, a configured node. Hence just the configured nodes will
be available:

http://cr.openjdk.java.net/~gromero/logs/jdk10_numa_patched.log

The change has no effect on numa topologies were the problem does not occur,
i.e. no change in the number of nodes and no change in the cpu to node map. On
numa topologies where memory-less nodes exist (like in the last example above),
cpus from a memory-less node won't be able to bind locally so they are mapped
to the closest node, otherwise they would be not associate to any node and
MutableNUMASpace::cas_allocate() would pick a node randomly, compromising the
performance.

I found no regressions on x64 for the following numa topology:

available: 2 nodes (0-1)
node 0 cpus: 0 1 2 3 8 9 10 11
node 0 size: 24102 MB
node 0 free: 19806 MB
node 1 cpus: 4 5 6 7 12 13 14 15
node 1 size: 24190 MB
node 1 free: 21951 MB
node distances:
node   0   1
  0:  10  21
  1:  21  10

I understand that fixing the current numa detection is a prerequisite to enable
UseNUMA by the default [1] and to extend the numa-aware allocation to the G1 GC [2].

Thank you.


Best regards,
Gustavo

[1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable NUMA Mode by Default When Appropriate)
[2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC: NUMA-Aware Allocation)

Loading...