RFR: 8264136: Active processor count may be underreported

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8264136: Active processor count may be underreported

Jaroslav Bachorik-3
## Current situation

In cgroups environments the available CPU resources are described by the minimal guaranteed amount and maximal allowed amount (see eg. [this post](https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling#:~:text=CPU%20shares%20provide%20tasks%20in,total%20number%20of%20shares%20available.)).

The current `active_processor_count` computation makes the assumption that the minimal guaranteed amount of CPU resources translates to the number of available CPUs reported by the container. Unfortunately, this is not completely true and a container is free to use whatever CPUs are available leading to the actual CPU usage being higher than the reported number of available CPUs.

Just for the record, the algorithm is a bit more involved - it computes both values, the one based on the minimal guaranteed amount (if specified) as well as the one based on the maximal allowed amount (again, if specified) and then takes the lesser of the two. In reality, when both parts are set the minimal guaranteed amount will always be less or equal to the maximal allowed amount so, as a simplification, we can consider the minimal guaranteed amount to be the base for the available CPU count calculation if it is set.

## Problematic behavior

For systems with 'elastic' setup where the minimal guaranteed amount and maximal allowed amount are not equal this definition of available CPUs can lead to misconfiguration of anything relying on the reported number of cores - eg. number of GC threads, compiler thread or the fork-join pool size.

## Proposed fix

The proposed fix is to disregard the minimal guaranteed amount in the calculation when `PreferContainerQuotaForCPUCount` JVM flag is set to `true` (currently default). This would allow fallback to the original calculation based on the minimal guaranteed amount by specifying `-XX:-PreferContainerQuotaForCPUCount`.

-------------

Commit messages:
 - Initial attempt at fixing reported cpu count in cgroups

Changes: https://git.openjdk.java.net/jdk/pull/3177/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3177&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264136
  Stats: 52 lines in 2 files changed: 26 ins; 7 del; 19 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3177.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3177/head:pull/3177

PR: https://git.openjdk.java.net/jdk/pull/3177
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264136: Active processor count may be underreported

Severin Gehwolf-3
On Wed, 24 Mar 2021 17:08:45 GMT, Jaroslav Bachorik <[hidden email]> wrote:

> ## Current situation
>
> In cgroups environments the available CPU resources are described by the minimal guaranteed amount and maximal allowed amount (see eg. [this post](https://engineering.squarespace.com/blog/2017/understanding-linux-container-scheduling#:~:text=CPU%20shares%20provide%20tasks%20in,total%20number%20of%20shares%20available.)).
>
> The current `active_processor_count` computation makes the assumption that the minimal guaranteed amount of CPU resources translates to the number of available CPUs reported by the container. Unfortunately, this is not completely true and a container is free to use whatever CPUs are available leading to the actual CPU usage being higher than the reported number of available CPUs.
>
> Just for the record, the algorithm is a bit more involved - it computes both values, the one based on the minimal guaranteed amount (if specified) as well as the one based on the maximal allowed amount (again, if specified) and then takes the lesser of the two. In reality, when both parts are set the minimal guaranteed amount will always be less or equal to the maximal allowed amount so, as a simplification, we can consider the minimal guaranteed amount to be the base for the available CPU count calculation if it is set.
>
> ## Problematic behavior
>
> For systems with 'elastic' setup where the minimal guaranteed amount and maximal allowed amount are not equal this definition of available CPUs can lead to misconfiguration of anything relying on the reported number of cores - eg. number of GC threads, compiler thread or the fork-join pool size.
>
> ## Proposed fix
>
> The proposed fix is to disregard the minimal guaranteed amount in the calculation when `PreferContainerQuotaForCPUCount` JVM flag is set to `true` (currently default). This would allow fallback to the original calculation based on the minimal guaranteed amount by specifying `-XX:-PreferContainerQuotaForCPUCount`.

@jbachorik I've added a few comments and questions to the bug (with some background info): https://bugs.openjdk.java.net/browse/JDK-8264136?focusedCommentId=14409876&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-14409876

-------------

PR: https://git.openjdk.java.net/jdk/pull/3177