RFR: 8226575: OperatingSystemMXBean should be made container aware

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8226575: OperatingSystemMXBean should be made container aware

Andrew Azores
Hi all,

Please review this change that addresses JDK-8226575 according to a
previous discussion on this list [0][1]. The webrev has been kindly
uploaded on my behalf by Severin Gehwolf, since I am not an author.

The initial problem was that the
com.sun.management.OperatingSystemMXBean was inconsistent about its
awareness of applicable container limits. On the one hand, the
(inherited) getAvailableProcessors() return value is consistent with
the container-aware Runtime.availableProcessors(). But on the other
hand, c.s.m.OperatingSystemMXBean's getTotalPhysicalMemorySize(),
getFreePhysicalMemorySize(), getTotalSwapSpaceSize(), and
getFreeSwapSpaceSize() returned values reflecting the physical host
machine with no awareness of container limits. getSystemCpuLoad() has
also been updated to use cgroup-accounted load calculation.

The fix applied is to use the JDK internal Metrics API to determine
container memory limits and CPU accounting and use these values
instead, if available, otherwise falling back on the pre-existing
native implementations.

bug:
https://bugs.openjdk.java.net/browse/JDK-8226575

webrev:
http://cr.openjdk.java.net/~sgehwolf/webrevs/aazores/JDK-8226575/01/webrev/

[0]
http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-June/028439.html
[1]
http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028709.html

Thanks,

--
Andrew Azores
Software Engineer, OpenJDK Team
Red Hat

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8226575: OperatingSystemMXBean should be made container aware

Severin Gehwolf
Hi Andrew,

On Wed, 2019-07-17 at 14:45 -0400, Andrew Azores wrote:

> Hi all,
>
> Please review this change that addresses JDK-8226575 according to a
> previous discussion on this list [0][1]. The webrev has been kindly
> uploaded on my behalf by Severin Gehwolf, since I am not an author.
>
> The initial problem was that the
> com.sun.management.OperatingSystemMXBean was inconsistent about its
> awareness of applicable container limits. On the one hand, the
> (inherited) getAvailableProcessors() return value is consistent with
> the container-aware Runtime.availableProcessors(). But on the other
> hand, c.s.m.OperatingSystemMXBean's getTotalPhysicalMemorySize(),
> getFreePhysicalMemorySize(), getTotalSwapSpaceSize(), and
> getFreeSwapSpaceSize() returned values reflecting the physical host
> machine with no awareness of container limits. getSystemCpuLoad() has
> also been updated to use cgroup-accounted load calculation.
>
> The fix applied is to use the JDK internal Metrics API to determine
> container memory limits and CPU accounting and use these values
> instead, if available, otherwise falling back on the pre-existing
> native implementations.
>
> bug:
> https://bugs.openjdk.java.net/browse/JDK-8226575
>
> webrev:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/aazores/JDK-8226575/01/webrev/

src/jdk.management/share/classes/module-info.java

I don't think you need to explicitly require java.base. Everything
depends on java.base. Is there a specific reason why that was needed?

I've noticed that this test fails with your patch:

FAILED: com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java

Fails with:
java.lang.RuntimeException: getSystemCpuLoad() returns 173995.48501979 which is not in the [0.0,1.0] interval
        at GetSystemCpuLoad.main(GetSystemCpuLoad.java:43)

This makes me wonder what the effect of this patch is on Linux
*outside* a container. Have you verified that Metric values correspond
to what whas previously returned via native methods? Could you verify,
please and tell us the before/after values? Thanks!

I wonder if we should split the OperatingSystemMXBean tests into it's
own (and not piggy-back on
TestCPUAwareness.java/TestMemoryAwareness.java). Have you considered
that?

Thanks,
Severin

> [0]
> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-June/028439.html
> [1]
> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028709.html
>
> Thanks,
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8226575: OperatingSystemMXBean should be made container aware

Andrew Azores
Hi Severin,

On Thu, 2019-07-18 at 18:37 +0200, Severin Gehwolf wrote:
> Hi Andrew,
>
> src/jdk.management/share/classes/module-info.java
>
> I don't think you need to explicitly require java.base. Everything
> depends on java.base. Is there a specific reason why that was needed?
>

Ah, you're right that that's not needed. I figured java.base should be
implicitly depended upon, but the first run through failed due to
module visibility. The actual fix needed was the other module-info
change in the patch, ie:

+    exports jdk.internal.platform to
+        jdk.management;

in the java.base module-info. I've gone ahead and removed the
unnecessary addition.

> I've noticed that this test fails with your patch:
>
> FAILED:
> com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java
>
> Fails with:
> java.lang.RuntimeException: getSystemCpuLoad() returns
> 173995.48501979 which is not in the [0.0,1.0] interval
> at GetSystemCpuLoad.main(GetSystemCpuLoad.java:43)

What did you run to get this test to execute? I hadn't hit that failure
in running `make test TEST=tier1`, but I can verify the failure if I
run specifically that class, so I'm unsure what larger suite or tier
it's included in.

I'll need to go back and rethink how this system load calculation can
be done.

>
> This makes me wonder what the effect of this patch is on Linux
> *outside* a container. Have you verified that Metric values
> correspond
> to what whas previously returned via native methods? Could you
> verify,
> please and tell us the before/after values? Thanks!

Other than the system load, the other memory-related values appear to
be correct both on bare metal and in Docker. Manually verifying the
tests in com/sun/management/OperatingSystemMXBean with the "trace"
argument and the patch applied (system load change removed), the values
printed match my host machine's /proc/meminfo. Running
jtreg:test/hotspot/jtreg/containers/docker tests also shows that the
Docker tests succeed and the reported values within a container are as
expected as well.

I can't see that there are any tests to verify the Metrics
implementation itself. Am I just missing them or have none been
written?

>
> I wonder if we should split the OperatingSystemMXBean tests into it's
> own (and not piggy-back on
> TestCPUAwareness.java/TestMemoryAwareness.java). Have you considered
> that?
>

Sure, I could do that for the next review round.

Thanks,
--
Andrew Azores
Software Engineer, OpenJDK Team
Red Hat

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8226575: OperatingSystemMXBean should be made container aware

David Holmes
In reply to this post by Andrew Azores
Hi Andrew,

Sorry - this whole discussion slipped past me while I was traveling in
the US in late June (and have then been on vacation). As I just wrote in
the bug report:

There are two OperatingSystemMXBean interfaces:
- java.lang.management.OperatingSystemMXBean
- com.sun.management.OperatingSystemMXBean (a subtype of the former)

What the methods of these two interface report depends on exactly how
the method has been specified. If it refers to something "available to
the JVM (or current process)" then it needs to be container aware. If it
refers to the "whole system" or something "physical", then that should
report information for the whole system.

Arguably some of the whole-system/physical APIs are misguided and
reflect a history where containers and virtualized environments were not
envisaged. But changing them is a compatibility issue and so it may be
better to add new methods that report more interesting/relevant
information for the process's environment. Either way a CSR request will
need to be filed and the changes examined from a compatibility perspective.

Some of the APIs may not be "well formed" in the first place - for
example getSystemLoadAverage():

"Returns the system load average for the last minute. The system load
average is the sum of the number of runnable entities queued to the
available processors and the number of runnable entities running on the
available processors averaged over a period of time. "

This refers to the "system" but specifies the calculation is based on
"available processors" - which is a contradiction as "available" !=
"system".

So I think we need to take a step back at the moment and look at exactly
which methods can be container-aware based on their current
specification, and which should be container-aware but can't be because
of their current specification, and so what new methods may be needed.

David
-----

On 18/07/2019 4:45 am, Andrew Azores wrote:

> Hi all,
>
> Please review this change that addresses JDK-8226575 according to a
> previous discussion on this list [0][1]. The webrev has been kindly
> uploaded on my behalf by Severin Gehwolf, since I am not an author.
>
> The initial problem was that the
> com.sun.management.OperatingSystemMXBean was inconsistent about its
> awareness of applicable container limits. On the one hand, the
> (inherited) getAvailableProcessors() return value is consistent with
> the container-aware Runtime.availableProcessors(). But on the other
> hand, c.s.m.OperatingSystemMXBean's getTotalPhysicalMemorySize(),
> getFreePhysicalMemorySize(), getTotalSwapSpaceSize(), and
> getFreeSwapSpaceSize() returned values reflecting the physical host
> machine with no awareness of container limits. getSystemCpuLoad() has
> also been updated to use cgroup-accounted load calculation.
>
> The fix applied is to use the JDK internal Metrics API to determine
> container memory limits and CPU accounting and use these values
> instead, if available, otherwise falling back on the pre-existing
> native implementations.
>
> bug:
> https://bugs.openjdk.java.net/browse/JDK-8226575
>
> webrev:
> http://cr.openjdk.java.net/~sgehwolf/webrevs/aazores/JDK-8226575/01/webrev/
>
> [0]
> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-June/028439.html
> [1]
> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028709.html
>
> Thanks,
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8226575: OperatingSystemMXBean should be made container aware

Severin Gehwolf
In reply to this post by Andrew Azores
Hi Andrew,

Note that we should first get the CSR and - what the CSR should entail
- in order before we can pursue this further. I'll help you with that.
More below.

On Thu, 2019-07-18 at 16:28 -0400, Andrew Azores wrote:

> Hi Severin,
>
> On Thu, 2019-07-18 at 18:37 +0200, Severin Gehwolf wrote:
> > Hi Andrew,
> >
> > src/jdk.management/share/classes/module-info.java
> >
> > I don't think you need to explicitly require java.base. Everything
> > depends on java.base. Is there a specific reason why that was needed?
> >
>
> Ah, you're right that that's not needed. I figured java.base should be
> implicitly depended upon, but the first run through failed due to
> module visibility. The actual fix needed was the other module-info
> change in the patch, ie:
>
> +    exports jdk.internal.platform to
> +        jdk.management;
>
> in the java.base module-info. I've gone ahead and removed the
> unnecessary addition.

OK, cool.

> > I've noticed that this test fails with your patch:
> >
> > FAILED:
> > com/sun/management/OperatingSystemMXBean/GetSystemCpuLoad.java
> >
> > Fails with:
> > java.lang.RuntimeException: getSystemCpuLoad() returns
> > 173995.48501979 which is not in the [0.0,1.0] interval
> > at GetSystemCpuLoad.main(GetSystemCpuLoad.java:43)
>
> What did you run to get this test to execute? I hadn't hit that failure
> in running `make test TEST=tier1`, but I can verify the failure if I
> run specifically that class, so I'm unsure what larger suite or tier
> it's included in.

I've run:
$ jtreg -verbose:summary -jdk:./build/linux-x86_64-server-release/images/jdk test/jdk/com/sun/management/OperatingSystemMXBean test/jdk/com/sun/management/UnixOperatingSystemMXBean test/jdk/java/lang/management/OperatingSystemMXBean

> I'll need to go back and rethink how this system load calculation can
> be done.

Sure.

> > This makes me wonder what the effect of this patch is on Linux
> > *outside* a container. Have you verified that Metric values
> > correspond
> > to what whas previously returned via native methods? Could you
> > verify,
> > please and tell us the before/after values? Thanks!
>
> Other than the system load, the other memory-related values appear to
> be correct both on bare metal and in Docker. Manually verifying the
> tests in com/sun/management/OperatingSystemMXBean with the "trace"
> argument and the patch applied (system load change removed), the values
> printed match my host machine's /proc/meminfo. Running
> jtreg:test/hotspot/jtreg/containers/docker tests also shows that the
> Docker tests succeed and the reported values within a container are as
> expected as well.

OK.

> I can't see that there are any tests to verify the Metrics
> implementation itself. Am I just missing them or have none been
> written?

There is MetricsTester.java in the test library which is beeing used by
TestCgroupMetrics.java and TestSystemMetrics.java

> > I wonder if we should split the OperatingSystemMXBean tests into it's
> > own (and not piggy-back on
> > TestCPUAwareness.java/TestMemoryAwareness.java). Have you considered
> > that?
> >
>
> Sure, I could do that for the next review round.

Great, thanks!

Cheers,
Severin

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8226575: OperatingSystemMXBean should be made container aware

Mandy Chung
In reply to this post by David Holmes


On 7/18/19 3:49 PM, David Holmes wrote:

> Hi Andrew,
>
> Sorry - this whole discussion slipped past me while I was traveling in
> the US in late June (and have then been on vacation). As I just wrote
> in the bug report:
>
> There are two OperatingSystemMXBean interfaces:
> - java.lang.management.OperatingSystemMXBean
> - com.sun.management.OperatingSystemMXBean (a subtype of the former)
>
> What the methods of these two interface report depends on exactly how
> the method has been specified. If it refers to something "available to
> the JVM (or current process)" then it needs to be container aware. If
> it refers to the "whole system" or something "physical", then that
> should report information for the whole system.
>
> Arguably some of the whole-system/physical APIs are misguided and
> reflect a history where containers and virtualized environments were
> not envisaged. But changing them is a compatibility issue and so it
> may be better to add new methods that report more interesting/relevant
> information for the process's environment. Either way a CSR request
> will need to be filed and the changes examined from a compatibility
> perspective.
>

Agree.

When the APIs were introduced, it was defined for the whole system and
the spec should be made clear.   If the APIs should be made
container-aware, the spec should be clarified of the new behavior.

Monitoring the whole system still provides a useful view even on a
container and virtualized environment.   One idea to consider would be
to refactor com.sun.management.OperatingSystemMXBean and extract the
container-aware methods in a new MXBean which OSMXBean extends and
create one new singleton instance of the new MXBean to report
container-specific metrics.

> Some of the APIs may not be "well formed" in the first place - for
> example getSystemLoadAverage():
>
> "Returns the system load average for the last minute. The system load
> average is the sum of the number of runnable entities queued to the
> available processors and the number of runnable entities running on
> the available processors averaged over a period of time. "
>
> This refers to the "system" but specifies the calculation is based on
> "available processors" - which is a contradiction as "available" !=
> "system".
>

The spec should be updated to "system" in this case.

> So I think we need to take a step back at the moment and look at
> exactly which methods can be container-aware based on their current
> specification, and which should be container-aware but can't be
> because of their current specification, and so what new methods may be
> needed.
>

Bob's suggestion [1] to make the total/free physical memory and swap
space size container-aware sounds reasonable.

Refactoring c.s.m.OperatingSystemMXBean could be one option to explore.

Mandy
[1]
http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028710.html
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8226575: OperatingSystemMXBean should be made container aware

Severin Gehwolf
Hi All,

I've filed a CSR and would like to solicit feedback at this point.
There are some contradicting issues involved and IMHO there is no
clear-cut answer to a couple of key points.

https://bugs.openjdk.java.net/browse/JDK-8228428

This is still a Work-In-Progress :)

Comments inline.

On Fri, 2019-07-19 at 08:27 -0700, Mandy Chung wrote:

>
> On 7/18/19 3:49 PM, David Holmes wrote:
> > Hi Andrew,
> >
> > Sorry - this whole discussion slipped past me while I was traveling in
> > the US in late June (and have then been on vacation). As I just wrote
> > in the bug report:
> >
> > There are two OperatingSystemMXBean interfaces:
> > - java.lang.management.OperatingSystemMXBean
> > - com.sun.management.OperatingSystemMXBean (a subtype of the former)
> >
> > What the methods of these two interface report depends on exactly how
> > the method has been specified. If it refers to something "available to
> > the JVM (or current process)" then it needs to be container aware. If
> > it refers to the "whole system" or something "physical", then that
> > should report information for the whole system.
> >
> > Arguably some of the whole-system/physical APIs are misguided and
> > reflect a history where containers and virtualized environments were
> > not envisaged. But changing them is a compatibility issue and so it
> > may be better to add new methods that report more interesting/relevant
> > information for the process's environment. Either way a CSR request
> > will need to be filed and the changes examined from a compatibility
> > perspective.
> >
>
> Agree.
>
> When the APIs were introduced, it was defined for the whole system and
> the spec should be made clear.   If the APIs should be made
> container-aware, the spec should be clarified of the new behavior.
>
> Monitoring the whole system still provides a useful view even on a
> container and virtualized environment.

Right, but going to the extreme, when a JVM runs inside a full
virtualized VM it reports it's view from *within* the VM, not the whole
system (the host the vm runs under). While it might be useful, I'm not
sure what the approach of "least surprises" would be. The JVM in a
container with a memory limit won't use the hosts' full memory
resources.

> One idea to consider would be
> to refactor com.sun.management.OperatingSystemMXBean and extract the
> container-aware methods in a new MXBean which OSMXBean extends and
> create one new singleton instance of the new MXBean to report
> container-specific metrics.

It would be platform specific, though. My reading of JDK-8199944
suggests this is a dead end.

> > Some of the APIs may not be "well formed" in the first place - for
> > example getSystemLoadAverage():
> >
> > "Returns the system load average for the last minute. The system load
> > average is the sum of the number of runnable entities queued to the
> > available processors and the number of runnable entities running on
> > the available processors averaged over a period of time. "
> >
> > This refers to the "system" but specifies the calculation is based on
> > "available processors" - which is a contradiction as "available" !=
> > "system".
> >
>
> The spec should be updated to "system" in this case.

You mean "available processors averaged" =>
"system processors averaged"? If so, should the method name of
getAvailableProcessors() be changed too?

> > So I think we need to take a step back at the moment and look at
> > exactly which methods can be container-aware based on their current
> > specification, and which should be container-aware but can't be
> > because of their current specification, and so what new methods may be
> > needed.
> >
>
> Bob's suggestion [1] to make the total/free physical memory and swap
> space size container-aware sounds reasonable.

+1

It makes a lot of sense given how the JDK works otherwise in those
quota'ed environments.

Thanks,
Severin

> Refactoring c.s.m.OperatingSystemMXBean could be one option to explore.
>
> Mandy
> [1]
> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028710.html

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8226575: OperatingSystemMXBean should be made container aware

David Holmes
Hi Severin, Andrew,

On 20/07/2019 2:07 am, Severin Gehwolf wrote:
> Hi All,
>
> I've filed a CSR and would like to solicit feedback at this point.
> There are some contradicting issues involved and IMHO there is no
> clear-cut answer to a couple of key points.
>
> https://bugs.openjdk.java.net/browse/JDK-8228428
>
> This is still a Work-In-Progress :)

I've added a lengthy comment to the CSR. I don't think we can
meaningfully try to retrofit this onto the existing MXbean and
attempting to do so will just open a can worms when it comes to
compatibility. A new MXbean that exposes the desired information for
what is available to the VM may be a simpler path forward.

Cheers,
David
-----

> Comments inline.
>
> On Fri, 2019-07-19 at 08:27 -0700, Mandy Chung wrote:
>>
>> On 7/18/19 3:49 PM, David Holmes wrote:
>>> Hi Andrew,
>>>
>>> Sorry - this whole discussion slipped past me while I was traveling in
>>> the US in late June (and have then been on vacation). As I just wrote
>>> in the bug report:
>>>
>>> There are two OperatingSystemMXBean interfaces:
>>> - java.lang.management.OperatingSystemMXBean
>>> - com.sun.management.OperatingSystemMXBean (a subtype of the former)
>>>
>>> What the methods of these two interface report depends on exactly how
>>> the method has been specified. If it refers to something "available to
>>> the JVM (or current process)" then it needs to be container aware. If
>>> it refers to the "whole system" or something "physical", then that
>>> should report information for the whole system.
>>>
>>> Arguably some of the whole-system/physical APIs are misguided and
>>> reflect a history where containers and virtualized environments were
>>> not envisaged. But changing them is a compatibility issue and so it
>>> may be better to add new methods that report more interesting/relevant
>>> information for the process's environment. Either way a CSR request
>>> will need to be filed and the changes examined from a compatibility
>>> perspective.
>>>
>>
>> Agree.
>>
>> When the APIs were introduced, it was defined for the whole system and
>> the spec should be made clear.   If the APIs should be made
>> container-aware, the spec should be clarified of the new behavior.
>>
>> Monitoring the whole system still provides a useful view even on a
>> container and virtualized environment.
>
> Right, but going to the extreme, when a JVM runs inside a full
> virtualized VM it reports it's view from *within* the VM, not the whole
> system (the host the vm runs under). While it might be useful, I'm not
> sure what the approach of "least surprises" would be. The JVM in a
> container with a memory limit won't use the hosts' full memory
> resources.
>
>> One idea to consider would be
>> to refactor com.sun.management.OperatingSystemMXBean and extract the
>> container-aware methods in a new MXBean which OSMXBean extends and
>> create one new singleton instance of the new MXBean to report
>> container-specific metrics.
>
> It would be platform specific, though. My reading of JDK-8199944
> suggests this is a dead end.
>
>>> Some of the APIs may not be "well formed" in the first place - for
>>> example getSystemLoadAverage():
>>>
>>> "Returns the system load average for the last minute. The system load
>>> average is the sum of the number of runnable entities queued to the
>>> available processors and the number of runnable entities running on
>>> the available processors averaged over a period of time. "
>>>
>>> This refers to the "system" but specifies the calculation is based on
>>> "available processors" - which is a contradiction as "available" !=
>>> "system".
>>>
>>
>> The spec should be updated to "system" in this case.
>
> You mean "available processors averaged" =>
> "system processors averaged"? If so, should the method name of
> getAvailableProcessors() be changed too?
>
>>> So I think we need to take a step back at the moment and look at
>>> exactly which methods can be container-aware based on their current
>>> specification, and which should be container-aware but can't be
>>> because of their current specification, and so what new methods may be
>>> needed.
>>>
>>
>> Bob's suggestion [1] to make the total/free physical memory and swap
>> space size container-aware sounds reasonable.
>
> +1
>
> It makes a lot of sense given how the JDK works otherwise in those
> quota'ed environments.
>
> Thanks,
> Severin
>
>> Refactoring c.s.m.OperatingSystemMXBean could be one option to explore.
>>
>> Mandy
>> [1]
>> http://mail.openjdk.java.net/pipermail/serviceability-dev/2019-July/028710.html
>