RFR: 8146115 - Improve docker container detection and resource configuration usage

classic Classic list List threaded Threaded
49 messages Options
123
Reply | Threaded
Open this post in threaded view
|

RFR: 8146115 - Improve docker container detection and resource configuration usage

Bob Vandette
Please review these changes that improve on docker container detection and the
automatic configuration of the number of active CPUs and total and free memory
based on the containers resource limitation settings and metric data files.

http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>

These changes are enabled with -XX:+UseContainerSupport.

You can enable logging for this support via -Xlog:os+container=trace.

Since the dynamic selection of CPUs based on cpusets, quotas and shares
may not satisfy every users needs, I’ve added an additional flag to allow the
number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.


Bob.



Reply | Threaded
Open this post in threaded view
|

Re: 8146115 - Improve docker container detection and resource configuration usage

Hohensee, Paul
Hi Bob,

Looks functionally ok, but I’m not an expert on that, so most of this review is sw engineering comments.

In os_linux.cpp:

Is ActiveProcessorCount used to constrain the container? The only use I can see is in active_processor_count(), which just returns the value specified on the command line. Seems to be a nop otherwise.

In available_memory(), is the cast to julong in “avail_mem = (julong)(mem_limit – mem_usage);” needed? Maybe to get rid of a compiler warning? Otherwise it’s strange.

In print_container_info(), “Running in a container: “ is common to both if and else. You could replace the first call to st->print() with
    st->print(“container: “; if (!OSContainer::is_containerized()) st->print(“not “); st->print(“running in a container.”)
I’d remove the “OSContainer::” prefix string, it’s a bit verbose, plus you probably want an st->cr() at the end of the method.
A jlong is a long on some platforms and a long long on others, so I’d replace “0L” with just ‘0’, since ‘0’ will get widened properly.
Minor formatting nits: ‘else’ goes on the same line as the ‘}’ closing the ‘then’; if an ‘else’ clause isn’t a block, then it usually goes on the same line as the ‘else’.

In osContainer.hpp/cpp:

If is_containerized() is frequently called, you might want to inline it. Also, init() is called only from create_java_vm(), so it’s not multi-thread safe, which means that checking _is_initialized and calling init() if it’s not set is confusing. I’d remove _is_initiatialized.

Getters in Hotspot don’t have “get_” prefixes (at least, they never used to!), so replace “get_container_type” with “container_type” and “pd_get_container_type” with “pd_container_type”.

In osContainer_linux.hpp/cpp:

In the CgroupSubsystem, you may want to use the os:: versions of strdup and free.
MAXBUF is used as the maximum length of various file paths. Use MAXPATHLEN+1 instead?
Change get_subsystem_path() to subsystem_path()?

Is it really necessary to have a separate GEN_CONTAINER_GET_INFO_STR macro? You could just pass NULL into GEN_CONTAINER_GET_INFO and have it check for scan_fmt == NULL.

Minor formatting nits: ‘else goes on the same line as the ‘}’ closing the ‘then’; ‘{‘ s/b at the end of the line defining cpuset_cpus_to_count().

The GET_CONTAINER_INFO macro should bracket the code with a block: {}.

Manifest constant 9223372036854771712 should be ‘static const julong UNLIMITED_MEM = 0x7ffffffffffff000;’ or ‘#define UNLIMITED_MEM 0x7ffffffffffff000’.

Thanks,

Paul

On 9/22/17, 7:28 AM, "hotspot-dev on behalf of Bob Vandette" <[hidden email] on behalf of [hidden email]> wrote:

    Please review these changes that improve on docker container detection and the
    automatic configuration of the number of active CPUs and total and free memory
    based on the containers resource limitation settings and metric data files.
   
    http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
   
    These changes are enabled with -XX:+UseContainerSupport.
   
    You can enable logging for this support via -Xlog:os+container=trace.
   
    Since the dynamic selection of CPUs based on cpusets, quotas and shares
    may not satisfy every users needs, I’ve added an additional flag to allow the
    number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
   
   
    Bob.
   
   
   
   

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

David Holmes
In reply to this post by Bob Vandette
Hi Bob,

I think there are some high-level decisions to be made regarding Linux
only or seemingly shared support for containers - see below - but
overall I don't have any major issues. I can't comment on the details of
the actual low-level cgroup queries etc.

Comments below ...

Bob writes:
> Please review these changes that improve on docker container detection and the
> automatic configuration of the number of active CPUs and total and free memory
> based on the containers resource limitation settings and metric data files.
>
> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>
> These changes are enabled with -XX:+UseContainerSupport.

If this is all confined to Linux only then this should be a linux-only
flag and all the changes should be confined to linux code. No shared
osContainer API is needed as it can be defined as a nested class of
os::Linux, and will only be called from os_linux.cpp.

>
> You can enable logging for this support via -Xlog:os+container=trace.

I'm not sure that is the right tagging for all situations. For example
in os::Linux::available_memory you might log the actual amount for
os+memory, regardless of whether containerized on not. You can of course
also have container specific logging in the same function.

> Since the dynamic selection of CPUs based on cpusets, quotas and shares
> may not satisfy every users needs, I’ve added an additional flag to allow the
> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.

I would suggest that ActiveProcessorCount be constrained to being >1 -
this is in line with our plans to get rid of AssumeMP/os::is_MP and
always build in MP support. Otherwise a count of 1 today won't behave
the same as a count of 1 in the future.

Also you have defined this globally but only accounted for it on Linux.
I think it makes sense to support this flag on all platforms (a
generalization of AssumeMP). Otherwise it needs to be defined as a
Linux-only flag in the pd_globals.hpp file

---

src/os/linux/vm/os_linux.cpp

In os::Linux::available_memory you might want more container specific
logging to track why you don't use container info even if containerized.
It might help expose a misconfigured container.

In os::Linux::print_container_info I'd probably just print nothing if
not containerized.

General style issue: we're not constrained to only declare local
variables at the start of a block. You should be able to declare the
variable at, or close to, the point of initialization.

Style issue:

2121     if (i < 0) st->print("OSContainer::active_processor_count()
failed");
2122     else

and elsewhere. Please move the st->print to its own line. Others may
argue for always using blocks ({}) in if/else.

2128       st->print("OSContainer::memory_limit_in_bytes: %ld", j);

And elsewhere: %ld should be JLONG_FORMAT when printing a jlong.

5024   // User has overridden the number of active processors
5025   if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) {
5026     log_trace(os)("active_processor_count: "
5027                   "active processor count set by user : %d",
5028                   (int)ActiveProcessorCount);
5029     return ActiveProcessorCount;
5030   }

We don't normally check flags in runtime code like this - this will be
executed on every call, and you will see that logging each time. This
should be handled during initialization (os::Posix::init()? - if
applying this flag globally) - with logging occurring once. The above
should just reduce to:

if (ActiveProcessorCount > 0) {
   return ActiveProcessorCount; // explicit user control of number of cpus
}

Even then I do get concerned about having to always check for the least
common cases before the most common one. :(

---

The osContainer_<os>.hpp files seem to be unnecessary as they are all empty.

---

osContainer.hpp:

The pd_* methods should be private.

You've exposed all of the potential cpu container functions as-if they
might be used directly, rather than just internally when calculating
available-processors. That's okay but you then need to document all the
possible return values - in particular the distinction between -2, -1, 0
and > 0.

---

osContainer.cpp

   41 bool OSContainer::is_containerized() {
   42   if (!_is_initialized) OSContainer::init();
   43   return _is_containerized;
   44 }

As OSContainer::init() is called at VM initialization the above should
simply assert that _is_initialized is true - else it means your call to
::init is in the wrong place and serves no purpose. That allows
is_containerized to be trivially inlined and placed in the .hpp file.

---

osContainer_linux.cpp

As Paul commented you should check whether you should be using
os::strdup, os::malloc, os::free etc to interact with NMT correctly -
and also respond to OOM appropriately.

34 class CgroupSubsystem: CHeapObj<mtInternal> {

You defined this class as CHeapObj and added a destructor to free a few
things, but I can't see where the instances of this class will
themselves ever be freed.

62     void set_subsystem_path(char *cgroup_path) {

If this takes a "const char*" will it save you from casting string
literals to "char*" elsewhere?

170 GEN_CONTAINER_GET_INFO(jlong, jlong, "%ld")

%ld should be JLONG_FORMAT

417   if (memlimit == 9223372036854771712)

Large constants should be defined using the CONST64 macro. And they
should only be defined once as Paul suggested.

  485  * Algorythm:

Typo: -> Algorithm

509   cpus = OSContainer::cpu_cpuset_cpus();
516     log_error(os,container)("Error getting cpuset_cpucount");

That seems to be the wrong error message for the function called.

522     share_count = ceilf((float)share / 1024.0f);

The cast to float is not needed as you are dividing by a float.

  509   cpus = OSContainer::cpu_cpuset_cpus();
  510   if (cpus != (char *)CONTAINER_ERROR) {

This is invalid code - you can't cast -2 to char*. The only way to
report an error from a char* function is an actual error message, or
else NULL.

509   cpus = OSContainer::cpu_cpuset_cpus();

As you are inside a pd_* implementation, I don't think you want or need
to call the public versions of the other OSContainer functions - you can
just call the local pd_ variants directly. Nothing written at the
OSContainer level should be allowed to affect what you calculate here
inside your implementation.

555  * Return the number of miliseconds per period

Typo: -> milliseconds

525   else share_count = cpu_count;

Style (various locations): separate line and possibly {}

Thanks,
David
-----

>
> Bob.

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

Bob Vandette
David,  Thank you for taking the time and providing a detailed review of these changes.

Where I haven’t responded, I’ll update the implementation based on your comments.

> On Sep 26, 2017, at 1:19 AM, David Holmes <[hidden email]> wrote:
>
> Hi Bob,
>
> I think there are some high-level decisions to be made regarding Linux only or seemingly shared support for containers - see below - but overall I don't have any major issues. I can't comment on the details of the actual low-level cgroup queries etc.
>
> Comments below ...
>
> Bob writes:
>> Please review these changes that improve on docker container detection and the
>> automatic configuration of the number of active CPUs and total and free memory
>> based on the containers resource limitation settings and metric data files.
>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>> These changes are enabled with -XX:+UseContainerSupport.
>
> If this is all confined to Linux only then this should be a linux-only flag and all the changes should be confined to linux code. No shared osContainer API is needed as it can be defined as a nested class of os::Linux, and will only be called from os_linux.cpp.

I received feedback on my other Container work where I was asked to make sure it was possible to support
other container technologies.  The addition of the shared osContainer API is to prepare for this and recognize that
this will eventually be supported other platforms.

>
>> You can enable logging for this support via -Xlog:os+container=trace.
>
> I'm not sure that is the right tagging for all situations. For example in os::Linux::available_memory you might log the actual amount for os+memory, regardless of whether containerized on not. You can of course also have container specific logging in the same function.
>
>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>> may not satisfy every users needs, I’ve added an additional flag to allow the
>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>
> I would suggest that ActiveProcessorCount be constrained to being >1 - this is in line with our plans to get rid of AssumeMP/os::is_MP and always build in MP support. Otherwise a count of 1 today won't behave the same as a count of 1 in the future.
What if I return true for is_MP anytime ActiveProcessorCount is set.  I’d like to provide the ability of specifying a single processor.

>
> Also you have defined this globally but only accounted for it on Linux. I think it makes sense to support this flag on all platforms (a generalization of AssumeMP). Otherwise it needs to be defined as a Linux-only flag in the pd_globals.hpp file
Good idea.

>
> ---
>
> src/os/linux/vm/os_linux.cpp
>
> In os::Linux::available_memory you might want more container specific logging to track why you don't use container info even if containerized. It might help expose a misconfigured container.
>
> In os::Linux::print_container_info I'd probably just print nothing if not containerized.
>
> General style issue: we're not constrained to only declare local variables at the start of a block. You should be able to declare the variable at, or close to, the point of initialization.
>
> Style issue:
>
> 2121     if (i < 0) st->print("OSContainer::active_processor_count() failed");
> 2122     else
>
> and elsewhere. Please move the st->print to its own line. Others may argue for always using blocks ({}) in if/else.

There doesn’t seem to be consistency on this issue.

>
> 2128       st->print("OSContainer::memory_limit_in_bytes: %ld", j);
>
> And elsewhere: %ld should be JLONG_FORMAT when printing a jlong.

I found a few places already where I needed to add this.  I’ll look for all of them.

>
> 5024   // User has overridden the number of active processors
> 5025   if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) {
> 5026     log_trace(os)("active_processor_count: "
> 5027                   "active processor count set by user : %d",
> 5028                   (int)ActiveProcessorCount);
> 5029     return ActiveProcessorCount;
> 5030   }
>
> We don't normally check flags in runtime code like this - this will be executed on every call, and you will see that logging each time. This should be handled during initialization (os::Posix::init()? - if applying this flag globally) - with logging occurring once. The above should just reduce to:
>
> if (ActiveProcessorCount > 0) {
>  return ActiveProcessorCount; // explicit user control of number of cpus
> }
>
> Even then I do get concerned about having to always check for the least common cases before the most common one. :(

This is not in a highly used function so it should be ok.

>
> ---
>
> The osContainer_<os>.hpp files seem to be unnecessary as they are all empty.

I’ll remove them.  I wasn’t sure if there was a convention to move more of osContainer_linux.cpp -> osContainer_linux.hpp.

For example: class CgroupSubsystem

>
> ---
>
> osContainer.hpp:
>
> The pd_* methods should be private.
>
> You've exposed all of the potential cpu container functions as-if they might be used directly, rather than just internally when calculating available-processors. That's okay but you then need to document all the possible return values - in particular the distinction between -2, -1, 0 and > 0.
>
> ---
>
> osContainer.cpp
>
>  41 bool OSContainer::is_containerized() {
>  42   if (!_is_initialized) OSContainer::init();
>  43   return _is_containerized;
>  44 }
>
> As OSContainer::init() is called at VM initialization the above should simply assert that _is_initialized is true - else it means your call to ::init is in the wrong place and serves no purpose. That allows is_containerized to be trivially inlined and placed in the .hpp file.
>
> ---
>
> osContainer_linux.cpp
>
> As Paul commented you should check whether you should be using os::strdup, os::malloc, os::free etc to interact with NMT correctly - and also respond to OOM appropriately.
>
> 34 class CgroupSubsystem: CHeapObj<mtInternal> {
>
> You defined this class as CHeapObj and added a destructor to free a few things, but I can't see where the instances of this class will themselves ever be freed

What’s the latest thinking on freeing CHeap Objects on termination?  Is it really worth wasting cpu cycles when our
process is about to terminate?  If not, I’ll just remove the destructors.

>
> 62     void set_subsystem_path(char *cgroup_path) {
>
> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere?

I tried several different ways of declaring the container accessor functions and
always ended up with warnings due to scanf not being able to validate arguments
since the format string didn’t end up being a string literal.  I originally was using templates
and then ended up with the macros.  I tried several different casts but could resolve the problem.

>
> 170 GEN_CONTAINER_GET_INFO(jlong, jlong, "%ld")
>
> %ld should be JLONG_FORMAT
>
> 417   if (memlimit == 9223372036854771712)
>
> Large constants should be defined using the CONST64 macro. And they should only be defined once as Paul suggested.
>
> 485  * Algorythm:
>
> Typo: -> Algorithm
>
> 509   cpus = OSContainer::cpu_cpuset_cpus();
> 516     log_error(os,container)("Error getting cpuset_cpucount");
>
> That seems to be the wrong error message for the function called.
>
> 522     share_count = ceilf((float)share / 1024.0f);
>
> The cast to float is not needed as you are dividing by a float.
>
> 509   cpus = OSContainer::cpu_cpuset_cpus();
> 510   if (cpus != (char *)CONTAINER_ERROR) {
>
> This is invalid code - you can't cast -2 to char*. The only way to report an error from a char* function is an actual error message, or else NULL.

I was trying to make error reporting consistent for all types.  I’ll change the error for strings to NULL.

>
> 509   cpus = OSContainer::cpu_cpuset_cpus();
>
> As you are inside a pd_* implementation, I don't think you want or need to call the public versions of the other OSContainer functions - you can just call the local pd_ variants directly. Nothing written at the OSContainer level should be allowed to affect what you calculate here inside your implementation.
>
> 555  * Return the number of miliseconds per period
>
> Typo: -> milliseconds
>
> 525   else share_count = cpu_count;
>
> Style (various locations): separate line and possibly {}
>
> Thanks,
> David
> ——

Thanks,
Bob.

>
>> Bob.
>

Reply | Threaded
Open this post in threaded view
|

Re: 8146115 - Improve docker container detection and resource configuration usage

Bob Vandette
In reply to this post by Hohensee, Paul
Thanks for the review Paul.  I’ll take care of all of your suggested changes where I haven’t commented below.

> On Sep 25, 2017, at 5:34 PM, Hohensee, Paul <[hidden email]> wrote:
>
> Hi Bob,
>
> Looks functionally ok, but I’m not an expert on that, so most of this review is sw engineering comments.
>
> In os_linux.cpp:
>
> Is ActiveProcessorCount used to constrain the container? The only use I can see is in active_processor_count(), which just returns the value specified on the command line. Seems to be a nop otherwise.
Yes, it returns ActiveProcessorCount in the os:active_processor_count() function rather than trying to determine the
number of active processors by other mechanisms.  This is not a noop.

>
> In available_memory(), is the cast to julong in “avail_mem = (julong)(mem_limit – mem_usage);” needed? Maybe to get rid of a compiler warning? Otherwise it’s strange.

Yes, compiler warning.

>
> In print_container_info(), “Running in a container: “ is common to both if and else. You could replace the first call to st->print() with
>    st->print(“container: “; if (!OSContainer::is_containerized()) st->print(“not “); st->print(“running in a container.”)
> I’d remove the “OSContainer::” prefix string, it’s a bit verbose, plus you probably want an st->cr() at the end of the method.
> A jlong is a long on some platforms and a long long on others, so I’d replace “0L” with just ‘0’, since ‘0’ will get widened properly.

> Minor formatting nits: ‘else’ goes on the same line as the ‘}’ closing the ‘then’; if an ‘else’ clause isn’t a block, then it usually goes on the same line as the ‘else’.

>
> In osContainer.hpp/cpp:
>
> If is_containerized() is frequently called, you might want to inline it. Also, init() is called only from create_java_vm(), so it’s not multi-thread safe, which means that checking _is_initialized and calling init() if it’s not set is confusing. I’d remove _is_initiatialized.

>
> Getters in Hotspot don’t have “get_” prefixes (at least, they never used to!), so replace “get_container_type” with “container_type” and “pd_get_container_type” with “pd_container_type”.

There are many examples of get_xxx in hotspot now.

>
> In osContainer_linux.hpp/cpp:
>
> In the CgroupSubsystem, you may want to use the os:: versions of strdup and free.
> MAXBUF is used as the maximum length of various file paths. Use MAXPATHLEN+1 instead?

> Change get_subsystem_path() to subsystem_path()?
>
> Is it really necessary to have a separate GEN_CONTAINER_GET_INFO_STR macro? You could just pass NULL into GEN_CONTAINER_GET_INFO and have it check for scan_fmt == NULL.
I was trying to avoid additional runtime testing since I already have a lot of required error checking to do.

>
> Minor formatting nits: ‘else goes on the same line as the ‘}’ closing the ‘then’; ‘{‘ s/b at the end of the line defining cpuset_cpus_to_count().
>
> The GET_CONTAINER_INFO macro should bracket the code with a block: {}.
>
> Manifest constant 9223372036854771712 should be ‘static const julong UNLIMITED_MEM = 0x7ffffffffffff000;’ or ‘#define UNLIMITED_MEM 0x7ffffffffffff000’.
>

Thanks,
Bob.

> Thanks,
>
> Paul
>
> On 9/22/17, 7:28 AM, "hotspot-dev on behalf of Bob Vandette" <[hidden email] on behalf of [hidden email]> wrote:
>
>    Please review these changes that improve on docker container detection and the
>    automatic configuration of the number of active CPUs and total and free memory
>    based on the containers resource limitation settings and metric data files.
>
>    http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>
>    These changes are enabled with -XX:+UseContainerSupport.
>
>    You can enable logging for this support via -Xlog:os+container=trace.
>
>    Since the dynamic selection of CPUs based on cpusets, quotas and shares
>    may not satisfy every users needs, I’ve added an additional flag to allow the
>    number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>
>
>    Bob.
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: 8146115 - Improve docker container detection and resource configuration usage

Hohensee, Paul
Hadn’t known about the existence of ‘get_’ getter names. That means there’s a mix currently, which is confusing, but a matter for another RFE or a change to the style guide. See

    https://wiki.openjdk.java.net/display/HotSpot/StyleGuide

which says

    Getter accessor names are noun phrases, with no "get_" noise word. Boolean getters can also begin with "is_" or "has_".

Re GEN_CONTAINER_GET_INFO_STR, on Intel processors the compare-and-branch generated for scan_fmt == NULL is a single cycle and probably similarly cheap on other platforms. Not a lot of overhead in exchange for de-duplication.

Thanks,

Paul

On 9/27/17, 9:10 AM, "Bob Vandette" <[hidden email]> wrote:

    Thanks for the review Paul.  I’ll take care of all of your suggested changes where I haven’t commented below.
   
    > On Sep 25, 2017, at 5:34 PM, Hohensee, Paul <[hidden email]> wrote:
    >
    > Hi Bob,
    >
    > Looks functionally ok, but I’m not an expert on that, so most of this review is sw engineering comments.
    >
    > In os_linux.cpp:
    >
    > Is ActiveProcessorCount used to constrain the container? The only use I can see is in active_processor_count(), which just returns the value specified on the command line. Seems to be a nop otherwise.
    Yes, it returns ActiveProcessorCount in the os:active_processor_count() function rather than trying to determine the
    number of active processors by other mechanisms.  This is not a noop.
   
    >
    > In available_memory(), is the cast to julong in “avail_mem = (julong)(mem_limit – mem_usage);” needed? Maybe to get rid of a compiler warning? Otherwise it’s strange.
   
    Yes, compiler warning.
   
    >
    > In print_container_info(), “Running in a container: “ is common to both if and else. You could replace the first call to st->print() with
    >    st->print(“container: “; if (!OSContainer::is_containerized()) st->print(“not “); st->print(“running in a container.”)
    > I’d remove the “OSContainer::” prefix string, it’s a bit verbose, plus you probably want an st->cr() at the end of the method.
    > A jlong is a long on some platforms and a long long on others, so I’d replace “0L” with just ‘0’, since ‘0’ will get widened properly.
   
    > Minor formatting nits: ‘else’ goes on the same line as the ‘}’ closing the ‘then’; if an ‘else’ clause isn’t a block, then it usually goes on the same line as the ‘else’.
   
    >
    > In osContainer.hpp/cpp:
    >
    > If is_containerized() is frequently called, you might want to inline it. Also, init() is called only from create_java_vm(), so it’s not multi-thread safe, which means that checking _is_initialized and calling init() if it’s not set is confusing. I’d remove _is_initiatialized.
   
    >
    > Getters in Hotspot don’t have “get_” prefixes (at least, they never used to!), so replace “get_container_type” with “container_type” and “pd_get_container_type” with “pd_container_type”.
   
    There are many examples of get_xxx in hotspot now.
   
    >
    > In osContainer_linux.hpp/cpp:
    >
    > In the CgroupSubsystem, you may want to use the os:: versions of strdup and free.
    > MAXBUF is used as the maximum length of various file paths. Use MAXPATHLEN+1 instead?
   
    > Change get_subsystem_path() to subsystem_path()?
    >
    > Is it really necessary to have a separate GEN_CONTAINER_GET_INFO_STR macro? You could just pass NULL into GEN_CONTAINER_GET_INFO and have it check for scan_fmt == NULL.
    I was trying to avoid additional runtime testing since I already have a lot of required error checking to do.
   
    >
    > Minor formatting nits: ‘else goes on the same line as the ‘}’ closing the ‘then’; ‘{‘ s/b at the end of the line defining cpuset_cpus_to_count().
    >
    > The GET_CONTAINER_INFO macro should bracket the code with a block: {}.
    >
    > Manifest constant 9223372036854771712 should be ‘static const julong UNLIMITED_MEM = 0x7ffffffffffff000;’ or ‘#define UNLIMITED_MEM 0x7ffffffffffff000’.
    >
   
    Thanks,
    Bob.
   
    > Thanks,
    >
    > Paul
    >
    > On 9/22/17, 7:28 AM, "hotspot-dev on behalf of Bob Vandette" <[hidden email] on behalf of [hidden email]> wrote:
    >
    >    Please review these changes that improve on docker container detection and the
    >    automatic configuration of the number of active CPUs and total and free memory
    >    based on the containers resource limitation settings and metric data files.
    >
    >    http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
    >
    >    These changes are enabled with -XX:+UseContainerSupport.
    >
    >    You can enable logging for this support via -Xlog:os+container=trace.
    >
    >    Since the dynamic selection of CPUs based on cpusets, quotas and shares
    >    may not satisfy every users needs, I’ve added an additional flag to allow the
    >    number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
    >
    >
    >    Bob.
    >
    >
    >
    >
    >
   
   

Reply | Threaded
Open this post in threaded view
|

Re: 8146115 - Improve docker container detection and resource configuration usage

John Rose-3
"There are counter examples to the prevailing style" does not lead to "therefore I can diverge from the prevailing style whenever I want."  You need another reason, such as "I'm modifying an API where an alternative style is the local norm" or "these identifiers gain special value from their special style."

– John

On Sep 27, 2017, at 12:59 PM, Hohensee, Paul <[hidden email]> wrote:

>> Getters in Hotspot don’t have “get_” prefixes (at least, they never used to!), so replace “get_container_type” with “container_type” and “pd_get_container_type” with “pd_container_type”.
>
>    There are many examples of get_xxx in hotspot now.

Reply | Threaded
Open this post in threaded view
|

Re: 8146115 - Improve docker container detection and resource configuration usage

Bob Vandette
I don’t have a problem with following the approved style.

Whenever I’m adding new code in a large body of code I usually look around and try to adapt to
the norm assuming that others that came before me were properly reviewed for style conformance.
I assumed that the style guide had been updated but didn’t check.

I’ll just point out that there are many many offenders that have been in the source tree a looong time.

Since I am more or less extending the os class, I followed os.hpp’s lead.  

os.hpp:  static const char*    get_temp_directory();
os.hpp:  static const char*    get_current_directory(char *buf, size_t buflen);
os.hpp:  static int get_loaded_modules_info(LoadedModulesCallbackFunc callback, void *param);
os.hpp:  static void* get_default_process_handle();
os.hpp:  static bool get_host_name(char* buf, size_t buflen);
os.hpp:  static int get_last_error();
os.hpp:  static frame get_sender_for_C_frame(frame *fr);
os.hpp:  static int get_signal_number(const char* signal_name);
os.hpp:  static int get_native_stack(address* stack, int size, int toSkip = 0);

Here are a few more examples:

thread.hpp:  const char* get_thread_name() const;
thread.hpp:  const char* get_thread_name_string(char* buf = NULL, int buflen = 0) const;
thread.hpp:  const char* get_threadgroup_name() const;
thread.hpp:  const char* get_parent_name() const;

sharedRuntime.hpp:  static address get_handle_wrong_method_stub() {
sharedRuntime.hpp:  static address get_handle_wrong_method_abstract_stub() {
sharedRuntime.hpp:  static address get_resolve_opt_virtual_call_stub() {
sharedRuntime.hpp:  static address get_resolve_virtual_call_stub() {

threadService.hpp:  static jlong get_total_thread_count()       { return _total_threads_count->get_value(); }
threadService.hpp:  static jlong get_peak_thread_count()        { return _peak_threads_count->get_value(); }
threadService.hpp:  static jlong get_live_thread_count()        { return _live_threads_count->get_value() - _exiting_threads_count; }
threadService.hpp:  static jlong get_daemon_thread_count()      { return _daemon_threads_count->get_value() - _exiting_daemon_threads_count; }

memoryService.hpp:  static MemoryPool*    get_memory_pool(instanceHandle pool);
memoryService.hpp:  static MemoryManager* get_memory_manager(instanceHandle mgr);
memoryService.hpp:  static MemoryPool* get_memory_pool(int index) {
memoryService.hpp:  static MemoryManager* get_memory_manager(int index) {
memoryService.hpp:  static bool get_verbose() { return log_is_enabled(Info, gc); }
memoryService.hpp:  static const GCMemoryManager* get_minor_gc_manager() {
memoryService.hpp:  static const GCMemoryManager* get_major_gc_manager() {

memTracker.cpp:  return MallocTracker::get_base(memblock);
memTracker.hpp:  static inline Tracker get_virtual_memory_uncommit_tracker() { return Tracker(); }
memTracker.hpp:  static inline Tracker get_virtual_memory_release_tracker() { return Tracker(); }
memTracker.hpp:      return MallocTracker::get_header_size(memblock);
memTracker.hpp:  static inline Tracker get_virtual_memory_uncommit_tracker() {
memTracker.hpp:  static inline Tracker get_virtual_memory_release_tracker() {
memTracker.hpp:  static inline MemBaseline& get_baseline() {


I’ll update my changes,

Bob.



> On Sep 27, 2017, at 4:05 PM, John Rose <[hidden email]> wrote:
>
> "There are counter examples to the prevailing style" does not lead to "therefore I can diverge from the prevailing style whenever I want."  You need another reason, such as "I'm modifying an API where an alternative style is the local norm" or "these identifiers gain special value from their special style."
>
> – John
>
> On Sep 27, 2017, at 12:59 PM, Hohensee, Paul <[hidden email]> wrote:
>
>>> Getters in Hotspot don’t have “get_” prefixes (at least, they never used to!), so replace “get_container_type” with “container_type” and “pd_get_container_type” with “pd_container_type”.
>>
>>   There are many examples of get_xxx in hotspot now.
>

Reply | Threaded
Open this post in threaded view
|

Re: 8146115 - Improve docker container detection and resource configuration usage

Hohensee, Paul
I’ve filed https://bugs.openjdk.java.net/browse/JDK-8188068 to fix the getter method name problem and assigned it to myself.

Thanks,

Paul

On 9/27/17, 1:22 PM, "Bob Vandette" <[hidden email]> wrote:

    I don’t have a problem with following the approved style.
   
    Whenever I’m adding new code in a large body of code I usually look around and try to adapt to
    the norm assuming that others that came before me were properly reviewed for style conformance.
    I assumed that the style guide had been updated but didn’t check.
   
    I’ll just point out that there are many many offenders that have been in the source tree a looong time.
   
    Since I am more or less extending the os class, I followed os.hpp’s lead.  
   
    os.hpp:  static const char*    get_temp_directory();
    os.hpp:  static const char*    get_current_directory(char *buf, size_t buflen);
    os.hpp:  static int get_loaded_modules_info(LoadedModulesCallbackFunc callback, void *param);
    os.hpp:  static void* get_default_process_handle();
    os.hpp:  static bool get_host_name(char* buf, size_t buflen);
    os.hpp:  static int get_last_error();
    os.hpp:  static frame get_sender_for_C_frame(frame *fr);
    os.hpp:  static int get_signal_number(const char* signal_name);
    os.hpp:  static int get_native_stack(address* stack, int size, int toSkip = 0);
   
    Here are a few more examples:
   
    thread.hpp:  const char* get_thread_name() const;
    thread.hpp:  const char* get_thread_name_string(char* buf = NULL, int buflen = 0) const;
    thread.hpp:  const char* get_threadgroup_name() const;
    thread.hpp:  const char* get_parent_name() const;
   
    sharedRuntime.hpp:  static address get_handle_wrong_method_stub() {
    sharedRuntime.hpp:  static address get_handle_wrong_method_abstract_stub() {
    sharedRuntime.hpp:  static address get_resolve_opt_virtual_call_stub() {
    sharedRuntime.hpp:  static address get_resolve_virtual_call_stub() {
   
    threadService.hpp:  static jlong get_total_thread_count()       { return _total_threads_count->get_value(); }
    threadService.hpp:  static jlong get_peak_thread_count()        { return _peak_threads_count->get_value(); }
    threadService.hpp:  static jlong get_live_thread_count()        { return _live_threads_count->get_value() - _exiting_threads_count; }
    threadService.hpp:  static jlong get_daemon_thread_count()      { return _daemon_threads_count->get_value() - _exiting_daemon_threads_count; }
   
    memoryService.hpp:  static MemoryPool*    get_memory_pool(instanceHandle pool);
    memoryService.hpp:  static MemoryManager* get_memory_manager(instanceHandle mgr);
    memoryService.hpp:  static MemoryPool* get_memory_pool(int index) {
    memoryService.hpp:  static MemoryManager* get_memory_manager(int index) {
    memoryService.hpp:  static bool get_verbose() { return log_is_enabled(Info, gc); }
    memoryService.hpp:  static const GCMemoryManager* get_minor_gc_manager() {
    memoryService.hpp:  static const GCMemoryManager* get_major_gc_manager() {
   
    memTracker.cpp:  return MallocTracker::get_base(memblock);
    memTracker.hpp:  static inline Tracker get_virtual_memory_uncommit_tracker() { return Tracker(); }
    memTracker.hpp:  static inline Tracker get_virtual_memory_release_tracker() { return Tracker(); }
    memTracker.hpp:      return MallocTracker::get_header_size(memblock);
    memTracker.hpp:  static inline Tracker get_virtual_memory_uncommit_tracker() {
    memTracker.hpp:  static inline Tracker get_virtual_memory_release_tracker() {
    memTracker.hpp:  static inline MemBaseline& get_baseline() {
   
   
    I’ll update my changes,
   
    Bob.
   
   
   
    > On Sep 27, 2017, at 4:05 PM, John Rose <[hidden email]> wrote:
    >
    > "There are counter examples to the prevailing style" does not lead to "therefore I can diverge from the prevailing style whenever I want."  You need another reason, such as "I'm modifying an API where an alternative style is the local norm" or "these identifiers gain special value from their special style."
    >
    > – John
    >
    > On Sep 27, 2017, at 12:59 PM, Hohensee, Paul <[hidden email]> wrote:
    >
    >>> Getters in Hotspot don’t have “get_” prefixes (at least, they never used to!), so replace “get_container_type” with “container_type” and “pd_get_container_type” with “pd_container_type”.
    >>
    >>   There are many examples of get_xxx in hotspot now.
    >
   
   
   

Reply | Threaded
Open this post in threaded view
|

Re: 8146115 - Improve docker container detection and resource configuration usage

John Rose-3
In reply to this post by Bob Vandette
On Sep 27, 2017, at 1:22 PM, Bob Vandette <[hidden email]> wrote:

> I don’t have a problem with following the approved style.

Of course!  For the record, I certainly didn't intend to imply
any such problems.  My previous note was perhaps short
enough to carry an abrupt tone.  I think I wrote it on my phone.
Sorry.  :-)

I'm going to take this opportunity to expand on my reasoning,
for the record.  I'll also address your counterexamples.

So, I wrote the style guide for Hotspot to help preserve some
pleasing (at least, useful) regularities in the style of the code base.
The guide is successful (at least, partly) if it gives a clear nudge
towards a reasonable norm we can all share.

Under the assumption that judgment and taste are important
in our creative activity of software engineering, the style guide
states clearly that there can always be reasons to depart from the
rules.  That makes it relatively weak as a process component, and
that is intentional.  It is guidance not mandate, and as such dependent
on everyone's judgment and taste.

> Since I am more or less extending the os class, I followed os.hpp’s lead.  

This argument ("I'm modifying an API where an alternative style
is the local norm.") is warmly supported by the style guide.  See:

  https://wiki.openjdk.java.net/display/HotSpot/StyleGuide#StyleGuide-Counterexamples

Some details: os.hpp has 900 LOC and a crude grep shows about
40 nullary non-boolean access methods plus a non-grepped
number of non-nullary access methods.

$ grep ' static .*( *)' os.hpp | grep -v 'static *void *[^*]' | grep -v 'static *bool'

Of these nine have "get_" pattern in their names.  So it's a
mixed bag.  Choosing how to add new names is not an exact
science, but I'd say there is a "tilt" here towards avoiding
"get_", but not a decisive one.  Is there a local convention
or just blind bit-evolution?  I think it's probably just evolution.
But I won't be upset if someone points out a reason for
some of those "get_" names.

Also, Paul's bug for un-mixing some of the various bags is
good as a low-frequency cleanup activity.  When he calls for
review, folks can take a look and give reasons why local conventions
are important and should not be "cleaned up".  If we want
to agree to leave things as they are, for some durable reason,
we can capture an explanatory comment and move on.
We might even decide, "please don't ever change this bit
because it will cause a maintenance burden", in which case
that too should be recorded clearly.

I hope this helps.

— John
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

David Holmes
In reply to this post by Bob Vandette
Hi Bob,

On 28/09/2017 1:45 AM, Bob Vandette wrote:
> David,  Thank you for taking the time and providing a detailed review of
> these changes.
>
> Where I haven’t responded, I’ll update the implementation based on your
> comments.

Okay. I've trimmed below to only leave things I have follow up on.

>> If this is all confined to Linux only then this should be a linux-only
>> flag and all the changes should be confined to linux code. No shared
>> osContainer API is needed as it can be defined as a nested class of
>> os::Linux, and will only be called from os_linux.cpp.
>
> I received feedback on my other Container work where I was asked to
> make sure it was possible to support other container technologies.
> The addition of the shared osContainer API is to prepare for this and
> recognize that this will eventually be supported other platforms.

The problem is that the proposed osContainer API is totally cgroup
centric. That API might not make sense for a different container
technology. Even if Docker is used on different platforms, does it use
cgroups on those other platforms? Until we have at least two examples we
want to support we don't know how to formulate a generic API. So in my
opinion we should initially keep this Linux specific as a
proof-of-concept for future more general container support.

>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>> may not satisfy every users needs, I’ve added an additional flag to
>>> allow the
>>> number of CPUs to be overridden.  This flag is named
>>> -XX:ActiveProcessorCount=xx.
>>
>> I would suggest that ActiveProcessorCount be constrained to being >1 -
>> this is in line with our plans to get rid of AssumeMP/os::is_MP and
>> always build in MP support. Otherwise a count of 1 today won't behave
>> the same as a count of 1 in the future.
> What if I return true for is_MP anytime ActiveProcessorCount is set.
>   I’d like to provide the ability of specifying a single processor.

If I make the AssumeMP change for 18.3 as planned then this won't be an
issue. I'd better get onto that :)

>>
>> Also you have defined this globally but only accounted for it on
>> Linux. I think it makes sense to support this flag on all platforms (a
>> generalization of AssumeMP). Otherwise it needs to be defined as a
>> Linux-only flag in the pd_globals.hpp file
> Good idea.

You could even factor this out as a separate issue/task independent of
the container work.

>> Style issue:
>>
>> 2121     if (i < 0) st->print("OSContainer::active_processor_count()
>> failed");
>> 2122     else
>>
>> and elsewhere. Please move the st->print to its own line. Others may
>> argue for always using blocks ({}) in if/else.
>
> There doesn’t seem to be consistency on this issue.

No there's no consistency :( And this isn't in the hotspot style guide
AFAICS. But I'm sure it's in some other coding guidelines ;-)

>> 5024   // User has overridden the number of active processors
>> 5025   if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) {
>> 5026     log_trace(os)("active_processor_count: "
>> 5027                   "active processor count set by user : %d",
>> 5028                   (int)ActiveProcessorCount);
>> 5029     return ActiveProcessorCount;
>> 5030   }
>>
>> We don't normally check flags in runtime code like this - this will be
>> executed on every call, and you will see that logging each time. This
>> should be handled during initialization (os::Posix::init()? - if
>> applying this flag globally) - with logging occurring once. The above
>> should just reduce to:
>>
>> if (ActiveProcessorCount > 0) {
>>  return ActiveProcessorCount; // explicit user control of number of cpus
>> }
>>
>> Even then I do get concerned about having to always check for the
>> least common cases before the most common one. :(
>
> This is not in a highly used function so it should be ok.

I really don't like seeing the FLAG_IS_DEFAULT in there - and you need
to move the logging anyway.

>>
>> The osContainer_<os>.hpp files seem to be unnecessary as they are all
>> empty.
>
> I’ll remove them.  I wasn’t sure if there was a convention to move more
> of osContainer_linux.cpp -> osContainer_linux.hpp.
>
> For example: classCgroupSubsystem

The header is only needed to expose an API for other code to use.
Locally defined classes can be kept in the .cpp file.

>> 34 class CgroupSubsystem: CHeapObj<mtInternal> {
>>
>> You defined this class as CHeapObj and added a destructor to free a
>> few things, but I can't see where the instances of this class will
>> themselves ever be freed
>
> What’s the latest thinking on freeing CHeap Objects on termination?  Is
> it really worth wasting cpu cycles when our
> process is about to terminate?  If not, I’ll just remove the destructors.

Philosophically I prefer new APIs to play nice with the invocation API,
even if existing API's don't play nice. But that's just me.

>>
>> 62     void set_subsystem_path(char *cgroup_path) {
>>
>> If this takes a "const char*" will it save you from casting string
>> literals to "char*" elsewhere?
>
> I tried several different ways of declaring the container accessor
> functions and
> always ended up with warnings due to scanf not being able to validate
> arguments
> since the format string didn’t end up being a string literal.  I
> originally was using templates
> and then ended up with the macros.  I tried several different casts but
> could resolve the problem.

Sounds like something Kim Barrett should take a look at :)

Thanks,
David
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

Bob Vandette

> On Sep 27, 2017, at 9:20 PM, David Holmes <[hidden email]> wrote:
>
> Hi Bob,
>
> On 28/09/2017 1:45 AM, Bob Vandette wrote:
>> David,  Thank you for taking the time and providing a detailed review of these changes.
>> Where I haven’t responded, I’ll update the implementation based on your comments.
>
> Okay. I've trimmed below to only leave things I have follow up on.
>
>>> If this is all confined to Linux only then this should be a linux-only flag and all the changes should be confined to linux code. No shared osContainer API is needed as it can be defined as a nested class of os::Linux, and will only be called from os_linux.cpp.
>> I received feedback on my other Container work where I was asked to
>> make sure it was possible to support other container technologies.
>> The addition of the shared osContainer API is to prepare for this and
>> recognize that this will eventually be supported other platforms.
>
> The problem is that the proposed osContainer API is totally cgroup centric. That API might not make sense for a different container technology. Even if Docker is used on different platforms, does it use cgroups on those other platforms? Until we have at least two examples we want to support we don't know how to formulate a generic API. So in my opinion we should initially keep this Linux specific as a proof-of-concept for future more general container support.

I was trying to prepare for the JEP implementation where M&M and JFR hooks will need a shared API to call.
I was expecting to return a not supported error code on platforms that didn’t have the os specific implementations.
I did take a look at a few other types of containers (VMWare’s SDK for example) and they all had similar types of functions
for retrieving the number of cpus and quotas along with the memory limits, swap and free space.  I assumed that we
could clean up the shared APIs once we did the second container support.

In any case that work can be done by the JEP integration so I’m ok with making this os/linux specific but I still would
like to keep this support in it’s own file (osContainer_linux.cpp and osContainer_linux.hpp) so all the cgroup processing
is kept separate and these files don’t have to move later.  This would make it easier to support  alternate types of containers.
I also wanted to avoid adding lots more size to os_linux.cpp.  It’s already too big.

Bob.

>
>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>>
>>> I would suggest that ActiveProcessorCount be constrained to being >1 - this is in line with our plans to get rid of AssumeMP/os::is_MP and always build in MP support. Otherwise a count of 1 today won't behave the same as a count of 1 in the future.
>> What if I return true for is_MP anytime ActiveProcessorCount is set.   I’d like to provide the ability of specifying a single processor.
>
> If I make the AssumeMP change for 18.3 as planned then this won't be an issue. I'd better get onto that :)
>
>>>
>>> Also you have defined this globally but only accounted for it on Linux. I think it makes sense to support this flag on all platforms (a generalization of AssumeMP). Otherwise it needs to be defined as a Linux-only flag in the pd_globals.hpp file
>> Good idea.
>
> You could even factor this out as a separate issue/task independent of the container work.
>
>>> Style issue:
>>>
>>> 2121     if (i < 0) st->print("OSContainer::active_processor_count() failed");
>>> 2122     else
>>>
>>> and elsewhere. Please move the st->print to its own line. Others may argue for always using blocks ({}) in if/else.
>> There doesn’t seem to be consistency on this issue.
>
> No there's no consistency :( And this isn't in the hotspot style guide AFAICS. But I'm sure it's in some other coding guidelines ;-)
>
>>> 5024   // User has overridden the number of active processors
>>> 5025   if (!FLAG_IS_DEFAULT(ActiveProcessorCount)) {
>>> 5026     log_trace(os)("active_processor_count: "
>>> 5027                   "active processor count set by user : %d",
>>> 5028                   (int)ActiveProcessorCount);
>>> 5029     return ActiveProcessorCount;
>>> 5030   }
>>>
>>> We don't normally check flags in runtime code like this - this will be executed on every call, and you will see that logging each time. This should be handled during initialization (os::Posix::init()? - if applying this flag globally) - with logging occurring once. The above should just reduce to:
>>>
>>> if (ActiveProcessorCount > 0) {
>>>  return ActiveProcessorCount; // explicit user control of number of cpus
>>> }
>>>
>>> Even then I do get concerned about having to always check for the least common cases before the most common one. :(
>> This is not in a highly used function so it should be ok.
>
> I really don't like seeing the FLAG_IS_DEFAULT in there - and you need to move the logging anyway.
>
>>>
>>> The osContainer_<os>.hpp files seem to be unnecessary as they are all empty.
>> I’ll remove them.  I wasn’t sure if there was a convention to move more of osContainer_linux.cpp -> osContainer_linux.hpp.
>> For example: classCgroupSubsystem
>
> The header is only needed to expose an API for other code to use. Locally defined classes can be kept in the .cpp file.
>
>>> 34 class CgroupSubsystem: CHeapObj<mtInternal> {
>>>
>>> You defined this class as CHeapObj and added a destructor to free a few things, but I can't see where the instances of this class will themselves ever be freed
>> What’s the latest thinking on freeing CHeap Objects on termination?  Is it really worth wasting cpu cycles when our
>> process is about to terminate?  If not, I’ll just remove the destructors.
>
> Philosophically I prefer new APIs to play nice with the invocation API, even if existing API's don't play nice. But that's just me.
>
>>>
>>> 62     void set_subsystem_path(char *cgroup_path) {
>>>
>>> If this takes a "const char*" will it save you from casting string literals to "char*" elsewhere?
>> I tried several different ways of declaring the container accessor functions and
>> always ended up with warnings due to scanf not being able to validate arguments
>> since the format string didn’t end up being a string literal.  I originally was using templates
>> and then ended up with the macros.  I tried several different casts but could resolve the problem.
>
> Sounds like something Kim Barrett should take a look at :)
>
> Thanks,
> David

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

Robbin Ehn
In reply to this post by Bob Vandette
Hi Bob,

As I said in your presentation for RT.
If kernel if configured with cgroup this should always be read (otherwise we get wrong values).
E.g. fedora have had cgroups default on several years (I believe most distros have it on).

- No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off?
- log target container would make little sense since almost all linuxes run with croups on.
- For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that
   If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's.

Thanks for trying to fixing this!

/Robbin

On 09/22/2017 04:27 PM, Bob Vandette wrote:

> Please review these changes that improve on docker container detection and the
> automatic configuration of the number of active CPUs and total and free memory
> based on the containers resource limitation settings and metric data files.
>
> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>
> These changes are enabled with -XX:+UseContainerSupport.
>
> You can enable logging for this support via -Xlog:os+container=trace.
>
> Since the dynamic selection of CPUs based on cpusets, quotas and shares
> may not satisfy every users needs, I’ve added an additional flag to allow the
> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>
>
> Bob.
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

David Holmes
Hi Robbin,

I have some views on this :)

On 3/10/2017 6:20 AM, Robbin Ehn wrote:

> Hi Bob,
>
> As I said in your presentation for RT.
> If kernel if configured with cgroup this should always be read
> (otherwise we get wrong values).
> E.g. fedora have had cgroups default on several years (I believe most
> distros have it on).
>
> - No option is needed at all: right now we have wrong values your fix
> will provide right ones, why would you ever what to turn that off?

It's not that you would want to turn that off (necessarily) but just
because cgroups capability exists it doesn't mean they have actually
been enabled and configured - in which case reading all the cgroup info
is unnecessary startup overhead. So for now this is opt-in - as was the
experimental cgroup support we added. Once it becomes clearer how this
needs to be used we can adjust the defaults. For now this is enabling
technology only.

> - log target container would make little sense since almost all linuxes
> run with croups on.

Again the capability is present but may not be enabled/configured.

> - For cpuset, the processes affinity mask already reflect cgroup setting
> so you don't need to look into cgroup for that
>    If you do, you would miss any processes specific affinity mask. So
> _cpu_count() should already be returning the right number of CPU's.

While the process affinity mask reflect cpusets (and we already use it
for that reason), it doesn't reflect shares and quotas. And if
shares/quotas are enforced and someone sets a custom affinity mask, what
is it all supposed to mean? That's one of the main reasons to allow the
number of cpu's to be hardwired via a flag. So it's better IMHO to read
everything from the cgroups if configured to use cgroups.

Cheers,
David

>
> Thanks for trying to fixing this!
>
> /Robbin
>
> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>> Please review these changes that improve on docker container detection
>> and the
>> automatic configuration of the number of active CPUs and total and
>> free memory
>> based on the containers resource limitation settings and metric data
>> files.
>>
>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ 
>> <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>
>> These changes are enabled with -XX:+UseContainerSupport.
>>
>> You can enable logging for this support via -Xlog:os+container=trace.
>>
>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>> may not satisfy every users needs, I’ve added an additional flag to
>> allow the
>> number of CPUs to be overridden.  This flag is named
>> -XX:ActiveProcessorCount=xx.
>>
>>
>> Bob.
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

Robbin Ehn
Hi David,

On 10/03/2017 12:46 AM, David Holmes wrote:

> Hi Robbin,
>
> I have some views on this :)
>
> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>> Hi Bob,
>>
>> As I said in your presentation for RT.
>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values).
>> E.g. fedora have had cgroups default on several years (I believe most distros have it on).
>>
>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off?
>
> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in
> which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer
> how this needs to be used we can adjust the defaults. For now this is enabling technology only.

If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem.
Therefore the flag make no sense.

>
>> - log target container would make little sense since almost all linuxes run with croups on.
>
> Again the capability is present but may not be enabled/configured.

The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem.

>
>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that
>>    If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's.
>
> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone
> sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to
> read everything from the cgroups if configured to use cgroups.

I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count.

Here is the bug:

[rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc
[0.002s][debug][os] Initial active processor count set to 4
^C
[rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
[0.003s][debug][os] Initial active processor count set to 32
^C

_cpu_count already does the right thing.

Thanks, Robbin


>
> Cheers,
> David
>
>>
>> Thanks for trying to fixing this!
>>
>> /Robbin
>>
>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>> Please review these changes that improve on docker container detection and the
>>> automatic configuration of the number of active CPUs and total and free memory
>>> based on the containers resource limitation settings and metric data files.
>>>
>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>
>>> These changes are enabled with -XX:+UseContainerSupport.
>>>
>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>
>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>>
>>>
>>> Bob.
>>>
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

David Holmes
On 3/10/2017 6:00 PM, Robbin Ehn wrote:

> Hi David,
>
> On 10/03/2017 12:46 AM, David Holmes wrote:
>> Hi Robbin,
>>
>> I have some views on this :)
>>
>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>> Hi Bob,
>>>
>>> As I said in your presentation for RT.
>>> If kernel if configured with cgroup this should always be read
>>> (otherwise we get wrong values).
>>> E.g. fedora have had cgroups default on several years (I believe most
>>> distros have it on).
>>>
>>> - No option is needed at all: right now we have wrong values your fix
>>> will provide right ones, why would you ever what to turn that off?
>>
>> It's not that you would want to turn that off (necessarily) but just
>> because cgroups capability exists it doesn't mean they have actually
>> been enabled and configured - in which case reading all the cgroup
>> info is unnecessary startup overhead. So for now this is opt-in - as
>> was the experimental cgroup support we added. Once it becomes clearer
>> how this needs to be used we can adjust the defaults. For now this is
>> enabling technology only.
>
> If cgroup are mounted they are on and the only way to know the
> configuration (such as no limits) is to actual read the cgroup filesystem.
> Therefore the flag make no sense.

No that is exactly why it is opt-in! Why should we have to waste startup
time reading a bunch of cgroup values just to determine that cgroups are
not actually being used!

>>
>>> - log target container would make little sense since almost all
>>> linuxes run with croups on.
>>
>> Again the capability is present but may not be enabled/configured.
>
> The capability is on if cgroup are mount and the only way to know the
> configuration is to read the cgroup filesystem.
>
>>
>>> - For cpuset, the processes affinity mask already reflect cgroup
>>> setting so you don't need to look into cgroup for that
>>>    If you do, you would miss any processes specific affinity mask. So
>>> _cpu_count() should already be returning the right number of CPU's.
>>
>> While the process affinity mask reflect cpusets (and we already use it
>> for that reason), it doesn't reflect shares and quotas. And if
>> shares/quotas are enforced and someone sets a custom affinity mask,
>> what is it all supposed to mean? That's one of the main reasons to
>> allow the number of cpu's to be hardwired via a flag. So it's better
>> IMHO to read everything from the cgroups if configured to use cgroups.
>
> I'm not taking about shares and quotes, they should be read of course,
> but cpuset should be checked such as in _cpu_count.
>
> Here is the bug:
>
> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp .
> ForEver | grep proc
> [0.002s][debug][os] Initial active processor count set to 4
> ^C
> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java
> -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
> [0.003s][debug][os] Initial active processor count set to 32
> ^C
>
> _cpu_count already does the right thing.

But how do you then combine that information with the use of shares
and/or quotas?

David
-----

> Thanks, Robbin
>
>
>>
>> Cheers,
>> David
>>
>>>
>>> Thanks for trying to fixing this!
>>>
>>> /Robbin
>>>
>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>> Please review these changes that improve on docker container
>>>> detection and the
>>>> automatic configuration of the number of active CPUs and total and
>>>> free memory
>>>> based on the containers resource limitation settings and metric data
>>>> files.
>>>>
>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ 
>>>> <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>
>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>
>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>
>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>> may not satisfy every users needs, I’ve added an additional flag to
>>>> allow the
>>>> number of CPUs to be overridden.  This flag is named
>>>> -XX:ActiveProcessorCount=xx.
>>>>
>>>>
>>>> Bob.
>>>>
>>>>
>>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

Robbin Ehn
Hi David, I think we are seen the issue from complete opposite. (this RFE could be pushed as a bug from my POV)

On 10/03/2017 10:42 AM, David Holmes wrote:

> On 3/10/2017 6:00 PM, Robbin Ehn wrote:
>> Hi David,
>>
>> On 10/03/2017 12:46 AM, David Holmes wrote:
>>> Hi Robbin,
>>>
>>> I have some views on this :)
>>>
>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>>> Hi Bob,
>>>>
>>>> As I said in your presentation for RT.
>>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values).
>>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on).
>>>>
>>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off?
>>>
>>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured -
>>> in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes
>>> clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only.
>>
>> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem.
>> Therefore the flag make no sense.
>
> No that is exactly why it is opt-in! Why should we have to waste startup time reading a bunch of cgroup values just to determine that cgroups are not actually being used!

If you have a cgroup enabled kernel they _are_ being used, no escaping that.
cgroup is not a simple yes and no so for which resources depend on how you configured your kernel.
To find out for what resource and what limits are set is we need to read them.

I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get values correct, so our heuristic works fine out-of-the-box. (and if you must, it opt-out)

Also I notice that we don't read the numa values so the phys mem method does a poor job. Correct would be check at least cgroup and numa bindings.
We also have this option UseCGroupMemoryLimitForHeap which should be removed.

>
>>>
>>>> - log target container would make little sense since almost all linuxes run with croups on.
>>>
>>> Again the capability is present but may not be enabled/configured.
>>
>> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem.
>>
>>>
>>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that
>>>>    If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's.
>>>
>>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and
>>> someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's
>>> better IMHO to read everything from the cgroups if configured to use cgroups.
>>
>> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count.
>>
>> Here is the bug:
>>
>> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc
>> [0.002s][debug][os] Initial active processor count set to 4
>> ^C
>> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
>> [0.003s][debug][os] Initial active processor count set to 32
>> ^C
>>
>> _cpu_count already does the right thing.
>
> But how do you then combine that information with the use of shares and/or quotas?

That I don't know, wild naive guess would be:
active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :)

I assume everything we need to know is in: https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt

Thanks, Robbin

>
> David
> -----
>
>> Thanks, Robbin
>>
>>
>>>
>>> Cheers,
>>> David
>>>
>>>>
>>>> Thanks for trying to fixing this!
>>>>
>>>> /Robbin
>>>>
>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>>> Please review these changes that improve on docker container detection and the
>>>>> automatic configuration of the number of active CPUs and total and free memory
>>>>> based on the containers resource limitation settings and metric data files.
>>>>>
>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>>
>>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>>
>>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>>
>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>>>>
>>>>>
>>>>> Bob.
>>>>>
>>>>>
>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

David Holmes
Hi Robbin,

On 3/10/2017 8:45 PM, Robbin Ehn wrote:
> Hi David, I think we are seen the issue from complete opposite. (this
> RFE could be pushed as a bug from my POV)

Yes we see this completely opposite. I see this is a poorly integrated
add-on API that we have to try to account for instead of being able to
read an "always correct" value from a standard OS API. They at least got
the cpuset support correct by having sched_getaffinity correctly account
for it. Alas the rest is ad-hoc.

>
> On 10/03/2017 10:42 AM, David Holmes wrote:
>> On 3/10/2017 6:00 PM, Robbin Ehn wrote:
>>> Hi David,
>>>
>>> On 10/03/2017 12:46 AM, David Holmes wrote:
>>>> Hi Robbin,
>>>>
>>>> I have some views on this :)
>>>>
>>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>>>> Hi Bob,
>>>>>
>>>>> As I said in your presentation for RT.
>>>>> If kernel if configured with cgroup this should always be read
>>>>> (otherwise we get wrong values).
>>>>> E.g. fedora have had cgroups default on several years (I believe
>>>>> most distros have it on).
>>>>>
>>>>> - No option is needed at all: right now we have wrong values your
>>>>> fix will provide right ones, why would you ever what to turn that off?
>>>>
>>>> It's not that you would want to turn that off (necessarily) but just
>>>> because cgroups capability exists it doesn't mean they have actually
>>>> been enabled and configured - in which case reading all the cgroup
>>>> info is unnecessary startup overhead. So for now this is opt-in - as
>>>> was the experimental cgroup support we added. Once it becomes
>>>> clearer how this needs to be used we can adjust the defaults. For
>>>> now this is enabling technology only.
>>>
>>> If cgroup are mounted they are on and the only way to know the
>>> configuration (such as no limits) is to actual read the cgroup
>>> filesystem.
>>> Therefore the flag make no sense.
>>
>> No that is exactly why it is opt-in! Why should we have to waste
>> startup time reading a bunch of cgroup values just to determine that
>> cgroups are not actually being used!
>
> If you have a cgroup enabled kernel they _are_ being used, no escaping
> that.

A cgroup set to unlimited is not being used from a practical perspective.

> cgroup is not a simple yes and no so for which resources depend on how
> you configured your kernel.
> To find out for what resource and what limits are set is we need to read
> them.
>
> I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get
> values correct, so our heuristic works fine out-of-the-box. (and if you
> must, it opt-out)

I'd rather people say "Hey I'm using this add-on resource management API
so don't ask the OS but please query the add-on.". Yes that is a little
harsh but the lack of integration at the OS level is a huge impediment
in my opinion.

> Also I notice that we don't read the numa values so the phys mem method
> does a poor job. Correct would be check at least cgroup and numa bindings.

NUMA is another minefield.

> We also have this option UseCGroupMemoryLimitForHeap which should be
> removed.

Bob already addressed why he was not getting rid of that initially.

>>
>>>>
>>>>> - log target container would make little sense since almost all
>>>>> linuxes run with croups on.
>>>>
>>>> Again the capability is present but may not be enabled/configured.
>>>
>>> The capability is on if cgroup are mount and the only way to know the
>>> configuration is to read the cgroup filesystem.
>>>
>>>>
>>>>> - For cpuset, the processes affinity mask already reflect cgroup
>>>>> setting so you don't need to look into cgroup for that
>>>>>    If you do, you would miss any processes specific affinity mask.
>>>>> So _cpu_count() should already be returning the right number of CPU's.
>>>>
>>>> While the process affinity mask reflect cpusets (and we already use
>>>> it for that reason), it doesn't reflect shares and quotas. And if
>>>> shares/quotas are enforced and someone sets a custom affinity mask,
>>>> what is it all supposed to mean? That's one of the main reasons to
>>>> allow the number of cpu's to be hardwired via a flag. So it's better
>>>> IMHO to read everything from the cgroups if configured to use cgroups.
>>>
>>> I'm not taking about shares and quotes, they should be read of
>>> course, but cpuset should be checked such as in _cpu_count.
>>>
>>> Here is the bug:
>>>
>>> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp
>>> . ForEver | grep proc
>>> [0.002s][debug][os] Initial active processor count set to 4
>>> ^C
>>> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java
>>> -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
>>> [0.003s][debug][os] Initial active processor count set to 32
>>> ^C
>>>
>>> _cpu_count already does the right thing.
>>
>> But how do you then combine that information with the use of shares
>> and/or quotas?
>
> That I don't know, wild naive guess would be:
> active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :)

That would be one option but it may not be meaningful. That said I don't
think the use of quota or shares to define the number of available CPUs
makes sense anyway.

Personally I don't think mixing direct use of cpusets with cgroup
defined limits makes much sense.

> I assume everything we need to know is in:
> https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt

Nope that only addresses cpusets. The one part of this that at least
makes sense in isolation.

Cheers,
David

> Thanks, Robbin
>
>>
>> David
>> -----
>>
>>> Thanks, Robbin
>>>
>>>
>>>>
>>>> Cheers,
>>>> David
>>>>
>>>>>
>>>>> Thanks for trying to fixing this!
>>>>>
>>>>> /Robbin
>>>>>
>>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>>>> Please review these changes that improve on docker container
>>>>>> detection and the
>>>>>> automatic configuration of the number of active CPUs and total and
>>>>>> free memory
>>>>>> based on the containers resource limitation settings and metric
>>>>>> data files.
>>>>>>
>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ 
>>>>>> <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>>>
>>>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>>>
>>>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>>>
>>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and
>>>>>> shares
>>>>>> may not satisfy every users needs, I’ve added an additional flag
>>>>>> to allow the
>>>>>> number of CPUs to be overridden.  This flag is named
>>>>>> -XX:ActiveProcessorCount=xx.
>>>>>>
>>>>>>
>>>>>> Bob.
>>>>>>
>>>>>>
>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

Robbin Ehn
Hi,

I'll leave that discussion for a while, another thing is:

In os::Linux::available_memory(), OSContainer::memory_limit_in_bytes() the limit can be larger than actual ram.
So we also need to check sysinfo e.g. return MIN(avail_mem, si.freeram * si.mem_unit).

So I think the check against "if (XXX == 9223372036854771712)" is not needed at all for any of those methods.
Just return what cgroup says if that is larger then the actual value pick the lower one.

/Robbin

On 10/03/2017 01:00 PM, David Holmes wrote:

> Hi Robbin,
>
> On 3/10/2017 8:45 PM, Robbin Ehn wrote:
>> Hi David, I think we are seen the issue from complete opposite. (this RFE could be pushed as a bug from my POV)
>
> Yes we see this completely opposite. I see this is a poorly integrated add-on API that we have to try to account for instead of being able to read an "always correct" value
> from a standard OS API. They at least got the cpuset support correct by having sched_getaffinity correctly account for it. Alas the rest is ad-hoc.
>
>>
>> On 10/03/2017 10:42 AM, David Holmes wrote:
>>> On 3/10/2017 6:00 PM, Robbin Ehn wrote:
>>>> Hi David,
>>>>
>>>> On 10/03/2017 12:46 AM, David Holmes wrote:
>>>>> Hi Robbin,
>>>>>
>>>>> I have some views on this :)
>>>>>
>>>>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>>>>> Hi Bob,
>>>>>>
>>>>>> As I said in your presentation for RT.
>>>>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values).
>>>>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on).
>>>>>>
>>>>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off?
>>>>>
>>>>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured -
>>>>> in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes
>>>>> clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only.
>>>>
>>>> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem.
>>>> Therefore the flag make no sense.
>>>
>>> No that is exactly why it is opt-in! Why should we have to waste startup time reading a bunch of cgroup values just to determine that cgroups are not actually being used!
>>
>> If you have a cgroup enabled kernel they _are_ being used, no escaping that.
>
> A cgroup set to unlimited is not being used from a practical perspective.
>
>> cgroup is not a simple yes and no so for which resources depend on how you configured your kernel.
>> To find out for what resource and what limits are set is we need to read them.
>>
>> I rather waste startup time (0.103292989 vs 0.103577139 seconds) and get values correct, so our heuristic works fine out-of-the-box. (and if you must, it opt-out)
>
> I'd rather people say "Hey I'm using this add-on resource management API so don't ask the OS but please query the add-on.". Yes that is a little harsh but the lack of
> integration at the OS level is a huge impediment in my opinion.
>
>> Also I notice that we don't read the numa values so the phys mem method does a poor job. Correct would be check at least cgroup and numa bindings.
>
> NUMA is another minefield.
>
>> We also have this option UseCGroupMemoryLimitForHeap which should be removed.
>
> Bob already addressed why he was not getting rid of that initially.
>
>>>
>>>>>
>>>>>> - log target container would make little sense since almost all linuxes run with croups on.
>>>>>
>>>>> Again the capability is present but may not be enabled/configured.
>>>>
>>>> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem.
>>>>
>>>>>
>>>>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that
>>>>>>    If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's.
>>>>>
>>>>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and
>>>>> someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's
>>>>> better IMHO to read everything from the cgroups if configured to use cgroups.
>>>>
>>>> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count.
>>>>
>>>> Here is the bug:
>>>>
>>>> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc
>>>> [0.002s][debug][os] Initial active processor count set to 4
>>>> ^C
>>>> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
>>>> [0.003s][debug][os] Initial active processor count set to 32
>>>> ^C
>>>>
>>>> _cpu_count already does the right thing.
>>>
>>> But how do you then combine that information with the use of shares and/or quotas?
>>
>> That I don't know, wild naive guess would be:
>> active count ~ MIN(OSContainer::pd_active_processor_count(), cpuset); :)
>
> That would be one option but it may not be meaningful. That said I don't think the use of quota or shares to define the number of available CPUs makes sense anyway.
>
> Personally I don't think mixing direct use of cpusets with cgroup defined limits makes much sense.
>
>> I assume everything we need to know is in: https://www.kernel.org/doc/Documentation/cgroup-v1/cpusets.txt
>
> Nope that only addresses cpusets. The one part of this that at least makes sense in isolation.
>
> Cheers,
> David
>
>> Thanks, Robbin
>>
>>>
>>> David
>>> -----
>>>
>>>> Thanks, Robbin
>>>>
>>>>
>>>>>
>>>>> Cheers,
>>>>> David
>>>>>
>>>>>>
>>>>>> Thanks for trying to fixing this!
>>>>>>
>>>>>> /Robbin
>>>>>>
>>>>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>>>>> Please review these changes that improve on docker container detection and the
>>>>>>> automatic configuration of the number of active CPUs and total and free memory
>>>>>>> based on the containers resource limitation settings and metric data files.
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>>>>
>>>>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>>>>
>>>>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>>>>
>>>>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>>>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>>>>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>>>>>>
>>>>>>>
>>>>>>> Bob.
>>>>>>>
>>>>>>>
>>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8146115 - Improve docker container detection and resource configuration usage

Bob Vandette
In reply to this post by Robbin Ehn
After talking to a number of folks and getting feedback, my current thinking is to enable the support by default.

I still want to include the flag for at least one Java release in the event that the new behavior causes some regression
in behavior.  I’m trying to make the detection robust so that it will fallback to the current behavior in the event
that cgroups is not configured as expected but I’d like to have a way of forcing the issue.  JDK 10 is not
supposed to be a long term support release which makes it a good target for this new behavior.

I agree with David that once we commit to cgroups, we should extract all VM configuration data from that
source.  There’s more information available for cpusets than just processor affinity that we might want to
consider when calculating the number of processors to assume for the VM.  There’s exclusivity and
effective cpu data available in addition to the cpuset string.

Bob.


> On Oct 3, 2017, at 4:00 AM, Robbin Ehn <[hidden email]> wrote:
>
> Hi David,
>
> On 10/03/2017 12:46 AM, David Holmes wrote:
>> Hi Robbin,
>> I have some views on this :)
>> On 3/10/2017 6:20 AM, Robbin Ehn wrote:
>>> Hi Bob,
>>>
>>> As I said in your presentation for RT.
>>> If kernel if configured with cgroup this should always be read (otherwise we get wrong values).
>>> E.g. fedora have had cgroups default on several years (I believe most distros have it on).
>>>
>>> - No option is needed at all: right now we have wrong values your fix will provide right ones, why would you ever what to turn that off?
>> It's not that you would want to turn that off (necessarily) but just because cgroups capability exists it doesn't mean they have actually been enabled and configured - in which case reading all the cgroup info is unnecessary startup overhead. So for now this is opt-in - as was the experimental cgroup support we added. Once it becomes clearer how this needs to be used we can adjust the defaults. For now this is enabling technology only.
>
> If cgroup are mounted they are on and the only way to know the configuration (such as no limits) is to actual read the cgroup filesystem.
> Therefore the flag make no sense.
>
>>> - log target container would make little sense since almost all linuxes run with croups on.
>> Again the capability is present but may not be enabled/configured.
>
> The capability is on if cgroup are mount and the only way to know the configuration is to read the cgroup filesystem.
>
>>> - For cpuset, the processes affinity mask already reflect cgroup setting so you don't need to look into cgroup for that
>>>    If you do, you would miss any processes specific affinity mask. So _cpu_count() should already be returning the right number of CPU's.
>> While the process affinity mask reflect cpusets (and we already use it for that reason), it doesn't reflect shares and quotas. And if shares/quotas are enforced and someone sets a custom affinity mask, what is it all supposed to mean? That's one of the main reasons to allow the number of cpu's to be hardwired via a flag. So it's better IMHO to read everything from the cgroups if configured to use cgroups.
>
> I'm not taking about shares and quotes, they should be read of course, but cpuset should be checked such as in _cpu_count.
>
> Here is the bug:
>
> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -Xlog:os=debug -cp . ForEver | grep proc
> [0.002s][debug][os] Initial active processor count set to 4
> ^C
> [rehn@rehn-ws dev]$ taskset --cpu-list 0-2,6 java -XX:+UseContainerSupport -Xlog:os=debug -cp . ForEver | grep proc
> [0.003s][debug][os] Initial active processor count set to 32
> ^C
>
> _cpu_count already does the right thing.
>
> Thanks, Robbin
>
>
>> Cheers,
>> David
>>>
>>> Thanks for trying to fixing this!
>>>
>>> /Robbin
>>>
>>> On 09/22/2017 04:27 PM, Bob Vandette wrote:
>>>> Please review these changes that improve on docker container detection and the
>>>> automatic configuration of the number of active CPUs and total and free memory
>>>> based on the containers resource limitation settings and metric data files.
>>>>
>>>> http://cr.openjdk.java.net/~bobv/8146115/webrev.00/ <http://cr.openjdk.java.net/~bobv/8146115/webrev.00/>
>>>>
>>>> These changes are enabled with -XX:+UseContainerSupport.
>>>>
>>>> You can enable logging for this support via -Xlog:os+container=trace.
>>>>
>>>> Since the dynamic selection of CPUs based on cpusets, quotas and shares
>>>> may not satisfy every users needs, I’ve added an additional flag to allow the
>>>> number of CPUs to be overridden.  This flag is named -XX:ActiveProcessorCount=xx.
>>>>
>>>>
>>>> Bob.
>>>>
>>>>
>>>>

123