RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Stefan Johansson
When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.

The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.

A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.

The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.

The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.

This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

-------------

Commit messages:
 - 8261401-check-effective
 - 8261401-self-review
 - 8261401-test-fixes
 - Only UseSHM if sanity check pass

Changes: https://git.openjdk.java.net/jdk/pull/2488/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261401
  Stats: 92 lines in 4 files changed: 86 ins; 0 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Thomas Stuefe
On Tue, 9 Feb 2021 20:50:25 GMT, Stefan Johansson <[hidden email]> wrote:

> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>
> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>
> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>
> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>
> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>
> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

Hi Stefan,

Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.

About your patch: The shmget test is good, I am not sure about the rest.

You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:

       EPERM  The SHM_HUGETLB flag was specified, but the caller was not
              privileged (did not have the CAP_IPC_LOCK capability).

So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().

About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.

I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.

Cheers, Thomas

src/hotspot/os/linux/os_linux.cpp line 3569:

> 3567: // The capability needed to lock memory CAP_IPC_LOCK
> 3568: #define CAP_IPC_LOCK      14
> 3569: #define CAP_IPC_LOCK_BIT  (1 << CAP_IPC_LOCK)

Can we not just include linux/capabilities.h ?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Stefan Johansson
On Wed, 10 Feb 2021 13:56:11 GMT, Thomas Stuefe <[hidden email]> wrote:

>> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>>
>> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>>
>> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>>
>> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>>
>> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>>
>> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.
>
> src/hotspot/os/linux/os_linux.cpp line 3569:
>
>> 3567: // The capability needed to lock memory CAP_IPC_LOCK
>> 3568: #define CAP_IPC_LOCK      14
>> 3569: #define CAP_IPC_LOCK_BIT  (1 << CAP_IPC_LOCK)
>
> Can we not just include linux/capabilities.h ?

We can, I'll change it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages

Stefan Johansson
In reply to this post by Thomas Stuefe
On Wed, 10 Feb 2021 14:41:23 GMT, Thomas Stuefe <[hidden email]> wrote:

> Hi Stefan,
>
> Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.
>

Nice to hear that you agree Thomas :)

> About your patch: The shmget test is good, I am not sure about the rest.
>
> You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:
>
> ```
>        EPERM  The SHM_HUGETLB flag was specified, but the caller was not
>               privileged (did not have the CAP_IPC_LOCK capability).
> ```
>
> So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().
>
This was my initial though as well, but it turns out that the capability is only needed to "lock" more memory than the MEMLOCK-limit defines. So the first `shmget()` will succeed as long as there is at least one large page available (and the MEMLOCK-limit is not smaller than the large page size).

I could of course design the other check to once again call `shmget()`, but with a size larger than the limit. If that check fails with `EPERM` we don't have the needed capability.

> About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.
>
In old kernels this was the case but according to `man setrlimit` this is no longer the case:
RLIMIT_MEMLOCK
  The maximum number of bytes of memory that may be locked into RAM ...
  In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be
  locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of
  memory that a privileged process may lock, and this limit instead governs the amount
  of memory that an unprivileged process may lock.

So if we have the capability the limit will never be a limiting factor.

> I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.
>

I agree that this warning is not covering all bases, but I though it was a good addition compared to just be silent  about it. But I like your idea about just doing the simplest sanity check here and then when a mapping goes bade try to figure out why it did.

In that case I would only include the first `shmget()` in the sanity check and file a separate issue for improving the warning.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2]

Stefan Johansson
In reply to this post by Stefan Johansson
> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>
> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>
> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>
> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>
> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>
> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:

  Thomas review
 
  Removed check for IPC_LOCK capability. If a large page mapping fails
  the errno is already present in the warning printed out. We could
  look at improving this to better explain when EPERM vs ENOMEM occurs.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2488/files
  - new: https://git.openjdk.java.net/jdk/pull/2488/files/376b2235..bf6cbce1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=00-01

  Stats: 38 lines in 1 file changed: 0 ins; 38 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2]

Thomas Stuefe
On Wed, 10 Feb 2021 18:28:51 GMT, Stefan Johansson <[hidden email]> wrote:

>> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>>
>> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>>
>> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>>
>> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>>
>> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>>
>> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.
>
> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>
>   Thomas review
>  
>   Removed check for IPC_LOCK capability. If a large page mapping fails
>   the errno is already present in the warning printed out. We could
>   look at improving this to better explain when EPERM vs ENOMEM occurs.

Looks almost good. Minor nit below.

src/hotspot/os/linux/os_linux.cpp line 3781:

> 3779:   }
> 3780:
> 3781:   log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system.");

IIUC here we end up if UseLargePages=true (by default or not) and we were unable to get any of our APIs to work? Should this be an unconditional printout? At least make it only unconditional if UseLargePages==true is not default? I know its only a theoretical problem now since default is false.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2]

Thomas Stuefe
On Wed, 10 Feb 2021 18:44:26 GMT, Thomas Stuefe <[hidden email]> wrote:

>> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Thomas review
>>  
>>   Removed check for IPC_LOCK capability. If a large page mapping fails
>>   the errno is already present in the warning printed out. We could
>>   look at improving this to better explain when EPERM vs ENOMEM occurs.
>
> Looks almost good. Minor nit below.

> > Hi Stefan,
> > Testing UseSHM makes definitely sense. Since SysV shm has more restrictions than the mmap API I am not sure that the former works where the latter does not.
>
> Nice to hear that you agree Thomas :)
>
> > About your patch: The shmget test is good, I am not sure about the rest.
> > You first attempt to `shmget(IPC_PRIVATE, page_size, SHM_HUGETLB)`. Then you separately scan for CAP_IPC_LOCK. But would shmget(SHM_HUGETLB) not fail with EPERM if CAP_IPC_LOCK is missing? man shmget says:
> > ```
> >        EPERM  The SHM_HUGETLB flag was specified, but the caller was not
> >               privileged (did not have the CAP_IPC_LOCK capability).
> > ```
> >
> >
> > So I think querying for EPERM after shmget should be sufficient? If CAP_IPC_LOCK was missing, the shmget should have failed and we should never get to the point of can_lock_memory().
>
> This was my initial though as well, but it turns out that the capability is only needed to "lock" more memory than the MEMLOCK-limit defines. So the first `shmget()` will succeed as long as there is at least one large page available (and the MEMLOCK-limit is not smaller than the large page size).
>
> I could of course design the other check to once again call `shmget()`, but with a size larger than the limit. If that check fails with `EPERM` we don't have the needed capability.

Okay, I see what you did now.

>
> > About the rlimit test: Assuming the man page is wrong and shmget succeeds even without the capability, you only print this limit if can_lock_memory() returned false. The way I understand this is that the limit can still be the limiting factor even with those capabilities in place.
>
> In old kernels this was the case but according to `man setrlimit` this is no longer the case:
>
> ```
> RLIMIT_MEMLOCK
>   The maximum number of bytes of memory that may be locked into RAM ...
>   In Linux kernels before 2.6.9, this limit controlled the amount of memory that could be
>   locked by a privileged process. Since Linux 2.6.9, no limits are placed on the amount of
>   memory that a privileged process may lock, and this limit instead governs the amount
>   of memory that an unprivileged process may lock.
> ```
>
> So if we have the capability the limit will never be a limiting factor.
>
> > I think it would make more sense to extend the error reporting if later real "shmget" calls fail. Note that later reserve calls can fail for a number of reasons (huge page pool exhausted, RLIMIT_MEMLOCK reached, SHMMAX or SHMALL reached, commit charge reached its limit...) which means that just reporting on RLIMIT_MEMLOCK would be arbitrary. Whether or not more intelligent error reporting makes sense depends also on whether we think we still need the SysV path at all. I personally doubt that this still makes sense.
>
> I agree that this warning is not covering all bases, but I though it was a good addition compared to just be silent about it. But I like your idea about just doing the simplest sanity check here and then when a mapping goes bade try to figure out why it did.
>
> In that case I would only include the first `shmget()` in the sanity check and file a separate issue for improving the warning.

This makes sense, especially since you could add that warning to both SysV and mmap paths since they share at least some of the restrictions. Like huge page pool size. It also would mean you don't pay for some of these tests unless needed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2]

Stefan Johansson
In reply to this post by Thomas Stuefe
On Wed, 10 Feb 2021 18:36:28 GMT, Thomas Stuefe <[hidden email]> wrote:

>> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Thomas review
>>  
>>   Removed check for IPC_LOCK capability. If a large page mapping fails
>>   the errno is already present in the warning printed out. We could
>>   look at improving this to better explain when EPERM vs ENOMEM occurs.
>
> src/hotspot/os/linux/os_linux.cpp line 3781:
>
>> 3779:   }
>> 3780:
>> 3781:   log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system.");
>
> IIUC here we end up if UseLargePages=true (by default or not) and we were unable to get any of our APIs to work? Should this be an unconditional printout? At least make it only unconditional if UseLargePages==true is not default? I know its only a theoretical problem now since default is false.

You are correct we only end up here if `UseLargePages` is true. Making it conditional like that would be consistent with the other warnings. So only warn if the user explicitly set `+UseLargePages`. So something like?
Suggestion:

  if (!FLAG_IS_DEFAULT(UseLargePages)) {
    log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system.");
  }

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v2]

Thomas Stuefe
On Wed, 10 Feb 2021 19:30:58 GMT, Stefan Johansson <[hidden email]> wrote:

>> src/hotspot/os/linux/os_linux.cpp line 3781:
>>
>>> 3779:   }
>>> 3780:
>>> 3781:   log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system.");
>>
>> IIUC here we end up if UseLargePages=true (by default or not) and we were unable to get any of our APIs to work? Should this be an unconditional printout? At least make it only unconditional if UseLargePages==true is not default? I know its only a theoretical problem now since default is false.
>
> You are correct we only end up here if `UseLargePages` is true. Making it conditional like that would be consistent with the other warnings. So only warn if the user explicitly set `+UseLargePages`. So something like?
> Suggestion:
>
>   if (!FLAG_IS_DEFAULT(UseLargePages)) {
>     log_warning(pagesize)("UseLargePages disabled, no large pages configured and available on the system.");
>   }

Seems reasonable and in line with what we do otherwise.

As a side note, I'm not a big fan of unconditional error printouts for non-fatal errors. Can mess with parsers (see eg JDK-8260902). The argument "we trace only if flag is explicitly set" is valid oc, but even so customers tend to bury flags in scripts and then forget them. But I know this is common procedure so this would be a separate discussion.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v3]

Stefan Johansson
In reply to this post by Stefan Johansson
> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>
> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>
> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>
> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>
> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>
> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:

  Only warn if UseLargePages was explicitly set.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2488/files
  - new: https://git.openjdk.java.net/jdk/pull/2488/files/bf6cbce1..6a2606be

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=01-02

  Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v3]

Thomas Stuefe
On Thu, 11 Feb 2021 07:48:53 GMT, Stefan Johansson <[hidden email]> wrote:

>> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>>
>> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>>
>> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>>
>> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>>
>> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>>
>> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.
>
> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>
>   Only warn if UseLargePages was explicitly set.

Looks good now. Thank you!

-------------

Marked as reviewed by stuefe (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v4]

Stefan Johansson
In reply to this post by Stefan Johansson
> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>
> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>
> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>
> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>
> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>
> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

Stefan Johansson has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:

 - Merge branch 'master' into 8261401-shm-sanity
 - Only warn if UseLargePages was explicitly set.
 - Thomas review
   
   Removed check for IPC_LOCK capability. If a large page mapping fails
   the errno is already present in the warning printed out. We could
   look at improving this to better explain when EPERM vs ENOMEM occurs.
 - 8261401-check-effective
 - 8261401-self-review
 - 8261401-test-fixes
 - Only UseSHM if sanity check pass

-------------

Changes: https://git.openjdk.java.net/jdk/pull/2488/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=03
  Stats: 47 lines in 4 files changed: 44 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5]

Stefan Johansson
In reply to this post by Stefan Johansson
> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>
> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>
> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>
> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>
> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>
> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:

  Clean up test after merge

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2488/files
  - new: https://git.openjdk.java.net/jdk/pull/2488/files/681dc92b..a9d8c0f7

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=03-04

  Stats: 5 lines in 1 file changed: 0 ins; 5 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5]

Thomas Schatzl-4
On Mon, 15 Feb 2021 13:44:58 GMT, Stefan Johansson <[hidden email]> wrote:

>> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>>
>> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>>
>> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>>
>> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>>
>> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>>
>> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.
>
> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>
>   Clean up test after merge

Changes requested by tschatzl (Reviewer).

src/hotspot/os/linux/os_linux.cpp line 3574:

> 3572:     // 1. shmmax is too small for the request.
> 3573:     //    > check shmmax value: cat /proc/sys/kernel/shmmax
> 3574:     //    > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax

I'd avoid using a 32 bit constant on potentially 64 bit systems. According to that same man page, the limit on 64 bits is 127 TB which is more than the suggested 4 GB :)

The man page talks about `ULONG_MAX - 2^24` as default (for 64 bit systems?).

src/hotspot/os/linux/os_linux.cpp line 3570:

> 3568:   // Try to create a large shared memory segment.
> 3569:   int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W);
> 3570:   if (shmid == -1) {

I think the flags should contain the `SHM_HUGE_*` flags (considering large pages) similar to hugetlbfs for proper checking to avoid the failure like in [JDK-8261636](https://bugs.openjdk.java.net/browse/JDK-8261636) reported just recently for the same reason.

According to the [man pages](https://man7.org/linux/man-pages/man2/shmget.2.html)] such flags exist.

src/hotspot/os/linux/os_linux.cpp line 3578:

> 3576:     //    > check available large pages: cat /proc/meminfo
> 3577:     //    > increase amount of large pages:
> 3578:     //          echo new_value > /proc/sys/vm/nr_hugepages

This is just my opinion, but I would prefer to refer to some generic documentation about these options (and use the "new" locations in `/sys/kernel/mm/hugepages/hugepages-*/` instead of the "legacy" /proc/sys, e.g. `https://lwn.net/Articles/376606/` - yes this is not official documentation, but the best I could find on the spot)

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5]

Stefan Johansson
In reply to this post by Stefan Johansson
On Mon, 15 Feb 2021 13:44:58 GMT, Stefan Johansson <[hidden email]> wrote:

>> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>>
>> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>>
>> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>>
>> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>>
>> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>>
>> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.
>
> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>
>   Clean up test after merge

Thanks for taking a look Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5]

Stefan Johansson
In reply to this post by Thomas Schatzl-4
On Wed, 17 Feb 2021 10:14:18 GMT, Thomas Schatzl <[hidden email]> wrote:

>> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Clean up test after merge
>
> src/hotspot/os/linux/os_linux.cpp line 3570:
>
>> 3568:   // Try to create a large shared memory segment.
>> 3569:   int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W);
>> 3570:   if (shmid == -1) {
>
> I think the flags should contain the `SHM_HUGE_*` flags (considering large pages) similar to hugetlbfs for proper checking to avoid the failure like in [JDK-8261636](https://bugs.openjdk.java.net/browse/JDK-8261636) reported just recently for the same reason.
>
> According to the [man pages](https://man7.org/linux/man-pages/man2/shmget.2.html)] such flags exist.

I would agree if the later usage of `shmget()` would make use of those flags but it don't. We might want o add a CR for enabling use of other page sizes than the default large page size for SHM as well. But there is also discussions about trying to remove this SHM support.

> src/hotspot/os/linux/os_linux.cpp line 3574:
>
>> 3572:     // 1. shmmax is too small for the request.
>> 3573:     //    > check shmmax value: cat /proc/sys/kernel/shmmax
>> 3574:     //    > increase shmmax value: echo "0xffffffff" > /proc/sys/kernel/shmmax
>
> I'd avoid using a 32 bit constant on potentially 64 bit systems. According to that same man page, the limit on 64 bits is 127 TB which is more than the suggested 4 GB :)
>
> The man page talks about `ULONG_MAX - 2^24` as default (for 64 bit systems?).

The comment is a copy of the old one where we actually use `shmget()`, but I agree that it could use an update. Not even sure we have to suggest a specific value, something like:
Suggestion:

    //    > increase shmmax value: echo "new value" > /proc/sys/kernel/shmmax
Might be good enough.

I will also update the original comment.

> src/hotspot/os/linux/os_linux.cpp line 3578:
>
>> 3576:     //    > check available large pages: cat /proc/meminfo
>> 3577:     //    > increase amount of large pages:
>> 3578:     //          echo new_value > /proc/sys/vm/nr_hugepages
>
> This is just my opinion, but I would prefer to refer to some generic documentation about these options (and use the "new" locations in `/sys/kernel/mm/hugepages/hugepages-*/` instead of the "legacy" /proc/sys, e.g. `https://lwn.net/Articles/376606/` - yes this is not official documentation, but the best I could find on the spot)

I agree, the documentation I usually refer to is:
https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

Since our SHM-support currently only handles the default large page size I think we should suggest to use:
`sysctl -w vm.nr_hugepages=new_value`

Suggestion:

    //          sysctl -w vm.nr_hugepages=new_value
    //    > For more information regarding large pages please refer to:
    //      https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v6]

Stefan Johansson
In reply to this post by Stefan Johansson
> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>
> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>
> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>
> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>
> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>
> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.

Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:

  Thomas review - Updated comments

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2488/files
  - new: https://git.openjdk.java.net/jdk/pull/2488/files/a9d8c0f7..8354e74f

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2488&range=04-05

  Stats: 9 lines in 1 file changed: 4 ins; 0 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2488.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2488/head:pull/2488

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v6]

Thomas Schatzl-4
On Wed, 17 Feb 2021 11:06:59 GMT, Stefan Johansson <[hidden email]> wrote:

>> When large pages are enabled on Linux (using -XX:+UseLargePages), both UseHugeTLBFS and UseSHM can be used. We prefer to use HugeTLBFS and first do a sanity check to see if this kind of large pages are available and if so we disable UseSHM.
>>
>> The problematic part is when HugeTLBFS pages are not available, then we disable this flag and without doing any sanity check for UseSHM, we mark large pages as enabled using SHM. One big problem with this is that SHM also requires the same type of explicitly allocated huge pages as HugeTLBFS and also privileges to lock memory. So it is likely that in the case of not being able to use HugeTLBFS we probably can't use SHM either.
>>
>> A fix for this would be to do a similar sanity check as currently done for HugeTLBFS and if it fails disable UseLargePages since we will always fail such allocation attempts anyways.
>>
>> The proposed sanity check consist of two part, where the first is just trying create a shared memory segment using `shmget()` with SHM_HUGETLB to use large pages. If this fails there is no idea in trying to use SHM to get large pages.
>>
>> The second part checks if the process has privileges to lock memory or if there will be a limit for the SHM usage. I think this would be a nice addition since it will notify the user about the limit and explain why large page mappings fail. The implementation parses `/proc/self/status` to make sure the needed capability is available.
>>
>> This change needs two tests to be updated to handle that large pages not can be disabled even when run with +UseLargePages. One of these tests are also updated in [PR#2486](https://github.com/openjdk/jdk/pull/2486) and I plan to get that integrated before this one.
>
> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>
>   Thomas review - Updated comments

I can't approve the changes you made in the "Suggested changes" boxes, or even comment on them as "looks good" directly, but they look all good.

Marked as reviewed by tschatzl (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v5]

Thomas Schatzl-4
In reply to this post by Stefan Johansson
On Wed, 17 Feb 2021 10:32:09 GMT, Stefan Johansson <[hidden email]> wrote:

>> src/hotspot/os/linux/os_linux.cpp line 3570:
>>
>>> 3568:   // Try to create a large shared memory segment.
>>> 3569:   int shmid = shmget(IPC_PRIVATE, page_size, SHM_HUGETLB|IPC_CREAT|SHM_R|SHM_W);
>>> 3570:   if (shmid == -1) {
>>
>> I think the flags should contain the `SHM_HUGE_*` flags (considering large pages) similar to hugetlbfs for proper checking to avoid the failure like in [JDK-8261636](https://bugs.openjdk.java.net/browse/JDK-8261636) reported just recently for the same reason.
>>
>> According to the [man pages](https://man7.org/linux/man-pages/man2/shmget.2.html)] such flags exist.
>
> I would agree if the later usage of `shmget()` would make use of those flags but it don't. We might want o add a CR for enabling use of other page sizes than the default large page size for SHM as well. But there is also discussions about trying to remove this SHM support.

I am good with creating a CR for that. We can always close it later.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261401: Add sanity check for UseSHM large pages similar to the one used with hugetlb large pages [v3]

Stefan Johansson
In reply to this post by Thomas Stuefe
On Thu, 11 Feb 2021 08:38:24 GMT, Thomas Stuefe <[hidden email]> wrote:

>> Stefan Johansson has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Only warn if UseLargePages was explicitly set.
>
> Looks good now. Thank you!

Thanks for the reviews @tstuefe and @tschatzl

FYI, I just filed three more enhancements:
[JDK-8261894: Remove support for UseSHM](https://bugs.openjdk.java.net/browse/JDK-8261894)
[JDK-8261896: Add support for multiple page sizes for UseSHM](https://bugs.openjdk.java.net/browse/JDK-8261896)
[JDK-8261899: Improve warning for UseSHM failures](https://bugs.openjdk.java.net/browse/JDK-8261899)

-------------

PR: https://git.openjdk.java.net/jdk/pull/2488
12