RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap

classic Classic list List threaded Threaded
62 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap

Marcus G K Williams
Use 2m pages for executable large
pages and in large page requests less
than 1g on linux.

- Add os::exec_large_page_size() that
returns 2m as size
- Add os::select_large_page_size() to return
correct large page size for size_t bytes
- Add 2m size to _page_sizes array
- Update reserve_memory_special methods
to set/use large_page_size based on exec
size
- Update large page not reserved warnings
to include large_page_size attempted
- Update TestLargePageUseForAuxMemory.java
to expect 2m large pages in some instances

Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Commit messages:
 - Default to 2M LargePages for code

Changes: https://git.openjdk.java.net/jdk/pull/1153/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8256155
  Stats: 99 lines in 3 files changed: 69 ins; 0 del; 30 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1153.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap

Stefan Johansson
On Wed, 11 Nov 2020 01:48:46 GMT, Marcus G K Williams <[hidden email]> wrote:

> Use 2m pages for executable large
> pages and in large page requests less
> than 1g on linux.
>
> - Add os::exec_large_page_size() that
> returns 2m as size
> - Add os::select_large_page_size() to return
> correct large page size for size_t bytes
> - Add 2m size to _page_sizes array
> - Update reserve_memory_special methods
> to set/use large_page_size based on exec
> size
> - Update large page not reserved warnings
> to include large_page_size attempted
> - Update TestLargePageUseForAuxMemory.java
> to expect 2m large pages in some instances
>
> Signed-off-by: Marcus G K Williams <[hidden email]>

Hi and welcome :)

I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments:
* Why do we have a special case for `exec` when selecting a large page size?
* If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap

Thomas Stuefe
In reply to this post by Marcus G K Williams
On Wed, 11 Nov 2020 01:48:46 GMT, Marcus G K Williams <[hidden email]> wrote:

> Use 2m pages for executable large
> pages and in large page requests less
> than 1g on linux.
>
> - Add os::exec_large_page_size() that
> returns 2m as size
> - Add os::select_large_page_size() to return
> correct large page size for size_t bytes
> - Add 2m size to _page_sizes array
> - Update reserve_memory_special methods
> to set/use large_page_size based on exec
> size
> - Update large page not reserved warnings
> to include large_page_size attempted
> - Update TestLargePageUseForAuxMemory.java
> to expect 2m large pages in some instances
>
> Signed-off-by: Marcus G K Williams <[hidden email]>

Hi,

this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.

Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.

I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.

What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)

Why is this proposal hard coded to 2M pages?

What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".

One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?

What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?

The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.

For SHM, I think you need to make sure that alignment matches SHMLBA?

It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.

Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).

The linux-2m-page-specific code in the platform-generic G1 test seems wrong.

Cheers, Thomas

src/hotspot/os/linux/os_linux.cpp line 3970:

> 3968:                                             char* req_addr, bool exec) {
> 3969:   size_t large_page_size;
> 3970:   large_page_size = os::select_large_page_size(bytes, exec);

The "os" is shared and platform generic. Please don't add anything there unless you write (and test as much as possible) the different platforms. I do not see why this API should even be exported from this unit.

src/hotspot/os/linux/os_linux.cpp line 4002:

> 4000:     char msg[128];
> 4001:     jio_snprintf(msg, sizeof(msg), "Failed to reserve shared memory with large_page_size: " SIZE_FORMAT ".", large_page_size);
> 4002:     shm_warning_format_with_errno("%s", msg);

Why the double printf here? But you can just use Univeral Logging ` log_info(os)("..") `. See e.g. thread creation in this file for examples.

test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 80:

> 78:     }
> 79:
> 80:     static void testVM(String what, long heapsize, boolean cardsShouldUseLargePages, boolean bitmapShouldUseLargePages, boolean largePages2m) throws Exception {

Having this linux-specific stuff in a generic G1 test :(

test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 150:

> 148:         if (Platform.isLinux() && largePageSize != largePageExecSize) {
> 149:             try {
> 150:                 Scanner scan_hugepages = new Scanner(new File("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"));

2M hard coded.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap

Marcus G K Williams
On Thu, 12 Nov 2020 15:36:13 GMT, Thomas Stuefe <[hidden email]> wrote:

>> Use 2m pages for executable large
>> pages and in large page requests less
>> than 1g on linux.
>>
>> - Add os::exec_large_page_size() that
>> returns 2m as size
>> - Add os::select_large_page_size() to return
>> correct large page size for size_t bytes
>> - Add 2m size to _page_sizes array
>> - Update reserve_memory_special methods
>> to set/use large_page_size based on exec
>> size
>> - Update large page not reserved warnings
>> to include large_page_size attempted
>> - Update TestLargePageUseForAuxMemory.java
>> to expect 2m large pages in some instances
>>
>> Signed-off-by: Marcus G K Williams <[hidden email]>
>
> Hi,
>
> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
>
> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
>
> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
>
> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)
>
> Why is this proposal hard coded to 2M pages?
>
> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
>
> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
>
> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
>
> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
>
> For SHM, I think you need to make sure that alignment matches SHMLBA?
>
> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
>
> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>
> The linux-2m-page-specific code in the platform-generic G1 test seems wrong.
>
> Cheers, Thomas

Hi Thomas,

Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some.

**Responses below inline:**

> Hi,
>
> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
>
> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
>

I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance).

To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.

> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
>
> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)

I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted.

>
> Why is this proposal hard coded to 2M pages?
>

To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. Also it was an implementation suggestion from some Oracle engineers on the topic, though my implementation of those suggestions could be suspect. Perhaps we should detect the smallest large page size in _page_sizes array that will fit the requested memory amount. However populating _page_sizes array is complicated by the fact that the current jdk code relies heavily on the default large page size, almost as if that is the only size that can be used. 2M large page sizes are the default default_large_page_size in Linux and should be available even if one configures the default_large_page_size to 1G.

> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
>

Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 
This is where 2m pages are added.

However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077
we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G

So in
https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229
we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested.

> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
>
> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
>

My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back.

> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
>
> For SHM, I think you need to make sure that alignment matches SHMLBA?

Looking into this.

>
> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
>

I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations.

> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>

Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those.

> The linux-2m-page-specific code in the platform-generic G1 test seems wrong.

Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled?

>
> Cheers, Thomas

Thanks again for the review.

> src/hotspot/os/linux/os_linux.cpp line 3970:
>
>> 3968:                                             char* req_addr, bool exec) {
>> 3969:   size_t large_page_size;
>> 3970:   large_page_size = os::select_large_page_size(bytes, exec);
>
> The "os" is shared and platform generic. Please don't add anything there unless you write (and test as much as possible) the different platforms. I do not see why this API should even be exported from this unit.

Understood. Will move from src/hotspot/share/runtime/os.hpp to src/hotspot/os/linux/os_linux.hpp .

> src/hotspot/os/linux/os_linux.cpp line 4002:
>
>> 4000:     char msg[128];
>> 4001:     jio_snprintf(msg, sizeof(msg), "Failed to reserve shared memory with large_page_size: " SIZE_FORMAT ".", large_page_size);
>> 4002:     shm_warning_format_with_errno("%s", msg);
>
> Why the double printf here? But you can just use Univeral Logging ` log_info(os)("..") `. See e.g. thread creation in this file for examples.

Looking into this.

> test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 80:
>
>> 78:     }
>> 79:
>> 80:     static void testVM(String what, long heapsize, boolean cardsShouldUseLargePages, boolean bitmapShouldUseLargePages, boolean largePages2m) throws Exception {
>
> Having this linux-specific stuff in a generic G1 test :(

Understood. Not sure how others would handle this as assumptions would change if my code is merged and one gets 2M large pages on smaller mem reservations but only on linux. Any advice?

> test/hotspot/jtreg/gc/g1/TestLargePageUseForAuxMemory.java line 150:
>
>> 148:         if (Platform.isLinux() && largePageSize != largePageExecSize) {
>> 149:             try {
>> 150:                 Scanner scan_hugepages = new Scanner(new File("/sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages"));
>
> 2M hard coded.

Understood. Not sure how others would handle this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap

Marcus G K Williams
On Wed, 18 Nov 2020 19:13:02 GMT, Marcus G K Williams <[hidden email]> wrote:

>> Hi,
>>
>> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
>>
>> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
>>
>> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
>>
>> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)
>>
>> Why is this proposal hard coded to 2M pages?
>>
>> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
>>
>> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
>>
>> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
>>
>> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
>>
>> For SHM, I think you need to make sure that alignment matches SHMLBA?
>>
>> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
>>
>> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>>
>> The linux-2m-page-specific code in the platform-generic G1 test seems wrong.
>>
>> Cheers, Thomas
>
> Hi Thomas,
>
> Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some.
>
> **Responses below inline:**
>
>> Hi,
>>
>> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
>>
>> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
>>
>
> I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance).
>
> To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>
>> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
>>
>> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)
>
> I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted.
>
>>
>> Why is this proposal hard coded to 2M pages?
>>
>
> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. Also it was an implementation suggestion from some Oracle engineers on the topic, though my implementation of those suggestions could be suspect. Perhaps we should detect the smallest large page size in _page_sizes array that will fit the requested memory amount. However populating _page_sizes array is complicated by the fact that the current jdk code relies heavily on the default large page size, almost as if that is the only size that can be used. 2M large page sizes are the default default_large_page_size in Linux and should be available even if one configures the default_large_page_size to 1G.
>
>> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
>>
>
> Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 
> This is where 2m pages are added.
>
> However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077
> we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G
>
> So in
> https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229
> we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested.
>
>> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
>>
>> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
>>
>
> My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back.
>
>> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
>>
>> For SHM, I think you need to make sure that alignment matches SHMLBA?
>
> Looking into this.
>
>>
>> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
>>
>
> I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations.
>
>> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>>
>
> Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those.
>
>> The linux-2m-page-specific code in the platform-generic G1 test seems wrong.
>
> Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled?
>
>>
>> Cheers, Thomas
>
> Thanks again for the review.

Hi Stefan,

Thanks so much for your review.

> Hi and welcome :)
>
> I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments:
>
> * Why do we have a special case for `exec` when selecting a large page size?

To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.

Perhaps I should just select the page size <= bytes requested and remove 'exec' special case.

> * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`.

os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2]

Marcus G K Williams
In reply to this post by Marcus G K Williams
> Add 2M LargePages to _page_sizes
>
>     Use 2m pages for large page requests
>     less than 1g on linux when 1G are default
>     pages
>
>     - Add os::Linux::large_page_size_2m() that
>     returns 2m as size
>     - Add os::Linux::select_large_page_size() to return
>     correct large page size for size_t bytes
>     - Add 2m size to _page_sizes array
>     - Update reserve_memory_special methods
>     to set/use large_page_size based on bytes reserved
>     - Update large page not reserved warnings
>     to include large_page_size attempted
>     - Update TestLargePageUseForAuxMemory.java
>     to expect 2m large pages in some instances
>
> Signed-off-by: Marcus G K Williams <[hidden email]>

Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains one commit:

  Add 2M LargePages to _page_sizes
 
  Use 2m pages for large page requests
  less than 1g on linux when 1G are default
  pages
 
  - Add os::Linux::large_page_size_2m() that
  returns 2m as size
  - Add os::Linux::select_large_page_size() to return
  correct large page size for size_t bytes
  - Add 2m size to _page_sizes array
  - Update reserve_memory_special methods
  to set/use large_page_size based on bytes reserved
  - Update large page not reserved warnings
  to include large_page_size attempted
  - Update TestLargePageUseForAuxMemory.java
  to expect 2m large pages in some instances
 
  Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Changes: https://git.openjdk.java.net/jdk/pull/1153/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=01
  Stats: 105 lines in 3 files changed: 75 ins; 0 del; 30 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1153.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2]

Marcus G K Williams
In reply to this post by Marcus G K Williams
On Wed, 18 Nov 2020 19:22:06 GMT, Marcus G K Williams <[hidden email]> wrote:

>> Hi Thomas,
>>
>> Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some.
>>
>> **Responses below inline:**
>>
>>> Hi,
>>>
>>> this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
>>>
>>> Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
>>>
>>
>> I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance).
>>
>> To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>>
>>> I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
>>>
>>> What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)
>>
>> I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted.
>>
>>>
>>> Why is this proposal hard coded to 2M pages?
>>>
>>
>> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. Also it was an implementation suggestion from some Oracle engineers on the topic, though my implementation of those suggestions could be suspect. Perhaps we should detect the smallest large page size in _page_sizes array that will fit the requested memory amount. However populating _page_sizes array is complicated by the fact that the current jdk code relies heavily on the default large page size, almost as if that is the only size that can be used. 2M large page sizes are the default default_large_page_size in Linux and should be available even if one configures the default_large_page_size to 1G.
>>
>>> What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
>>>
>>
>> Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790 
>> This is where 2m pages are added.
>>
>> However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077
>> we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G
>>
>> So in
>> https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229
>> we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested.
>>
>>> One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
>>>
>>> What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
>>>
>>
>> My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back.
>>
>>> The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
>>>
>>> For SHM, I think you need to make sure that alignment matches SHMLBA?
>>
>> Looking into this.
>>
>>>
>>> It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
>>>
>>
>> I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations.
>>
>>> Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>>>
>>
>> Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those.
>>
>>> The linux-2m-page-specific code in the platform-generic G1 test seems wrong.
>>
>> Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled?
>>
>>>
>>> Cheers, Thomas
>>
>> Thanks again for the review.
>
> Hi Stefan,
>
> Thanks so much for your review.
>
>> Hi and welcome :)
>>
>> I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments:
>>
>> * Why do we have a special case for `exec` when selecting a large page size?
>
> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>
> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case.
>
>> * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`.
>
> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved.

Pushed a new patch removing exec references and instead use large page size based on requested memory size bytes. Moved added definitions from os to os::Linux.

More work/research in progress.

----

Add 2M LargePages to _page_sizes

    Use 2m pages for large page requests
    less than 1g on linux when 1G are default
    pages

    - **Add os::Linux::large_page_size_2m() that
    returns 2m as size**
    - **Add os::Linux::select_large_page_size() to return
    correct large page size for size_t bytes**
    - Add 2m size to _page_sizes array
    - Update reserve_memory_special methods
    to set/use large_page_size based on bytes reserved
    - Update large page not reserved warnings
    to include large_page_size attempted
    - Update TestLargePageUseForAuxMemory.java
    to expect 2m large pages in some instances

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2]

Stefan Johansson
In reply to this post by Marcus G K Williams
On Wed, 18 Nov 2020 19:22:06 GMT, Marcus G K Williams <[hidden email]> wrote:

> Hi Stefan,
>
> Thanks so much for your review.
>
> > Hi and welcome :)
> > I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments:
> >
> > * Why do we have a special case for `exec` when selecting a large page size?
>
> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>
> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case.
>
Yes, I see no reason to keep that special case and we want to keep this code as general as possible. Looking at the code in `os::Linux::find_default_large_page_size()` it looks like S390 supports 1M large pages, so we cannot assume 2M. I suggest using a technique similar to the one used in `os::Linux::find_large_page_size` to find supported page sizes. If you scan `/sys/kernel/mm/hugepages` and populate  `_page_sizes` using the information found we know we only get supported sizes.

> > * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`.
>
> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved.

You are correct that the default size might indeed be 1G, so using something like I suggest above to figure  out the available page sizes and then using an appropriate one given the size of the mapping sounds good.

Please also avoid force pushing changes to open PRs since it makes it harder to follow what changes between updates. It is fine for a PR to contain multiple commits and if you need to update with things from the main branch you should merge rather than rebase.

Cheers,
Stefan

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2]

Thomas Stuefe
On Thu, 19 Nov 2020 08:19:59 GMT, Stefan Johansson <[hidden email]> wrote:

>> Hi Stefan,
>>
>> Thanks so much for your review.
>>
>>> Hi and welcome :)
>>>
>>> I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments:
>>>
>>> * Why do we have a special case for `exec` when selecting a large page size?
>>
>> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>>
>> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case.
>>
>>> * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`.
>>
>> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved.
>
>> Hi Stefan,
>>
>> Thanks so much for your review.
>>
>> > Hi and welcome :)
>> > I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments:
>> >
>> > * Why do we have a special case for `exec` when selecting a large page size?
>>
>> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>>
>> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case.
>>
> Yes, I see no reason to keep that special case and we want to keep this code as general as possible. Looking at the code in `os::Linux::find_default_large_page_size()` it looks like S390 supports 1M large pages, so we cannot assume 2M. I suggest using a technique similar to the one used in `os::Linux::find_large_page_size` to find supported page sizes. If you scan `/sys/kernel/mm/hugepages` and populate  `_page_sizes` using the information found we know we only get supported sizes.
>
>> > * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`.
>>
>> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved.
>
> You are correct that the default size might indeed be 1G, so using something like I suggest above to figure  out the available page sizes and then using an appropriate one given the size of the mapping sounds good.
>
> Please also avoid force pushing changes to open PRs since it makes it harder to follow what changes between updates. It is fine for a PR to contain multiple commits and if you need to update with things from the main branch you should merge rather than rebase.
>
> Cheers,
> Stefan

Hi Markus,

thanks, and a belated welcome!

Some initial background:

We at SAP are maintainers for a number of ports, among others AIX and linux ppc/s390 as well as some propietary ones (e.g. HPUX or ia64). So I wear my platform glasses when looking at this code.

IMHO the virtual memory layer in hotspot - os::reserve_memory() and all its friends - could do with a revamp. At least a consistent API documentation :-/. Supposed to be an API-independent abstraction, its facade breaks in many places. See e.g. JDK-8255978, JDK-8253649 (all windows), AIX sysV shmem handling, @AntonKozlov's valiant attempt to add MAP_JIT code heap reservation on MacOS (https://github.com/openjdk/jdk/pull/294), or the relative difficulty with which support for JEP 316 (from Intel) had been added.

Hence my initial caution. Every new feature increases complexity for us maintainers. Especially if it continues the bad tradition of not documenting or commenting anything. Since I do not know whether Intel sticks around to maintain this contribution (bit of a mixed track record there, see e.g. JDK-8256181), we must plan on maintenance falling to us.

That said, now that I understand better what you want to do, your plan certainly makes sense and is useful.

One of the more pressing concerns I have is that the changes to reserve_memory() would somehow be observable from the outside and/or leak back into the os layer when calling os::commit_memory/uncommit_memory/release_memory. This is the case with @AntonKozlov's MAP_JIT change: it requires a matching commit call in os::commit_memory() to be made for executable memory allocated with os::reserve_memory(), and therefore exposed one weakness of the os::reserve_memory() API, that its very difficult to pass along meta information about memory mappings.

I think this is not the case here, but I'm not sure and we should be sure.

**More remarks inline.**

> Hi Thomas,
>
> Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some.
>
> **Responses below inline:**
>
> > Hi,
> > this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
> > Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
>
> I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance).

Please beef up the JBS issue a bit. If you do not have access to it, you can send the text to me I will update it. Or even easier, just update the PR description and we copy the text to the JBS.

JBS tickets are supposed to keep information about what we did and why for a long time. When formulating the text, just imagine the reader to be someone in the future with general knowledge in your field but without particular knowledge about this very case. I know this is a vague description though; for an example, see e.g. https://bugs.openjdk.java.net/browse/JDK-8255978.

>
> To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.

Right, and as Stefan suggested, this should be kept more "fluid" and not be hard coded to 2M, nor to just one additional large page. Maybe the system has four page sizes (our propietary HPUX has that, not that it matters here).

>
> > I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
> > What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)
>
> I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted.

<snip>

>
> > What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
>
> Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790
> This is where 2m pages are added.
>
> However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077
> we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G
>
> So in
> https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229
> we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested.


We need to decide on whether we want to do this for the code heap only or for every reservation done with reserve_memory_special (I really dislike that name btw). In your proposal you "piggyback" on the exec property as a standin for "code heap", which is not clean and also not necessarily true. So:

a) If we only want to do this for the code heap, we could think about creating an own API for allocating the code heap. E.g. os::reserve_code_space() and os::release_code_space(). This is one of the ideas @AntonKozlov came up with to circumvent the need for a fully fledged revamp of these APIs while still being able to move his PR forward.

b) If we want to do this for all callers of reserve_memory_special(), we should also remove any mention of "exec" and just implement that.

I currently favour (b) but would like to know opinions of others.

>
> > One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
> > What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
>
> My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back.

Okay. We do not expect every contributor to have exotic test machines, but this means we will have to do that testing. We need to know to plan in these efforts.

>
> > The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
> > For SHM, I think you need to make sure that alignment matches SHMLBA?
>
> Looking into this.
>
> > It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
>
> I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations.
>
> > Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>
> Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those.

When I write API specs I basically mean "new code should comment better". That can be as simple as a one liner above your os::Linux::select_large_page_size() function.

About regression tests, we have a google-test suite (see test/hotspot/gtest) which would be the appropiate point to put in tests.

>
> > The linux-2m-page-specific code in the platform-generic G1 test seems wrong.
>
> Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled?

I defer to the G1 folks for that.

>
> > Cheers, Thomas
>
> Thanks again for the review.

Sure. Thanks for the much more clear information.

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2]

Thomas Stuefe
On Thu, 19 Nov 2020 13:08:05 GMT, Thomas Stuefe <[hidden email]> wrote:

>>> Hi Stefan,
>>>
>>> Thanks so much for your review.
>>>
>>> > Hi and welcome :)
>>> > I haven't started reviewing the code in detail but a first quick glance raised a couple of questions/comments:
>>> >
>>> > * Why do we have a special case for `exec` when selecting a large page size?
>>>
>>> To my knowledge 2M is the smallest large pages size supported by Linux at the moment. Hardcoding 2M pages was an attempt to simplify the reservation of code memory using LargePages. In most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>>>
>>> Perhaps I should just select the page size <= bytes requested and remove 'exec' special case.
>>>
>> Yes, I see no reason to keep that special case and we want to keep this code as general as possible. Looking at the code in `os::Linux::find_default_large_page_size()` it looks like S390 supports 1M large pages, so we cannot assume 2M. I suggest using a technique similar to the one used in `os::Linux::find_large_page_size` to find supported page sizes. If you scan `/sys/kernel/mm/hugepages` and populate  `_page_sizes` using the information found we know we only get supported sizes.
>>
>>> > * If we need the special case, `exec_large_page_size()` should not be hard code to return 2m but rather use `os::default_large_page_size()`.
>>>
>>> os::default_large_page_size() will not necessarily be small enough for code memory reservations if the os::default_large_page_size() = 1G, in those cases we would get 4k on most linux x86_64 variants. My attempt is to ensure the smallest large_page_size availabe is used for code memory reservations. Perhaps my 2M hardcoding was a mistake and I should discover this size and select it based on the bytes being reserved.
>>
>> You are correct that the default size might indeed be 1G, so using something like I suggest above to figure  out the available page sizes and then using an appropriate one given the size of the mapping sounds good.
>>
>> Please also avoid force pushing changes to open PRs since it makes it harder to follow what changes between updates. It is fine for a PR to contain multiple commits and if you need to update with things from the main branch you should merge rather than rebase.
>>
>> Cheers,
>> Stefan
>
> Hi Markus,
>
> thanks, and a belated welcome!
>
> Some initial background:
>
> We at SAP are maintainers for a number of ports, among others AIX and linux ppc/s390 as well as some propietary ones (e.g. HPUX or ia64). So I wear my platform glasses when looking at this code.
>
> IMHO the virtual memory layer in hotspot - os::reserve_memory() and all its friends - could do with a revamp. At least a consistent API documentation :-/. Supposed to be an API-independent abstraction, its facade breaks in many places. See e.g. JDK-8255978, JDK-8253649 (all windows), AIX sysV shmem handling, @AntonKozlov's valiant attempt to add MAP_JIT code heap reservation on MacOS (https://github.com/openjdk/jdk/pull/294), or the relative difficulty with which support for JEP 316 (from Intel) had been added.
>
> Hence my initial caution. Every new feature increases complexity for us maintainers. Especially if it continues the bad tradition of not documenting or commenting anything. Since I do not know whether Intel sticks around to maintain this contribution (bit of a mixed track record there, see e.g. JDK-8256181), we must plan on maintenance falling to us.
>
> That said, now that I understand better what you want to do, your plan certainly makes sense and is useful.
>
> One of the more pressing concerns I have is that the changes to reserve_memory() would somehow be observable from the outside and/or leak back into the os layer when calling os::commit_memory/uncommit_memory/release_memory. This is the case with @AntonKozlov's MAP_JIT change: it requires a matching commit call in os::commit_memory() to be made for executable memory allocated with os::reserve_memory(), and therefore exposed one weakness of the os::reserve_memory() API, that its very difficult to pass along meta information about memory mappings.
>
> I think this is not the case here, but I'm not sure and we should be sure.
>
> **More remarks inline.**
>
>> Hi Thomas,
>>
>> Thanks so much for your review. Please bear with me as this if my first patch to JDK community. But I have pushed patches to other open source communities (OpenDaylight, ONAP, OpenStack) and worked as a committer in some.
>>
>> **Responses below inline:**
>>
>> > Hi,
>> > this seems like a improvement for a very specific scenario (2M instead of 1G pages on x86(?) Linux(?)). At the moment this feels more like an early prototype. The lack of comments/documentation is not helping.
>> > Both JBS and PR are a bit taciturn. It would help if you could elaborate a bit. E.g. is this just for Linux? for x86 only? since the ticket talks about 4K pages, which are not universal across all architectures.
>>
>> I appreciate the feedback. Perhaps the lack of detail in the pull request/JDK issue is a function of my zoomed focus on the specific purpose and lack of understanding about how much detail is normally included. The purpose of the patch/issue is to enable code hugepage memory reservations on Linux when the JDK is configured with 1G hugepages (LargePages in JDK parlance).
>
> Please beef up the JBS issue a bit. If you do not have access to it, you can send the text to me I will update it. Or even easier, just update the PR description and we copy the text to the JBS.
>
> JBS tickets are supposed to keep information about what we did and why for a long time. When formulating the text, just imagine the reader to be someone in the future with general knowledge in your field but without particular knowledge about this very case. I know this is a vague description though; for an example, see e.g. https://bugs.openjdk.java.net/browse/JDK-8255978.
>
>>
>> To my knowledge, in most cases currently code memory is reserved in default page size of the system when using 1G LargePages because it does not require 1G or larger reservations. In modern Linux variants default page size seems to be 4k on x86_64. In other architectures it could be up to 64k. The purpose of the patch is to enable the use of smaller LargePages for reservations less than 1G when LargePages are enabled and 1G is set as LargePageSizeInBytes, so as not to fall back to 4k-64k pages for these reservations.
>
> Right, and as Stefan suggested, this should be kept more "fluid" and not be hard coded to 2M, nor to just one additional large page. Maybe the system has four page sizes (our propietary HPUX has that, not that it matters here).
>
>>
>> > I can glean some of what you want to do from the patch itself, but the spec is vague so there is no way to verify if the patch matches the spec.
>> > What does page size have to do with exec permission? This should not be tied to exec. The whole patch should not contain the word "exec" :)
>>
>> I'd appreciate any advice on writing a less vague spec. I have used exec as a stand-in for code memory reservations in my descriptions, mostly due to the fact that a 'bool exec' is used in functions that reserve HugePages and this later is translated into 'PROT_EXEC' when mmap is called, "exec" is passed in but not used in SHM. These are the particular memory reservations we wanted the patch to affect when using 1G LargePages. However I will remove those references if unwarranted.
>
> <snip>
>
>>
>> > What memory regions are supposed to be affected by this? JBS ticket talks about "code, card table and other".
>>
>> Code is the target memory region. However there are some other instances where large_page reservation is happening due to the addition of 2M pages as an option. Some calls fail and error when adding 2M pages to _page_sizes array in the company of 1G pages. See line https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR3790
>> This is where 2m pages are added.
>>
>> However at https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4077
>> we get failures after the addition of 2M sizes to _page_sizes array, due to some smaller reservations that happen regardless of the LargePageSizeInBytes=1G
>>
>> So in
>> https://github.com/openjdk/jdk/pull/1153/files/daba99ac5f46dadb263caafa7ff87566d6d7dc58#diff-aeec57d804d56002f26a85359fc4ac8b48cfc249d57c656a30a63fc6bf3457adR4229
>> we make sure that we select the largest page size that works with the bytes requested for reservation. Perhaps we shouldn't have exec as a special case, as the large page returned will be the same based on the size requested.
>
>
> We need to decide on whether we want to do this for the code heap only or for every reservation done with reserve_memory_special (I really dislike that name btw). In your proposal you "piggyback" on the exec property as a standin for "code heap", which is not clean and also not necessarily true. So:
>
> a) If we only want to do this for the code heap, we could think about creating an own API for allocating the code heap. E.g. os::reserve_code_space() and os::release_code_space(). This is one of the ideas @AntonKozlov came up with to circumvent the need for a fully fledged revamp of these APIs while still being able to move his PR forward.
>
> b) If we want to do this for all callers of reserve_memory_special(), we should also remove any mention of "exec" and just implement that.
>
> I currently favour (b) but would like to know opinions of others.
>
>>
>> > One problem I see is that the notion of "we have a small standard page and a single large pinned page size" is - I believe - baked in into a few places. Are there any places where an implicit assumption of the page size or their "pinned-ness" could break things now (see also below remark about UseSHM)? For instance, are these pages pinned on all our platforms, and if no, could code be affected which commits/uncommits and assumes a certain page size?
>> > What tests have you run? On what platforms? Also platforms with different page sizes? How well did you test UseSHM?
>>
>> My architecture knowledge outside of x86_64 is limited. I've been looking into this and thinking about it and I will have some more comments in the next day or so. For UseSHM I ran some negative tests but I will do some more rigorous testing and report back.
>
> Okay. We do not expect every contributor to have exotic test machines, but this means we will have to do that testing. We need to know to plan in these efforts.
>
>>
>> > The latter is interesting because arguably there is the bigger behavioral change. TLBFS path was using a mixture of large and small pages anyway, so adding another page size into the mix is not a big stretch. But for SHM, things would change: where before reserve_memory_special would return NULL and we'd invoke fallback reservation, now we return a region consisting of 2M pinned pages.
>> > For SHM, I think you need to make sure that alignment matches SHMLBA?
>>
>> Looking into this.
>>
>> > It was not clear from the patch or the JBS item whether you propose to change the semantics of LargePageSizeInBytes. E.g. what happens if the value specified explicitely is smaller than your exec_page size? Your patch seems to give preference to exec_page_size. But this would be a behavioral change, and may need a CSR.
>>
>> I'm open to removing exec references and just enabling multiple page sizes which would allow for 2M pages to be used by code memory reservations.
>>
>> > Finally, comments would be nice. Clear API specs. Extending regression tests for reserve_memory_special would be good too, to test the new behavior (for gtests examples, see test/hotspot/gtest/runtime in the source folder).
>>
>> Thanks. I will push an updated patch w/ comments. Will attempt clear API spec and regression tests for reserve_memory_special but will need some guidance on those.
>
> When I write API specs I basically mean "new code should comment better". That can be as simple as a one liner above your os::Linux::select_large_page_size() function.
>
> About regression tests, we have a google-test suite (see test/hotspot/gtest) which would be the appropiate point to put in tests.
>
>>
>> > The linux-2m-page-specific code in the platform-generic G1 test seems wrong.
>>
>> Any advice here. My change specifically changes the behavior of the pages returned in the test for linux platforms but should not have effects on other platforms. I don't know how this would generally happen for JDK tests in this case. It seems to me that the JDK will act differently on different platforms. How is this normally handled?
>
> I defer to the G1 folks for that.
>
>>
>> > Cheers, Thomas
>>
>> Thanks again for the review.
>
> Sure. Thanks for the much more clear information.
>
> Cheers, Thomas

Hi Markus,

the more I think about this the more I think it your proposal makes sense.

In my opinion I would do it transparently for reserve_memory_special() (so, not tied to code heap).

Maybe one way to simplify this and move it forward would be to just do it for UseHugeTLBFS, and leave the UseSHM path unchanged. I consider this less risky since with UseHugeTLBFS we already reserve spaces with mixed page sizes and that seems to work - so here, callers already are wrong if they make any assumptions about the underlying page size. Note that UseHugeTLBFS is the default if +UseLargePages is set.

Just my 5 cent.

Cheers, Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2]

Thomas Stuefe
On Wed, 25 Nov 2020 13:35:18 GMT, Stefan Johansson <[hidden email]> wrote:

>> Hi Markus,
>>
>> the more I think about this the more I think it your proposal makes sense.
>>
>> In my opinion I would do it transparently for reserve_memory_special() (so, not tied to code heap).
>>
>> Maybe one way to simplify this and move it forward would be to just do it for UseHugeTLBFS, and leave the UseSHM path unchanged. I consider this less risky since with UseHugeTLBFS we already reserve spaces with mixed page sizes and that seems to work - so here, callers already are wrong if they make any assumptions about the underlying page size. Note that UseHugeTLBFS is the default if +UseLargePages is set.
>>
>> Just my 5 cent.
>>
>> Cheers, Thomas
>
> I agree with what Thomas is saying. This should be a generic thing for reservations, as I've suggested before, choosing the largest page size given the size of the mapping. I would also be good with starting with the `UseHugeTLBFS` case.
>
> When it comes to testing, we should not hard code these kind of things in the test, but add WhiteBox functions that return the correct numbers given the platform and environment.
>
>         WhiteBox wb = WhiteBox.getWhiteBox();
>         smallPageSize = wb.getVMPageSize();
>         smallPageSize = wb.getVMPageSize();
>         largePageSize = wb.getVMLargePageSize();
>         largePageSize = wb.getVMLargePageSize();
>         largePageExecSize = 2097152;
> So instead of hard coding this, I guess the correct approach would be to return an array of available page sizes and verify that the correct one is used.

I honestly don't even know why we have UseSHM. Seems redundant, and since it uses SystemV shared memory which has a different semantics from mmap, it is subtly broken in a number of places (eg https://bugs.openjdk.java.net/browse/JDK-8257040 or https://bugs.openjdk.java.net/browse/JDK-8257041).

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v2]

Thomas Stuefe
On Wed, 25 Nov 2020 13:58:49 GMT, Thomas Stuefe <[hidden email]> wrote:

>> I agree with what Thomas is saying. This should be a generic thing for reservations, as I've suggested before, choosing the largest page size given the size of the mapping. I would also be good with starting with the `UseHugeTLBFS` case.
>>
>> When it comes to testing, we should not hard code these kind of things in the test, but add WhiteBox functions that return the correct numbers given the platform and environment.
>>
>>         WhiteBox wb = WhiteBox.getWhiteBox();
>>         smallPageSize = wb.getVMPageSize();
>>         smallPageSize = wb.getVMPageSize();
>>         largePageSize = wb.getVMLargePageSize();
>>         largePageSize = wb.getVMLargePageSize();
>>         largePageExecSize = 2097152;
>> So instead of hard coding this, I guess the correct approach would be to return an array of available page sizes and verify that the correct one is used.
>
> I honestly don't even know why we have UseSHM. Seems redundant, and since it uses SystemV shared memory which has a different semantics from mmap, it is subtly broken in a number of places (eg https://bugs.openjdk.java.net/browse/JDK-8257040 or https://bugs.openjdk.java.net/browse/JDK-8257041).

One thing I stumbled upon while looking at this code is why the CodeHeap always wants to have at least 8 pages covering its range:

  // If large page support is enabled, align code heaps according to large
  // page size to make sure that code cache is covered by large pages.
  const size_t alignment = MAX2(page_size(false, 8), (size_t) os::vm_allocation_granularity());

which means that for a wish pagesize of 1G, the code cache would have to cover at least 8G. I am not even sure this is possible, isn't it limited to 4G?

Anyway, they don't uncommit. And the comment in codecache.cpp indicates this is to be able to step-wise commit, but with huge pages the space is committed right from the start anyway. So I do not see what good these 8 pages do. If we allowed the CodeCache to use just one page, it could be 1G in size and use a single 1G page.

Note that there are similar min_page_size requests in GC, but I did not look closer into them.

Also, this does not take away the usefulness of this proposal.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v3]

Marcus G K Williams
In reply to this post by Marcus G K Williams
> Add 2M LargePages to _page_sizes
>
>     Use 2m pages for large page requests
>     less than 1g on linux when 1G are default
>     pages
>
>     - Add os::Linux::large_page_size_2m() that
>     returns 2m as size
>     - Add os::Linux::select_large_page_size() to return
>     correct large page size for size_t bytes
>     - Add 2m size to _page_sizes array
>     - Update reserve_memory_special methods
>     to set/use large_page_size based on bytes reserved
>     - Update large page not reserved warnings
>     to include large_page_size attempted
>     - Update TestLargePageUseForAuxMemory.java
>     to expect 2m large pages in some instances
>
> Signed-off-by: Marcus G K Williams <[hidden email]>

Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains four additional commits since the last revision:

 - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp
 - Add 2M LargePages to _page_sizes
   
   Use 2m pages for large page requests
   less than 1g on linux when 1G are default
   pages
   
   - Add os::Linux::large_page_size_2m() that
   returns 2m as size
   - Add os::Linux::select_large_page_size() to return
   correct large page size for size_t bytes
   - Add 2m size to _page_sizes array
   - Update reserve_memory_special methods
   to set/use large_page_size based on bytes reserved
   - Update large page not reserved warnings
   to include large_page_size attempted
   - Update TestLargePageUseForAuxMemory.java
   to expect 2m large pages in some instances
   
   Signed-off-by: Marcus G K Williams <[hidden email]>
 - Merge remote-tracking branch 'upstream/master' into update_hlp
 - Add 2M LargePages to _page_sizes
   
   Use 2m pages for large page requests
   less than 1g on linux when 1G are default
   pages
   
   - Add os::Linux::large_page_size_2m() that
   returns 2m as size
   - Add os::Linux::select_large_page_size() to return
   correct large page size for size_t bytes
   - Add 2m size to _page_sizes array
   - Update reserve_memory_special methods
   to set/use large_page_size based on bytes reserved
   - Update large page not reserved warnings
   to include large_page_size attempted
   - Update TestLargePageUseForAuxMemory.java
   to expect 2m large pages in some instances
   
   Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1153/files
  - new: https://git.openjdk.java.net/jdk/pull/1153/files/7092bec8..57e54963

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=01-02

  Stats: 204961 lines in 1352 files changed: 133754 ins; 50869 del; 20338 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1153.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256155: 2M large pages for code when LargePageSizeInBytes is set to 1G for heap [v3]

Marcus G K Williams
In reply to this post by Thomas Stuefe
On Sun, 29 Nov 2020 08:17:09 GMT, Thomas Stuefe <[hidden email]> wrote:

>> I honestly don't even know why we have UseSHM. Seems redundant, and since it uses SystemV shared memory which has a different semantics from mmap, it is subtly broken in a number of places (eg https://bugs.openjdk.java.net/browse/JDK-8257040 or https://bugs.openjdk.java.net/browse/JDK-8257041).
>
> One thing I stumbled upon while looking at this code is why the CodeHeap always wants to have at least 8 pages covering its range:
>
>   // If large page support is enabled, align code heaps according to large
>   // page size to make sure that code cache is covered by large pages.
>   const size_t alignment = MAX2(page_size(false, 8), (size_t) os::vm_allocation_granularity());
>
> which means that for a wish pagesize of 1G, the code cache would have to cover at least 8G. I am not even sure this is possible, isn't it limited to 4G?
>
> Anyway, they don't uncommit. And the comment in codecache.cpp indicates this is to be able to step-wise commit, but with huge pages the space is committed right from the start anyway. So I do not see what good these 8 pages do. If we allowed the CodeCache to use just one page, it could be 1G in size and use a single 1G page.
>
> Note that there are similar min_page_size requests in GC, but I did not look closer into them.
>
> Also, this does not take away the usefulness of this proposal.

Working on addressing comments in the code on this PR. Should have a tested change pushed later today and replies to comments.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v3]

Marcus G K Williams
In reply to this post by Thomas Stuefe
On Sun, 29 Nov 2020 08:17:09 GMT, Thomas Stuefe <[hidden email]> wrote:

> One thing I stumbled upon while looking at this code is why the CodeHeap always wants to have at least 8 pages covering its range:
>
> ```
>   // If large page support is enabled, align code heaps according to large
>   // page size to make sure that code cache is covered by large pages.
>   const size_t alignment = MAX2(page_size(false, 8), (size_t) os::vm_allocation_granularity());
> ```
>
> which means that for a wish pagesize of 1G, the code cache would have to cover at least 8G. I am not even sure this is possible, isn't it limited to 4G?
>
> Anyway, they don't uncommit. And the comment in codecache.cpp indicates this is to be able to step-wise commit, but with huge pages the space is committed right from the start anyway. So I do not see what good these 8 pages do. If we allowed the CodeCache to use just one page, it could be 1G in size and use a single 1G page.
>
> Note that there are similar min_page_size requests in GC, but I did not look closer into them.
>
> Also, this does not take away the usefulness of this proposal.

Interesting. I'll look at min_page_size requests in GC and codecache in relation to large pages and see what kind of optimization can be done in another JDK-Issue/PR.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v4]

Marcus G K Williams
In reply to this post by Marcus G K Williams
> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using
> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k).
>
> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size().
>
> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved.

Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:

 - Adress Comments, Rework changes for PagesizeSet
   
   Signed-off-by: Marcus G K Williams <[hidden email]>
 - JDK-8257588: Make os::_page_sizes a bitmask #1522
 - Merge branch 'master' into update_hlp
 - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp
 - Add 2M LargePages to _page_sizes
   
   Use 2m pages for large page requests
   less than 1g on linux when 1G are default
   pages
   
   - Add os::Linux::large_page_size_2m() that
   returns 2m as size
   - Add os::Linux::select_large_page_size() to return
   correct large page size for size_t bytes
   - Add 2m size to _page_sizes array
   - Update reserve_memory_special methods
   to set/use large_page_size based on bytes reserved
   - Update large page not reserved warnings
   to include large_page_size attempted
   - Update TestLargePageUseForAuxMemory.java
   to expect 2m large pages in some instances
   
   Signed-off-by: Marcus G K Williams <[hidden email]>
 - Merge remote-tracking branch 'upstream/master' into update_hlp
 - Add 2M LargePages to _page_sizes
   
   Use 2m pages for large page requests
   less than 1g on linux when 1G are default
   pages
   
   - Add os::Linux::large_page_size_2m() that
   returns 2m as size
   - Add os::Linux::select_large_page_size() to return
   correct large page size for size_t bytes
   - Add 2m size to _page_sizes array
   - Update reserve_memory_special methods
   to set/use large_page_size based on bytes reserved
   - Update large page not reserved warnings
   to include large_page_size attempted
   - Update TestLargePageUseForAuxMemory.java
   to expect 2m large pages in some instances
   
   Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1153/files
  - new: https://git.openjdk.java.net/jdk/pull/1153/files/57e54963..0901e70e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=02-03

  Stats: 2740 lines in 107 files changed: 1814 ins; 632 del; 294 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1153.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v4]

Marcus G K Williams
In reply to this post by Marcus G K Williams
On Thu, 3 Dec 2020 23:22:17 GMT, Marcus G K Williams <[hidden email]> wrote:

>> One thing I stumbled upon while looking at this code is why the CodeHeap always wants to have at least 8 pages covering its range:
>>
>>   // If large page support is enabled, align code heaps according to large
>>   // page size to make sure that code cache is covered by large pages.
>>   const size_t alignment = MAX2(page_size(false, 8), (size_t) os::vm_allocation_granularity());
>>
>> which means that for a wish pagesize of 1G, the code cache would have to cover at least 8G. I am not even sure this is possible, isn't it limited to 4G?
>>
>> Anyway, they don't uncommit. And the comment in codecache.cpp indicates this is to be able to step-wise commit, but with huge pages the space is committed right from the start anyway. So I do not see what good these 8 pages do. If we allowed the CodeCache to use just one page, it could be 1G in size and use a single 1G page.
>>
>> Note that there are similar min_page_size requests in GC, but I did not look closer into them.
>>
>> Also, this does not take away the usefulness of this proposal.
>
>> One thing I stumbled upon while looking at this code is why the CodeHeap always wants to have at least 8 pages covering its range:
>>
>> ```
>>   // If large page support is enabled, align code heaps according to large
>>   // page size to make sure that code cache is covered by large pages.
>>   const size_t alignment = MAX2(page_size(false, 8), (size_t) os::vm_allocation_granularity());
>> ```
>>
>> which means that for a wish pagesize of 1G, the code cache would have to cover at least 8G. I am not even sure this is possible, isn't it limited to 4G?
>>
>> Anyway, they don't uncommit. And the comment in codecache.cpp indicates this is to be able to step-wise commit, but with huge pages the space is committed right from the start anyway. So I do not see what good these 8 pages do. If we allowed the CodeCache to use just one page, it could be 1G in size and use a single 1G page.
>>
>> Note that there are similar min_page_size requests in GC, but I did not look closer into them.
>>
>> Also, this does not take away the usefulness of this proposal.
>
> Interesting. I'll look at min_page_size requests in GC and codecache in relation to large pages and see what kind of optimization can be done in another JDK-Issue/PR.

Recent push is dependent on and includes #1522 - when it is updated, I will update here.

Removed 2M/exec specific code. Re-wrote to take advantage of #1522. Attempted to address other comments. Please let me know if I've missed something.

My apologies this took so long to update, we had a long holiday weekend.

Thanks again for all of the review @kstefanj and @tstuefe!

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v4]

Marcus G K Williams
In reply to this post by Marcus G K Williams
On Thu, 3 Dec 2020 23:48:15 GMT, Marcus G K Williams <[hidden email]> wrote:

>> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using
>> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k).
>>
>> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size().
>>
>> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved.
>
> Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains seven additional commits since the last revision:
>
>  - Adress Comments, Rework changes for PagesizeSet
>    
>    Signed-off-by: Marcus G K Williams <[hidden email]>
>  - JDK-8257588: Make os::_page_sizes a bitmask #1522
>  - Merge branch 'master' into update_hlp
>  - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp
>  - Add 2M LargePages to _page_sizes
>    
>    Use 2m pages for large page requests
>    less than 1g on linux when 1G are default
>    pages
>    
>    - Add os::Linux::large_page_size_2m() that
>    returns 2m as size
>    - Add os::Linux::select_large_page_size() to return
>    correct large page size for size_t bytes
>    - Add 2m size to _page_sizes array
>    - Update reserve_memory_special methods
>    to set/use large_page_size based on bytes reserved
>    - Update large page not reserved warnings
>    to include large_page_size attempted
>    - Update TestLargePageUseForAuxMemory.java
>    to expect 2m large pages in some instances
>    
>    Signed-off-by: Marcus G K Williams <[hidden email]>
>  - Merge remote-tracking branch 'upstream/master' into update_hlp
>  - Add 2M LargePages to _page_sizes
>    
>    Use 2m pages for large page requests
>    less than 1g on linux when 1G are default
>    pages
>    
>    - Add os::Linux::large_page_size_2m() that
>    returns 2m as size
>    - Add os::Linux::select_large_page_size() to return
>    correct large page size for size_t bytes
>    - Add 2m size to _page_sizes array
>    - Update reserve_memory_special methods
>    to set/use large_page_size based on bytes reserved
>    - Update large page not reserved warnings
>    to include large_page_size attempted
>    - Update TestLargePageUseForAuxMemory.java
>    to expect 2m large pages in some instances
>    
>    Signed-off-by: Marcus G K Williams <[hidden email]>

src/hotspot/os/linux/os_linux.cpp line 3900:

> 3898:     err);                                               \
> 3899:   } while (0)
> 3900:

Remove this remnant of UseSHM changes.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v5]

Thomas Stuefe
In reply to this post by Marcus G K Williams
On Fri, 4 Dec 2020 00:07:12 GMT, Marcus G K Williams <[hidden email]> wrote:

>> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using
>> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k).
>>
>> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size().
>>
>> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved.
>
> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:
>
>   Remove remnant UseSHM change
>  
>   Signed-off-by: Marcus G K Williams <[hidden email]>

Hi Marcus,

I generally like this patch. I will do a more thorough review later. But could this wait please until after JDK16 has been forked off? Since I would like this to spend some more times cooking on our more exotic Linuxes.

Cheers, Thomas

src/hotspot/os/linux/os_linux.cpp line 3743:

> 3741:       // The kernel is using kB, hotspot uses bytes
> 3742:       if (page_size * K > (size_t)Linux::page_size()) {
> 3743:         if (!os::page_sizes().is_set(page_size * K)) {

is_set is not needed, just call add

-------------

PR: https://git.openjdk.java.net/jdk/pull/1153
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8256155: os::Linux Populate all large_page_sizes, select smallest page size in reserve_memory_special_huge_tlbfs* [v6]

Marcus G K Williams
In reply to this post by Marcus G K Williams
> When using LargePageSizeInBytes=1G, os::Linux::reserve_memory_special_huge_tlbfs* cannot select large pages smaller than 1G. Code heap usually uses less than 1G, so currently the code precludes code heap from using
> Large pages in this circumstance and when os::Linux::reserve_memory_special_huge_tlbfs* is called page sizes fall back to Linux::page_size() (usually 4k).
>
> This change allows the above use case by populating all large_page_sizes present in /sys/kernel/mm/hugepages in _page_sizes upon calling os::Linux::setup_large_page_size().
>
> In os::Linux::reserve_memory_special_huge_tlbfs* we then select the largest large page size available in _page_sizes that is smaller than bytes being reserved.

Marcus G K Williams has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains 10 commits:

 - Thomas S. Feedback
   
   Signed-off-by: Marcus G K Williams <[hidden email]>
 - Merge branch 'master' into update_hlp
 - Remove remnant UseSHM change
   
   Signed-off-by: Marcus G K Williams <[hidden email]>
 - Adress Comments, Rework changes for PagesizeSet
   
   Signed-off-by: Marcus G K Williams <[hidden email]>
 - JDK-8257588: Make os::_page_sizes a bitmask #1522
 - Merge branch 'master' into update_hlp
 - Merge branch 'update_hlp' of github.com:mgkwill/jdk into update_hlp
 - Add 2M LargePages to _page_sizes
   
   Use 2m pages for large page requests
   less than 1g on linux when 1G are default
   pages
   
   - Add os::Linux::large_page_size_2m() that
   returns 2m as size
   - Add os::Linux::select_large_page_size() to return
   correct large page size for size_t bytes
   - Add 2m size to _page_sizes array
   - Update reserve_memory_special methods
   to set/use large_page_size based on bytes reserved
   - Update large page not reserved warnings
   to include large_page_size attempted
   - Update TestLargePageUseForAuxMemory.java
   to expect 2m large pages in some instances
   
   Signed-off-by: Marcus G K Williams <[hidden email]>
 - Merge remote-tracking branch 'upstream/master' into update_hlp
 - Add 2M LargePages to _page_sizes
   
   Use 2m pages for large page requests
   less than 1g on linux when 1G are default
   pages
   
   - Add os::Linux::large_page_size_2m() that
   returns 2m as size
   - Add os::Linux::select_large_page_size() to return
   correct large page size for size_t bytes
   - Add 2m size to _page_sizes array
   - Update reserve_memory_special methods
   to set/use large_page_size based on bytes reserved
   - Update large page not reserved warnings
   to include large_page_size attempted
   - Update TestLargePageUseForAuxMemory.java
   to expect 2m large pages in some instances
   
   Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Changes: https://git.openjdk.java.net/jdk/pull/1153/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1153&range=05
  Stats: 71 lines in 4 files changed: 53 ins; 0 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1153.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1153/head:pull/1153

PR: https://git.openjdk.java.net/jdk/pull/1153
1234