RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale

Thomas Stuefe
The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.

Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.

Background:

The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.

This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.

That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.

But it messes up statistics a bit. Example:

jcmd xxx VM.metaspace

<snip>

Chunk freelists:
<snip>
        Both:

  4m: (none)
  2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
  1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
512k: (none)
256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
 64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
 32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
 16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
  8k: (none)
  4k: (none)
  2k: (none)
  1k: (none)
Total word size: 5,83 MB, committed: 0 bytes ( 0%)


In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.

The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:

        Both:

  4m: (none)
  2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
  1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
256k: (none)
128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
 64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
 32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
 16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
  8k: (none)
  4k: (none)
  2k: (none)
  1k: (none)
Total word size: 8,78 MB, committed: 96,00 KB ( 1%)

Thanks, Thomas

-------------

Commit messages:
 - Add gtest for Vsn::is_range_fully_committed
 - Initial

Changes: https://git.openjdk.java.net/jdk/pull/1291/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8255884
  Stats: 226 lines in 6 files changed: 199 ins; 14 del; 13 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1291/head:pull/1291

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v2]

Thomas Stuefe
> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>
> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>
> Background:
>
> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>
> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>
> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>
> But it messes up statistics a bit. Example:
>
> jcmd xxx VM.metaspace
>
> <snip>
>
> Chunk freelists:
> <snip>
>         Both:
>
>   4m: (none)
>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
> 512k: (none)
> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>
>
> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>
> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>
>         Both:
>
>   4m: (none)
>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
> 256k: (none)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>
> Thanks, Thomas

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge
 - Add gtest for Vsn::is_range_fully_committed
 - Initial

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1291/files
  - new: https://git.openjdk.java.net/jdk/pull/1291/files/f433fe70..037d3b2e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=00-01

  Stats: 83546 lines in 637 files changed: 74001 ins; 2961 del; 6584 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1291/head:pull/1291

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v2]

Coleen Phillimore-3
On Fri, 27 Nov 2020 06:35:16 GMT, Thomas Stuefe <[hidden email]> wrote:

>> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>>
>> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>>
>> Background:
>>
>> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>>
>> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>>
>> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>>
>> But it messes up statistics a bit. Example:
>>
>> jcmd xxx VM.metaspace
>>
>> <snip>
>>
>> Chunk freelists:
>> <snip>
>>         Both:
>>
>>   4m: (none)
>>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
>> 512k: (none)
>> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
>> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>>   8k: (none)
>>   4k: (none)
>>   2k: (none)
>>   1k: (none)
>> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>>
>>
>> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>>
>> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>>
>>         Both:
>>
>>   4m: (none)
>>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
>> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
>> 256k: (none)
>> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>>   8k: (none)
>>   4k: (none)
>>   2k: (none)
>>   1k: (none)
>> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>>
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
>  - Merge
>  - Add gtest for Vsn::is_range_fully_committed
>  - Initial

src/hotspot/share/memory/metaspace/metachunk.cpp line 185:

> 183:     for (Metachunk* c = prev_in_vs(); c != NULL && c->base() >= granule_start; c = c->prev_in_vs()) {
> 184:       assert(c->committed_words() == 0, "neighbor was already committed?");
> 185:       c->_committed_words = c->word_size();

Can the previous metachunk be partially contained in the commit granule?  ie c->base < granule_start but c->end >= granule start?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v2]

Thomas Stuefe
On Thu, 3 Dec 2020 00:02:50 GMT, Coleen Phillimore <[hidden email]> wrote:

>> Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>>  - Merge
>>  - Add gtest for Vsn::is_range_fully_committed
>>  - Initial
>
> src/hotspot/share/memory/metaspace/metachunk.cpp line 185:
>
>> 183:     for (Metachunk* c = prev_in_vs(); c != NULL && c->base() >= granule_start; c = c->prev_in_vs()) {
>> 184:       assert(c->committed_words() == 0, "neighbor was already committed?");
>> 185:       c->_committed_words = c->word_size();
>
> Can the previous metachunk be partially contained in the commit granule?  ie c->base < granule_start but c->end >= granule start?

No. A chunk smaller than a granule is fully contained in one granule; a chunk larger than a granule contains only full granules.

Which is inbuilt since both chunks and granules are pow2 values, and start at addresses aligned to their size.

But just to make things very explicit I'll add an assert for this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v3]

Thomas Stuefe
In reply to this post by Thomas Stuefe
> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>
> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>
> Background:
>
> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>
> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>
> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>
> But it messes up statistics a bit. Example:
>
> jcmd xxx VM.metaspace
>
> <snip>
>
> Chunk freelists:
> <snip>
>         Both:
>
>   4m: (none)
>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
> 512k: (none)
> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>
>
> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>
> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>
>         Both:
>
>   4m: (none)
>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
> 256k: (none)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>
> Thanks, Thomas

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  Add an assert to make chunk-to-granule-geometry explicit

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1291/files
  - new: https://git.openjdk.java.net/jdk/pull/1291/files/037d3b2e..205425e8

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=01-02

  Stats: 27 lines in 3 files changed: 24 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1291/head:pull/1291

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale

Thomas Stuefe
In reply to this post by Thomas Stuefe
On Wed, 18 Nov 2020 13:33:33 GMT, Thomas Stuefe <[hidden email]> wrote:

> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>
> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>
> Background:
>
> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>
> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>
> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>
> But it messes up statistics a bit. Example:
>
> jcmd xxx VM.metaspace
>
> <snip>
>
> Chunk freelists:
> <snip>
>         Both:
>
>   4m: (none)
>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
> 512k: (none)
> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>
>
> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>
> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>
>         Both:
>
>   4m: (none)
>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
> 256k: (none)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>
> Thanks, Thomas

Note: I decided to move this fix up to JDK17. The fix is fine and simple, but since Elastic Metaspace was itself a large patch - which is well tested now - I'd like to minimize any unnecessary changes.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v4]

Thomas Stuefe
In reply to this post by Thomas Stuefe
> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>
> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>
> Background:
>
> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>
> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>
> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>
> But it messes up statistics a bit. Example:
>
> jcmd xxx VM.metaspace
>
> <snip>
>
> Chunk freelists:
> <snip>
>         Both:
>
>   4m: (none)
>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
> 512k: (none)
> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>
>
> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>
> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>
>         Both:
>
>   4m: (none)
>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
> 256k: (none)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>
> Thanks, Thomas

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - Fix death test
 - Merge
 - Add an assert to make chunk-to-granule-geometry explicit
 - Merge
 - Add gtest for Vsn::is_range_fully_committed
 - Initial

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1291/files
  - new: https://git.openjdk.java.net/jdk/pull/1291/files/205425e8..901d42ba

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=02-03

  Stats: 68824 lines in 1566 files changed: 50262 ins; 11937 del; 6625 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1291/head:pull/1291

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v5]

Thomas Stuefe
In reply to this post by Thomas Stuefe
> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>
> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>
> Background:
>
> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>
> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>
> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>
> But it messes up statistics a bit. Example:
>
> jcmd xxx VM.metaspace
>
> <snip>
>
> Chunk freelists:
> <snip>
>         Both:
>
>   4m: (none)
>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
> 512k: (none)
> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>
>
> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>
> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>
>         Both:
>
>   4m: (none)
>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
> 256k: (none)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>
> Thanks, Thomas

Thomas Stuefe has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains seven commits:

 - Merge
 - Fix death test
 - Merge
 - Add an assert to make chunk-to-granule-geometry explicit
 - Merge
 - Add gtest for Vsn::is_range_fully_committed
 - Initial

-------------

Changes: https://git.openjdk.java.net/jdk/pull/1291/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=04
  Stats: 251 lines in 6 files changed: 221 ins; 12 del; 18 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1291/head:pull/1291

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v6]

Thomas Stuefe
In reply to this post by Thomas Stuefe
> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>
> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>
> Background:
>
> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>
> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>
> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>
> But it messes up statistics a bit. Example:
>
> jcmd xxx VM.metaspace
>
> <snip>
>
> Chunk freelists:
> <snip>
>         Both:
>
>   4m: (none)
>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
> 512k: (none)
> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>
>
> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>
> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>
>         Both:
>
>   4m: (none)
>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
> 256k: (none)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>
> Thanks, Thomas

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  Fix builds

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/1291/files
  - new: https://git.openjdk.java.net/jdk/pull/1291/files/e6f5c8c4..9bacc700

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=1291&range=04-05

  Stats: 5 lines in 3 files changed: 0 ins; 0 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/1291.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/1291/head:pull/1291

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v6]

Coleen Phillimore-3
On Tue, 2 Feb 2021 06:51:55 GMT, Thomas Stuefe <[hidden email]> wrote:

>> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>>
>> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>>
>> Background:
>>
>> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>>
>> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>>
>> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>>
>> But it messes up statistics a bit. Example:
>>
>> jcmd xxx VM.metaspace
>>
>> <snip>
>>
>> Chunk freelists:
>> <snip>
>>         Both:
>>
>>   4m: (none)
>>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
>> 512k: (none)
>> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
>> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>>   8k: (none)
>>   4k: (none)
>>   2k: (none)
>>   1k: (none)
>> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>>
>>
>> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>>
>> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>>
>>         Both:
>>
>>   4m: (none)
>>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
>> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
>> 256k: (none)
>> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>>   8k: (none)
>>   4k: (none)
>>   2k: (none)
>>   1k: (none)
>> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>>
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
>   Fix builds

This looks good as much as I follow the details.  Sorry for the delay in reviewing it.

src/hotspot/share/memory/metaspace/metachunk.cpp line 194:

> 192:
> 193:   DEBUG_ONLY(verify();)
> 194:   DEBUG_ONLY(verify_neighborhood();)

Is this verification expensive wrt time in fastdebug?  Should it be SOMETIMES?

-------------

Marked as reviewed by coleenp (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v6]

Thomas Stuefe
On Sat, 13 Feb 2021 13:58:21 GMT, Coleen Phillimore <[hidden email]> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Fix builds
>
> src/hotspot/share/memory/metaspace/metachunk.cpp line 194:
>
>> 192:
>> 193:   DEBUG_ONLY(verify();)
>> 194:   DEBUG_ONLY(verify_neighborhood();)
>
> Is this verification expensive wrt time in fastdebug?  Should it be SOMETIMES?

Should not be terribly expensive, but I will move it into the condition above. Its only needed if we modified the neighbors. That would bring down invocation times considerably.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8255884: Metaspace: chunk local commit counter may be stale [v6]

Thomas Stuefe
In reply to this post by Coleen Phillimore-3
On Sat, 13 Feb 2021 13:58:55 GMT, Coleen Phillimore <[hidden email]> wrote:

> This looks good as much as I follow the details. Sorry for the delay in reviewing it.

Thanks Coleen.

The patch ran in our systems for three months now and seems stable.

OTOH, looking critically at this now after three months, I fear it adds too much complexity for a small increase in statistics precision. I'll spend some cycles to check if there is another, less complex approach. Therefore I decided to withdraw the PR for now.

I may separate the comment changes and some of the verifications and put them into a separate RFE.

Thank you!

..Thomas

-------------

PR: https://git.openjdk.java.net/jdk/pull/1291
Reply | Threaded
Open this post in threaded view
|

Withdrawn: JDK-8255884: Metaspace: chunk local commit counter may be stale

Thomas Stuefe
In reply to this post by Thomas Stuefe
On Wed, 18 Nov 2020 13:33:33 GMT, Thomas Stuefe <[hidden email]> wrote:

> The detailed metaspace report (`jcmd VM.metaspace`) shows the ChunkManager statistics. In these statistics, the commit counter for chunk levels < granule size may be wrong.
>
> Note that this is not a big deal. It does not affect the total commit counter, which is directly taken from the virtual memory layer, and which is always right. But it can be confusing for analysts.
>
> Background:
>
> The "truth" about the commit state of granules is kept inside VirtualSpaceNode in a bitmask ("commit mask"). Since this is a global resource and access to it is synchronized, each chunk keeps a commit watermark to know the size of the range which is guaranteed to be committed and which can be used without checking that bitmap. This watermark is checked when allocating from the chunk.
>
> This chunk-local watermark can get stale if the granule underneath the chunk gets committed without the chunk noticing. The only way this can happen is if the chunk is smaller than a granule and some in-granule neighbor was committed.
>
> That in itself works as designed and is totally benign. The chunk-local commit watermark will get silently corrected on the next allocation.
>
> But it messes up statistics a bit. Example:
>
> jcmd xxx VM.metaspace
>
> <snip>
>
> Chunk freelists:
> <snip>
>         Both:
>
>   4m: (none)
>   2m: 2, capacity=4,00 MB, committed=0 bytes ( 0%)
>   1m: 1, capacity=1,00 MB, committed=0 bytes ( 0%)
> 512k: (none)
> 256k: 2, capacity=512,00 KB, committed=0 bytes ( 0%)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 2, capacity=128,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=0 bytes ( 0%) <<<<< actually, these are committed
>  16k: 1, capacity=16,00 KB, committed=0 bytes ( 0%) <<<<< these too
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 5,83 MB, committed: 0 bytes ( 0%)
>
>
> In this example, the 32K and 16K chunks in the freelist had been committed as a side effect of comitting a neighboring chunk. Their watermarks are still at zero though and therefore they appear uncommitted here.
>
> The solution would be to - when a small chunk gets committed - adjust the watermarks of his in-granule neighbors. Then it looks like this:
>
>         Both:
>
>   4m: (none)
>   2m: 3, capacity=6,00 MB, committed=0 bytes ( 0%)
>   1m: 2, capacity=2,00 MB, committed=0 bytes ( 0%)
> 512k: 1, capacity=512,00 KB, committed=0 bytes ( 0%)
> 256k: (none)
> 128k: 1, capacity=128,00 KB, committed=0 bytes ( 0%)
>  64k: 1, capacity=64,00 KB, committed=0 bytes ( 0%)
>  32k: 2, capacity=64,00 KB, committed=64,00 KB (100%) <<<<< good
>  16k: 2, capacity=32,00 KB, committed=32,00 KB (100%) <<<<< good
>   8k: (none)
>   4k: (none)
>   2k: (none)
>   1k: (none)
> Total word size: 8,78 MB, committed: 96,00 KB ( 1%)
>
> Thanks, Thomas

This pull request has been closed without being integrated.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1291