RFR: 8258431: Provide a JFR event with live set size estimate

classic Classic list List threaded Threaded
40 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.

## Introducing new JFR event

While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.

## Implementation

The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.

The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.

### Epsilon GC

Trivial implementation - just return `used()` instead.

### Serial GC

Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).

### Parallel GC

For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).

### G1 GC

Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.

### Shenandoah

In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.

### ZGC

`ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.

-------------

Commit messages:
 - 8258431: Provide a JFR event with live set size estimate

Changes: https://git.openjdk.java.net/jdk/pull/2579/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2579&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8258431
  Stats: 177 lines in 33 files changed: 172 ins; 1 del; 4 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2579.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2579/head:pull/2579

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Aleksey Shipilev-5
On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik <[hidden email]> wrote:

> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>
> ## Introducing new JFR event
>
> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>
> ## Implementation
>
> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>
> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>
> ### Epsilon GC
>
> Trivial implementation - just return `used()` instead.
>
> ### Serial GC
>
> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>
> ### Parallel GC
>
> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>
> ### G1 GC
>
> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>
> ### Shenandoah
>
> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>
> ### ZGC
>
> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.

Interesting! Cursory review follows.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4578:

> 4576:
> 4577: void G1CollectedHeap::set_live(size_t bytes) {
> 4578:   Atomic::release_store(&_live_size, bytes);

I don't think this requires `release_store`, regular `store` would be enough. G1 folks can say for sure.

src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 100:

> 98:   HeapWord* mem_allocate_old_gen(size_t size);
> 99:
> 100:

Excess newline?

src/hotspot/share/gc/shared/collectedHeap.hpp line 217:

> 215:   virtual size_t capacity() const = 0;
> 216:   virtual size_t used() const = 0;
> 217:   // a best-effort estimate of the live set size

Suggestion:

// Returns the estimate of live set size. Because live set changes over time,
// this is a best-effort estimate by each of the implementations. These usually
// are most precise right after the GC cycle.

src/hotspot/share/gc/shared/genCollectedHeap.cpp line 1144:

> 1142:   _old_gen->prepare_for_compaction(&cp);
> 1143:   _young_gen->prepare_for_compaction(&cp);
> 1144:

Stray newline?

src/hotspot/share/gc/shared/genCollectedHeap.hpp line 183:

> 181:     size_t live = _live_size;
> 182:     return live > 0 ? live : used();
> 183:   };

I think the implementation belongs to `genCollectedHeap.cpp`.

src/hotspot/share/gc/shared/generation.hpp line 140:

> 138:   virtual size_t used() const = 0;      // The number of used bytes in the gen.
> 139:   virtual size_t free() const = 0;      // The number of free bytes in the gen.
> 140:   virtual size_t live() const = 0;

Needs a comment to match the lines above? Say, `// The estimate of live bytes in the gen.`

src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 579:

> 577:     event.set_heapLive(heap->live());
> 578:     event.commit();
> 579:   }

On the first sight, this belongs in `ShenandoahConcurrentMark::finish_mark()`. Placing the event here would fire the event when concurrent GC is cancelled, which is not what you want.

src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 265:

> 263:   ShenandoahHeap* const heap = ShenandoahHeap::heap();
> 264:   heap->set_concurrent_mark_in_progress(false);
> 265:   heap->mark_finished();

Let's not rename this method. Introduce a new method, `ShenandoahHeap::update_live`, and call it every time after `mark_complete_marking_context()` is called.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 627:

> 625:
> 626: size_t ShenandoahHeap::live() const {
> 627:   size_t live = Atomic::load_acquire(&_live);

I understand you copy-pasted from the same file. We have removed `_acquire` with #2504. Do `Atomic::load` here.

src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 655:

> 653:
> 654: void ShenandoahHeap::set_live(size_t bytes) {
> 655:   Atomic::release_store_fence(&_live, bytes);

Same, do `Atomic::store` here.

src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 494:

> 492:   mark_complete_marking_context();
> 493:
> 494:   class ShenandoahCollectLiveSizeClosure : public ShenandoahHeapRegionClosure {

We don't usually use the in-method declarations like these, pull it out of the method.

src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 511:

> 509:
> 510:   ShenandoahCollectLiveSizeClosure cl;
> 511:   heap_region_iterate(&cl);

I think you want `parallel_heap_region_iterate` on this path, and do `Atomic::add(&_live, r->get_live_data_bytes())` in the closure. We shall see if this makes sense to make fully concurrently...

src/hotspot/share/gc/epsilon/epsilonHeap.hpp line 80:

> 78:   virtual size_t capacity()     const { return _virtual_space.committed_size(); }
> 79:   virtual size_t used()         const { return _space->used(); }
> 80:   virtual size_t live()         const { return used(); }

I'd prefer to call `_space->used()` directly here. Minor optimization, I know.

-------------

Changes requested by shade (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Aleksey Shipilev-5
On Thu, 18 Feb 2021 10:23:37 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/shenandoah/shenandoahHeap.cpp line 627:
>
>> 625:
>> 626: size_t ShenandoahHeap::live() const {
>> 627:   size_t live = Atomic::load_acquire(&_live);
>
> I understand you copy-pasted from the same file. We have removed `_acquire` with #2504. Do `Atomic::load` here.

...which also means you want to merge from master to get recent changes?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Albert Mingkun Yang
In reply to this post by Jaroslav Bachorik-3
On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik <[hidden email]> wrote:

> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>
> ## Introducing new JFR event
>
> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>
> ## Implementation
>
> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>
> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>
> ### Epsilon GC
>
> Trivial implementation - just return `used()` instead.
>
> ### Serial GC
>
> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>
> ### Parallel GC
>
> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>
> ### G1 GC
>
> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>
> ### Shenandoah
>
> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>
> ### ZGC
>
> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.

Additionally, some test(s) on this new feature would be nice. Maybe you can add sth in `HeapSummaryEventAllGcs`?

PS: I was looking into how to get periodic heap usage info just a few days ago, and settled for `MemProfiling` as a workaround. Thank you for the patch.

src/hotspot/share/jfr/periodic/jfrPeriodic.cpp line 649:

> 647: TRACE_REQUEST_FUNC(HeapUsageSummary) {
> 648:   EventHeapUsageSummary event;
> 649:   if (event.should_commit()) {

I believe the `should_commit` check is not needed; the period check is handle by the caller.

src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 79:

> 77:   size_t _young_live;
> 78:   size_t _eden_live;
> 79:   size_t _old_live;

It's only the sum that's ever exposed, right? I wonder if it makes sense to merge them into one var to only track the sum.

-------------

Changes requested by ayang (Author).

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Per Liden-2
In reply to this post by Jaroslav Bachorik-3
On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik <[hidden email]> wrote:

> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>
> ## Introducing new JFR event
>
> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>
> ## Implementation
>
> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>
> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>
> ### Epsilon GC
>
> Trivial implementation - just return `used()` instead.
>
> ### Serial GC
>
> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>
> ### Parallel GC
>
> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>
> ### G1 GC
>
> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>
> ### Shenandoah
>
> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>
> ### ZGC
>
> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.

src/hotspot/share/gc/z/zStat.hpp line 549:

> 547:   static size_t used_at_mark_start();
> 548:   static size_t used_at_relocate_end();
> 549:   static size_t live();

Please call this `live_at_mark_end()` to match the names of the neighboring functions.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Thomas Schatzl-4
In reply to this post by Jaroslav Bachorik-3
On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik <[hidden email]> wrote:

> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>
> ## Introducing new JFR event
>
> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>
> ## Implementation
>
> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>
> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>
> ### Epsilon GC
>
> Trivial implementation - just return `used()` instead.
>
> ### Serial GC
>
> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>
> ### Parallel GC
>
> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>
> ### G1 GC
>
> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>
> ### Shenandoah
>
> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>
> ### ZGC
>
> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.

The change also misses liveness update after G1 Full GC: it should at least reset the internal liveness counter to 0 so that `used()` is used.
I think there is the same issue for Parallel Full GC. Serial seems to be handled.

src/hotspot/share/gc/shared/collectedHeap.hpp line 217:

> 215:   virtual size_t capacity() const = 0;
> 216:   virtual size_t used() const = 0;
> 217:   // a best-effort estimate of the live set size

I would prefer @shipilev's comment. Also I would like to suggest to call this method `live_estimate()` to set the expectations right.

src/hotspot/share/gc/g1/g1ConcurrentMark.cpp line 1114:

> 1112:
> 1113:   _g1h->set_live(live_size * HeapWordSize);
> 1114:

This code is located in the wrong place. It will return only the live words for the areas that have been marked, not eden or objects allocated in old gen after the marking started.

Further it iterates over all regions, which can be large compared to actually active regions.

A better place is in `G1UpdateRemSetTrackingBeforeRebuild::do_heap_region()` after the last method call - at that point, `HeapRegion::live_bytes()` contains the per-region number of live data for all regions.

`G1UpdateRemSetTrackingBeforeRebuild` is instantiated and then called by multiple threads. It's probably best that that `HeapClosure` locally sums up the live byte estimates and then in the caller `G1UpdateRemSetTrackingBeforeRebuildTask::work()` sums up the per thread results like is done for `G1UpdateRemSetTrackingBeforeRebuildTask::_total_selected_for_rebuild`, which is then set in the caller of the `G1UpdateRemSetTrackingBeforeRebuildTask`.

src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 1850:

> 1848: size_t G1CollectedHeap::live() const {
> 1849:   size_t size = Atomic::load(&_live_size);
> 1850:   return size > 0 ? size : used();

note that `used()` is susceptible to fluttering due to memory ordering problems: since its result consists of multiple reads, you can get readings from very different situations.
It is recommended to use `used_unlocked()` instead, which does not take allocation regions and archive regions into account, but at least it is not susceptible to jumping around when re-reading it in quick succession.

src/hotspot/share/gc/parallel/parallelScavengeHeap.inline.hpp line 49:

> 47:   _young_live = young_gen()->used_in_bytes();
> 48:   _eden_live = young_gen()->eden_space()->used_in_bytes();
> 49:   _old_live = old_gen()->used_in_bytes();

`_young_live` already seems to contain `_eden_live` looking at the implementation of `PSYoungGen::used_in_bytes()`:

I.e.

`size_t PSYoungGen::used_in_bytes() const {
  return eden_space()->used_in_bytes()
       + from_space()->used_in_bytes();      // to_space() is only used during scavenge
}
`

but maybe I'm wrong here.

src/hotspot/share/gc/shared/genCollectedHeap.cpp line 683:

> 681:   }
> 682:   // update the live size after last GC
> 683:   _live_size = _young_gen->live() + _old_gen->live();

I would prefer if that code were placed into `gc_epilogue`.

src/hotspot/share/gc/shared/space.inline.hpp line 189:

> 187:         oop obj = oop(cur_obj);
> 188:         size_t obj_size = obj->size();
> 189:         live_offset += obj_size;

It seems more natural to me to put this counting into the `DeadSpacer` as this is what this change does. Also, the actual dead space "used" can be calculated from the difference between the `_allowed_deadspace_words` and the maximum (calculated in the constructor of `DeadSpacer`) afaict at the end of evacuation. So there is no need to incur per-object costs during evacuation at all.

-------------

Changes requested by tschatzl (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Thomas Schatzl-4
In reply to this post by Aleksey Shipilev-5
On Thu, 18 Feb 2021 10:15:37 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4578:
>
>> 4576:
>> 4577: void G1CollectedHeap::set_live(size_t bytes) {
>> 4578:   Atomic::release_store(&_live_size, bytes);
>
> I don't think this requires `release_store`, regular `store` would be enough. G1 folks can say for sure.

Not required.

> src/hotspot/share/gc/shared/genCollectedHeap.hpp line 183:
>
>> 181:     size_t live = _live_size;
>> 182:     return live > 0 ? live : used();
>> 183:   };
>
> I think the implementation belongs to `genCollectedHeap.cpp`.

+1. Does not seem to be performance sensitive.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Thomas Schatzl-4
In reply to this post by Albert Mingkun Yang
On Fri, 19 Feb 2021 08:22:56 GMT, Albert Mingkun Yang <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/parallel/parallelScavengeHeap.hpp line 79:
>
>> 77:   size_t _young_live;
>> 78:   size_t _eden_live;
>> 79:   size_t _old_live;
>
> It's only the sum that's ever exposed, right? I wonder if it makes sense to merge them into one var to only track the sum.

I agree because they seem to be always read and written at the same time.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Erik Gahlin-2
In reply to this post by Jaroslav Bachorik-3
On Mon, 15 Feb 2021 17:23:44 GMT, Jaroslav Bachorik <[hidden email]> wrote:

> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>
> ## Introducing new JFR event
>
> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>
> ## Implementation
>
> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>
> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>
> ### Epsilon GC
>
> Trivial implementation - just return `used()` instead.
>
> ### Serial GC
>
> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>
> ### Parallel GC
>
> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>
> ### G1 GC
>
> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>
> ### Shenandoah
>
> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>
> ### ZGC
>
> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.

src/hotspot/share/jfr/metadata/metadata.xml line 205:

> 203:     <Field type="ulong" contentType="bytes" name="capacity" label="Heap Capacity" description="Maximum number of bytes to be allocated by objects in the heap" />
> 204:     <Field type="ulong" contentType="bytes" name="used" label="Heap Used" description="Bytes allocated by objects in the heap" />
> 205:     <Field type="ulong" contentType="bytes" name="live" label="Heap Live" description="Live bytes allocated by objects in the heap" />

I think it would be good to mention in the description that it is an estimate, i.e. "Estimate of live bytes ....".

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Thomas Schatzl-4
On Mon, 22 Feb 2021 16:50:48 GMT, Thomas Schatzl <[hidden email]> wrote:

>> src/hotspot/share/gc/g1/g1CollectedHeap.cpp line 4578:
>>
>>> 4576:
>>> 4577: void G1CollectedHeap::set_live(size_t bytes) {
>>> 4578:   Atomic::release_store(&_live_size, bytes);
>>
>> I don't think this requires `release_store`, regular `store` would be enough. G1 folks can say for sure.
>
> Not required.

👍

>> src/hotspot/share/gc/shared/genCollectedHeap.hpp line 183:
>>
>>> 181:     size_t live = _live_size;
>>> 182:     return live > 0 ? live : used();
>>> 183:   };
>>
>> I think the implementation belongs to `genCollectedHeap.cpp`.
>
> +1. Does not seem to be performance sensitive.

👍

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Aleksey Shipilev-5
On Thu, 18 Feb 2021 10:18:03 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/shared/genCollectedHeap.cpp line 1144:
>
>> 1142:   _old_gen->prepare_for_compaction(&cp);
>> 1143:   _young_gen->prepare_for_compaction(&cp);
>> 1144:
>
> Stray newline?

😊

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Aleksey Shipilev-5
On Thu, 18 Feb 2021 10:19:31 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/shared/generation.hpp line 140:
>
>> 138:   virtual size_t used() const = 0;      // The number of used bytes in the gen.
>> 139:   virtual size_t free() const = 0;      // The number of free bytes in the gen.
>> 140:   virtual size_t live() const = 0;
>
> Needs a comment to match the lines above? Say, `// The estimate of live bytes in the gen.`

👍

> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 579:
>
>> 577:     event.set_heapLive(heap->live());
>> 578:     event.commit();
>> 579:   }
>
> On the first sight, this belongs in `ShenandoahConcurrentMark::finish_mark()`. Placing the event here would fire the event when concurrent GC is cancelled, which is not what you want.

👍

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Aleksey Shipilev-5
In reply to this post by Thomas Schatzl-4
On Mon, 22 Feb 2021 17:20:49 GMT, Thomas Schatzl <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> The change also misses liveness update after G1 Full GC: it should at least reset the internal liveness counter to 0 so that `used()` is used.
> I think there is the same issue for Parallel Full GC. Serial seems to be handled.

Another general comment about Shenandoah. It would seem easier to piggyback liveness summarization on region iteration that heuristics does at the end of mark anyway. See `ShenandoahHeuristics::choose_collection_set`. I can do that when you are done with your changes, or try it yourself.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Aleksey Shipilev-5
On Thu, 18 Feb 2021 10:22:58 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/shenandoah/shenandoahConcurrentMark.cpp line 265:
>
>> 263:   ShenandoahHeap* const heap = ShenandoahHeap::heap();
>> 264:   heap->set_concurrent_mark_in_progress(false);
>> 265:   heap->mark_finished();
>
> Let's not rename this method. Introduce a new method, `ShenandoahHeap::update_live`, and call it every time after `mark_complete_marking_context()` is called.

👍

> src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 494:
>
>> 492:   mark_complete_marking_context();
>> 493:
>> 494:   class ShenandoahCollectLiveSizeClosure : public ShenandoahHeapRegionClosure {
>
> We don't usually use the in-method declarations like these, pull it out of the method.

👍

> src/hotspot/share/gc/epsilon/epsilonHeap.hpp line 80:
>
>> 78:   virtual size_t capacity()     const { return _virtual_space.committed_size(); }
>> 79:   virtual size_t used()         const { return _space->used(); }
>> 80:   virtual size_t live()         const { return used(); }
>
> I'd prefer to call `_space->used()` directly here. Minor optimization, I know.

👍

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Per Liden-2
On Mon, 22 Feb 2021 08:44:25 GMT, Per Liden <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/z/zStat.hpp line 549:
>
>> 547:   static size_t used_at_mark_start();
>> 548:   static size_t used_at_relocate_end();
>> 549:   static size_t live();
>
> Please call this `live_at_mark_end()` to match the names of the neighboring functions.

👍

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Albert Mingkun Yang
On Fri, 19 Feb 2021 08:21:36 GMT, Albert Mingkun Yang <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/jfr/periodic/jfrPeriodic.cpp line 649:
>
>> 647: TRACE_REQUEST_FUNC(HeapUsageSummary) {
>> 648:   EventHeapUsageSummary event;
>> 649:   if (event.should_commit()) {
>
> I believe the `should_commit` check is not needed; the period check is handle by the caller.

👍

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Thomas Schatzl-4
On Mon, 22 Feb 2021 17:12:43 GMT, Thomas Schatzl <[hidden email]> wrote:

>> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>>
>> ## Introducing new JFR event
>>
>> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
>> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>>
>> ## Implementation
>>
>> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>>
>> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>>
>> ### Epsilon GC
>>
>> Trivial implementation - just return `used()` instead.
>>
>> ### Serial GC
>>
>> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>>
>> ### Parallel GC
>>
>> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>>
>> ### G1 GC
>>
>> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>>
>> ### Shenandoah
>>
>> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
>> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>>
>> ### ZGC
>>
>> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.
>
> src/hotspot/share/gc/shared/genCollectedHeap.cpp line 683:
>
>> 681:   }
>> 682:   // update the live size after last GC
>> 683:   _live_size = _young_gen->live() + _old_gen->live();
>
> I would prefer if that code were placed into `gc_epilogue`.

👍

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate

Jaroslav Bachorik-3
In reply to this post by Jaroslav Bachorik-3
On Mon, 1 Mar 2021 14:03:37 GMT, Jaroslav Bachorik <[hidden email]> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahConcurrentGC.cpp line 579:
>>
>>> 577:     event.set_heapLive(heap->live());
>>> 578:     event.commit();
>>> 579:   }
>>
>> On the first sight, this belongs in `ShenandoahConcurrentMark::finish_mark()`. Placing the event here would fire the event when concurrent GC is cancelled, which is not what you want.
>
> 👍

Actually, this shouldn't even be here. `EventGCHeapSummary` is emitted via `trace_heap*` calls which should already be hooked into Shenandoah. Let me remove this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate [v2]

Jaroslav Bachorik-3
In reply to this post by Jaroslav Bachorik-3
> The purpose of this change is to  expose a 'cheap' estimate of the current live set size (the meaning of 'current' is dependent on each particular GC implementation but in worst case 'at last full GC') in form of a periodically emitted JFR event.
>
> ## Introducing new JFR event
>
> While there is already 'GC Heap Summary' JFR event it does not fit the requirements as it is closely tied to GC cycle so eg. for ZGC or Shenandoah it may not happen for quite a long time, increasing the risk of not having the heap summary events being present in the JFR recording at all.
> Because of this I am proposing to add a new 'Heap Usage Summary' event which will be emitted periodically, by default on each JFR chunk, and will contain the information abut the heap capacity, the used and live bytes. This information is available from all GC implementations and can be provided at literally any time.
>
> ## Implementation
>
> The implementation differs from GC to GC because each GC algorithm/implementation provides a slightly different way to track the liveness. The common part is `size_t live() const` method added to `CollectedHeap` superclass and the use of a cached 'liveness' value computed after the last GC cycle. If `liveness` hasn't been calculated yet the implementation will default to returning 'used' value.
>
> The implementations are based on my (rather shallow) knowledge of inner working of the respective GC engines and I am open to suggestions to make them better/correct.
>
> ### Epsilon GC
>
> Trivial implementation - just return `used()` instead.
>
> ### Serial GC
>
> Here we utilize the fact that mark-copy phase is naturally compacting so the number of bytes after copy is 'live' and that the mark-sweep implementation keeps an internal info about objects being 'dead' but excluded from the compaction effort and we can these numbers to derive the old-gen live set size (used bytes minus the cumulative size of the 'un-dead' objects).
>
> ### Parallel GC
>
> For Parallel GC the liveness is calculated as the sum of used bytes in all regions after the last GC cycle. This seems to be a safe bet because this collector is always compacting (AFAIK).
>
> ### G1 GC
>
> Using `G1ConcurrentMark::remark()` method the live set size is computed as a sum of `_live_words` from the associated `G1RegionMarkStats` objects. Here I am not 100% sure this approach covers all eventualities and it would be great to have someone skilled in G1 implementation to chime in so I can fix it. However, the numbers I am getting for G1 are comparable to other GCs for the same application.
>
> ### Shenandoah
>
> In Shenandoah, the regions are keeping the liveness info. However, the VM op that is used for iterating regions is a safe-pointing one so it would be great to run it in an already safe-pointed context.
> This leads to hooking into `ShenandoahConcurrentMark::finish_mark()` and `ShenandoahSTWMark::mark()` where at the end of the marking process the liveness info is summarized and set to `ShenandoahHeap::_live` volatile field - which is later read by the event emitting code.
>
> ### ZGC
>
> `ZStatHeap` is already holding the liveness info - so this implementation is just making it accessible via `ZCollectedHeap::live()` method.

Jaroslav Bachorik has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains ten additional commits since the last revision:

 - Merge remote-tracking branch 'origin/master' into jb/live_set_1
 - Change dead space calculation
 - Common PR fixes
 - Minor G1 related PR fixes
 - Epsilon related PR fixes
 - Shenandoah related PR fixes
 - Rename ZStatHeap::live() to live_at_mark_end()
 - Update event definition and emission
 - 8258431: Provide a JFR event with live set size estimate

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2579/files
  - new: https://git.openjdk.java.net/jdk/pull/2579/files/ddc5b5c1..03a8617e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2579&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2579&range=00-01

  Stats: 45701 lines in 1355 files changed: 27365 ins; 10881 del; 7455 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2579.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2579/head:pull/2579

PR: https://git.openjdk.java.net/jdk/pull/2579
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8258431: Provide a JFR event with live set size estimate [v2]

Jaroslav Bachorik-3
In reply to this post by Aleksey Shipilev-5
On Mon, 1 Mar 2021 14:17:17 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> The change also misses liveness update after G1 Full GC: it should at least reset the internal liveness counter to 0 so that `used()` is used.
>> I think there is the same issue for Parallel Full GC. Serial seems to be handled.
>
> Another general comment about Shenandoah. It would seem easier to piggyback liveness summarization on region iteration that heuristics does at the end of mark anyway. See `ShenandoahHeuristics::choose_collection_set`. I can do that when you are done with your changes, or try it yourself.

I have addressed comments with trivial fixes.
Will take a look at the remainder of more complex ones next.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2579
12