Re: RFR 8215523: jstat reports incorrect values for OU for CMS GC

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: RFR 8215523: jstat reports incorrect values for OU for CMS GC

serguei.spitsyn@oracle.com
Hi Poonam,

The fix looks Okay from the Serviceability point of view.

Thanks,
Serguei


On 6/21/19 13:30, Poonam Parhar wrote:
Updating the subject line with the updated bug synopsis.

And also, much thanks to Thomas Schatzl for his remarkable help in tweaking and refining this fix.

regards,
Poonam

On 6/21/19 12:21 PM, Poonam Parhar wrote:
Hello,

Please review the changes for the fix of:

Bug 8215523: jstat reports incorrect values for OU for CMS GC
Webrev: http://cr.openjdk.java.net/~poonam/8215523/webrev.00/

Problem:
With CMS, at certain points, 'OU' (Old usage) values reported in the jstat output increase first and then drop down. This is wrong because the 'old usage' drops down without any old generation collection.

Cause:
For the 'used' values of different memory spaces, jstat uses a periodic task to read these values and update the corresponding jstat counters.

For CMS generation, there is a sampler that periodically updates the 'used' counter:

 class GenerationUsedHelper : public PerfLongSampleHelper {
103 private:
104 Generation* _gen;
105
106 public:
107 GenerationUsedHelper(Generation* g) : _gen(g) { }
108
109 inline jlong take_sample() {
110 return _gen->used();
111 }
112 };

The above used() call in turn makes a call to CompactibleFreeListSpace::used().

376 size_t CompactibleFreeListSpace::used() const {
377 return capacity() - free();
378 }
379
380 size_t CompactibleFreeListSpace::free() const {
381 // "MT-safe, but not MT-precise"(TM), if you will: i.e.
382 // if you do this while the structures are in flux you
383 // may get an approximate answer only; for instance
384 // because there is concurrent allocation either
385 // directly by mutators or for promotion during a GC.
386 // It's "MT-safe", however, in the sense that you are guaranteed
387 // not to crash and burn, for instance, because of walking
388 // pointers that could disappear as you were walking them.
389 // The approximation is because the various components
390 // that are read below are not read atomically (and
391 // further the computation of totalSizeInIndexedFreeLists()
392 // is itself a non-atomic computation. The normal use of
393 // this is during a resize operation at the end of GC
394 // and at that time you are guaranteed to get the
395 // correct actual value. However, for instance, this is
396 // also read completely asynchronously by the "perf-sampler"
397 // that supports jvmstat, and you are apt to see the values
398 // flicker in such cases.
399 assert(_dictionary != NULL, "No _dictionary?");
400 return (_dictionary->total_chunk_size(DEBUG_ONLY(freelistLock())) +
401 totalSizeInIndexedFreeLists() +
402 _smallLinearAllocBlock._word_size) * HeapWordSize;
403 }

As stated above in the code comments, free() can return an imprecise approximate value when allocations are happening in the CompactibleFreeListSpace at the same time, e.g. during direct allocations or promotions.

Solution:
For CMS, maintain an internal variable for the space usage. This internal variable gets updated only at points where the CMS space usage computation is guaranteed to be accurate. jstat and other memory usage monitoring tools get access to this stable value of used space.

Testing: jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2,hs-tier3

Thanks,
Poonam




Reply | Threaded
Open this post in threaded view
|

Re: RFR 8215523: jstat reports incorrect values for OU for CMS GC

Poonam Parhar
Hello Severin,

Thanks for catching that. Yes, I meant jstat. I will correct that.

regards,
Poonam


On 6/28/19 9:32 AM, Severin Gehwolf wrote:

> Hi Poonam,
>
> One nit caught my eye:
>
> On Fri, 2019-06-21 at 13:30 -0700, Poonam Parhar wrote:
>> Please review the changes for the fix of:
>>> Bug 8215523: jstat reports incorrect values for OU for CMS GC
>>> Webrev: http://cr.openjdk.java.net/~poonam/8215523/webrev.00/
> http://cr.openjdk.java.net/%7Epoonam/8215523/webrev.00/src/hotspot/share/gc/cms/compactibleFreeListSpace.hpp.sdiff.html
>
> Line 419 the comment mentions "... required for jhat and other ...".
> jhat no longer exists. Do you mean jstat?
>
> Thanks,
> Severin
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR 8215523: jstat reports incorrect values for OU for CMS GC

Thomas Schatzl
In reply to this post by serguei.spitsyn@oracle.com
Hi,

On Fri, 2019-06-28 at 07:19 -0700, Poonam Parhar wrote:
> Hello,
>
> Thomas Schatzl pointed out that the recalculate_used_stable() call
> in collect_in_background() may not be necessary. Testing the changes
> after removing that call confirmed that. Here's the updated webrev:
> http://cr.openjdk.java.net/~poonam/8215523/webrev.01/

  looks good.

Thomas :)



Reply | Threaded
Open this post in threaded view
|

Re: RFR 8215523: jstat reports incorrect values for OU for CMS GC

serguei.spitsyn@oracle.com
In reply to this post by serguei.spitsyn@oracle.com
Hi Poonam,

The update still looks good to me (modulo renaming jhat => jstat).

Thanks,
Serguei


On 6/28/19 07:19, Poonam Parhar wrote:
Hello,

Thomas Schatzl pointed out that the recalculate_used_stable() call in collect_in_background() may not be necessary. Testing the changes after removing that call confirmed that. Here's the updated webrev:
http://cr.openjdk.java.net/~poonam/8215523/webrev.01/

Thanks,
Poonam

On 6/21/19 5:59 PM, [hidden email] wrote:
Hi Poonam,

The fix looks Okay from the Serviceability point of view.

Thanks,
Serguei


On 6/21/19 13:30, Poonam Parhar wrote:
Updating the subject line with the updated bug synopsis.

And also, much thanks to Thomas Schatzl for his remarkable help in tweaking and refining this fix.

regards,
Poonam

On 6/21/19 12:21 PM, Poonam Parhar wrote:
Hello,

Please review the changes for the fix of:

Bug 8215523: jstat reports incorrect values for OU for CMS GC
Webrev: http://cr.openjdk.java.net/~poonam/8215523/webrev.00/

Problem:
With CMS, at certain points, 'OU' (Old usage) values reported in the jstat output increase first and then drop down. This is wrong because the 'old usage' drops down without any old generation collection.

Cause:
For the 'used' values of different memory spaces, jstat uses a periodic task to read these values and update the corresponding jstat counters.

For CMS generation, there is a sampler that periodically updates the 'used' counter:

 class GenerationUsedHelper : public PerfLongSampleHelper {
103 private:
104 Generation* _gen;
105
106 public:
107 GenerationUsedHelper(Generation* g) : _gen(g) { }
108
109 inline jlong take_sample() {
110 return _gen->used();
111 }
112 };

The above used() call in turn makes a call to CompactibleFreeListSpace::used().

376 size_t CompactibleFreeListSpace::used() const {
377 return capacity() - free();
378 }
379
380 size_t CompactibleFreeListSpace::free() const {
381 // "MT-safe, but not MT-precise"(TM), if you will: i.e.
382 // if you do this while the structures are in flux you
383 // may get an approximate answer only; for instance
384 // because there is concurrent allocation either
385 // directly by mutators or for promotion during a GC.
386 // It's "MT-safe", however, in the sense that you are guaranteed
387 // not to crash and burn, for instance, because of walking
388 // pointers that could disappear as you were walking them.
389 // The approximation is because the various components
390 // that are read below are not read atomically (and
391 // further the computation of totalSizeInIndexedFreeLists()
392 // is itself a non-atomic computation. The normal use of
393 // this is during a resize operation at the end of GC
394 // and at that time you are guaranteed to get the
395 // correct actual value. However, for instance, this is
396 // also read completely asynchronously by the "perf-sampler"
397 // that supports jvmstat, and you are apt to see the values
398 // flicker in such cases.
399 assert(_dictionary != NULL, "No _dictionary?");
400 return (_dictionary->total_chunk_size(DEBUG_ONLY(freelistLock())) +
401 totalSizeInIndexedFreeLists() +
402 _smallLinearAllocBlock._word_size) * HeapWordSize;
403 }

As stated above in the code comments, free() can return an imprecise approximate value when allocations are happening in the CompactibleFreeListSpace at the same time, e.g. during direct allocations or promotions.

Solution:
For CMS, maintain an internal variable for the space usage. This internal variable gets updated only at points where the CMS space usage computation is guaranteed to be accurate. jstat and other memory usage monitoring tools get access to this stable value of used space.

Testing: jdk-tier1,jdk-tier2,jdk-tier3,hs-tier1,hs-tier2,hs-tier3

Thanks,
Poonam