RFR(s): 8185278: TestGreyReclaimedHumongousObjects.java fails guarantee(index != trim_index(_head_index + 1)) failed: should not go past head

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR(s): 8185278: TestGreyReclaimedHumongousObjects.java fails guarantee(index != trim_index(_head_index + 1)) failed: should not go past head

sangheon.kim@oracle.com
Hi all,

Could I have some reviews for fixing G1 MMU concurrency problem?

* Background
As this bug occurred with DetGC option enabled, it became a confidential
bug.
The short description is:
------------------------------------
gc/g1/TestGreyReclaimedHumongousObjects.java
     Test failed the following assert on Linux X64 64-bit Server VM:

     # Internal Error (g1MMUTracker.cpp:155), pid=9433, tid=9449
     # guarantee(index != trim_index(_head_index + 1)) failed: should
not go past head
------------------------------------

* Analysis
Considering G1MMUTrackerQueue::_head_index and _tail_index couldn't be
same with _no_entries==QueueLength, the data is corrupted.
G1MMUTrackerQueue::add_pause() is only called from VMThread but
G1MMUTrackerQueue::when_sec() which internally calls
remove_expired_entries() can be called from ConcurrentMarkThread
concurrently. And when_sec() is guarded by MMUTracker_lock while
add_pause() is not guarded.

* Proposal
Instead of adding MMUTracker_lock at add_pause(), it would be better to
use SuspendibleThreadSetJoiner as there are 2 additional benefits.
1. If there is running young gc but not yet updated its gc time for MMU,
its gc time will be reflected at this MMU calculation as STS will
suspend ConcurrentMarkThread.
2. ConcurrentMarkThread will not sleep if there is young gc which makes
MMU more accurate.

CR: https://bugs.openjdk.java.net/browse/JDK-8185278
Webrev: http://cr.openjdk.java.net/~sangheki/8185278/webrev.0/
Testing: JPRT

Thanks,
Sangheon

[Bonus]
** Core dump analysis
I guess it happened near index 35 as G1MMUTrackerQueue::_array[35] is
recorded for "Pause Cleanup".
In this case, there are 2 gcs which meet the condition of 'end time >
limit' at G1MMUTrackerQueue::when_internal(). i.e.
(gdb) p _array[35] <= _mark_cleanup_start_sec
$106 = {_start_time = 48.060288561, _end_time = 48.060476459}
(gdb) p _array[36]
$107 = {_start_time = 48.061733350000004, _end_time = 48.062563427000001}
(gdb) p _array[37] <= _head_index == _tail_index == 37
$108 = {_start_time = 48.062946299000004, _end_time = 48.063457630000002}
(gdb) p _array[38] <= The oldest gc record
$109 = {_start_time = 47.888210037, _end_time = 47.888929198}

I think _tail_index should be 38.
Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8185278: TestGreyReclaimedHumongousObjects.java fails guarantee(index != trim_index(_head_index + 1)) failed: should not go past head

Thomas Schatzl
Hi Sangheon,

On Thu, 2017-10-12 at 16:19 -0700, sangheon.kim wrote:
> Hi all,
>
> Could I have some reviews for fixing G1 MMU concurrency problem?
>

thanks :)

> * Background
>
[...]

> * Analysis
> Considering G1MMUTrackerQueue::_head_index and _tail_index couldn't
> be same with _no_entries==QueueLength, the data is corrupted.
> G1MMUTrackerQueue::add_pause() is only called from VMThread but 
> G1MMUTrackerQueue::when_sec() which internally calls 
> remove_expired_entries() can be called from ConcurrentMarkThread 
> concurrently. And when_sec() is guarded by MMUTracker_lock while 
> add_pause() is not guarded.
>
> * Proposal
> Instead of adding MMUTracker_lock at add_pause(), it would be better
> to use SuspendibleThreadSetJoiner as there are 2 additional benefits.
> 1. If there is running young gc but not yet updated its gc time for
> MMU, its gc time will be reflected at this MMU calculation as STS
> will suspend ConcurrentMarkThread.
> 2. ConcurrentMarkThread will not sleep if there is young gc which
> makes MMU more accurate.
>
> CR: https://bugs.openjdk.java.net/browse/JDK-8185278
> Webrev: http://cr.openjdk.java.net/~sangheki/8185278/webrev.0/
> Testing: JPRT
>

Some minor comments:

 - the comment at concurrentMarkThread.cpp:144 seems to be misplaced.
Imho it should be above the SuspendibleThreadSetJoiner call, not as
method comment.

 - in that comment: "If there is running young gc but not yet updated
MMU, its gc time will be reflected at this MMU calculation." -> "If
currently a gc is running, but it has not yet updated the MMU, we will
not forget to consider that pause in the MMU calculation"

 - in that comment, not sure what the third option refers to, maybe the
CPU usage of the ConcurrentMarkThread?

 - concurrentMarkThread.cpp:131: please move the method description (is
it?) to the method definition. Further I would prefer if it read
something like: "Delay marking to meet MMU" only which sums up what it
does. The other part of that comment about that we can schedule pauses
flexibly would probably better at the call site of the method but could
be skipped as well. This comment is a rather weak suggestion, feel free
to ignore it completely.

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8185278: TestGreyReclaimedHumongousObjects.java fails guarantee(index != trim_index(_head_index + 1)) failed: should not go past head

sangheon.kim@oracle.com
Hi Thomas,

Thanks for looking at this.

On 10/18/2017 03:41 AM, Thomas Schatzl wrote:

> Hi Sangheon,
>
> On Thu, 2017-10-12 at 16:19 -0700, sangheon.kim wrote:
>> Hi all,
>>
>> Could I have some reviews for fixing G1 MMU concurrency problem?
>>
> thanks :)
>
>> * Background
>>
> [...]
>> * Analysis
>> Considering G1MMUTrackerQueue::_head_index and _tail_index couldn't
>> be same with _no_entries==QueueLength, the data is corrupted.
>> G1MMUTrackerQueue::add_pause() is only called from VMThread but
>> G1MMUTrackerQueue::when_sec() which internally calls
>> remove_expired_entries() can be called from ConcurrentMarkThread
>> concurrently. And when_sec() is guarded by MMUTracker_lock while
>> add_pause() is not guarded.
>>
>> * Proposal
>> Instead of adding MMUTracker_lock at add_pause(), it would be better
>> to use SuspendibleThreadSetJoiner as there are 2 additional benefits.
>> 1. If there is running young gc but not yet updated its gc time for
>> MMU, its gc time will be reflected at this MMU calculation as STS
>> will suspend ConcurrentMarkThread.
>> 2. ConcurrentMarkThread will not sleep if there is young gc which
>> makes MMU more accurate.
>>
>> CR: https://bugs.openjdk.java.net/browse/JDK-8185278
>> Webrev: http://cr.openjdk.java.net/~sangheki/8185278/webrev.0/
>> Testing: JPRT
>>
> Some minor comments:
>
>   - the comment at concurrentMarkThread.cpp:144 seems to be misplaced.
> Imho it should be above the SuspendibleThreadSetJoiner call, not as
> method comment.
Done.

>
>   - in that comment: "If there is running young gc but not yet updated
> MMU, its gc time will be reflected at this MMU calculation." -> "If
> currently a gc is running, but it has not yet updated the MMU, we will
> not forget to consider that pause in the MMU calculation"
Done.

>
>   - in that comment, not sure what the third option refers to, maybe the
> CPU usage of the ConcurrentMarkThread?
Updated as it already made you confused.
What I wanted to explain was if a gc is running, CM thread will wait the
gc to be finished.
And then sleep for predicted amount of time. This will make MMU
calculation more accurate than before. i.e. previously CM thread slept
concurrently with VM thread which is not supposed to do so.

BTW, if the comment of 3 reasons is not helpful, I'm okay to completely
deleting it.

>
>   - concurrentMarkThread.cpp:131: please move the method description (is
> it?) to the method definition. Further I would prefer if it read
> something like: "Delay marking to meet MMU" only which sums up what it
> does. The other part of that comment about that we can schedule pauses
> flexibly would probably better at the call site of the method but could
> be skipped as well. This comment is a rather weak suggestion, feel free
> to ignore it completely.
As you may know, this is not a part of my change.
But I agree with you, so updated as you commented.

webrev:
http://cr.openjdk.java.net/~sangheki/8185278/webrev.1/
http://cr.openjdk.java.net/~sangheki/8185278/webrev.1_to_0/

Thanks,
Sangheon


>
> Thanks,
>    Thomas
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8185278: TestGreyReclaimedHumongousObjects.java fails guarantee(index != trim_index(_head_index + 1)) failed: should not go past head

Erik Helin-2
On 10/18/2017 11:05 PM, sangheon.kim wrot> webrev:
> http://cr.openjdk.java.net/~sangheki/8185278/webrev.1/
> http://cr.openjdk.java.net/~sangheki/8185278/webrev.1_to_0/

Thanks for taking care of this bug Sangheon! Looks good to me, Reviewed.

Thanks,
Erik

> Thanks,
> Sangheon
Reply | Threaded
Open this post in threaded view
|

Re: RFR(s): 8185278: TestGreyReclaimedHumongousObjects.java fails guarantee(index != trim_index(_head_index + 1)) failed: should not go past head

Thomas Schatzl
In reply to this post by sangheon.kim@oracle.com
Hi,

On Wed, 2017-10-18 at 14:05 -0700, sangheon.kim wrote:
> Hi Thomas,
>
>
[...]
> As you may know, this is not a part of my change.
> But I agree with you, so updated as you commented.
>
> webrev:
> http://cr.openjdk.java.net/~sangheki/8185278/webrev.1/
> http://cr.openjdk.java.net/~sangheki/8185278/webrev.1_to_0/
>
>

 ship it :) Sorry for the late reply.

Thomas