RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Stefan Johansson
Hi,

Please review the implementation of JEP-307:
https://bugs.openjdk.java.net/browse/JDK-8172890

Webrev:
http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/

Summary:
As communicated late last year [1], I've been working on parallelizing
the Full GC for G1. The implementation is now ready for review.

The approach I chose was to redo marking at the start of the Full GC and
not reuse the marking information from the concurrent mark cycle. The
main reason behind this is to maximize the chance of freeing up memory.
I reused the marking bitmap from the concurrent mark code though, so
instead of marking in the mark word a bitmap is used. The mark word is
still used for forwarding pointers, so marks will still have to be
preserved for some objects.

The algorithm is still a four phased mark-compact but each phase is
handled by parallel workers. Marking and reference processing is done in
phase 1. In phase 2 all worker threads work through the heap claiming
regions which they prepare for compaction. This is done by installing
forwarding pointers into the mark word of the live objects that will
move. The regions claimed by a worker in this phase will be the same
regions that the worker will compact in phase 4. This ensures that
objects are not overwritten before compacted.

In phase 3, all pointers to other objects are updated by looking at the
forwarding pointers. At this point all information needed to create new
remembered sets is available and this rebuilding has been added to phase
3. In the old version remembered set rebuilding was done separately
after the compaction, but this is more efficient.

As mentioned phase 4 is when the compaction is done. In this first
version, to avoid some complexity, there is no work stealing in this
phase. This will lead to some imbalance between the workers, but this
can be treated as a separate RFE in the future.

The part of this work that has generated the most questions during
internal discussions are the serial parts of phase 2 and 4. They are
executed if no regions are to be freed up by the parallel workers. It is
kind of a safety mechanism to avoid throwing a premature OOM. In the
case of no regions being freed by the parallel code path a single
threaded pass over the last region of each worker is done (at most
number-of-workers regions are handled) to further compact these regions
and hopefully free up some regions.

Testing:
* A lot of local sanity testing, both functional and performance.
* Passed tier 1-5 of internal testing on supported platforms.
* No regressions in performance testing.

Cheers,
Stefan

[1]
http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Stefan Johansson
Hi,

We're moving forward with the review internally and doing some
performance enhancements as well. Here are updated webrevs:
Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/

Note that the full webrev is based on the new consolidated repo, but the
incremental was generated with the old structure.

Highlight in this update:
* Cleaned out unused code in PreservedMarks.
* Fixed memory leak in GenericTaskQueueSet.
* HeapRegionClaimerBase has been removed and instead we now have two
functions to iterate through all heap regions.
* General cleanups and renames to ease understanding the code.
* G1 Hot Card Cache cleanup made parallel and moved into appropriate phase.
* Updated HeapRegion::apply_to_marked_objects to be a template function
to avoid virtual call.

Thanks Erik D and Thomas S for all comments so far.

Cheers,
Stefan


On 2017-09-04 17:36, Stefan Johansson wrote:

> Hi,
>
> Please review the implementation of JEP-307:
> https://bugs.openjdk.java.net/browse/JDK-8172890
>
> Webrev:
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>
> Summary:
> As communicated late last year [1], I've been working on parallelizing
> the Full GC for G1. The implementation is now ready for review.
>
> The approach I chose was to redo marking at the start of the Full GC
> and not reuse the marking information from the concurrent mark cycle.
> The main reason behind this is to maximize the chance of freeing up
> memory. I reused the marking bitmap from the concurrent mark code
> though, so instead of marking in the mark word a bitmap is used. The
> mark word is still used for forwarding pointers, so marks will still
> have to be preserved for some objects.
>
> The algorithm is still a four phased mark-compact but each phase is
> handled by parallel workers. Marking and reference processing is done
> in phase 1. In phase 2 all worker threads work through the heap
> claiming regions which they prepare for compaction. This is done by
> installing forwarding pointers into the mark word of the live objects
> that will move. The regions claimed by a worker in this phase will be
> the same regions that the worker will compact in phase 4. This ensures
> that objects are not overwritten before compacted.
>
> In phase 3, all pointers to other objects are updated by looking at
> the forwarding pointers. At this point all information needed to
> create new remembered sets is available and this rebuilding has been
> added to phase 3. In the old version remembered set rebuilding was
> done separately after the compaction, but this is more efficient.
>
> As mentioned phase 4 is when the compaction is done. In this first
> version, to avoid some complexity, there is no work stealing in this
> phase. This will lead to some imbalance between the workers, but this
> can be treated as a separate RFE in the future.
>
> The part of this work that has generated the most questions during
> internal discussions are the serial parts of phase 2 and 4. They are
> executed if no regions are to be freed up by the parallel workers. It
> is kind of a safety mechanism to avoid throwing a premature OOM. In
> the case of no regions being freed by the parallel code path a single
> threaded pass over the last region of each worker is done (at most
> number-of-workers regions are handled) to further compact these
> regions and hopefully free up some regions.
>
> Testing:
> * A lot of local sanity testing, both functional and performance.
> * Passed tier 1-5 of internal testing on supported platforms.
> * No regressions in performance testing.
>
> Cheers,
> Stefan
>
> [1]
> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Stefan Johansson
Hi again,

A lot more internal review has been done and here are the latest webrevs:
Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-04/
Incremental (from previous mail):
http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/

Summary of changes:
* Updated calculations for heap sizing after gc.
* Removed G1FullGCWorkerData and moved data into G1FullCollector instead.
* Renamed G1MarkStack to G1FullGCMarker.
* Renamed G1CompactionPoint to G1FullGCCompactionPoint.
* Removed now unused RebuildRSOopClosure and par_write_ref from G1RemSet.
* Updated comments to be more informative.
* Better naming of functions and variables.
* Updated copyright for a lot of files.

Big thanks to Erik D, Thomas S and Sangheon K for working your way
through this big change.

Cheers,
Stefan

On 2017-09-19 17:32, Stefan Johansson wrote:

> Hi,
>
> We're moving forward with the review internally and doing some
> performance enhancements as well. Here are updated webrevs:
> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/
>
> Note that the full webrev is based on the new consolidated repo, but
> the incremental was generated with the old structure.
>
> Highlight in this update:
> * Cleaned out unused code in PreservedMarks.
> * Fixed memory leak in GenericTaskQueueSet.
> * HeapRegionClaimerBase has been removed and instead we now have two
> functions to iterate through all heap regions.
> * General cleanups and renames to ease understanding the code.
> * G1 Hot Card Cache cleanup made parallel and moved into appropriate
> phase.
> * Updated HeapRegion::apply_to_marked_objects to be a template
> function to avoid virtual call.
>
> Thanks Erik D and Thomas S for all comments so far.
>
> Cheers,
> Stefan
>
>
> On 2017-09-04 17:36, Stefan Johansson wrote:
>> Hi,
>>
>> Please review the implementation of JEP-307:
>> https://bugs.openjdk.java.net/browse/JDK-8172890
>>
>> Webrev:
>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>>
>> Summary:
>> As communicated late last year [1], I've been working on
>> parallelizing the Full GC for G1. The implementation is now ready for
>> review.
>>
>> The approach I chose was to redo marking at the start of the Full GC
>> and not reuse the marking information from the concurrent mark cycle.
>> The main reason behind this is to maximize the chance of freeing up
>> memory. I reused the marking bitmap from the concurrent mark code
>> though, so instead of marking in the mark word a bitmap is used. The
>> mark word is still used for forwarding pointers, so marks will still
>> have to be preserved for some objects.
>>
>> The algorithm is still a four phased mark-compact but each phase is
>> handled by parallel workers. Marking and reference processing is done
>> in phase 1. In phase 2 all worker threads work through the heap
>> claiming regions which they prepare for compaction. This is done by
>> installing forwarding pointers into the mark word of the live objects
>> that will move. The regions claimed by a worker in this phase will be
>> the same regions that the worker will compact in phase 4. This
>> ensures that objects are not overwritten before compacted.
>>
>> In phase 3, all pointers to other objects are updated by looking at
>> the forwarding pointers. At this point all information needed to
>> create new remembered sets is available and this rebuilding has been
>> added to phase 3. In the old version remembered set rebuilding was
>> done separately after the compaction, but this is more efficient.
>>
>> As mentioned phase 4 is when the compaction is done. In this first
>> version, to avoid some complexity, there is no work stealing in this
>> phase. This will lead to some imbalance between the workers, but this
>> can be treated as a separate RFE in the future.
>>
>> The part of this work that has generated the most questions during
>> internal discussions are the serial parts of phase 2 and 4. They are
>> executed if no regions are to be freed up by the parallel workers. It
>> is kind of a safety mechanism to avoid throwing a premature OOM. In
>> the case of no regions being freed by the parallel code path a single
>> threaded pass over the last region of each worker is done (at most
>> number-of-workers regions are handled) to further compact these
>> regions and hopefully free up some regions.
>>
>> Testing:
>> * A lot of local sanity testing, both functional and performance.
>> * Passed tier 1-5 of internal testing on supported platforms.
>> * No regressions in performance testing.
>>
>> Cheers,
>> Stefan
>>
>> [1]
>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Thomas Schatzl
Hi,

On Wed, 2017-10-18 at 11:45 +0200, Stefan Johansson wrote:

> Hi again,
>
> A lot more internal review has been done and here are the latest
> webrevs:
> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-
> 04/
> Incremental (from previous mail): 
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/
>
> Summary of changes:
> * Updated calculations for heap sizing after gc.
> * Removed G1FullGCWorkerData and moved data into G1FullCollector
> instead.
> * Renamed G1MarkStack to G1FullGCMarker.
> * Renamed G1CompactionPoint to G1FullGCCompactionPoint.
> * Removed now unused RebuildRSOopClosure and par_write_ref from
> G1RemSet.
> * Updated comments to be more informative.
> * Better naming of functions and variables.
> * Updated copyright for a lot of files.
>
> Big thanks to Erik D, Thomas S and Sangheon K for working your way 
> through this big change.

  ship it.

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Erik Helin-2
In reply to this post by Stefan Johansson
On 10/18/2017 11:45 AM, Stefan Johansson wrote:

> Hi again,
>
> A lot more internal review has been done and here are the latest webrevs:
> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-04/
> Incremental (from previous mail):
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/
>
> Summary of changes:
> * Updated calculations for heap sizing after gc.
> * Removed G1FullGCWorkerData and moved data into G1FullCollector instead.
> * Renamed G1MarkStack to G1FullGCMarker.
> * Renamed G1CompactionPoint to G1FullGCCompactionPoint.
> * Removed now unused RebuildRSOopClosure and par_write_ref from G1RemSet.
> * Updated comments to be more informative.
> * Better naming of functions and variables.
> * Updated copyright for a lot of files.
>
> Big thanks to Erik D, Thomas S and Sangheon K for working your way
> through this big change.

Looks good, Reviewed.

Thanks,
Erik

> Cheers,
> Stefan
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

sangheon.kim@oracle.com
In reply to this post by Stefan Johansson
Looks good to me too.

Thanks,
Sangheon


On 10/18/2017 02:45 AM, Stefan Johansson wrote:

> Hi again,
>
> A lot more internal review has been done and here are the latest webrevs:
> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-04/
> Incremental (from previous mail):
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/
>
> Summary of changes:
> * Updated calculations for heap sizing after gc.
> * Removed G1FullGCWorkerData and moved data into G1FullCollector instead.
> * Renamed G1MarkStack to G1FullGCMarker.
> * Renamed G1CompactionPoint to G1FullGCCompactionPoint.
> * Removed now unused RebuildRSOopClosure and par_write_ref from G1RemSet.
> * Updated comments to be more informative.
> * Better naming of functions and variables.
> * Updated copyright for a lot of files.
>
> Big thanks to Erik D, Thomas S and Sangheon K for working your way
> through this big change.
>
> Cheers,
> Stefan
>
> On 2017-09-19 17:32, Stefan Johansson wrote:
>> Hi,
>>
>> We're moving forward with the review internally and doing some
>> performance enhancements as well. Here are updated webrevs:
>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
>> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/
>>
>> Note that the full webrev is based on the new consolidated repo, but
>> the incremental was generated with the old structure.
>>
>> Highlight in this update:
>> * Cleaned out unused code in PreservedMarks.
>> * Fixed memory leak in GenericTaskQueueSet.
>> * HeapRegionClaimerBase has been removed and instead we now have two
>> functions to iterate through all heap regions.
>> * General cleanups and renames to ease understanding the code.
>> * G1 Hot Card Cache cleanup made parallel and moved into appropriate
>> phase.
>> * Updated HeapRegion::apply_to_marked_objects to be a template
>> function to avoid virtual call.
>>
>> Thanks Erik D and Thomas S for all comments so far.
>>
>> Cheers,
>> Stefan
>>
>>
>> On 2017-09-04 17:36, Stefan Johansson wrote:
>>> Hi,
>>>
>>> Please review the implementation of JEP-307:
>>> https://bugs.openjdk.java.net/browse/JDK-8172890
>>>
>>> Webrev:
>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>>>
>>> Summary:
>>> As communicated late last year [1], I've been working on
>>> parallelizing the Full GC for G1. The implementation is now ready
>>> for review.
>>>
>>> The approach I chose was to redo marking at the start of the Full GC
>>> and not reuse the marking information from the concurrent mark
>>> cycle. The main reason behind this is to maximize the chance of
>>> freeing up memory. I reused the marking bitmap from the concurrent
>>> mark code though, so instead of marking in the mark word a bitmap is
>>> used. The mark word is still used for forwarding pointers, so marks
>>> will still have to be preserved for some objects.
>>>
>>> The algorithm is still a four phased mark-compact but each phase is
>>> handled by parallel workers. Marking and reference processing is
>>> done in phase 1. In phase 2 all worker threads work through the heap
>>> claiming regions which they prepare for compaction. This is done by
>>> installing forwarding pointers into the mark word of the live
>>> objects that will move. The regions claimed by a worker in this
>>> phase will be the same regions that the worker will compact in phase
>>> 4. This ensures that objects are not overwritten before compacted.
>>>
>>> In phase 3, all pointers to other objects are updated by looking at
>>> the forwarding pointers. At this point all information needed to
>>> create new remembered sets is available and this rebuilding has been
>>> added to phase 3. In the old version remembered set rebuilding was
>>> done separately after the compaction, but this is more efficient.
>>>
>>> As mentioned phase 4 is when the compaction is done. In this first
>>> version, to avoid some complexity, there is no work stealing in this
>>> phase. This will lead to some imbalance between the workers, but
>>> this can be treated as a separate RFE in the future.
>>>
>>> The part of this work that has generated the most questions during
>>> internal discussions are the serial parts of phase 2 and 4. They are
>>> executed if no regions are to be freed up by the parallel workers.
>>> It is kind of a safety mechanism to avoid throwing a premature OOM.
>>> In the case of no regions being freed by the parallel code path a
>>> single threaded pass over the last region of each worker is done (at
>>> most number-of-workers regions are handled) to further compact these
>>> regions and hopefully free up some regions.
>>>
>>> Testing:
>>> * A lot of local sanity testing, both functional and performance.
>>> * Passed tier 1-5 of internal testing on supported platforms.
>>> * No regressions in performance testing.
>>>
>>> Cheers,
>>> Stefan
>>>
>>> [1]
>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Stefan Johansson
The JEP is now targeted and I will push it shortly.

There were some small conflicting changes made in HS and here are the
changes I've done to match the current state of the repository:
http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04-rebase-fixes/

The final change set will also include a small change to make reference
discovery more efficient:
http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04-discovery-fix/

I've discussed both these changes with Thomas, Erik and Sangheon over
chat and consider them reviewed as well.

Thanks for the reviews,
Stefan


On 2017-10-18 15:49, sangheon.kim wrote:

> Looks good to me too.
>
> Thanks,
> Sangheon
>
>
> On 10/18/2017 02:45 AM, Stefan Johansson wrote:
>> Hi again,
>>
>> A lot more internal review has been done and here are the latest
>> webrevs:
>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
>> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-04/
>> Incremental (from previous mail):
>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/
>>
>> Summary of changes:
>> * Updated calculations for heap sizing after gc.
>> * Removed G1FullGCWorkerData and moved data into G1FullCollector
>> instead.
>> * Renamed G1MarkStack to G1FullGCMarker.
>> * Renamed G1CompactionPoint to G1FullGCCompactionPoint.
>> * Removed now unused RebuildRSOopClosure and par_write_ref from
>> G1RemSet.
>> * Updated comments to be more informative.
>> * Better naming of functions and variables.
>> * Updated copyright for a lot of files.
>>
>> Big thanks to Erik D, Thomas S and Sangheon K for working your way
>> through this big change.
>>
>> Cheers,
>> Stefan
>>
>> On 2017-09-19 17:32, Stefan Johansson wrote:
>>> Hi,
>>>
>>> We're moving forward with the review internally and doing some
>>> performance enhancements as well. Here are updated webrevs:
>>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
>>> Incremental:
>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/
>>>
>>> Note that the full webrev is based on the new consolidated repo, but
>>> the incremental was generated with the old structure.
>>>
>>> Highlight in this update:
>>> * Cleaned out unused code in PreservedMarks.
>>> * Fixed memory leak in GenericTaskQueueSet.
>>> * HeapRegionClaimerBase has been removed and instead we now have two
>>> functions to iterate through all heap regions.
>>> * General cleanups and renames to ease understanding the code.
>>> * G1 Hot Card Cache cleanup made parallel and moved into appropriate
>>> phase.
>>> * Updated HeapRegion::apply_to_marked_objects to be a template
>>> function to avoid virtual call.
>>>
>>> Thanks Erik D and Thomas S for all comments so far.
>>>
>>> Cheers,
>>> Stefan
>>>
>>>
>>> On 2017-09-04 17:36, Stefan Johansson wrote:
>>>> Hi,
>>>>
>>>> Please review the implementation of JEP-307:
>>>> https://bugs.openjdk.java.net/browse/JDK-8172890
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>>>>
>>>> Summary:
>>>> As communicated late last year [1], I've been working on
>>>> parallelizing the Full GC for G1. The implementation is now ready
>>>> for review.
>>>>
>>>> The approach I chose was to redo marking at the start of the Full
>>>> GC and not reuse the marking information from the concurrent mark
>>>> cycle. The main reason behind this is to maximize the chance of
>>>> freeing up memory. I reused the marking bitmap from the concurrent
>>>> mark code though, so instead of marking in the mark word a bitmap
>>>> is used. The mark word is still used for forwarding pointers, so
>>>> marks will still have to be preserved for some objects.
>>>>
>>>> The algorithm is still a four phased mark-compact but each phase is
>>>> handled by parallel workers. Marking and reference processing is
>>>> done in phase 1. In phase 2 all worker threads work through the
>>>> heap claiming regions which they prepare for compaction. This is
>>>> done by installing forwarding pointers into the mark word of the
>>>> live objects that will move. The regions claimed by a worker in
>>>> this phase will be the same regions that the worker will compact in
>>>> phase 4. This ensures that objects are not overwritten before
>>>> compacted.
>>>>
>>>> In phase 3, all pointers to other objects are updated by looking at
>>>> the forwarding pointers. At this point all information needed to
>>>> create new remembered sets is available and this rebuilding has
>>>> been added to phase 3. In the old version remembered set rebuilding
>>>> was done separately after the compaction, but this is more efficient.
>>>>
>>>> As mentioned phase 4 is when the compaction is done. In this first
>>>> version, to avoid some complexity, there is no work stealing in
>>>> this phase. This will lead to some imbalance between the workers,
>>>> but this can be treated as a separate RFE in the future.
>>>>
>>>> The part of this work that has generated the most questions during
>>>> internal discussions are the serial parts of phase 2 and 4. They
>>>> are executed if no regions are to be freed up by the parallel
>>>> workers. It is kind of a safety mechanism to avoid throwing a
>>>> premature OOM. In the case of no regions being freed by the
>>>> parallel code path a single threaded pass over the last region of
>>>> each worker is done (at most number-of-workers regions are handled)
>>>> to further compact these regions and hopefully free up some regions.
>>>>
>>>> Testing:
>>>> * A lot of local sanity testing, both functional and performance.
>>>> * Passed tier 1-5 of internal testing on supported platforms.
>>>> * No regressions in performance testing.
>>>>
>>>> Cheers,
>>>> Stefan
>>>>
>>>> [1]
>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

RE: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Doerr, Martin
Hi Stefan,

thanks for contributing this change. We basically like it. But is has the little drawback that it doesn't build with older GCC (e.g. 4.8).
We got the following error:
g1FullGCOopClosures.hpp:140: undefined reference to `void G1VerifyOopClosure::do_oop_nv<unsigned int>(unsigned int*)'

It can be fixed by moving the do_oop implementations into the cpp file (see diffs below).

I can open a bug for it and provide a webrev. Or would you prefer to put it into another follow-up change?

Best regards,
Martin


diff -r 7092940fbaff src/hotspot/share/gc/g1/g1FullGCOopClosures.cpp
--- a/src/hotspot/share/gc/g1/g1FullGCOopClosures.cpp   Wed Nov 15 08:25:28 2017 -0500
+++ b/src/hotspot/share/gc/g1/g1FullGCOopClosures.cpp   Wed Nov 15 15:50:07 2017 +0100
@@ -144,5 +144,8 @@
   }
 }

+void G1VerifyOopClosure::do_oop(oop* p)       { do_oop_nv(p); }
+void G1VerifyOopClosure::do_oop(narrowOop* p) { do_oop_nv(p); }
+
 // Generate G1 full GC specialized oop_oop_iterate functions.
 SPECIALIZED_OOP_OOP_ITERATE_CLOSURES_G1FULL(ALL_KLASS_OOP_OOP_ITERATE_DEFN)
diff -r 7092940fbaff src/hotspot/share/gc/g1/g1FullGCOopClosures.hpp
--- a/src/hotspot/share/gc/g1/g1FullGCOopClosures.hpp   Wed Nov 15 08:25:28 2017 -0500
+++ b/src/hotspot/share/gc/g1/g1FullGCOopClosures.hpp   Wed Nov 15 15:50:07 2017 +0100
@@ -136,8 +136,8 @@

   template <class T> void do_oop_nv(T* p);

-  void do_oop(oop* p)       { do_oop_nv(p); }
-  void do_oop(narrowOop* p) { do_oop_nv(p); }
+  void do_oop(oop* p)      ;// { do_oop_nv(p); }
+  void do_oop(narrowOop* p);// { do_oop_nv(p); }
 };

 class G1FollowStackClosure: public VoidClosure {



-----Original Message-----
From: hotspot-gc-dev [mailto:[hidden email]] On Behalf Of Stefan Johansson
Sent: Dienstag, 14. November 2017 09:47
To: sangheon.kim <[hidden email]>; [hidden email]
Subject: Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

The JEP is now targeted and I will push it shortly.

There were some small conflicting changes made in HS and here are the
changes I've done to match the current state of the repository:
http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04-rebase-fixes/

The final change set will also include a small change to make reference
discovery more efficient:
http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04-discovery-fix/

I've discussed both these changes with Thomas, Erik and Sangheon over
chat and consider them reviewed as well.

Thanks for the reviews,
Stefan


On 2017-10-18 15:49, sangheon.kim wrote:

> Looks good to me too.
>
> Thanks,
> Sangheon
>
>
> On 10/18/2017 02:45 AM, Stefan Johansson wrote:
>> Hi again,
>>
>> A lot more internal review has been done and here are the latest
>> webrevs:
>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
>> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-04/
>> Incremental (from previous mail):
>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/
>>
>> Summary of changes:
>> * Updated calculations for heap sizing after gc.
>> * Removed G1FullGCWorkerData and moved data into G1FullCollector
>> instead.
>> * Renamed G1MarkStack to G1FullGCMarker.
>> * Renamed G1CompactionPoint to G1FullGCCompactionPoint.
>> * Removed now unused RebuildRSOopClosure and par_write_ref from
>> G1RemSet.
>> * Updated comments to be more informative.
>> * Better naming of functions and variables.
>> * Updated copyright for a lot of files.
>>
>> Big thanks to Erik D, Thomas S and Sangheon K for working your way
>> through this big change.
>>
>> Cheers,
>> Stefan
>>
>> On 2017-09-19 17:32, Stefan Johansson wrote:
>>> Hi,
>>>
>>> We're moving forward with the review internally and doing some
>>> performance enhancements as well. Here are updated webrevs:
>>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
>>> Incremental:
>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/
>>>
>>> Note that the full webrev is based on the new consolidated repo, but
>>> the incremental was generated with the old structure.
>>>
>>> Highlight in this update:
>>> * Cleaned out unused code in PreservedMarks.
>>> * Fixed memory leak in GenericTaskQueueSet.
>>> * HeapRegionClaimerBase has been removed and instead we now have two
>>> functions to iterate through all heap regions.
>>> * General cleanups and renames to ease understanding the code.
>>> * G1 Hot Card Cache cleanup made parallel and moved into appropriate
>>> phase.
>>> * Updated HeapRegion::apply_to_marked_objects to be a template
>>> function to avoid virtual call.
>>>
>>> Thanks Erik D and Thomas S for all comments so far.
>>>
>>> Cheers,
>>> Stefan
>>>
>>>
>>> On 2017-09-04 17:36, Stefan Johansson wrote:
>>>> Hi,
>>>>
>>>> Please review the implementation of JEP-307:
>>>> https://bugs.openjdk.java.net/browse/JDK-8172890
>>>>
>>>> Webrev:
>>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>>>>
>>>> Summary:
>>>> As communicated late last year [1], I've been working on
>>>> parallelizing the Full GC for G1. The implementation is now ready
>>>> for review.
>>>>
>>>> The approach I chose was to redo marking at the start of the Full
>>>> GC and not reuse the marking information from the concurrent mark
>>>> cycle. The main reason behind this is to maximize the chance of
>>>> freeing up memory. I reused the marking bitmap from the concurrent
>>>> mark code though, so instead of marking in the mark word a bitmap
>>>> is used. The mark word is still used for forwarding pointers, so
>>>> marks will still have to be preserved for some objects.
>>>>
>>>> The algorithm is still a four phased mark-compact but each phase is
>>>> handled by parallel workers. Marking and reference processing is
>>>> done in phase 1. In phase 2 all worker threads work through the
>>>> heap claiming regions which they prepare for compaction. This is
>>>> done by installing forwarding pointers into the mark word of the
>>>> live objects that will move. The regions claimed by a worker in
>>>> this phase will be the same regions that the worker will compact in
>>>> phase 4. This ensures that objects are not overwritten before
>>>> compacted.
>>>>
>>>> In phase 3, all pointers to other objects are updated by looking at
>>>> the forwarding pointers. At this point all information needed to
>>>> create new remembered sets is available and this rebuilding has
>>>> been added to phase 3. In the old version remembered set rebuilding
>>>> was done separately after the compaction, but this is more efficient.
>>>>
>>>> As mentioned phase 4 is when the compaction is done. In this first
>>>> version, to avoid some complexity, there is no work stealing in
>>>> this phase. This will lead to some imbalance between the workers,
>>>> but this can be treated as a separate RFE in the future.
>>>>
>>>> The part of this work that has generated the most questions during
>>>> internal discussions are the serial parts of phase 2 and 4. They
>>>> are executed if no regions are to be freed up by the parallel
>>>> workers. It is kind of a safety mechanism to avoid throwing a
>>>> premature OOM. In the case of no regions being freed by the
>>>> parallel code path a single threaded pass over the last region of
>>>> each worker is done (at most number-of-workers regions are handled)
>>>> to further compact these regions and hopefully free up some regions.
>>>>
>>>> Testing:
>>>> * A lot of local sanity testing, both functional and performance.
>>>> * Passed tier 1-5 of internal testing on supported platforms.
>>>> * No regressions in performance testing.
>>>>
>>>> Cheers,
>>>> Stefan
>>>>
>>>> [1]
>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1

Stefan Johansson
Hi Martin,

Sorry for breaking the build.

Please open a bug for this and provide a webrev for the fix, it is
probably easier since you can make sure the fix really works. I'm not
sure moving the do_oop implementation is what we should do, maybe
instead move the do_oop_nv() implementation into the inline.hpp-file and
make sure it is included where needed.

Thanks,
Stefan

On 2017-11-15 16:44, Doerr, Martin wrote:

> Hi Stefan,
>
> thanks for contributing this change. We basically like it. But is has the little drawback that it doesn't build with older GCC (e.g. 4.8).
> We got the following error:
> g1FullGCOopClosures.hpp:140: undefined reference to `void G1VerifyOopClosure::do_oop_nv<unsigned int>(unsigned int*)'
>
> It can be fixed by moving the do_oop implementations into the cpp file (see diffs below).
>
> I can open a bug for it and provide a webrev. Or would you prefer to put it into another follow-up change?
>
> Best regards,
> Martin
>
>
> diff -r 7092940fbaff src/hotspot/share/gc/g1/g1FullGCOopClosures.cpp
> --- a/src/hotspot/share/gc/g1/g1FullGCOopClosures.cpp   Wed Nov 15 08:25:28 2017 -0500
> +++ b/src/hotspot/share/gc/g1/g1FullGCOopClosures.cpp   Wed Nov 15 15:50:07 2017 +0100
> @@ -144,5 +144,8 @@
>     }
>   }
>
> +void G1VerifyOopClosure::do_oop(oop* p)       { do_oop_nv(p); }
> +void G1VerifyOopClosure::do_oop(narrowOop* p) { do_oop_nv(p); }
> +
>   // Generate G1 full GC specialized oop_oop_iterate functions.
>   SPECIALIZED_OOP_OOP_ITERATE_CLOSURES_G1FULL(ALL_KLASS_OOP_OOP_ITERATE_DEFN)
> diff -r 7092940fbaff src/hotspot/share/gc/g1/g1FullGCOopClosures.hpp
> --- a/src/hotspot/share/gc/g1/g1FullGCOopClosures.hpp   Wed Nov 15 08:25:28 2017 -0500
> +++ b/src/hotspot/share/gc/g1/g1FullGCOopClosures.hpp   Wed Nov 15 15:50:07 2017 +0100
> @@ -136,8 +136,8 @@
>
>     template <class T> void do_oop_nv(T* p);
>
> -  void do_oop(oop* p)       { do_oop_nv(p); }
> -  void do_oop(narrowOop* p) { do_oop_nv(p); }
> +  void do_oop(oop* p)      ;// { do_oop_nv(p); }
> +  void do_oop(narrowOop* p);// { do_oop_nv(p); }
>   };
>
>   class G1FollowStackClosure: public VoidClosure {
>
>
>
> -----Original Message-----
> From: hotspot-gc-dev [mailto:[hidden email]] On Behalf Of Stefan Johansson
> Sent: Dienstag, 14. November 2017 09:47
> To: sangheon.kim <[hidden email]>; [hidden email]
> Subject: Re: RFR: 8186571: Implementation: JEP 307: Parallel Full GC for G1
>
> The JEP is now targeted and I will push it shortly.
>
> There were some small conflicting changes made in HS and here are the
> changes I've done to match the current state of the repository:
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04-rebase-fixes/
>
> The final change set will also include a small change to make reference
> discovery more efficient:
> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04-discovery-fix/
>
> I've discussed both these changes with Thomas, Erik and Sangheon over
> chat and consider them reviewed as well.
>
> Thanks for the reviews,
> Stefan
>
>
> On 2017-10-18 15:49, sangheon.kim wrote:
>> Looks good to me too.
>>
>> Thanks,
>> Sangheon
>>
>>
>> On 10/18/2017 02:45 AM, Stefan Johansson wrote:
>>> Hi again,
>>>
>>> A lot more internal review has been done and here are the latest
>>> webrevs:
>>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.04/
>>> Incremental: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.03-04/
>>> Incremental (from previous mail):
>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01-04/
>>>
>>> Summary of changes:
>>> * Updated calculations for heap sizing after gc.
>>> * Removed G1FullGCWorkerData and moved data into G1FullCollector
>>> instead.
>>> * Renamed G1MarkStack to G1FullGCMarker.
>>> * Renamed G1CompactionPoint to G1FullGCCompactionPoint.
>>> * Removed now unused RebuildRSOopClosure and par_write_ref from
>>> G1RemSet.
>>> * Updated comments to be more informative.
>>> * Better naming of functions and variables.
>>> * Updated copyright for a lot of files.
>>>
>>> Big thanks to Erik D, Thomas S and Sangheon K for working your way
>>> through this big change.
>>>
>>> Cheers,
>>> Stefan
>>>
>>> On 2017-09-19 17:32, Stefan Johansson wrote:
>>>> Hi,
>>>>
>>>> We're moving forward with the review internally and doing some
>>>> performance enhancements as well. Here are updated webrevs:
>>>> Full: http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.01/
>>>> Incremental:
>>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00-01/
>>>>
>>>> Note that the full webrev is based on the new consolidated repo, but
>>>> the incremental was generated with the old structure.
>>>>
>>>> Highlight in this update:
>>>> * Cleaned out unused code in PreservedMarks.
>>>> * Fixed memory leak in GenericTaskQueueSet.
>>>> * HeapRegionClaimerBase has been removed and instead we now have two
>>>> functions to iterate through all heap regions.
>>>> * General cleanups and renames to ease understanding the code.
>>>> * G1 Hot Card Cache cleanup made parallel and moved into appropriate
>>>> phase.
>>>> * Updated HeapRegion::apply_to_marked_objects to be a template
>>>> function to avoid virtual call.
>>>>
>>>> Thanks Erik D and Thomas S for all comments so far.
>>>>
>>>> Cheers,
>>>> Stefan
>>>>
>>>>
>>>> On 2017-09-04 17:36, Stefan Johansson wrote:
>>>>> Hi,
>>>>>
>>>>> Please review the implementation of JEP-307:
>>>>> https://bugs.openjdk.java.net/browse/JDK-8172890
>>>>>
>>>>> Webrev:
>>>>> http://cr.openjdk.java.net/~sjohanss/8186571/hotspot.00/
>>>>>
>>>>> Summary:
>>>>> As communicated late last year [1], I've been working on
>>>>> parallelizing the Full GC for G1. The implementation is now ready
>>>>> for review.
>>>>>
>>>>> The approach I chose was to redo marking at the start of the Full
>>>>> GC and not reuse the marking information from the concurrent mark
>>>>> cycle. The main reason behind this is to maximize the chance of
>>>>> freeing up memory. I reused the marking bitmap from the concurrent
>>>>> mark code though, so instead of marking in the mark word a bitmap
>>>>> is used. The mark word is still used for forwarding pointers, so
>>>>> marks will still have to be preserved for some objects.
>>>>>
>>>>> The algorithm is still a four phased mark-compact but each phase is
>>>>> handled by parallel workers. Marking and reference processing is
>>>>> done in phase 1. In phase 2 all worker threads work through the
>>>>> heap claiming regions which they prepare for compaction. This is
>>>>> done by installing forwarding pointers into the mark word of the
>>>>> live objects that will move. The regions claimed by a worker in
>>>>> this phase will be the same regions that the worker will compact in
>>>>> phase 4. This ensures that objects are not overwritten before
>>>>> compacted.
>>>>>
>>>>> In phase 3, all pointers to other objects are updated by looking at
>>>>> the forwarding pointers. At this point all information needed to
>>>>> create new remembered sets is available and this rebuilding has
>>>>> been added to phase 3. In the old version remembered set rebuilding
>>>>> was done separately after the compaction, but this is more efficient.
>>>>>
>>>>> As mentioned phase 4 is when the compaction is done. In this first
>>>>> version, to avoid some complexity, there is no work stealing in
>>>>> this phase. This will lead to some imbalance between the workers,
>>>>> but this can be treated as a separate RFE in the future.
>>>>>
>>>>> The part of this work that has generated the most questions during
>>>>> internal discussions are the serial parts of phase 2 and 4. They
>>>>> are executed if no regions are to be freed up by the parallel
>>>>> workers. It is kind of a safety mechanism to avoid throwing a
>>>>> premature OOM. In the case of no regions being freed by the
>>>>> parallel code path a single threaded pass over the last region of
>>>>> each worker is done (at most number-of-workers regions are handled)
>>>>> to further compact these regions and hopefully free up some regions.
>>>>>
>>>>> Testing:
>>>>> * A lot of local sanity testing, both functional and performance.
>>>>> * Passed tier 1-5 of internal testing on supported platforms.
>>>>> * No regressions in performance testing.
>>>>>
>>>>> Cheers,
>>>>> Stefan
>>>>>
>>>>> [1]
>>>>> http://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2016-November/019216.html