RFR: 8194312: Support parallel and concurrent JNI global handle processing

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
Please review this change to the implementation of JNI global and weak
global handles, providing infrastructure for parallel and concurrent
processing of such handles.

This change introduces OopStorage, a data structure for managing
off-heap references to objects allocated in the Java heap. JNI global
and weak global handles are reimplemented using OopStorage, rather
than using JNIHandleBlock. (JNI local handles continue to use
JNIHandleBlock; the lifetime management and concurrency issues are
very different for local vs global handles.)

This change does not change any GCs to use the new parallel and
concurrent capabilities, only laying the foundations for doing so.

Other uses of OopStorage are also being considered, in the context of
runtime latency improvements for ZGC and other collectors. The idea is
to change some (often weak) oop references in runtime data structures
to OopStorage-managed handles, simplifying the GC processing and
avoiding (potentially concurrent) GC walks of runtime data structures.

As a side effect, this enhancement fixes these bugs:
8174790: Race adding (weak) global JNI handles and determining type of handle
8176454: Performance: jweak references not suitable for robust cache architecture

Some things not addressed by this change include:

- 8194309: JNI handle allocation failure not reported correctly  
For now, the pre-existing vm_exit_out_of_memory behavior is retained.

- Updating JNIHandles to use the new Access protocol.

- Serviceability Agent updates for OopStorage and JNI usage.  Just
enough has been done to allow existing SA tests to "pass".  This will
be addressed as a followup.

CR:
https://bugs.openjdk.java.net/browse/JDK-8194312

Webrev:
http://cr.openjdk.java.net/~kbarrett/8194312/open.00/

Testing:
Mach5 hs-tier1 through hs-tier5, jdk-nightly

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

coleen.phillimore


http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.html

  463 template<typename F, typename BlockPtr> // [const] Block*
  464 inline bool OopStorage::Block::iterate_impl(F f, BlockPtr block) {

Not my typical comment, but this (above) is too terse.

  489 template<typename F, typename Storage> // Storage := [const] OopStorage
  490 inline bool OopStorage::iterate_impl(F f, Storage* storage) {

This too.  Can you add a comment before each of these that they can be
instantiated with a const OopStorage or non-const.  ie that the function
provides a const and non-const iteration over the block list depending
on how instantiated.  For those of us who find English easier than
template code.

  492   typedef typename Conditional<IsConst<Storage>::value, const Block*, Block*>::type BlockPtr;

add some short comment like:
+492: // If OopStorage is const then the Block pointers are also const.

http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.html

  109 #ifdef _WINDOWS
  110 #pragma warning(push)
  111 #pragma warning(disable: 4351)
  112 #endif // _WINDOWS
  113 OopStorage::Block::Block(const OopStorage* owner, void* memory) :
  114   // VS2013 warns (C4351) that elements of _data will be *correctly* default
  115   // initialized, unlike earlier versions that *incorrectly* did not do so.
  116   // Using push/disable/pop because suppress here doesn't work.

Can you move the comment to line 110 before the pragma?

  134 OopStorage::Block::~Block() {
  135 #ifdef ASSERT
  136   // Clear fields used by block_for_ptr and entry validation, which
  137   // might help catch bugs.
  138   _allocated_bitmask = 0;
  139   _owner = NULL;
  140 #endif // ASSERT
  141 }

Why not clear the fields anyway?  Seems like the expected behavior of
the destructor.

I feel like BasicParState should be in an oopStorage.inline.hpp file to
be included by the GCs that need it.   Runtime code only needs to
include oopStorage.hpp and doesn't need this code.

I don't have any other comments and looks like you've addressed most of
my preview comments that I remember.  I've looked at the runtime changes
for JNI and they look good.   I haven't looked at the
parallel/concurrent iterator code that GC uses, and I only skimmed the
test which is quite significant but good to have.

This change looks great.  Looking forward to parallel/concurrent JNI
handle garbage collection and weak pointers from the runtime.

Thanks,
Coleen

On 1/2/18 4:41 PM, Kim Barrett wrote:

> Please review this change to the implementation of JNI global and weak
> global handles, providing infrastructure for parallel and concurrent
> processing of such handles.
>
> This change introduces OopStorage, a data structure for managing
> off-heap references to objects allocated in the Java heap. JNI global
> and weak global handles are reimplemented using OopStorage, rather
> than using JNIHandleBlock. (JNI local handles continue to use
> JNIHandleBlock; the lifetime management and concurrency issues are
> very different for local vs global handles.)
>
> This change does not change any GCs to use the new parallel and
> concurrent capabilities, only laying the foundations for doing so.
>
> Other uses of OopStorage are also being considered, in the context of
> runtime latency improvements for ZGC and other collectors. The idea is
> to change some (often weak) oop references in runtime data structures
> to OopStorage-managed handles, simplifying the GC processing and
> avoiding (potentially concurrent) GC walks of runtime data structures.
>
> As a side effect, this enhancement fixes these bugs:
> 8174790: Race adding (weak) global JNI handles and determining type of handle
> 8176454: Performance: jweak references not suitable for robust cache architecture
>
> Some things not addressed by this change include:
>
> - 8194309: JNI handle allocation failure not reported correctly
> For now, the pre-existing vm_exit_out_of_memory behavior is retained.
>
> - Updating JNIHandles to use the new Access protocol.
>
> - Serviceability Agent updates for OopStorage and JNI usage.  Just
> enough has been done to allow existing SA tests to "pass".  This will
> be addressed as a followup.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8194312
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/
>
> Testing:
> Mach5 hs-tier1 through hs-tier5, jdk-nightly
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
In reply to this post by Kim Barrett
> On Jan 8, 2018, at 7:20 PM, [hidden email] wrote:
>
> Hi Kim,
>
> A couple of quick minor comments.

Thanks.

> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/PointerLocation.java.frames.html
>  241     } else if (isInStrongGlobalJNIHandles() ||
>  242                isInWeakGlobalJNIHandles()) {
>
>  243       tty.print("In ");
>
>  244       if (isInStrongGlobalJNIHandles()) {
>
>  245         tty.print("strong global");
>
>  246       } else if (isInWeakGlobalJNIHandles()) {
>
>  247         tty.print("weak global");
>  248       }
>
>  A suggestion to simplify it a little bit as follows:
>       } else if (isInStrongGlobalJNIHandles()) {
>
>         tty.print("In strong global");
>
>       } else if (isInWeakGlobalJNIHandles()) {
>
>         tty.print("In weak global");
>       }

Good idea.  Will do.

> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.html
>   61 void OopStorage::BlockList::push_back(const Block& block) {
>   62   const Block* old = _tail;
>   63   if (old == NULL) {
>   64     assert(_tail == NULL, "invariant");
>   65     _head = _tail = &block;
>   66   } else {
>   67     _get_entry(*old)._next = &block;
>   68     _get_entry(block)._prev = old;
>   69     _tail = &block;
>   70   }
>   71 }
>
>   It seems, the assert should check _head instead of _tail.

Oops!  Will do.

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

David Holmes
In reply to this post by Kim Barrett
Hi Kim,

As per direct email I'm concerned about the impact the parallel
processing may have on the ability of jni checking to work correctly. At
the moment most (all?) checking is done IN_VM so that no safepoint can
occur and the data being examined is 'stable'. I think it safe to say
that jniCheck does not (and probably was not intended to) handle race
conditions with other JNI using threads (ie thread B deletes a handle
being checked in thread A). But the IN_VM ensures the check is not
racing with the GC or other relevant safepoint VM ops. With these
changes it is far from clear if such races have now been introduced and
that jniCheck will become unreliable for the limited checking it does ?

Thanks,
David




On 3/01/2018 7:41 AM, Kim Barrett wrote:

> Please review this change to the implementation of JNI global and weak
> global handles, providing infrastructure for parallel and concurrent
> processing of such handles.
>
> This change introduces OopStorage, a data structure for managing
> off-heap references to objects allocated in the Java heap. JNI global
> and weak global handles are reimplemented using OopStorage, rather
> than using JNIHandleBlock. (JNI local handles continue to use
> JNIHandleBlock; the lifetime management and concurrency issues are
> very different for local vs global handles.)
>
> This change does not change any GCs to use the new parallel and
> concurrent capabilities, only laying the foundations for doing so.
>
> Other uses of OopStorage are also being considered, in the context of
> runtime latency improvements for ZGC and other collectors. The idea is
> to change some (often weak) oop references in runtime data structures
> to OopStorage-managed handles, simplifying the GC processing and
> avoiding (potentially concurrent) GC walks of runtime data structures.
>
> As a side effect, this enhancement fixes these bugs:
> 8174790: Race adding (weak) global JNI handles and determining type of handle
> 8176454: Performance: jweak references not suitable for robust cache architecture
>
> Some things not addressed by this change include:
>
> - 8194309: JNI handle allocation failure not reported correctly
> For now, the pre-existing vm_exit_out_of_memory behavior is retained.
>
> - Updating JNIHandles to use the new Access protocol.
>
> - Serviceability Agent updates for OopStorage and JNI usage.  Just
> enough has been done to allow existing SA tests to "pass".  This will
> be addressed as a followup.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8194312
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/
>
> Testing:
> Mach5 hs-tier1 through hs-tier5, jdk-nightly
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
In reply to this post by coleen.phillimore
> On Jan 8, 2018, at 3:22 PM, [hidden email] wrote:
>
>
>
> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.html
>
> 463 template<typename F, typename BlockPtr> // [const] Block*
> 464 inline bool OopStorage::Block::iterate_impl(F f, BlockPtr block) {
>
> Not my typical comment, but this (above) is too terse.

Okay.

> 489 template<typename F, typename Storage> // Storage := [const] OopStorage
> 490 inline bool OopStorage::iterate_impl(F f, Storage* storage) {
>
> This too.  Can you add a comment before each of these that they can be instantiated with a const OopStorage or non-const.  ie that the function provides a const and non-const iteration over the block list depending on how instantiated.  For those of us who find English easier than template code.

Okay.

> 492   typedef typename Conditional<IsConst<Storage>::value, const Block*, Block*>::type BlockPtr;
>
> add some short comment like:
> +492: // If OopStorage is const then the Block pointers are also const.

Okay.

> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.html
>
> 109 #ifdef _WINDOWS
> 110 #pragma warning(push)
> 111 #pragma warning(disable: 4351)
> 112 #endif // _WINDOWS
> 113 OopStorage::Block::Block(const OopStorage* owner, void* memory) :
> 114   // VS2013 warns (C4351) that elements of _data will be *correctly* default
> 115   // initialized, unlike earlier versions that *incorrectly* did not do so.
> 116   // Using push/disable/pop because suppress here doesn't work.
>
> Can you move the comment to line 110 before the pragma?

Okay.

> 134 OopStorage::Block::~Block() {
> 135 #ifdef ASSERT
> 136   // Clear fields used by block_for_ptr and entry validation, which
> 137   // might help catch bugs.
> 138   _allocated_bitmask = 0;
> 139   _owner = NULL;
> 140 #endif // ASSERT
> 141 }
>
> Why not clear the fields anyway?  Seems like the expected behavior of the destructor.

Okay.  While doing that, it occurred to me these are dead stores,
and hence the compiler could optimize them away.  Since I'd like to
prevent that, I added some volatile casts.

And while I was at it, I added ~BlockEntry and ~BlockList with some
asserts.

> I feel like BasicParState should be in an oopStorage.inline.hpp file to be included by the GCs that need it.   Runtime code only needs to include oopStorage.hpp and doesn't need this code.

I forgot to deal with your pre-review suggestion of moving some stuff
to oopStorage.inline.hpp.  I'd like to defer making such a change
until this review is (mostly) done, rather than clutter up diffs.

I think all (or very nearly all) of the inline definitions in
oopStorage.hpp could be moved. I think they're all related to
iteration, and I think most clients (other than GC) won't be using
iteration at all.

I think having the parallel iteration support wrapped with
#ifdef INCLUDE_ALL_GCS (as is already the case) deals with the
conditional GC usage; I think that all except Serial will eventually
use it.

The parallel iteration support could be split out into another file,
so places that only use serial iteration (like jvmti) wouldn't include
it, but I think the number of such places doesn't warrant another
file.

>
> I don't have any other comments and looks like you've addressed most of my preview comments that I remember.  I've looked at the runtime changes for JNI and they look good.   I haven't looked at the parallel/concurrent iterator code that GC uses, and I only skimmed the test which is quite significant but good to have.
>
> This change looks great.  Looking forward to parallel/concurrent JNI handle garbage collection and weak pointers from the runtime.

Thanks.

New webrevs (for comments from Coleen and Serguei)
full: http://cr.openjdk.java.net/~kbarrett/8194312/open.01/
incr: http://cr.openjdk.java.net/~kbarrett/8194312/open.01.inc/

Oh, and I need to go through and update all the copyrights to 2018.

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

serguei.spitsyn@oracle.com
In reply to this post by Kim Barrett

On 1/8/18 16:58, Kim Barrett wrote:

>> On Jan 8, 2018, at 7:20 PM, [hidden email] wrote:
>>
>> Hi Kim,
>>
>> A couple of quick minor comments.
> Thanks.
>
>> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/utilities/PointerLocation.java.frames.html
>>   241     } else if (isInStrongGlobalJNIHandles() ||
>>   242                isInWeakGlobalJNIHandles()) {
>>
>>   243       tty.print("In ");
>>
>>   244       if (isInStrongGlobalJNIHandles()) {
>>
>>   245         tty.print("strong global");
>>
>>   246       } else if (isInWeakGlobalJNIHandles()) {
>>
>>   247         tty.print("weak global");
>>   248       }
>>
>>   A suggestion to simplify it a little bit as follows:
>>        } else if (isInStrongGlobalJNIHandles()) {
>>
>>          tty.print("In strong global");
>>
>>        } else if (isInWeakGlobalJNIHandles()) {
>>
>>          tty.print("In weak global");
>>        }
> Good idea.  Will do.
>
>> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.html
>>    61 void OopStorage::BlockList::push_back(const Block& block) {
>>    62   const Block* old = _tail;
>>    63   if (old == NULL) {
>>    64     assert(_tail == NULL, "invariant");
>>    65     _head = _tail = &block;
>>    66   } else {
>>    67     _get_entry(*old)._next = &block;
>>    68     _get_entry(block)._prev = old;
>>    69     _tail = &block;
>>    70   }
>>    71 }
>>
>>    It seems, the assert should check _head instead of _tail.
> Oops!  Will do.

Thanks!

Also, this fragment looks strange:

  354   if (block->is_empty()) {
  355     log_debug(oopstorage, blocks)("%s: block not empty " PTR_FORMAT, name(), p2i(block));
  356     --_empty_block_count;
  357   }


Should the message tell "block is empty" instead of "block not empty"?

Thanks,
Serguei

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
> On Jan 8, 2018, at 10:57 PM, [hidden email] wrote:
> Also, this fragment looks strange:
>
> 354   if (block->is_empty()) {
> 355     log_debug(oopstorage, blocks)("%s: block not empty " PTR_FORMAT, name(), p2i(block));
> 356     --_empty_block_count;
> 357   }
>
>
> Should the message tell "block is empty" instead of "block not empty”?

This is logging the transition from empty to non-empty.

I could introduce a bool was_empty flag at line 354 to capture the is_empty state before the allocation,
and then log the transition and update the counter after the allocation, if was_empty is true.  The current
code order is just to remove the need for that extra boolean flag.


Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

serguei.spitsyn@oracle.com
On 1/8/18 20:10, Kim Barrett wrote:

>> On Jan 8, 2018, at 10:57 PM, [hidden email] wrote:
>> Also, this fragment looks strange:
>>
>> 354   if (block->is_empty()) {
>> 355     log_debug(oopstorage, blocks)("%s: block not empty " PTR_FORMAT, name(), p2i(block));
>> 356     --_empty_block_count;
>> 357   }
>>
>>
>> Should the message tell "block is empty" instead of "block not empty”?
> This is logging the transition from empty to non-empty.
>
> I could introduce a bool was_empty flag at line 354 to capture the is_empty state before the allocation,
> and then log the transition and update the counter after the allocation, if was_empty is true.  The current
> code order is just to remove the need for that extra boolean flag.

Got it, thanks!
Serguei

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

serguei.spitsyn@oracle.com
In reply to this post by Kim Barrett
On 1/8/18 19:44, Kim Barrett wrote:

>> On Jan 8, 2018, at 3:22 PM, [hidden email] wrote:
>>
>>
>>
>> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.html
>>
>> 463 template<typename F, typename BlockPtr> // [const] Block*
>> 464 inline bool OopStorage::Block::iterate_impl(F f, BlockPtr block) {
>>
>> Not my typical comment, but this (above) is too terse.
> Okay.
>
>> 489 template<typename F, typename Storage> // Storage := [const] OopStorage
>> 490 inline bool OopStorage::iterate_impl(F f, Storage* storage) {
>>
>> This too.  Can you add a comment before each of these that they can be instantiated with a const OopStorage or non-const.  ie that the function provides a const and non-const iteration over the block list depending on how instantiated.  For those of us who find English easier than template code.
> Okay.
>
>> 492   typedef typename Conditional<IsConst<Storage>::value, const Block*, Block*>::type BlockPtr;
>>
>> add some short comment like:
>> +492: // If OopStorage is const then the Block pointers are also const.
> Okay.
>
>> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.html
>>
>> 109 #ifdef _WINDOWS
>> 110 #pragma warning(push)
>> 111 #pragma warning(disable: 4351)
>> 112 #endif // _WINDOWS
>> 113 OopStorage::Block::Block(const OopStorage* owner, void* memory) :
>> 114   // VS2013 warns (C4351) that elements of _data will be *correctly* default
>> 115   // initialized, unlike earlier versions that *incorrectly* did not do so.
>> 116   // Using push/disable/pop because suppress here doesn't work.
>>
>> Can you move the comment to line 110 before the pragma?
> Okay.
>
>> 134 OopStorage::Block::~Block() {
>> 135 #ifdef ASSERT
>> 136   // Clear fields used by block_for_ptr and entry validation, which
>> 137   // might help catch bugs.
>> 138   _allocated_bitmask = 0;
>> 139   _owner = NULL;
>> 140 #endif // ASSERT
>> 141 }
>>
>> Why not clear the fields anyway?  Seems like the expected behavior of the destructor.
> Okay.  While doing that, it occurred to me these are dead stores,
> and hence the compiler could optimize them away.  Since I'd like to
> prevent that, I added some volatile casts.
>
> And while I was at it, I added ~BlockEntry and ~BlockList with some
> asserts.
>
>> I feel like BasicParState should be in an oopStorage.inline.hpp file to be included by the GCs that need it.   Runtime code only needs to include oopStorage.hpp and doesn't need this code.
> I forgot to deal with your pre-review suggestion of moving some stuff
> to oopStorage.inline.hpp.  I'd like to defer making such a change
> until this review is (mostly) done, rather than clutter up diffs.
>
> I think all (or very nearly all) of the inline definitions in
> oopStorage.hpp could be moved. I think they're all related to
> iteration, and I think most clients (other than GC) won't be using
> iteration at all.
>
> I think having the parallel iteration support wrapped with
> #ifdef INCLUDE_ALL_GCS (as is already the case) deals with the
> conditional GC usage; I think that all except Serial will eventually
> use it.
>
> The parallel iteration support could be split out into another file,
> so places that only use serial iteration (like jvmti) wouldn't include
> it, but I think the number of such places doesn't warrant another
> file.
>
>> I don't have any other comments and looks like you've addressed most of my preview comments that I remember.  I've looked at the runtime changes for JNI and they look good.   I haven't looked at the parallel/concurrent iterator code that GC uses, and I only skimmed the test which is quite significant but good to have.
>>
>> This change looks great.  Looking forward to parallel/concurrent JNI handle garbage collection and weak pointers from the runtime.
> Thanks.
>
> New webrevs (for comments from Coleen and Serguei)
> full: http://cr.openjdk.java.net/~kbarrett/8194312/open.01/
> incr: http://cr.openjdk.java.net/~kbarrett/8194312/open.01.inc/

The incremental update looks good to me.

> Oh, and I need to go through and update all the copyrights to 2018.

I was about to point to it. :)

Thanks,
Serguei

>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

David Holmes
In reply to this post by David Holmes
On 9/01/2018 1:18 PM, David Holmes wrote:

> Hi Kim,
>
> As per direct email I'm concerned about the impact the parallel
> processing may have on the ability of jni checking to work correctly. At
> the moment most (all?) checking is done IN_VM so that no safepoint can
> occur and the data being examined is 'stable'. I think it safe to say
> that jniCheck does not (and probably was not intended to) handle race
> conditions with other JNI using threads (ie thread B deletes a handle
> being checked in thread A). But the IN_VM ensures the check is not
> racing with the GC or other relevant safepoint VM ops. With these
> changes it is far from clear if such races have now been introduced and
> that jniCheck will become unreliable for the limited checking it does ?

To clarify the above. oopStorage is an enabling technology that provides
the support for potential parallel/concurrent JNI global handle
processing. My concern is with any GC that in the future takes advantage
of this and starts doing things outside of a safepoint, that is
currently done at a safepoint, and which may impact the assumptions
within the JNI checking code, within the IN_VM sections, that expects to
execute atomically with respect to such code.

Cheers,
David

> Thanks,
> David
>
>
>
>
> On 3/01/2018 7:41 AM, Kim Barrett wrote:
>> Please review this change to the implementation of JNI global and weak
>> global handles, providing infrastructure for parallel and concurrent
>> processing of such handles.
>>
>> This change introduces OopStorage, a data structure for managing
>> off-heap references to objects allocated in the Java heap. JNI global
>> and weak global handles are reimplemented using OopStorage, rather
>> than using JNIHandleBlock. (JNI local handles continue to use
>> JNIHandleBlock; the lifetime management and concurrency issues are
>> very different for local vs global handles.)
>>
>> This change does not change any GCs to use the new parallel and
>> concurrent capabilities, only laying the foundations for doing so.
>>
>> Other uses of OopStorage are also being considered, in the context of
>> runtime latency improvements for ZGC and other collectors. The idea is
>> to change some (often weak) oop references in runtime data structures
>> to OopStorage-managed handles, simplifying the GC processing and
>> avoiding (potentially concurrent) GC walks of runtime data structures.
>>
>> As a side effect, this enhancement fixes these bugs:
>> 8174790: Race adding (weak) global JNI handles and determining type of
>> handle
>> 8176454: Performance: jweak references not suitable for robust cache
>> architecture
>>
>> Some things not addressed by this change include:
>>
>> - 8194309: JNI handle allocation failure not reported correctly
>> For now, the pre-existing vm_exit_out_of_memory behavior is retained.
>>
>> - Updating JNIHandles to use the new Access protocol.
>>
>> - Serviceability Agent updates for OopStorage and JNI usage.  Just
>> enough has been done to allow existing SA tests to "pass".  This will
>> be addressed as a followup.
>>
>> CR:
>> https://bugs.openjdk.java.net/browse/JDK-8194312
>>
>> Webrev:
>> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/
>>
>> Testing:
>> Mach5 hs-tier1 through hs-tier5, jdk-nightly
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Erik Österlund-2
In reply to this post by Kim Barrett
Hi Kim,

I have already seen earlier incarnations of this code so I only have a
few comments regarding things that were not in that incarnation.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/OopStorage.java:

The Java implementation of OopStorage for the SA agent seems a bit empty
and full of TODO comments to actually implement something. It seems like
quite a bit of the other changes to the SA assumes that this actually
does something. For example there are changes to use OopStorage oopsDo
instead of JNIHandleBlock oopsDo, which I presume means there has to be
some oopsDo implementation in place for OopStorage.
Is the intention to implement that later on in a separate RFE?

src/hotspot/share/gc/shared/oopStorage.hpp:

The OopStorage::oops_do functions accept some kind of OopClosure and
performs iteration with it. I'm curious though why it takes the specific
type of OopClosure as a template parameter, passes it around, and
eventually discards the information without making use of it? I was
thinking perhaps the intention was to use the closure type information
to devirtualize the do_oop() call, but that does not appear to be the
case. To do that you would have to in OopStorage::OopFn::operator()()
perform a qualified _cl->Closure::do_oop() call instead of _cl->do_oop()
(and possibly take some care to prevent abstract oop closure types like
OopClosure or ExtendedOopClosure from being devirtualized, which would
be awkward (doable with some EnableIf<!IsSame<Closure,
OopClosure>::value>::type overload, or something like that).
In other words, it seems like either this template parameter is
unnecessary and you could just replace uses of it with OopClosure, or it
should be made useful, e.g. by devirtualizing the calls or something.
Same goes for the IsAliveClosure parameter for weak_oops_do.

src/hotspot/share/gc/shared/oopStorage.cpp:

I recall in the pre-review we had some discussion about whether the
infrastructure for supporting early iteration termination was necessary
or not. You had me convinced that JNI would need it for something. Is
this still the case, or has that use case been revised? Reading through
the new JNI code, I could not find uses of
OopStorage::iterate_safepoint() from the JNI code, and hence no uses of
early termination. Having said that, it does look rather simple now with
only the sequential case supporting early termination.
However, if there is indeed no use any longer for OopStorage::iterate()
and hence the only publicly available tool for early iteration exit, I
wonder if perhaps it should be reconsidered if we want to remove that
infrastructure. Or will there be other use cases down the road?
Currently I find it at least a bit confusing that, e.g.
OopStorage::iterate_safepoint() expects a function of type F that
returns bool while OopStorage::BasicParState::iterate() expects a
function of  type F that returns void. For the
OopStorage::iterate_safepoint() case it is clearly described in the
documentation, in the OopStorage::BasicParState::iterate() case I did
not find it clear that this F is a different F to the safepoint one.
I guess I propose either removing early termination support alltogether
if nobody is using it anyway (and hence having the F function type
consistently return void), or documenting clearly that the sequential
and parallel iteration functions expect different function types - one
with bool return type and one with void return type, depending on
whether early termination is supported or not.

Apart from the above minor remarks, I am delighted to see these changes.
I am extra delighted to see deleted_handle() disappear. I am also
delighted to see the new simplified synchronization for parallel
iteration - that worked out nicely. Thank you for building a good oop
storage solution.

Thanks,
/Erik

On 2018-01-02 22:41, Kim Barrett wrote:

> Please review this change to the implementation of JNI global and weak
> global handles, providing infrastructure for parallel and concurrent
> processing of such handles.
>
> This change introduces OopStorage, a data structure for managing
> off-heap references to objects allocated in the Java heap. JNI global
> and weak global handles are reimplemented using OopStorage, rather
> than using JNIHandleBlock. (JNI local handles continue to use
> JNIHandleBlock; the lifetime management and concurrency issues are
> very different for local vs global handles.)
>
> This change does not change any GCs to use the new parallel and
> concurrent capabilities, only laying the foundations for doing so.
>
> Other uses of OopStorage are also being considered, in the context of
> runtime latency improvements for ZGC and other collectors. The idea is
> to change some (often weak) oop references in runtime data structures
> to OopStorage-managed handles, simplifying the GC processing and
> avoiding (potentially concurrent) GC walks of runtime data structures.
>
> As a side effect, this enhancement fixes these bugs:
> 8174790: Race adding (weak) global JNI handles and determining type of handle
> 8176454: Performance: jweak references not suitable for robust cache architecture
>
> Some things not addressed by this change include:
>
> - 8194309: JNI handle allocation failure not reported correctly
> For now, the pre-existing vm_exit_out_of_memory behavior is retained.
>
> - Updating JNIHandles to use the new Access protocol.
>
> - Serviceability Agent updates for OopStorage and JNI usage.  Just
> enough has been done to allow existing SA tests to "pass".  This will
> be addressed as a followup.
>
> CR:
> https://bugs.openjdk.java.net/browse/JDK-8194312
>
> Webrev:
> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/
>
> Testing:
> Mach5 hs-tier1 through hs-tier5, jdk-nightly
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

coleen.phillimore
In reply to this post by Kim Barrett


On 1/8/18 10:44 PM, Kim Barrett wrote:

>> On Jan 8, 2018, at 3:22 PM, [hidden email] wrote:
>>
>>
>>
>> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.hpp.html
>>
>> 463 template<typename F, typename BlockPtr> // [const] Block*
>> 464 inline bool OopStorage::Block::iterate_impl(F f, BlockPtr block) {
>>
>> Not my typical comment, but this (above) is too terse.
> Okay.
>
>> 489 template<typename F, typename Storage> // Storage := [const] OopStorage
>> 490 inline bool OopStorage::iterate_impl(F f, Storage* storage) {
>>
>> This too.  Can you add a comment before each of these that they can be instantiated with a const OopStorage or non-const.  ie that the function provides a const and non-const iteration over the block list depending on how instantiated.  For those of us who find English easier than template code.
> Okay.
>
>> 492   typedef typename Conditional<IsConst<Storage>::value, const Block*, Block*>::type BlockPtr;
>>
>> add some short comment like:
>> +492: // If OopStorage is const then the Block pointers are also const.
> Okay.
>
>> http://cr.openjdk.java.net/~kbarrett/8194312/open.00/src/hotspot/share/gc/shared/oopStorage.cpp.html
>>
>> 109 #ifdef _WINDOWS
>> 110 #pragma warning(push)
>> 111 #pragma warning(disable: 4351)
>> 112 #endif // _WINDOWS
>> 113 OopStorage::Block::Block(const OopStorage* owner, void* memory) :
>> 114   // VS2013 warns (C4351) that elements of _data will be *correctly* default
>> 115   // initialized, unlike earlier versions that *incorrectly* did not do so.
>> 116   // Using push/disable/pop because suppress here doesn't work.
>>
>> Can you move the comment to line 110 before the pragma?
> Okay.
>
>> 134 OopStorage::Block::~Block() {
>> 135 #ifdef ASSERT
>> 136   // Clear fields used by block_for_ptr and entry validation, which
>> 137   // might help catch bugs.
>> 138   _allocated_bitmask = 0;
>> 139   _owner = NULL;
>> 140 #endif // ASSERT
>> 141 }
>>
>> Why not clear the fields anyway?  Seems like the expected behavior of the destructor.
> Okay.  While doing that, it occurred to me these are dead stores,
> and hence the compiler could optimize them away.  Since I'd like to
> prevent that, I added some volatile casts.
>
> And while I was at it, I added ~BlockEntry and ~BlockList with some
> asserts.
>
>> I feel like BasicParState should be in an oopStorage.inline.hpp file to be included by the GCs that need it.   Runtime code only needs to include oopStorage.hpp and doesn't need this code.
> I forgot to deal with your pre-review suggestion of moving some stuff
> to oopStorage.inline.hpp.  I'd like to defer making such a change
> until this review is (mostly) done, rather than clutter up diffs.
>
> I think all (or very nearly all) of the inline definitions in
> oopStorage.hpp could be moved. I think they're all related to
> iteration, and I think most clients (other than GC) won't be using
> iteration at all.
>
> I think having the parallel iteration support wrapped with
> #ifdef INCLUDE_ALL_GCS (as is already the case) deals with the
> conditional GC usage; I think that all except Serial will eventually
> use it.
>
> The parallel iteration support could be split out into another file,
> so places that only use serial iteration (like jvmti) wouldn't include
> it, but I think the number of such places doesn't warrant another
> file.

I would like that it be split out into another file.  A lot of places in
the runtime will only create and add oops to OopStorage and shouldn't
care about iteration.  I'm fine with making this change as a further
improvement.

Thanks,
Coleen

>
>> I don't have any other comments and looks like you've addressed most of my preview comments that I remember.  I've looked at the runtime changes for JNI and they look good.   I haven't looked at the parallel/concurrent iterator code that GC uses, and I only skimmed the test which is quite significant but good to have.
>>
>> This change looks great.  Looking forward to parallel/concurrent JNI handle garbage collection and weak pointers from the runtime.
> Thanks.
>
> New webrevs (for comments from Coleen and Serguei)
> full: http://cr.openjdk.java.net/~kbarrett/8194312/open.01/
> incr: http://cr.openjdk.java.net/~kbarrett/8194312/open.01.inc/
>
> Oh, and I need to go through and update all the copyrights to 2018.
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
In reply to this post by Erik Österlund-2
> On Jan 10, 2018, at 8:50 AM, Erik Österlund <[hidden email]> wrote:
>
> Hi Kim,
>
> I have already seen earlier incarnations of this code so I only have a few comments regarding things that were not in that incarnation.

Thanks for looking at it yet again.

> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/OopStorage.java:
>
> The Java implementation of OopStorage for the SA agent seems a bit empty and full of TODO comments to actually implement something. It seems like quite a bit of the other changes to the SA assumes that this actually does something. For example there are changes to use OopStorage oopsDo instead of JNIHandleBlock oopsDo, which I presume means there has to be some oopsDo implementation in place for OopStorage.
> Is the intention to implement that later on in a separate RFE?

Quoting from the original RFR email:

  - Serviceability Agent updates for OopStorage and JNI usage.  Just
  enough has been done to allow existing SA tests to "pass".  This
  will be addressed as a followup.

I haven't filed that RFE yet, but will be doing so.

> src/hotspot/share/gc/shared/oopStorage.hpp:
>
> The OopStorage::oops_do functions accept some kind of OopClosure and performs iteration with it. I'm curious though why it takes the specific type of OopClosure as a template parameter, passes it around, and eventually discards the information without making use of it? I was thinking perhaps the intention was to use the closure type information to devirtualize the do_oop() call, but that does not appear to be the case. To do that you would have to in OopStorage::OopFn::operator()() perform a qualified _cl->Closure::do_oop() call instead of _cl->do_oop() (and possibly take some care to prevent abstract oop closure types like OopClosure or ExtendedOopClosure from being devirtualized, which would be awkward (doable with some EnableIf<!IsSame<Closure, OopClosure>::value>::type overload, or something like that).
> In other words, it seems like either this template parameter is unnecessary and you could just replace uses of it with OopClosure, or it should be made useful, e.g. by devirtualizing the calls or something. Same goes for the IsAliveClosure parameter for weak_oops_do.

Sorry, but I'm not understanding what problem is being suggested.
OopStorage doesn't depend on OopClosure as a type, at all, anywhere.
There are no occurrences of the type OopClosure anywhere in the
OopStorage implementation.  The types coming in to OopStorage
iteration are preserved and used, without any need for odd syntax.

With the type preserved like that, for common use-cases such as

  MyClosureType my_closure;
  storage->oops_do(&my_closure);

the compiler can know the exact type of my_closure at the call site
for do_oop(), in which case the compiler can devirtualize the call
(and I would expect any decent compiler to do so; I've seen gcc do it).

Maybe the issue is about JNIHandles::oops_do and friends, which
presently are ordinary functions with OopClosure* arguments and the
like?  Yes, the type information isn't flowing through there from the
caller to the OopStorage infrastructure.  That's because that was
going to introduce some additional fannout in this changeset.  I was
going to move the inline definitions to oopStorage.inline.hpp (per
Coleen's pre-review request), and all uses of iteration would then
need to be changed to include that, which would have required a new
jniHandles.inline.hpp, which would have required at least some
additional downstream include changes.  I'm planning to make those
changes as a followup.  Of course, I then forgot Coleen's pre-review
request.

> src/hotspot/share/gc/shared/oopStorage.cpp:
>
> I recall in the pre-review we had some discussion about whether the infrastructure for supporting early iteration termination was necessary or not. You had me convinced that JNI would need it for something. Is this still the case, or has that use case been revised? Reading through the new JNI code, I could not find uses of OopStorage::iterate_safepoint() from the JNI code, and hence no uses of early termination. Having said that, it does look rather simple now with only the sequential case supporting early termination.
> However, if there is indeed no use any longer for OopStorage::iterate() and hence the only publicly available tool for early iteration exit, I wonder if perhaps it should be reconsidered if we want to remove that infrastructure. Or will there be other use cases down the road? Currently I find it at least a bit confusing that, e.g. OopStorage::iterate_safepoint() expects a function of type F that returns bool while OopStorage::BasicParState::iterate() expects a function of  type F that returns void. For the OopStorage::iterate_safepoint() case it is clearly described in the documentation, in the OopStorage::BasicParState::iterate() case I did not find it clear that this F is a different F to the safepoint one.
> I guess I propose either removing early termination support alltogether if nobody is using it anyway (and hence having the F function type consistently return void), or documenting clearly that the sequential and parallel iteration functions expect different function types - one with bool return type and one with void return type, depending on whether early termination is supported or not.

The SimpleRootsClosure used by
VM_HeapWalkOperation::collect_simple_roots is a use-case for early
termination.

iteration_safepoint documents a requirement on the result of the
argument function (the result must be implicitly convertible to
bool), and the associated behavior.  ParState::iterate doesn't specify
any requirements on the result of the argument function, nor specify
any associated behavior, so has none.  I should make that explicit
though, e.g. document that any result returned by f is ignored.

Since oops_do and weak_oops_do are expected to be the usual iteration
entry points, and the early termination support doesn't affect them at
all, I would like to leave it in place, to support use-cases like that
one from JVMTI.  As we discussed during pre-review, ParState doesn't
support early termination, since ParState is for exclusive use by GC,
and we don't have or envision use-cases for early termination by GC.

> Apart from the above minor remarks, I am delighted to see these changes. I am extra delighted to see deleted_handle() disappear. I am also delighted to see the new simplified synchronization for parallel iteration - that worked out nicely. Thank you for building a good oop storage solution.

Thanks.

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
In reply to this post by coleen.phillimore
> On Jan 10, 2018, at 8:56 AM, [hidden email] wrote:
>
>
>
> On 1/8/18 10:44 PM, Kim Barrett wrote:
>>> I feel like BasicParState should be in an oopStorage.inline.hpp file to be included by the GCs that need it.   Runtime code only needs to include oopStorage.hpp and doesn't need this code.
>> I forgot to deal with your pre-review suggestion of moving some stuff
>> to oopStorage.inline.hpp.  I'd like to defer making such a change
>> until this review is (mostly) done, rather than clutter up diffs.
>>
>> I think all (or very nearly all) of the inline definitions in
>> oopStorage.hpp could be moved. I think they're all related to
>> iteration, and I think most clients (other than GC) won't be using
>> iteration at all.
>>
>> I think having the parallel iteration support wrapped with
>> #ifdef INCLUDE_ALL_GCS (as is already the case) deals with the
>> conditional GC usage; I think that all except Serial will eventually
>> use it.
>>
>> The parallel iteration support could be split out into another file,
>> so places that only use serial iteration (like jvmti) wouldn't include
>> it, but I think the number of such places doesn't warrant another
>> file.
>
> I would like that it be split out into another file.  A lot of places in the runtime will only create and add oops to OopStorage and shouldn't care about iteration.  I'm fine with making this change as a further improvement.

I'm unclear on what you are asking for.  Do you want

A. Two files:

(1) oopStorage.hpp - contains just class OopStorage.

(2) oopStorage.inline.hpp - contains all the inline definitions
presently in oopStorage.hpp.

or

B. Three files:

(1) oopStorage.hpp - contains just class OopStorage.

(2) oopStorage.inline.hpp - contains all the inline definitions
presently in oopStorage.hpp, except for the ParState stuff.

(3) oopStorageParState.inline.hpp - contains the ParState stuff.

or

C. Two files:

(1) oopStorage.hpp - contains class OopStorage and all the inline
definitions presently in oopStorage.hpp, except for the ParState
stuff.

(2) oopStorage.inline.hpp - contains the ParState stuff.

I'm fine with option (A).  Option (B) is unusual, but plausible; among
other things, inclusion of the ParState header by non-GC code is
suspicious.  I'm not sure that's worth another file.  Option (C)
doesn't seem like what you are asking for, but included for
completeness.

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

coleen.phillimore


On 1/10/18 4:10 PM, Kim Barrett wrote:

>> On Jan 10, 2018, at 8:56 AM, [hidden email] wrote:
>>
>>
>>
>> On 1/8/18 10:44 PM, Kim Barrett wrote:
>>>> I feel like BasicParState should be in an oopStorage.inline.hpp file to be included by the GCs that need it.   Runtime code only needs to include oopStorage.hpp and doesn't need this code.
>>> I forgot to deal with your pre-review suggestion of moving some stuff
>>> to oopStorage.inline.hpp.  I'd like to defer making such a change
>>> until this review is (mostly) done, rather than clutter up diffs.
>>>
>>> I think all (or very nearly all) of the inline definitions in
>>> oopStorage.hpp could be moved. I think they're all related to
>>> iteration, and I think most clients (other than GC) won't be using
>>> iteration at all.
>>>
>>> I think having the parallel iteration support wrapped with
>>> #ifdef INCLUDE_ALL_GCS (as is already the case) deals with the
>>> conditional GC usage; I think that all except Serial will eventually
>>> use it.
>>>
>>> The parallel iteration support could be split out into another file,
>>> so places that only use serial iteration (like jvmti) wouldn't include
>>> it, but I think the number of such places doesn't warrant another
>>> file.
>> I would like that it be split out into another file.  A lot of places in the runtime will only create and add oops to OopStorage and shouldn't care about iteration.  I'm fine with making this change as a further improvement.
> I'm unclear on what you are asking for.  Do you want
>
> A. Two files:
>
> (1) oopStorage.hpp - contains just class OopStorage.
>
> (2) oopStorage.inline.hpp - contains all the inline definitions
> presently in oopStorage.hpp.
>
> or
>
> B. Three files:
>
> (1) oopStorage.hpp - contains just class OopStorage.
>
> (2) oopStorage.inline.hpp - contains all the inline definitions
> presently in oopStorage.hpp, except for the ParState stuff.
>
> (3) oopStorageParState.inline.hpp - contains the ParState stuff.
>
> or
>
> C. Two files:
>
> (1) oopStorage.hpp - contains class OopStorage and all the inline
> definitions presently in oopStorage.hpp, except for the ParState
> stuff.
>
> (2) oopStorage.inline.hpp - contains the ParState stuff.
>
> I'm fine with option (A).  Option (B) is unusual, but plausible; among
> other things, inclusion of the ParState header by non-GC code is
> suspicious.  I'm not sure that's worth another file.  Option (C)
> doesn't seem like what you are asking for, but included for
> completeness.
>

I think B.  I'd have to see it.   Put this in the RFE.

Coleen

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Erik Österlund-2
In reply to this post by Kim Barrett
Hi Kim,

On 2018-01-10 21:57, Kim Barrett wrote:

>> On Jan 10, 2018, at 8:50 AM, Erik Österlund <[hidden email]> wrote:
>>
>> Hi Kim,
>>
>> I have already seen earlier incarnations of this code so I only have a few comments regarding things that were not in that incarnation.
> Thanks for looking at it yet again.
>
>> src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/gc/shared/OopStorage.java:
>>
>> The Java implementation of OopStorage for the SA agent seems a bit empty and full of TODO comments to actually implement something. It seems like quite a bit of the other changes to the SA assumes that this actually does something. For example there are changes to use OopStorage oopsDo instead of JNIHandleBlock oopsDo, which I presume means there has to be some oopsDo implementation in place for OopStorage.
>> Is the intention to implement that later on in a separate RFE?
> Quoting from the original RFR email:
>
>    - Serviceability Agent updates for OopStorage and JNI usage.  Just
>    enough has been done to allow existing SA tests to "pass".  This
>    will be addressed as a followup.
>
> I haven't filed that RFE yet, but will be doing so.

Okay, I see. Thank you for the explanation. I missed that.

>> src/hotspot/share/gc/shared/oopStorage.hpp:
>>
>> The OopStorage::oops_do functions accept some kind of OopClosure and performs iteration with it. I'm curious though why it takes the specific type of OopClosure as a template parameter, passes it around, and eventually discards the information without making use of it? I was thinking perhaps the intention was to use the closure type information to devirtualize the do_oop() call, but that does not appear to be the case. To do that you would have to in OopStorage::OopFn::operator()() perform a qualified _cl->Closure::do_oop() call instead of _cl->do_oop() (and possibly take some care to prevent abstract oop closure types like OopClosure or ExtendedOopClosure from being devirtualized, which would be awkward (doable with some EnableIf<!IsSame<Closure, OopClosure>::value>::type overload, or something like that).
>> In other words, it seems like either this template parameter is unnecessary and you could just replace uses of it with OopClosure, or it should be made useful, e.g. by devirtualizing the calls or something. Same goes for the IsAliveClosure parameter for weak_oops_do.
> Sorry, but I'm not understanding what problem is being suggested.
> OopStorage doesn't depend on OopClosure as a type, at all, anywhere.
> There are no occurrences of the type OopClosure anywhere in the
> OopStorage implementation.  The types coming in to OopStorage
> iteration are preserved and used, without any need for odd syntax.
>
> With the type preserved like that, for common use-cases such as
>
>    MyClosureType my_closure;
>    storage->oops_do(&my_closure);
>
> the compiler can know the exact type of my_closure at the call site
> for do_oop(), in which case the compiler can devirtualize the call
> (and I would expect any decent compiler to do so; I've seen gcc do it).
>
> Maybe the issue is about JNIHandles::oops_do and friends, which
> presently are ordinary functions with OopClosure* arguments and the
> like?  Yes, the type information isn't flowing through there from the
> caller to the OopStorage infrastructure.  That's because that was
> going to introduce some additional fannout in this changeset.  I was
> going to move the inline definitions to oopStorage.inline.hpp (per
> Coleen's pre-review request), and all uses of iteration would then
> need to be changed to include that, which would have required a new
> jniHandles.inline.hpp, which would have required at least some
> additional downstream include changes.  I'm planning to make those
> changes as a followup.  Of course, I then forgot Coleen's pre-review
> request.

Your example may very well be devirtualized. But your assumption that
passing around the type information is what allows that devirtualization
is incorrect. In fact it does not make it any easier at all.
If you have a class A and a class B derived from A (and no other classes
in your program), then seeing an arbitrary value of type B* at a call
site does not make it safe to devirtualize calls to such objects in the
general case. You need more than the type information here. There is no
way for the compiler to prove that there is not another class C deriving
from B in some other compilation unit that will be linked in at some
later stage (link-time or even run-time), and hence unless otherwise
proven through something other than the declared type, the B* might
(from the compiler point of view) hypothetically point at a C* instance.
What the compiler may do though which is probably what you have
observed, is to perform points-to analysis that proves that all possible
values of the object at the call site have the exact type of B, by
following the object as it gets passed around back to all possible
allocation sites from the call site, using data-flow analysis, proving
that all those allocation sites were of type B. And that points-to
analysis is not being helped in any way by passing around the exact type
in declarations. In fact, you could pass the type inaccurately as A* and
the compiler will still see that all possible values of the object at
the callsite are exactly B and devirtualize the call anyway. When
points-to analysis can prove the type of the object, the passed around
type information no longer matters. And when points-to analysis can not
prove the type of the object, the passed around type information still
does not matter.

Here is a small sample program you can compile to assembly to
demonstrate my point with clang/g++ -O3 -S main.cpp:

#include <stdio.h>

class A {
public:
   virtual void foo() {
     printf("A::foo");
   }
};

class B: public A {
public:
   virtual void foo() {
     printf("B::foo");
   }
};

extern B& get_b();

int main(int argc, char* argv[]) {
   B& b_ref = get_b();
   B b;
   A& a_ref = b;
   b.foo();        // generates non-virtual call; points-to analysis
(trivially) determines the derived type of all possible values of b is B
   a_ref.foo();    // generates non-virtual call; points-to analysis
determines the derived type of all possible values of a_ref is B
(despite declared type being A - it does not matter)
   b_ref.foo();    // generates virtual call; points-to analysis can not
determine the derived type of all possible values of b_ref is B. The
declared type B does not matter or help in any way, as another
compilation unit could have derived B. Only with link-time optimization
is it safe to remove the virtual call
   b_ref.B::foo(); // generates non-virtual call; qualifier forces B
interpretation and devirtualizes the call; no need to worry about having
no clue about the dynamic type

   return 0;
}

I disassembled with both clang and gcc to verify that my comments
reflect reality.

What I am trying to say is that the compiler will not be able to
devirtualize any more or less calls to the OopClosure by passing around
its exact type as a Closure type parameter, unless you utilize that type
information to devirtualize it with a qualified _cl->Closure::do_oop(),
which is the proper way to devirtualize calls (without relying on the
success of points-to analysis finding all allocation sites, which still
is invariant of the passed around Closure type information). Therefore,
it seems strange to me to pass around the Closure type as a template
parameter, but then never make any use of that type information.

So my suggestion remains the same: either make use of the type
information by explicitly devirtualizing the calls to do_oop, or remove
that template type and replace it with OopClosure instead, as that is
equally as accurate for virtual calls that are not explicitly
devirtualized, and makes it a bit easier to read.

>
>> src/hotspot/share/gc/shared/oopStorage.cpp:
>>
>> I recall in the pre-review we had some discussion about whether the infrastructure for supporting early iteration termination was necessary or not. You had me convinced that JNI would need it for something. Is this still the case, or has that use case been revised? Reading through the new JNI code, I could not find uses of OopStorage::iterate_safepoint() from the JNI code, and hence no uses of early termination. Having said that, it does look rather simple now with only the sequential case supporting early termination.
>> However, if there is indeed no use any longer for OopStorage::iterate() and hence the only publicly available tool for early iteration exit, I wonder if perhaps it should be reconsidered if we want to remove that infrastructure. Or will there be other use cases down the road? Currently I find it at least a bit confusing that, e.g. OopStorage::iterate_safepoint() expects a function of type F that returns bool while OopStorage::BasicParState::iterate() expects a function of  type F that returns void. For the OopStorage::iterate_safepoint() case it is clearly described in the documentation, in the OopStorage::BasicParState::iterate() case I did not find it clear that this F is a different F to the safepoint one.
>> I guess I propose either removing early termination support alltogether if nobody is using it anyway (and hence having the F function type consistently return void), or documenting clearly that the sequential and parallel iteration functions expect different function types - one with bool return type and one with void return type, depending on whether early termination is supported or not.
> The SimpleRootsClosure used by
> VM_HeapWalkOperation::collect_simple_roots is a use-case for early
> termination.

I see. Sounds like it is still worthwhile to keep early termination then.

> iteration_safepoint documents a requirement on the result of the
> argument function (the result must be implicitly convertible to
> bool), and the associated behavior.  ParState::iterate doesn't specify
> any requirements on the result of the argument function, nor specify
> any associated behavior, so has none.  I should make that explicit
> though, e.g. document that any result returned by f is ignored.

Yes, documenting that explicitly would be fantastic.

> Since oops_do and weak_oops_do are expected to be the usual iteration
> entry points, and the early termination support doesn't affect them at
> all, I would like to leave it in place, to support use-cases like that
> one from JVMTI.  As we discussed during pre-review, ParState doesn't
> support early termination, since ParState is for exclusive use by GC,
> and we don't have or envision use-cases for early termination by GC.

I am okay with leaving it in place.

Thanks,
/Erik

>> Apart from the above minor remarks, I am delighted to see these changes. I am extra delighted to see deleted_handle() disappear. I am also delighted to see the new simplified synchronization for parallel iteration - that worked out nicely. Thank you for building a good oop storage solution.
> Thanks.
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
> On Jan 10, 2018, at 7:23 PM, Erik Österlund <[hidden email]> wrote:
>
> Your example may very well be devirtualized. But your assumption that passing around the type information is what allows that devirtualization is incorrect. In fact it does not make it any easier at all.
> If you have a class A and a class B derived from A (and no other classes in your program), then seeing an arbitrary value of type B* at a call site does not make it safe to devirtualize calls to such objects in the general case. You need more than the type information here. There is no way for the compiler to prove that there is not another class C deriving from B in some other compilation unit that will be linked in at some later stage (link-time or even run-time), and hence unless otherwise proven through something other than the declared type, the B* might (from the compiler point of view) hypothetically point at a C* instance.
> What the compiler may do though which is probably what you have observed, is to perform points-to analysis that proves that all possible values of the object at the call site have the exact type of B, by following the object as it gets passed around back to all possible allocation sites from the call site, using data-flow analysis, proving that all those allocation sites were of type B. And that points-to analysis is not being helped in any way by passing around the exact type in declarations. In fact, you could pass the type inaccurately as A* and the compiler will still see that all possible values of the object at the callsite are exactly B and devirtualize the call anyway. When points-to analysis can prove the type of the object, the passed around type information no longer matters. And when points-to analysis can not prove the type of the object, the passed around type information still does not matter.
>
> Here is a small sample program you can compile to assembly to demonstrate my point with clang/g++ -O3 -S main.cpp:
>
> #include <stdio.h>
>
> class A {
> public:
>   virtual void foo() {
>     printf("A::foo");
>   }
> };
>
> class B: public A {
> public:
>   virtual void foo() {
>     printf("B::foo");
>   }
> };
>
> extern B& get_b();
>
> int main(int argc, char* argv[]) {
>   B& b_ref = get_b();
>   B b;
>   A& a_ref = b;
>   b.foo();        // generates non-virtual call; points-to analysis (trivially) determines the derived type of all possible values of b is B
>   a_ref.foo();    // generates non-virtual call; points-to analysis determines the derived type of all possible values of a_ref is B (despite declared type being A - it does not matter)
>   b_ref.foo();    // generates virtual call; points-to analysis can not determine the derived type of all possible values of b_ref is B. The declared type B does not matter or help in any way, as another compilation unit could have derived B. Only with link-time optimization is it safe to remove the virtual call
>   b_ref.B::foo(); // generates non-virtual call; qualifier forces B interpretation and devirtualizes the call; no need to worry about having no clue about the dynamic type
>
>   return 0;
> }
>
> I disassembled with both clang and gcc to verify that my comments reflect reality.
>
> What I am trying to say is that the compiler will not be able to devirtualize any more or less calls to the OopClosure by passing around its exact type as a Closure type parameter, unless you utilize that type information to devirtualize it with a qualified _cl->Closure::do_oop(), which is the proper way to devirtualize calls (without relying on the success of points-to analysis finding all allocation sites, which still is invariant of the passed around Closure type information). Therefore, it seems strange to me to pass around the Closure type as a template parameter, but then never make any use of that type information.
>
> So my suggestion remains the same: either make use of the type information by explicitly devirtualizing the calls to do_oop, or remove that template type and replace it with OopClosure instead, as that is equally as accurate for virtual calls that are not explicitly devirtualized, and makes it a bit easier to read.

Okay, I understand what you are talking about now.  And my earlier
response was poor in a number of ways.  I knew this question had come
up before (during pre-review), but I completely misremembered the
rationale and botched the response.

You are correct that points-to analysis deals with a lot of issues.
(Speculative devirtualization can help address some additional cases.
Various mechanisms for informing the compiler that a visible
definition is the final one and can't be further overridden can also
help; local classes, classes in anonymous namespaces, C++11 final
class and function annotations.)

It would be incorrect for OopStorage to directly call a specific
function via cl->ClosureType::do_oop, since ClosureType::do_oop might
not be the most specialized definition.  There's not a (good) way for
the OopStorage source code to know or check that. (Actually, C++11
might provide some tools that could help in some cases.)

One reason for iteration templates is to support const iteration.
There are many iterations of collections of oops that never modify the
contents of the collections, and really ought to be declared const.
(Hotspot code seems often quite bad about const-correctness.
OopStorage supports const iterations, though I'm not sure there's a
use-case for const weak_oops_do; certainly the two argument form that
nulls out dead entries can't be const.  So that isn't supported.)

OopClosure does not support const iteration, because it doesn't
support application to a const oop*.  By making iterate and oops_do be
function templates, we don't require the "closure" argument to be
derived from any particular base class, merely that it models the
appropriate concept.  So we can iterate using something which does
support application to a const oop*.

Another reason for iteration templates is to be forward-looking to
C++11, where it becomes easy to pass in an instance of a local type,
or a lambda, or a bind expression, all of which are ways to make
used-in-one-place function objects conveniently at the point of use.
This can often make the code much easier to understand than having to
go somewhere else to find out what do_oop for *this* closure does.

With a little work we could always use oops_do for iteration (and make
iterate/iterate_safepoint private).  There would be two versions of
oops_do, one using cl->do_oop(p), the other cl(p), selected via a
little metaprogramming (in C++11, I don't remember how hard it is in
C++03, or if it's even possible.)


Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
> On Jan 11, 2018, at 7:56 PM, Kim Barrett <[hidden email]> wrote:
> One reason for iteration templates is to support const iteration.
> […]
>
> Another reason for iteration templates is to be forward-looking to
> C++11, where it becomes easy to pass in an instance of a local type,
> or a lambda, or a bind expression, all of which are ways to make
> used-in-one-place function objects conveniently at the point of use.
> This can often make the code much easier to understand than having to
> go somewhere else to find out what do_oop for *this* closure does.
>
> With a little work we could always use oops_do for iteration (and make
> iterate/iterate_safepoint private).  There would be two versions of
> oops_do, one using cl->do_oop(p), the other cl(p), selected via a
> little metaprogramming (in C++11, I don't remember how hard it is in
> C++03, or if it's even possible.)

Hm, maybe this is easier than I was thinking it might be.  The difficulty
I was remembering seems to be a somewhat different case, and a simple
solution seems obvious now, except it’s kind of late, so maybe I’m tired
and missing something.

I’ll look at it tomorrow.

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Kim Barrett
> On Jan 12, 2018, at 3:53 AM, Kim Barrett <[hidden email]> wrote:
>
>> On Jan 11, 2018, at 7:56 PM, Kim Barrett <[hidden email]> wrote:
>> One reason for iteration templates is to support const iteration.
>> […]
>>
>> Another reason for iteration templates is to be forward-looking to
>> C++11, where it becomes easy to pass in an instance of a local type,
>> or a lambda, or a bind expression, all of which are ways to make
>> used-in-one-place function objects conveniently at the point of use.
>> This can often make the code much easier to understand than having to
>> go somewhere else to find out what do_oop for *this* closure does.
>>
>> With a little work we could always use oops_do for iteration (and make
>> iterate/iterate_safepoint private).  There would be two versions of
>> oops_do, one using cl->do_oop(p), the other cl(p), selected via a
>> little metaprogramming (in C++11, I don't remember how hard it is in
>> C++03, or if it's even possible.)
>
> Hm, maybe this is easier than I was thinking it might be.  The difficulty
> I was remembering seems to be a somewhat different case, and a simple
> solution seems obvious now, except it’s kind of late, so maybe I’m tired
> and missing something.
>
> I’ll look at it tomorrow.

Never mind, I’m not seeing a way to do that selection easily with C++03.
And the reasons for considering doing it don’t really exist until C++11.
So I’m wanting to leave it as is.


Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8194312: Support parallel and concurrent JNI global handle processing

Erik Österlund-2
Hi Kim,

On 15 Jan 2018, at 06:56, Kim Barrett <[hidden email]> wrote:

>> On Jan 12, 2018, at 3:53 AM, Kim Barrett <[hidden email]> wrote:
>>
>>> On Jan 11, 2018, at 7:56 PM, Kim Barrett <[hidden email]> wrote:
>>> One reason for iteration templates is to support const iteration.
>>> […]
>>>
>>> Another reason for iteration templates is to be forward-looking to
>>> C++11, where it becomes easy to pass in an instance of a local type,
>>> or a lambda, or a bind expression, all of which are ways to make
>>> used-in-one-place function objects conveniently at the point of use.
>>> This can often make the code much easier to understand than having to
>>> go somewhere else to find out what do_oop for *this* closure does.
>>>
>>> With a little work we could always use oops_do for iteration (and make
>>> iterate/iterate_safepoint private).  There would be two versions of
>>> oops_do, one using cl->do_oop(p), the other cl(p), selected via a
>>> little metaprogramming (in C++11, I don't remember how hard it is in
>>> C++03, or if it's even possible.)
>>
>> Hm, maybe this is easier than I was thinking it might be.  The difficulty
>> I was remembering seems to be a somewhat different case, and a simple
>> solution seems obvious now, except it’s kind of late, so maybe I’m tired
>> and missing something.
>>
>> I’ll look at it tomorrow.
>
> Never mind, I’m not seeing a way to do that selection easily with C++03.
> And the reasons for considering doing it don’t really exist until C++11.
> So I’m wanting to leave it as is.

Okay. Let’s revisit that in the future instead then. Ship it!

Thanks,
/Erik
12