RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

Coleen Phillimore-3
This change makes the DictionaryEntry pd_set lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.

I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.

Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.

-------------

Commit messages:
 - 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

Changes: https://git.openjdk.java.net/jdk/pull/3362/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3362&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8259242
  Stats: 121 lines in 9 files changed: 61 ins; 34 del; 26 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3362.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3362/head:pull/3362

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

David Holmes-2
On Tue, 6 Apr 2021 18:54:09 GMT, Coleen Phillimore <[hidden email]> wrote:

> This change makes the DictionaryEntry pd_set lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>
> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>
> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.

Hi Coleen,

Can you explain the synchronization protocol you are using here please - I don't understand how the handshake fits into it ??

Further comments below.

Thanks,
David

src/hotspot/share/classfile/dictionary.cpp line 422:

> 420: void Dictionary::clean_cached_protection_domains(GrowableArray<ProtectionDomainEntry*>* delete_list) {
> 421:   assert(Thread::current()->is_Java_thread(), "only called by JavaThread");
> 422:   assert_locked_or_safepoint(SystemDictionary_lock);

If the current thread must be a JavaThread then we cannot be at a safepoint and so the second assert can just check the lock is held.

src/hotspot/share/classfile/dictionary.hpp line 145:

> 143:
> 144:   ProtectionDomainEntry* pd_set() const            { return Atomic::load(&_pd_set); }
> 145:   void set_pd_set(ProtectionDomainEntry* entry)    { Atomic::store(&_pd_set, entry); }

These probably need to be load_acquire and release_store, so that a lock-free traversal can proceed correctly with a locked addition/removal. With suitable name changes for the methods.

src/hotspot/share/classfile/protectionDomainCache.cpp line 89:

> 87:   // If there are any deleted entries, Handshake-all then they'll be
> 88:   // safe to remove since traversing the pd_set list does not stop for
> 89:   // safepoints and only JavaThreads will read the pd_set.

I do not understand what you are trying to do here. This is basically a no-op handshake with every thread? Does all the threads remain in the handshake until the last one is processed or do they proceed through one at a time? The former seems better suited for a safepoint operation. The latter means this doesn't work as they may have moved on to a new removal.

src/hotspot/share/classfile/protectionDomainCache.hpp line 115:

> 113:
> 114:   ProtectionDomainEntry* next() { return Atomic::load(&_next); }
> 115:   void set_next(ProtectionDomainEntry* entry) { Atomic::store(&_next, entry); }

Again these may need to be load_acquire/release_store for correct lock-free traversal. With suitable name changes for the methods.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

Coleen Phillimore-3
On Wed, 7 Apr 2021 04:38:24 GMT, David Holmes <[hidden email]> wrote:

>> This change makes the DictionaryEntry pd_set lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>>
>> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>>
>> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
>
> src/hotspot/share/classfile/dictionary.cpp line 422:
>
>> 420: void Dictionary::clean_cached_protection_domains(GrowableArray<ProtectionDomainEntry*>* delete_list) {
>> 421:   assert(Thread::current()->is_Java_thread(), "only called by JavaThread");
>> 422:   assert_locked_or_safepoint(SystemDictionary_lock);
>
> If the current thread must be a JavaThread then we cannot be at a safepoint and so the second assert can just check the lock is held.

sure.

> src/hotspot/share/classfile/dictionary.hpp line 145:
>
>> 143:
>> 144:   ProtectionDomainEntry* pd_set() const            { return Atomic::load(&_pd_set); }
>> 145:   void set_pd_set(ProtectionDomainEntry* entry)    { Atomic::store(&_pd_set, entry); }
>
> These probably need to be load_acquire and release_store, so that a lock-free traversal can proceed correctly with a locked addition/removal. With suitable name changes for the methods.

I forgot that the method names have to match the implementation details. Thanks I didn't know whether they needed acquire etc.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

Coleen Phillimore-3
In reply to this post by David Holmes-2
On Wed, 7 Apr 2021 04:49:02 GMT, David Holmes <[hidden email]> wrote:

>> This change makes the DictionaryEntry pd_set lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>>
>> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>>
>> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
>
> src/hotspot/share/classfile/protectionDomainCache.cpp line 89:
>
>> 87:   // If there are any deleted entries, Handshake-all then they'll be
>> 88:   // safe to remove since traversing the pd_set list does not stop for
>> 89:   // safepoints and only JavaThreads will read the pd_set.
>
> I do not understand what you are trying to do here. This is basically a no-op handshake with every thread? Does all the threads remain in the handshake until the last one is processed or do they proceed through one at a time? The former seems better suited for a safepoint operation. The latter means this doesn't work as they may have moved on to a new removal.

Only the ServiceThread calls this code, so they won't be removed again.  This is modeled after the ObjectMonitor code that does the same thing.  All the threads have to hit a handshake at a safepoint polling place, which they cannot do while reading the list.  After all the threads have hit the handshake, no thread can see the old entries on the list.
I thought that's what I said in the comment.  What should I say to make this clearer?

> src/hotspot/share/classfile/protectionDomainCache.hpp line 115:
>
>> 113:
>> 114:   ProtectionDomainEntry* next() { return Atomic::load(&_next); }
>> 115:   void set_next(ProtectionDomainEntry* entry) { Atomic::store(&_next, entry); }
>
> Again these may need to be load_acquire/release_store for correct lock-free traversal. With suitable name changes for the methods.

Thanks, I'll add acquire/release and change the names to match the implementation details.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

Coleen Phillimore-3
In reply to this post by David Holmes-2
On Wed, 7 Apr 2021 04:55:35 GMT, David Holmes <[hidden email]> wrote:

>> This change makes the DictionaryEntry pd_set lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>>
>> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>>
>> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
>
> Hi Coleen,
>
> Can you explain the synchronization protocol you are using here please - I don't understand how the handshake fits into it ??
>
> Further comments below.
>
> Thanks,
> David

The lock-free algorithm is based on the one for object monitors, except that adding and deleting entries is done with the SystemDictionary_lock held, reading entries is done with a (now) Atomic load acquires etc with no safepoint polls, and deleting the entries is done by saving the entries off to the side, until a handshake is reached, meaning none of the readers are reading the old version of the list.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

Coleen Phillimore-3
In reply to this post by Coleen Phillimore-3
On Wed, 7 Apr 2021 11:55:56 GMT, Coleen Phillimore <[hidden email]> wrote:

>> src/hotspot/share/classfile/protectionDomainCache.hpp line 115:
>>
>>> 113:
>>> 114:   ProtectionDomainEntry* next() { return Atomic::load(&_next); }
>>> 115:   void set_next(ProtectionDomainEntry* entry) { Atomic::store(&_next, entry); }
>>
>> Again these may need to be load_acquire/release_store for correct lock-free traversal. With suitable name changes for the methods.
>
> Thanks, I'll add acquire/release and change the names to match the implementation details.

Did we collectively agree on the naming convention that set_next has to be release_set_next because the implementation uses release_store, or is this just your preference?  I find this noisy in the context where these functions are being called.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out [v2]

Coleen Phillimore-3
In reply to this post by Coleen Phillimore-3
> This change makes the DictionaryEntry pd_set lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>
> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>
> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.

Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:

  Add acquire/release and update names.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3362/files
  - new: https://git.openjdk.java.net/jdk/pull/3362/files/0695d6e9..40c359b1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3362&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3362&range=00-01

  Stats: 25 lines in 3 files changed: 0 ins; 1 del; 24 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3362.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3362/head:pull/3362

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

David Holmes
In reply to this post by Coleen Phillimore-3
On 7/04/2021 10:30 pm, Coleen Phillimore wrote:

> On Wed, 7 Apr 2021 11:55:56 GMT, Coleen Phillimore <[hidden email]> wrote:
>
>>> src/hotspot/share/classfile/protectionDomainCache.hpp line 115:
>>>
>>>> 113:
>>>> 114:   ProtectionDomainEntry* next() { return Atomic::load(&_next); }
>>>> 115:   void set_next(ProtectionDomainEntry* entry) { Atomic::store(&_next, entry); }
>>>
>>> Again these may need to be load_acquire/release_store for correct lock-free traversal. With suitable name changes for the methods.
>>
>> Thanks, I'll add acquire/release and change the names to match the implementation details.
>
> Did we collectively agree on the naming convention that set_next has to be release_set_next because the implementation uses release_store, or is this just your preference?  I find this noisy in the context where these functions are being called.

Yes that is the convention that was established and in place for a
number of fields.

Thanks,
David

> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/3362
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out

David Holmes
In reply to this post by Coleen Phillimore-3
On 7/04/2021 10:05 pm, Coleen Phillimore wrote:

> On Wed, 7 Apr 2021 04:55:35 GMT, David Holmes <[hidden email]> wrote:
>
>>> This change makes the DictionaryEntry pd_set lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>>>
>>> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>>>
>>> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
>>
>> Hi Coleen,
>>
>> Can you explain the synchronization protocol you are using here please - I don't understand how the handshake fits into it ??
>>
>> Further comments below.
>>
>> Thanks,
>> David
>
> The lock-free algorithm is based on the one for object monitors, except that adding and deleting entries is done with the SystemDictionary_lock held, reading entries is done with a (now) Atomic load acquires etc with no safepoint polls, and deleting the entries is done by saving the entries off to the side, until a handshake is reached, meaning none of the readers are reading the old version of the list.

I really don't see why we need to use handshakes here. The writers use a
lock, and use release_stores. The readers use load_acquire. Our normal
lock-free data structures don't need handshakes to be involved, so sorry
but I'm not getting it.

Thanks,
David

> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/3362
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out [v2]

Coleen Phillimore-3
In reply to this post by Coleen Phillimore-3
On Wed, 7 Apr 2021 12:02:44 GMT, Coleen Phillimore <[hidden email]> wrote:

>> Hi Coleen,
>>
>> Can you explain the synchronization protocol you are using here please - I don't understand how the handshake fits into it ??
>>
>> Further comments below.
>>
>> Thanks,
>> David
>
> The lock-free algorithm is based on the one for object monitors, except that adding and deleting entries is done with the SystemDictionary_lock held, reading entries is done with a (now) Atomic load acquires etc with no safepoint polls, and deleting the entries is done by saving the entries off to the side, until a handshake is reached, meaning none of the readers are reading the old version of the list.

The reader thread could be using the linked list and have just loaded -> next, while the deleting thread also has loaded -> next, deciding it's unused and calling delete on it.  The reader is then swapped back in the process and looks at the _pd_cache field which was deleted or the _next field which is will point to the delete pattern.
When we call 'delete' on a node in this list, we have to make sure that nothing is reading it.

The concurrent hashtable uses globalCounter for this situation.  Many of our lock free data structures delete entries in safepoints (like dictionaries).   This pd_set list does the same as the ObjectMonitor code.

The handshake is rare as I said in the comments.  The protection domain is generally associated with the klass in the initiating loader, which stays alive while this class is alive.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out [v2]

Daniel D.Daugherty
In reply to this post by Coleen Phillimore-3
On Wed, 7 Apr 2021 11:55:19 GMT, Coleen Phillimore <[hidden email]> wrote:

>> src/hotspot/share/classfile/protectionDomainCache.cpp line 89:
>>
>>> 87:   // If there are any deleted entries, Handshake-all then they'll be
>>> 88:   // safe to remove since traversing the pd_set list does not stop for
>>> 89:   // safepoints and only JavaThreads will read the pd_set.
>>
>> I do not understand what you are trying to do here. This is basically a no-op handshake with every thread? Does all the threads remain in the handshake until the last one is processed or do they proceed through one at a time? The former seems better suited for a safepoint operation. The latter means this doesn't work as they may have moved on to a new removal.
>
> Only the ServiceThread calls this code, so they won't be removed again.  This is modeled after the ObjectMonitor code that does the same thing.  All the threads have to hit a handshake at a safepoint polling place, which they cannot do while reading the list.  After all the threads have hit the handshake, no thread can see the old entries on the list.
> I thought that's what I said in the comment.  What should I say to make this clearer?

See src/hotspot/share/runtime/synchronizer.cpp:

size_t ObjectSynchronizer::deflate_idle_monitors() {
<snip>
    if (current->is_Java_thread()) {
<snip>
      // A JavaThread needs to handshake in order to safely free the
      // ObjectMonitors that were deflated in this cycle.
      HandshakeForDeflation hfd_hc;
      Handshake::execute(&hfd_hc);

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: vmTestbase/vm/mlvm/mixed/stress/java/findDeadlock/TestDescription.java timed out [v2]

Patricio Chilano Mateo
In reply to this post by Coleen Phillimore-3
On Wed, 7 Apr 2021 12:39:19 GMT, Coleen Phillimore <[hidden email]> wrote:

>> This change makes the DictionaryEntry pd_set reads lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>>
>> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>>
>> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>
>   Add acquire/release and update names.

Hi Coleen,

Changes LGTM. Some comments below:

src/hotspot/share/classfile/dictionary.cpp line 92:

> 90:   // This doesn't require a lock because nothing is reading this
> 91:   // entry anymore.  The ClassLoader is dead.
> 92:   if (entry->pd_set_acquire() != NULL) {

I think this should still be a 'while' to remove the whole list.

src/hotspot/share/classfile/protectionDomainCache.cpp line 93:

> 91:   // with the caller class and class loader, which if still alive will keep this
> 92:   // protection domain entry alive.
> 93:   if (delete_list->length() != 0) {

If you want you could also use a threshold to decide when to issue the handshake, to avoid triggering it for very few entries (given removals seem to be rare already). See is_async_deflation_needed() for example.

src/hotspot/share/classfile/protectionDomainCache.cpp line 211:

> 209: }
> 210:
> 211:

extra line?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v3]

Coleen Phillimore-3
In reply to this post by Coleen Phillimore-3
> This change makes the DictionaryEntry pd_set reads lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>
> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>
> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.

Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:

  Make ProtectionDomainEntry delete list global and set a threshold for deletions before handshaking. Fix if to when.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3362/files
  - new: https://git.openjdk.java.net/jdk/pull/3362/files/40c359b1..06624ed1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3362&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3362&range=01-02

  Stats: 24 lines in 3 files changed: 11 ins; 1 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3362.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3362/head:pull/3362

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v2]

Coleen Phillimore-3
In reply to this post by Coleen Phillimore-3
On Wed, 7 Apr 2021 12:39:19 GMT, Coleen Phillimore <[hidden email]> wrote:

>> This change makes the DictionaryEntry pd_set reads lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>>
>> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>>
>> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>
>   Add acquire/release and update names.

Thank you for reviewing, Patricio.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v2]

Coleen Phillimore-3
On Thu, 8 Apr 2021 23:02:19 GMT, Coleen Phillimore <[hidden email]> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Add acquire/release and update names.
>
> Thank you for reviewing, Patricio.

I renamed this bug and pull request.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v2]

Coleen Phillimore-3
In reply to this post by Patricio Chilano Mateo
On Thu, 8 Apr 2021 20:18:21 GMT, Patricio Chilano Mateo <[hidden email]> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Add acquire/release and update names.
>
> src/hotspot/share/classfile/dictionary.cpp line 92:
>
>> 90:   // This doesn't require a lock because nothing is reading this
>> 91:   // entry anymore.  The ClassLoader is dead.
>> 92:   if (entry->pd_set_acquire() != NULL) {
>
> I think this should still be a 'while' to remove the whole list.

You're right, thank you for finding this.  I was going to rewrite this to call a function so I can access _next directly.

> src/hotspot/share/classfile/protectionDomainCache.cpp line 93:
>
>> 91:   // with the caller class and class loader, which if still alive will keep this
>> 92:   // protection domain entry alive.
>> 93:   if (delete_list->length() != 0) {
>
> If you want you could also use a threshold to decide when to issue the handshake, to avoid triggering it for very few entries (given removals seem to be rare already). See is_async_deflation_needed() for example.

I wanted to avoid making delete_list global and CHeap allocated.  I could do that and make the threshold 20.

> src/hotspot/share/classfile/protectionDomainCache.cpp line 211:
>
>> 209: }
>> 210:
>> 211:
>
> extra line?

removed.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v3]

David Holmes-2
In reply to this post by Coleen Phillimore-3
On Thu, 8 Apr 2021 23:09:49 GMT, Coleen Phillimore <[hidden email]> wrote:

>> This change makes the DictionaryEntry pd_set reads lock free, because when I set a low SafepointTimeout on the findDeadlock test, these threads were waiting on the ProtectionDomainSet_lock.  This lock is a singleton that's taken for every dictionary entry.
>>
>> I don't know if this change will make the test never timeout again, because removing this lock caused us to hit a low SafepointTimeout on another _no_safepoint_check lock that is not so easy to eradictate.
>>
>> Tested with tiers 1-3.  There's a test that exercises this code in runtime/Dictionary/ProtectionDomainCacheTest.java.
>
> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>
>   Make ProtectionDomainEntry delete list global and set a threshold for deletions before handshaking. Fix if to when.

Hi Coleen,

Thanks for the additional explanations about the synchronization protocol that is being used. The use of the global handshake with the OM deletion had not really registered with me as a general technique that we would employ for safe-memory-reclamation. (I'm not sure if I find it cool or scary! :) )

So to summarise the protocol:
- the pd_set can only be set and appended to when the lock is held
- traversal of a pd_set is lock-free
- removal from a pd_set is a two phase process:
-   1. Under the lock move the entry to the global delete list
-   2. When appropriate the serviceThread will process the delete list, by first handshaking all threads, and then iterating through and deleting the entries

We know an entry cannot be in-use when deleted because that implies a traversal is in progress, but if a traversal is in progress then the handshake cannot have happened.

Is that correct?

I would like to see this high-level description as a block comment somewhere for clarity, as the individual pieces of code seem spread around and it is not easy (for me) to see the connections when just reading the code.

Thanks,
David

src/hotspot/share/classfile/dictionary.cpp line 148:

> 146:          "can only be called by a JavaThread or at safepoint");
> 147:   // This cannot safepoint while reading the protection domain set.
> 148:   NoSafepointVerifier nsv;

Does this implicitly also mean NoHandshakeVerifier? Or do we need a seperate NHV?

src/hotspot/share/classfile/dictionary.cpp line 455:

> 453:             prev->release_set_next(current->next_acquire());
> 454:           }
> 455:           delete_list->push(current);

Suggested comment before this:
// Mark current for deletion, but in the meantime it can still be traversed

It is important that current is not unlinked so that in-progress traversals do not break. Though to be honest I'm unclear exactly what triggers the clearing of an entry this way.

src/hotspot/share/classfile/dictionary.cpp line 560:

> 558: void DictionaryEntry::print_count(outputStream *st) {
> 559:   int count = 0;
> 560:   for (ProtectionDomainEntry* current = pd_set_acquire();  // accessed inside SD lock

Can the end-of-line comment get promoted to a full line comment before the loop - or even an assert that the lock is held.

src/hotspot/share/classfile/protectionDomainCache.cpp line 116:

> 114:     _delete_list = new (ResourceObj::C_HEAP, mtClass)
> 115:                        GrowableArray<ProtectionDomainEntry*>(20, mtClass);
> 116:   }

What is protecting this code to ensure the allocation can only happen once?

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v3]

Coleen Phillimore-3
On Fri, 9 Apr 2021 01:44:41 GMT, David Holmes <[hidden email]> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Make ProtectionDomainEntry delete list global and set a threshold for deletions before handshaking. Fix if to when.
>
> src/hotspot/share/classfile/dictionary.cpp line 148:
>
>> 146:          "can only be called by a JavaThread or at safepoint");
>> 147:   // This cannot safepoint while reading the protection domain set.
>> 148:   NoSafepointVerifier nsv;
>
> Does this implicitly also mean NoHandshakeVerifier? Or do we need a seperate NHV?

It does mean the same thing in that we check for handshakes where we'd check for safepoints, and that what NSV detects.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v3]

Coleen Phillimore-3
In reply to this post by David Holmes-2
On Fri, 9 Apr 2021 01:53:04 GMT, David Holmes <[hidden email]> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Make ProtectionDomainEntry delete list global and set a threshold for deletions before handshaking. Fix if to when.
>
> src/hotspot/share/classfile/dictionary.cpp line 560:
>
>> 558: void DictionaryEntry::print_count(outputStream *st) {
>> 559:   int count = 0;
>> 560:   for (ProtectionDomainEntry* current = pd_set_acquire();  // accessed inside SD lock
>
> Can the end-of-line comment get promoted to a full line comment before the loop - or even an assert that the lock is held.

Ok, I added an assert instead.  This logging at pd_set creation time is sort of useless, but it's useful at PrintSystemDictionaryAtExit time.

> src/hotspot/share/classfile/protectionDomainCache.cpp line 116:
>
>> 114:     _delete_list = new (ResourceObj::C_HEAP, mtClass)
>> 115:                        GrowableArray<ProtectionDomainEntry*>(20, mtClass);
>> 116:   }
>
> What is protecting this code to ensure the allocation can only happen once?

Yes.  This functions is only called from the service thread.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8259242: Remove ProtectionDomainSet_lock [v3]

Coleen Phillimore-3
In reply to this post by David Holmes-2
On Fri, 9 Apr 2021 02:17:29 GMT, David Holmes <[hidden email]> wrote:

>> Coleen Phillimore has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Make ProtectionDomainEntry delete list global and set a threshold for deletions before handshaking. Fix if to when.
>
> src/hotspot/share/classfile/dictionary.cpp line 455:
>
>> 453:             prev->release_set_next(current->next_acquire());
>> 454:           }
>> 455:           delete_list->push(current);
>
> Suggested comment before this:
> // Mark current for deletion, but in the meantime it can still be traversed
>
> It is important that current is not unlinked so that in-progress traversals do not break. Though to be honest I'm unclear exactly what triggers the clearing of an entry this way.

I added the comment.  The service thread could be trying to remove this entry and there's a small chance that some class loading will be looking up this entry with SystemDictionary::find_instance_klass.

    // First clean cached pd lists in loaded CLDs
    // It's unlikely, but some loaded classes in a dictionary might
    // point to a protection_domain that has been unloaded.
    // The dictionary pd_set points at entries in the ProtectionDomainCacheTable.

There's a test runtime/Dictionary/ProtectionDomainCache.java that does this. It creates a protection domain with a jar file loading with a URL class loader that's then removed.  The defining class loader of the class loaded has that protection domain in its pd_set.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3362
12