RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Robbin Ehn-2
Handshakes may contain oop Handles from the requesting thread.
A handshake may be executed by another thread than the requester.
We must make sure any such Handle is safe to use by other threads before executing the handshake.
This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).

Passes t1-5

-------------

Commit messages:
 - SWS start processing on handshaker

Changes: https://git.openjdk.java.net/jdk/pull/2571/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2571&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261391
  Stats: 27 lines in 1 file changed: 17 ins; 2 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2571.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2571/head:pull/2571

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Erik Österlund-3
On Mon, 15 Feb 2021 10:23:24 GMT, Robbin Ehn <[hidden email]> wrote:

> Handshakes may contain oop Handles from the requesting thread.
> A handshake may be executed by another thread than the requester.
> We must make sure any such Handle is safe to use by other threads before executing the handshake.
> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>
> Passes t1-5

Looks good. Thanks for fixing!

-------------

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Robbin Ehn-2
On Tue, 16 Feb 2021 10:52:05 GMT, Erik Österlund <[hidden email]> wrote:

> Looks good. Thanks for fixing!

Thank you for having a look.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Daniel D.Daugherty
In reply to this post by Robbin Ehn-2
On Mon, 15 Feb 2021 10:23:24 GMT, Robbin Ehn <[hidden email]> wrote:

> Handshakes may contain oop Handles from the requesting thread.
> A handshake may be executed by another thread than the requester.
> We must make sure any such Handle is safe to use by other threads before executing the handshake.
> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>
> Passes t1-5

No code/technical issues with this fix. Just possible naming changes.

src/hotspot/share/runtime/handshake.cpp line 289:

> 287:   StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc);
> 288:   if (_requester != NULL && _requester != executioner && _requester->is_Java_thread()) {
> 289:     // The handshake closure may contains oop Handles from the _requester.

nit typo: s/contains/contain/

src/hotspot/share/runtime/handshake.cpp line 282:

> 280: };
> 281:
> 282: void HandshakeOperation::prepare(JavaThread* current_target, Thread* executioner) {

I don't think `executioner` is good name here since it doesn't look
like the Thread passed via that parameter is killing anything.
Perhaps you mean `executer` - as in the Thread that is executing
some specific code/function.

Or maybe go with `executing_thread` to be very clear.

src/hotspot/share/runtime/handshake.cpp line 66:

> 64:     _requester(requester) {}
> 65:   virtual ~HandshakeOperation() {}
> 66:   void prepare(JavaThread* current_target, Thread* executioner);

I don't think `executioner` is good name here since it doesn't look
like the Thread passed via that parameter is killing anything.
Perhaps you mean `executer` - as in the Thread that is executing
some specific code/function.

Or maybe go with `executing_thread` to be very clear.

src/hotspot/share/runtime/handshake.cpp line 541:

> 539:       }
> 540:
> 541:       op->prepare(_handshakee, current_thread);

Okay, just I'm clear on this fix:
- You are replacing this single `StackWatermarkSet::start_processing` call on the handshakee with a call to the new prepare() function.
- The new prepare() function always does the `StackWatermarkSet::start_processing` call on the `current_target` which is the handshakee.
- The new prepare() function does a `StackWatermarkSet::start_processing` call on executioner/executer/executing_thread when that thread is different than the requester/handshaker thread.

So you have multiple names for the "target": `target`, `handshakee`,  and `current_target`. Is there a reason to create the new name `current_target` instead of using either `target` or `handshakee` that you've used before? Since `current_target` is new name in this fix, I would like to see that one renamed to either `target` or `handshakee` in this fix.

I think you also have multiple names for the "requester": `requester` and `handshaker`, but I'm less confused by that naming. In this fix, you're now adding a new name: `executioner` which could be called `executer` or `executing_thread`.

Perhaps you could clean up this naming confusion is follow-up RFE that _only_ fixes naming.

-------------

Marked as reviewed by dcubed (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Robbin Ehn-2
On Tue, 16 Feb 2021 17:55:44 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> Handshakes may contain oop Handles from the requesting thread.
>> A handshake may be executed by another thread than the requester.
>> We must make sure any such Handle is safe to use by other threads before executing the handshake.
>> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>>
>> Passes t1-5
>
> src/hotspot/share/runtime/handshake.cpp line 289:
>
>> 287:   StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc);
>> 288:   if (_requester != NULL && _requester != executioner && _requester->is_Java_thread()) {
>> 289:     // The handshake closure may contains oop Handles from the _requester.
>
> nit typo: s/contains/contain/

Fixed

> src/hotspot/share/runtime/handshake.cpp line 282:
>
>> 280: };
>> 281:
>> 282: void HandshakeOperation::prepare(JavaThread* current_target, Thread* executioner) {
>
> I don't think `executioner` is good name here since it doesn't look
> like the Thread passed via that parameter is killing anything.
> Perhaps you mean `executer` - as in the Thread that is executing
> some specific code/function.
>
> Or maybe go with `executing_thread` to be very clear.

Fixed

> src/hotspot/share/runtime/handshake.cpp line 66:
>
>> 64:     _requester(requester) {}
>> 65:   virtual ~HandshakeOperation() {}
>> 66:   void prepare(JavaThread* current_target, Thread* executioner);
>
> I don't think `executioner` is good name here since it doesn't look
> like the Thread passed via that parameter is killing anything.
> Perhaps you mean `executer` - as in the Thread that is executing
> some specific code/function.
>
> Or maybe go with `executing_thread` to be very clear.

Fixed

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Tue, 16 Feb 2021 18:15:29 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> Handshakes may contain oop Handles from the requesting thread.
>> A handshake may be executed by another thread than the requester.
>> We must make sure any such Handle is safe to use by other threads before executing the handshake.
>> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>>
>> Passes t1-5
>
> src/hotspot/share/runtime/handshake.cpp line 541:
>
>> 539:       }
>> 540:
>> 541:       op->prepare(_handshakee, current_thread);
>
> Okay, just I'm clear on this fix:
> - You are replacing this single `StackWatermarkSet::start_processing` call on the handshakee with a call to the new prepare() function.
> - The new prepare() function always does the `StackWatermarkSet::start_processing` call on the `current_target` which is the handshakee.
> - The new prepare() function does a `StackWatermarkSet::start_processing` call on executioner/executer/executing_thread when that thread is different than the requester/handshaker thread.
>
> So you have multiple names for the "target": `target`, `handshakee`,  and `current_target`. Is there a reason to create the new name `current_target` instead of using either `target` or `handshakee` that you've used before? Since `current_target` is new name in this fix, I would like to see that one renamed to either `target` or `handshakee` in this fix.
>
> I think you also have multiple names for the "requester": `requester` and `handshaker`, but I'm less confused by that naming. In this fix, you're now adding a new name: `executioner` which could be called `executer` or `executing_thread`.
>
> Perhaps you could clean up this naming confusion is follow-up RFE that _only_ fixes naming.

1: Request the handshake
  - The thread doing this is the owner of the handshake operation
  - Handles in the handshake operation belongs to this thread and we must do SWS::start_proc() on it

2: Target thread(s)
  - The thread(s) which the operation is performed on
  - We must do SWS::start_proc() on it

3: Executer of the handshake
  - The thread the executes the code that the handshake operation contains.

Handshake operation:
 _requester is the thread requested the operation
 _target is NULL if we target all threads, otherwise it is the target thread.

The prepare() method is part of the handshake operation and need some additional information.
It needs to know the thread executing the handshake and it needs the current target (since _target may be NULL).

The HandshakeState name for the thread owning the handshake state is handshakee, which is a private implementation detail.
When the handshake state is processing handshakes it calls prepare with _handshakee as current_target and current_thread as parameters to prepare().

target may be NULL
current target may not be NULL
handshakee refers to the handshake state owner (not the target of a handshake operation)

They are thus not interchangeable, but at the moment of execution they can all refer to the same thread.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Robbin Ehn-2
On Wed, 17 Feb 2021 10:21:06 GMT, Robbin Ehn <[hidden email]> wrote:

>> src/hotspot/share/runtime/handshake.cpp line 541:
>>
>>> 539:       }
>>> 540:
>>> 541:       op->prepare(_handshakee, current_thread);
>>
>> Okay, just I'm clear on this fix:
>> - You are replacing this single `StackWatermarkSet::start_processing` call on the handshakee with a call to the new prepare() function.
>> - The new prepare() function always does the `StackWatermarkSet::start_processing` call on the `current_target` which is the handshakee.
>> - The new prepare() function does a `StackWatermarkSet::start_processing` call on executioner/executer/executing_thread when that thread is different than the requester/handshaker thread.
>>
>> So you have multiple names for the "target": `target`, `handshakee`,  and `current_target`. Is there a reason to create the new name `current_target` instead of using either `target` or `handshakee` that you've used before? Since `current_target` is new name in this fix, I would like to see that one renamed to either `target` or `handshakee` in this fix.
>>
>> I think you also have multiple names for the "requester": `requester` and `handshaker`, but I'm less confused by that naming. In this fix, you're now adding a new name: `executioner` which could be called `executer` or `executing_thread`.
>>
>> Perhaps you could clean up this naming confusion is follow-up RFE that _only_ fixes naming.
>
> 1: Request the handshake
>   - The thread doing this is the owner of the handshake operation
>   - Handles in the handshake operation belongs to this thread and we must do SWS::start_proc() on it
>
> 2: Target thread(s)
>   - The thread(s) which the operation is performed on
>   - We must do SWS::start_proc() on it
>
> 3: Executer of the handshake
>   - The thread the executes the code that the handshake operation contains.
>
> Handshake operation:
>  _requester is the thread requested the operation
>  _target is NULL if we target all threads, otherwise it is the target thread.
>
> The prepare() method is part of the handshake operation and need some additional information.
> It needs to know the thread executing the handshake and it needs the current target (since _target may be NULL).
>
> The HandshakeState name for the thread owning the handshake state is handshakee, which is a private implementation detail.
> When the handshake state is processing handshakes it calls prepare with _handshakee as current_target and current_thread as parameters to prepare().
>
> target may be NULL
> current target may not be NULL
> handshakee refers to the handshake state owner (not the target of a handshake operation)
>
> They are thus not interchangeable, but at the moment of execution they can all refer to the same thread.

I now see that I missed call prepare when executing the handshake by handshakee, fixing!

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v2]

Robbin Ehn-2
In reply to this post by Robbin Ehn-2
> Handshakes may contain oop Handles from the requesting thread.
> A handshake may be executed by another thread than the requester.
> We must make sure any such Handle is safe to use by other threads before executing the handshake.
> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>
> Passes t1-5

Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - prepare() on self execution and a rename
 - Merge branch 'master' into 8261391-sws-handshaker
 - SWS start processing on handshaker

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2571/files
  - new: https://git.openjdk.java.net/jdk/pull/2571/files/83382d2d..7245e442

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2571&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2571&range=00-01

  Stats: 5728 lines in 102 files changed: 5090 ins; 407 del; 231 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2571.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2571/head:pull/2571

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v2]

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Tue, 16 Feb 2021 18:23:46 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>>  - prepare() on self execution and a rename
>>  - Merge branch 'master' into 8261391-sws-handshaker
>>  - SWS start processing on handshaker
>
> No code/technical issues with this fix. Just possible naming changes.

Update passed t1-3 and locally run the handshake stress tests with ZGC stress options (-XX:+UseZGC -XX:ZCollectionInterval=0.01 -XX:ZFragmentationLimit=0).

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v2]

Daniel D.Daugherty
In reply to this post by Robbin Ehn-2
On Wed, 17 Feb 2021 13:34:16 GMT, Robbin Ehn <[hidden email]> wrote:

>> Handshakes may contain oop Handles from the requesting thread.
>> A handshake may be executed by another thread than the requester.
>> We must make sure any such Handle is safe to use by other threads before executing the handshake.
>> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>>
>> Passes t1-5
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
>  - prepare() on self execution and a rename
>  - Merge branch 'master' into 8261391-sws-handshaker
>  - SWS start processing on handshaker

Looks good.

src/hotspot/share/runtime/handshake.cpp line 289:

> 287:   }
> 288:   if (current_target != executing_thread) {
> 289:     // Only when the target is not executing the handshake it self.

nit typo: s/it self/itself/

-------------

Marked as reviewed by dcubed (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v2]

Robbin Ehn-2
On Wed, 17 Feb 2021 17:17:50 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>>
>>  - prepare() on self execution and a rename
>>  - Merge branch 'master' into 8261391-sws-handshaker
>>  - SWS start processing on handshaker
>
> src/hotspot/share/runtime/handshake.cpp line 289:
>
>> 287:   }
>> 288:   if (current_target != executing_thread) {
>> 289:     // Only when the target is not executing the handshake it self.
>
> nit typo: s/it self/itself/

Thanks, fixing!

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

Robbin Ehn-2
In reply to this post by Robbin Ehn-2
> Handshakes may contain oop Handles from the requesting thread.
> A handshake may be executed by another thread than the requester.
> We must make sure any such Handle is safe to use by other threads before executing the handshake.
> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>
> Passes t1-5

Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:

  Tiny spelling error

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2571/files
  - new: https://git.openjdk.java.net/jdk/pull/2571/files/7245e442..69330c1b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2571&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2571&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2571.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2571/head:pull/2571

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

Erik Österlund-3
On Wed, 17 Feb 2021 19:47:36 GMT, Robbin Ehn <[hidden email]> wrote:

>> Handshakes may contain oop Handles from the requesting thread.
>> A handshake may be executed by another thread than the requester.
>> We must make sure any such Handle is safe to use by other threads before executing the handshake.
>> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>>
>> Passes t1-5
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>
>   Tiny spelling error

Marked as reviewed by eosterlund (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes-2
In reply to this post by Robbin Ehn-2
On Wed, 17 Feb 2021 19:50:00 GMT, Robbin Ehn <[hidden email]> wrote:

>> Handshakes may contain oop Handles from the requesting thread.
>> A handshake may be executed by another thread than the requester.
>> We must make sure any such Handle is safe to use by other threads before executing the handshake.
>> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>>
>> Passes t1-5
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>
>   Tiny spelling error

Hi Robbin,

I think there is a bug in your fix wrt. async handshakes - see below.

More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.

Perhaps @fisk can comment on the general problem.

Thanks,
David

src/hotspot/share/runtime/handshake.cpp line 84:

> 82:  public:
> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
> 84:     : HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}

Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?

src/hotspot/share/runtime/handshake.cpp line 328:

> 326:
> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());

It is unfortunate that we need to manifest the current thread here (and elsewhere).

-------------

Changes requested by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

Robbin Ehn-2
On Thu, 18 Feb 2021 02:10:56 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Tiny spelling error
>
> src/hotspot/share/runtime/handshake.cpp line 84:
>
>> 82:  public:
>> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
>> 84:     : HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}
>
> Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?

The asynchronous handshake may never be executed, thus having an infinity life-time. (thread never leaves native/blocked)
Referring to a the requesting thread means we must keep it alive forever also.
If you refer to stack-based resources from a thread, means that thread should not touch it's stack before you are done.
If the requester can't use it stack until you are done, you essentially have a synchronous handshake.

The only way we have to keep the requesting thread alive is to put in a ThreadsList, which means we would keep all threads alive potentially forever (until end of VM-life). But still a Handle would be unsafe because it's stack-based.

Therefore it's hard-coded to NULL. (are we suppose to use nullptr now ?)

> src/hotspot/share/runtime/handshake.cpp line 328:
>
>> 326:
>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());
>
> It is unfortunate that we need to manifest the current thread here (and elsewhere).

Yes, I did that since I did not want to change the API.
I was thinking of:
Handshake::execute(HandshakeClosure* hs_cl, Thread* requester = NULL)
And then go over calls site and see if we have the current thread in a variable.
But I felt that would be better off in another change, if wanted.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes
On 18/02/2021 5:39 pm, Robbin Ehn wrote:

> On Thu, 18 Feb 2021 02:10:56 GMT, David Holmes <[hidden email]> wrote:
>
>>> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>>>
>>>    Tiny spelling error
>>
>> src/hotspot/share/runtime/handshake.cpp line 84:
>>
>>> 82:  public:
>>> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
>>> 84:     : HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}
>>
>> Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?
>
> The asynchronous handshake may never be executed, thus having an infinity life-time. (thread never leaves native/blocked)
> Referring to a the requesting thread means we must keep it alive forever also.
> If you refer to stack-based resources from a thread, means that thread should not touch it's stack before you are done.
> If the requester can't use it stack until you are done, you essentially have a synchronous handshake.
> The only way we have to keep the requesting thread alive is to put in a ThreadsList, which means we would keep all threads alive potentially forever (until end of VM-life). But still a Handle would be unsafe because it's stack-based.

I've been confusing Handles with Resources - not realising a Handle is
stack-local and so shouldn't really be shared with another thread. So
are we solving the wrong problem here? Rather than making it "safe" for
one thread to access another's Handle, should we not be using a
different mechanism to pass the oop between threads?

> Therefore it's hard-coded to NULL. (are we suppose to use nullptr now ?)

Yes nullptr (when we remember :) )

Thanks,
David

>> src/hotspot/share/runtime/handshake.cpp line 328:
>>
>>> 326:
>>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>>> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());
>>
>> It is unfortunate that we need to manifest the current thread here (and elsewhere).
>
> Yes, I did that since I did not want to change the API.
> I was thinking of:
> Handshake::execute(HandshakeClosure* hs_cl, Thread* requester = NULL)
> And then go over calls site and see if we have the current thread in a variable.
> But I felt that would be better off in another change, if wanted.
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/2571
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes
In reply to this post by David Holmes-2
Correction ...

On 18/02/2021 12:22 pm, David Holmes wrote:
>
> More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.

I was confusing Handles with Resources in terms of lifecycle - not
realising Handles are stack-local to a Thread and can only be shared
with another thread very, very carefully.

David

> Perhaps @fisk can comment on the general problem.
>
> Thanks,
> David
>
> src/hotspot/share/runtime/handshake.cpp line 84:
>
>> 82:  public:
>> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
>> 84:     : HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}
>
> Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?
>
> src/hotspot/share/runtime/handshake.cpp line 328:
>
>> 326:
>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());
>
> It is unfortunate that we need to manifest the current thread here (and elsewhere).
>
> -------------
>
> Changes requested by dholmes (Reviewer).
>
> PR: https://git.openjdk.java.net/jdk/pull/2571
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

Robbin Ehn-2
In reply to this post by David Holmes-2
On Thu, 18 Feb 2021 02:19:58 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Tiny spelling error
>
> Hi Robbin,
>
> I think there is a bug in your fix wrt. async handshakes - see below.
>
> More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.
>
> Perhaps @fisk can comment on the general problem.
>
> Thanks,
> David

> _Mailing list message from [David Holmes](mailto:[hidden email]) on [hotspot-runtime-dev](mailto:[hidden email]):_
>
> Correction ...
>
> On 18/02/2021 12:22 pm, David Holmes wrote:
>
> > More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.
>
> I was confusing Handles with Resources in terms of lifecycle - not
> realising Handles are stack-local to a Thread and can only be shared
> with another thread very, very carefully.

Yes, but I do not intended to address if we should disallow passing Handles or if Handles should handle this automatically.
IMHO the right thing would be a 'GlobalHandle' (with oop backing in oopStorage) which you can give owner ship of to another thread.

Are you ok with the change?

>
> David

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes
On 19/02/2021 5:41 pm, Robbin Ehn wrote:

> On Thu, 18 Feb 2021 02:19:58 GMT, David Holmes <[hidden email]> wrote:
>
>>> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>>>
>>>    Tiny spelling error
>>
>> Hi Robbin,
>>
>> I think there is a bug in your fix wrt. async handshakes - see below.
>>
>> More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.
>>
>> Perhaps @fisk can comment on the general problem.
>>
>> Thanks,
>> David
>
>> _Mailing list message from [David Holmes](mailto:[hidden email]) on [hotspot-runtime-dev](mailto:[hidden email]):_
>>
>> Correction ...
>>
>> On 18/02/2021 12:22 pm, David Holmes wrote:
>>
>>> More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.
>>
>> I was confusing Handles with Resources in terms of lifecycle - not
>> realising Handles are stack-local to a Thread and can only be shared
>> with another thread very, very carefully.
>
> Yes, but I do not intended to address if we should disallow passing Handles or if Handles should handle this automatically.
> IMHO the right thing would be a 'GlobalHandle' (with oop backing in oopStorage) which you can give owner ship of to another thread.
>
> Are you ok with the change?

Not really. I think it solves the wrong problem. But I won't block it.

I'll file a RFE to look at the broader problem.

Cheers,
David
-----


>>
>> David
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/2571
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

Erik Österlund-2
In reply to this post by David Holmes
Hi David,

As for the general problem of accessing a private handle from another thread, my thinking was that this could only possibly be safe if a thread ”grabs on to” a remote thread so it’s not concurrently running. Otherwise doing so would never be safe with or without the new concurrent root processing code.
So the reasoning was that if we make it safe for safepoint operations to access any thread’s private oops, and for handshakes to access the handshakee’s private oops, then that should cover that kind of interactions.

However, I did think about the possibility of the handshaker’s private oops being accessed by the handshakee, and thought like you that you just shouldn’t do that. You should use global handles for such shared interactions. But ever since then I have now and then thought ”what if I missed a handle”. And this bug proves that I did indeed miss a handle.

Therefore, while I agree that passing a private handle to be used by another thread is something we shouldn’t do, I would sleep better at night knowing that it is safe to do so anyway, should it happen, in case there is another strange interaction that I missed, or some new code comes in that thinks it’s safe. This is the more robust way of dealing with it.

Thanks,
/Erik

> On 18 Feb 2021, at 13:48, David Holmes <[hidden email]> wrote:
>
> Correction ...
>
>> On 18/02/2021 12:22 pm, David Holmes wrote:
>> More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.
>
> I was confusing Handles with Resources in terms of lifecycle - not realising Handles are stack-local to a Thread and can only be shared with another thread very, very carefully.
>
> David
>
>> Perhaps @fisk can comment on the general problem.
>> Thanks,
>> David
>> src/hotspot/share/runtime/handshake.cpp line 84:
>>> 82:  public:
>>> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
>>> 84:     : HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}
>> Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?
>> src/hotspot/share/runtime/handshake.cpp line 328:
>>> 326:
>>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>>> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());
>> It is unfortunate that we need to manifest the current thread here (and elsewhere).
>> -------------
>> Changes requested by dholmes (Reviewer).
>> PR: https://git.openjdk.java.net/jdk/pull/2571
12