RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes
Hi Erik,

On 19/02/2021 6:16 pm, Erik Osterlund wrote:
> Hi David,
>
> As for the general problem of accessing a private handle from another thread, my thinking was that this could only possibly be safe if a thread ”grabs on to” a remote thread so it’s not concurrently running. Otherwise doing so would never be safe with or without the new concurrent root processing code.

It is safe whenever the two threads use a protocol that ensures safety.
So bundling a local handle into a synchronous VM-op or synchronous
handshake op is safe, because the originator won't move on until the op
has been completed. But any kind of async op would be dangerous - as
would trying to "return" a Handle via the op.

> So the reasoning was that if we make it safe for safepoint operations to access any thread’s private oops, and for handshakes to access the handshakee’s private oops, then that should cover that kind of interactions.
> However, I did think about the possibility of the handshaker’s private oops being accessed by the handshakee, and thought like you that you just shouldn’t do that. You should use global handles for such shared interactions. But ever since then I have now and then thought ”what if I missed a handle”. And this bug proves that I did indeed miss a handle.

The current case seems to me to be what you would normally encounter.
It is the handshaker that creates the handle and the op and then
potentially has the handshakee execute the op - hence the it has to be
safe for the handshakee to access the handshaker's handles. The reverse
is not expected because, AFAICS, the handshakee never creates the op so
it would only expose local handles if it "returned" them via the op -
and that is inherently unsafe!

>
> Therefore, while I agree that passing a private handle to be used by another thread is something we shouldn’t do, I would sleep better at night knowing that it is safe to do so anyway, should it happen, in case there is another strange interaction that I missed, or some new code comes in that thinks it’s safe. This is the more robust way of dealing with it.

On reflection I can see there are situations where we can construct a
safe exchange of data using handles, so yes this must work. But I don't
like the fact the code performing the exchange has to be aware that
there is some special call that needs to be done to in fact make it safe
though (beyond the protocol) - that is error prone. The Handle logic
should encapsulate this if possible. And ideally we would detect when
such a cross-thread access in in fact unsafe.

So notwithstanding I think there is more to be looked at here, the
current fix is a necessary and desirable one, and I withdraw my
(non-binding) objections.

Thanks,
David
-----

> Thanks,
> /Erik
>
>> On 18 Feb 2021, at 13:48, David Holmes <[hidden email]> wrote:
>>
>> Correction ...
>>
>>> On 18/02/2021 12:22 pm, David Holmes wrote:
>>> More generally I am concerned by the fact that we have a potential bug if a Handle created by one thread is accessed by another, without first calling start_processing() on the creator of the Handle. This would seem to be error prone to me unless it is actually hidden away inside the Handle implementation - though how to do that efficiently is unclear. I'm very concerned that it is too easy to write code (eg. handing something off to the  service thread or notification thread) that will be buggy.
>>
>> I was confusing Handles with Resources in terms of lifecycle - not realising Handles are stack-local to a Thread and can only be shared with another thread very, very carefully.
>>
>> David
>>
>>> Perhaps @fisk can comment on the general problem.
>>> Thanks,
>>> David
>>> src/hotspot/share/runtime/handshake.cpp line 84:
>>>> 82:  public:
>>>> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl, JavaThread* target, jlong start_ns)
>>>> 84:     : HandshakeOperation(cl, target, NULL), _start_time_ns(start_ns) {}
>>> Not at all clear why requester is always NULL here. There is obviously a requester thread even in the async case. If that thread created a Handle in the op then don't we need to call start_processing on it?
>>> src/hotspot/share/runtime/handshake.cpp line 328:
>>>> 326:
>>>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>>>> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());
>>> It is unfortunate that we need to manifest the current thread here (and elsewhere).
>>> -------------
>>> Changes requested by dholmes (Reviewer).
>>> PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes-2
In reply to this post by Robbin Ehn-2
On Wed, 17 Feb 2021 19:50:00 GMT, Robbin Ehn <[hidden email]> wrote:

>> Handshakes may contain oop Handles from the requesting thread.
>> A handshake may be executed by another thread than the requester.
>> We must make sure any such Handle is safe to use by other threads before executing the handshake.
>> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>>
>> Passes t1-5
>
> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>
>   Tiny spelling error

Marked as reviewed by dholmes (Reviewer).

src/hotspot/share/runtime/handshake.cpp line 291:

> 289:     // Only when the target is not executing the handshake itself.
> 290:     StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc);
> 291:   }

As per my comment to Erik, this situation seems like a programming error to me as there you should not be a way for the target to create a Handle that the executing thread then gets a hold off - data should only flow one way from the requesting thread to the executing thread.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

Robbin Ehn-2
On Mon, 22 Feb 2021 05:14:39 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Tiny spelling error
>
> src/hotspot/share/runtime/handshake.cpp line 291:
>
>> 289:     // Only when the target is not executing the handshake itself.
>> 290:     StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc);
>> 291:   }
>
> As per my comment to Erik, this situation seems like a programming error to me as there you should not be a way for the target to create a Handle that the executing thread then gets a hold off - data should only flow one way from the requesting thread to the executing thread.

StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc); is for thread local resources which may contain oops. A handshake may read from e.g. the target threads stack, so before the executing thread reads from the stack, we make sure it is safe.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes
On 22/02/2021 5:38 pm, Robbin Ehn wrote:

> On Mon, 22 Feb 2021 05:14:39 GMT, David Holmes <[hidden email]> wrote:
>
>>> Robbin Ehn has updated the pull request incrementally with one additional commit since the last revision:
>>>
>>>    Tiny spelling error
>>
>> src/hotspot/share/runtime/handshake.cpp line 291:
>>
>>> 289:     // Only when the target is not executing the handshake itself.
>>> 290:     StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc);
>>> 291:   }
>>
>> As per my comment to Erik, this situation seems like a programming error to me as there you should not be a way for the target to create a Handle that the executing thread then gets a hold off - data should only flow one way from the requesting thread to the executing thread.
>
> StackWatermarkSet::start_processing(current_target, StackWatermarkKind::gc); is for thread local resources which may contain oops. A handshake may read from e.g. the target threads stack, so before the executing thread reads from the stack, we make sure it is safe.

Ah I see - not just to make Handles safe (though they are themselves on
the stack). Thanks for clarifying.

David

> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/2571
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread [v3]

David Holmes
In reply to this post by David Holmes
Correction ...

On 22/02/2021 3:11 pm, David Holmes wrote:

> Hi Erik,
>
> On 19/02/2021 6:16 pm, Erik Osterlund wrote:
>> Hi David,
>>
>> As for the general problem of accessing a private handle from another
>> thread, my thinking was that this could only possibly be safe if a
>> thread ”grabs on to” a remote thread so it’s not concurrently running.
>> Otherwise doing so would never be safe with or without the new
>> concurrent root processing code.
>
> It is safe whenever the two threads use a protocol that ensures safety.
> So bundling a local handle into a synchronous VM-op or synchronous
> handshake op is safe, because the originator won't move on until the op
> has been completed. But any kind of async op would be dangerous - as
> would trying to "return" a Handle via the op.
>
>> So the reasoning was that if we make it safe for safepoint operations
>> to access any thread’s private oops, and for handshakes to access the
>> handshakee’s private oops, then that should cover that kind of
>> interactions.
>> However, I did think about the possibility of the handshaker’s private
>> oops being accessed by the handshakee, and thought like you that you
>> just shouldn’t do that. You should use global handles for such shared
>> interactions. But ever since then I have now and then thought ”what if
>> I missed a handle”. And this bug proves that I did indeed miss a handle.
>
> The current case seems to me to be what you would normally encounter.
> It is the handshaker that creates the handle and the op and then
> potentially has the handshakee execute the op - hence the it has to be
> safe for the handshakee to access the handshaker's handles. The reverse
> is not expected because, AFAICS, the handshakee never creates the op so
> it would only expose local handles if it "returned" them via the op -
> and that is inherently unsafe!

Robbin pointed out that the more general issue of safety here involves
e.g walking the stack of the handshakee which may contain oops; it isn't
to per-se make Handles created by the handshakee appear safe.

David
-----

>>
>> Therefore, while I agree that passing a private handle to be used by
>> another thread is something we shouldn’t do, I would sleep better at
>> night knowing that it is safe to do so anyway, should it happen, in
>> case there is another strange interaction that I missed, or some new
>> code comes in that thinks it’s safe. This is the more robust way of
>> dealing with it.
>
> On reflection I can see there are situations where we can construct a
> safe exchange of data using handles, so yes this must work. But I don't
> like the fact the code performing the exchange has to be aware that
> there is some special call that needs to be done to in fact make it safe
> though (beyond the protocol) - that is error prone. The Handle logic
> should encapsulate this if possible. And ideally we would detect when
> such a cross-thread access in in fact unsafe.
>
> So notwithstanding I think there is more to be looked at here, the
> current fix is a necessary and desirable one, and I withdraw my
> (non-binding) objections.
>
> Thanks,
> David
> -----
>
>> Thanks,
>> /Erik
>>
>>> On 18 Feb 2021, at 13:48, David Holmes <[hidden email]> wrote:
>>>
>>> Correction ...
>>>
>>>> On 18/02/2021 12:22 pm, David Holmes wrote:
>>>> More generally I am concerned by the fact that we have a potential
>>>> bug if a Handle created by one thread is accessed by another,
>>>> without first calling start_processing() on the creator of the
>>>> Handle. This would seem to be error prone to me unless it is
>>>> actually hidden away inside the Handle implementation - though how
>>>> to do that efficiently is unclear. I'm very concerned that it is too
>>>> easy to write code (eg. handing something off to the  service thread
>>>> or notification thread) that will be buggy.
>>>
>>> I was confusing Handles with Resources in terms of lifecycle - not
>>> realising Handles are stack-local to a Thread and can only be shared
>>> with another thread very, very carefully.
>>>
>>> David
>>>
>>>> Perhaps @fisk can comment on the general problem.
>>>> Thanks,
>>>> David
>>>> src/hotspot/share/runtime/handshake.cpp line 84:
>>>>> 82:  public:
>>>>> 83:   AsyncHandshakeOperation(AsyncHandshakeClosure* cl,
>>>>> JavaThread* target, jlong start_ns)
>>>>> 84:     : HandshakeOperation(cl, target, NULL),
>>>>> _start_time_ns(start_ns) {}
>>>> Not at all clear why requester is always NULL here. There is
>>>> obviously a requester thread even in the async case. If that thread
>>>> created a Handle in the op then don't we need to call
>>>> start_processing on it?
>>>> src/hotspot/share/runtime/handshake.cpp line 328:
>>>>> 326:
>>>>> 327: void Handshake::execute(HandshakeClosure* hs_cl) {
>>>>> 328:   HandshakeOperation cto(hs_cl, NULL, Thread::current());
>>>> It is unfortunate that we need to manifest the current thread here
>>>> (and elsewhere).
>>>> -------------
>>>> Changes requested by dholmes (Reviewer).
>>>> PR: https://git.openjdk.java.net/jdk/pull/2571
Reply | Threaded
Open this post in threaded view
|

Integrated: 8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Robbin Ehn-2
In reply to this post by Robbin Ehn-2
On Mon, 15 Feb 2021 10:23:24 GMT, Robbin Ehn <[hidden email]> wrote:

> Handshakes may contain oop Handles from the requesting thread.
> A handshake may be executed by another thread than the requester.
> We must make sure any such Handle is safe to use by other threads before executing the handshake.
> This change-set adds SWS::start_process() for the requesting thread also (we already do this for the target JavaThread).
>
> Passes t1-5

This pull request has now been integrated.

Changeset: d7eebdac
Author:    Robbin Ehn <[hidden email]>
URL:       https://git.openjdk.java.net/jdk/commit/d7eebdac
Stats:     31 lines in 1 file changed: 21 ins; 2 del; 8 mod

8261391: ZGC crash - SEGV in RevokeOneBias::do_thread

Reviewed-by: eosterlund, dcubed, dholmes

-------------

PR: https://git.openjdk.java.net/jdk/pull/2571
12