Re: RFR: 8257831: Suspend with handshakes [v2]

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Daniel D.Daugherty
On Fri, 26 Mar 2021 13:21:43 GMT, Robbin Ehn <[hidden email]> wrote:

>> A suspend request is done by handshaking thread target thread(s). When executing the handshake operation we know the target mutator thread is in a dormant state (as in safepoint safe state). We have a guarantee that it will check it's poll before leaving the dormant state. To stop the thread from leaving the the dormant state we install a second asynchronous handshake to be executed by the targeted thread. The asynchronous handshake will wait on a monitor while the thread is suspended. The target thread cannot not leave the dormant state without a resume request.
>>
>> Per thread suspend requests are naturally serialized by the per thread HandshakeState lock (we can only execute one handshake at a time per thread).
>> Instead of having a separate lock we use this to our advantage and use HandshakeState lock for serializing access to the suspend flag and for wait/notify.
>>
>> Suspend:
>> Requesting thread -> synchronous handshake -> target thread
>> Inside synchronus handshake (HandshakeState lock is locked while
>> executing any handshake):
>> - Set suspended flag
>> - Install asynchronous handshake
>>
>> Target thread -> tries to leave dormant state -> Executes handshakes
>> Target only executes asynchronous handshake:
>> - While suspended
>> - Go to blocked
>> - Wait on HandshakeState lock
>>
>> Resume:
>> Resuming thread:
>> - Lock HandshakeState lock
>> - Clear suspended flag
>> - Notify HandshakeState lock
>> - Unlock HandshakeState lock
>>
>> The "suspend requested" flag is an optimization, without it a dormant thread could be suspended and resumed many times and each would add a new asynchronous handshake. Suspend requested flag means there is already an asynchronous suspend handshake in queue which can be re-used, only the suspend flag needs to be set.
>>
>> ----
>> Some code can be simplified or done in a smarter way but I refrained from doing such changes instead tried to keep existing code as is as far as possible. This concerns especially raw monitors.
>>
>> ----
>> Regarding the changed test, the documentation says:
>> "If the calling thread is specified in the request_list array, this function will not return until some other thread resumes it."
>>
>> But the code:
>>   LOG("suspendTestedThreads: before JVMTI SuspendThreadList");
>>   err = jvmti->SuspendThreadList(threads_count, threads, results);
>>   ...
>>   // Allow the Main thread to inspect the result of tested threads suspension
>>   agent_unlock(jni);
>>  
>> The thread will never return from SuspendThreadList until resumed, so it cannot unlock with agent_unlock().
>> Thus main thread is stuck forever on:
>>   // Block until the suspender thread competes the tested threads suspension
>>   agent_lock(jni);
>>
>> And never checks and resumes the threads. So I removed that lock instead just sleep and check until all thread have the expected suspended state.
>>
>> ----
>>
>> This version already contains updates after pre-review comments from @dcubed-ojdk, @pchilano, @coleenp.
>> (Pre-review comments here:
>> https://github.com/openjdk/jdk/pull/2625)
>>
>> ----
>> Testing t1-t8, nsk_jdi/nsk_jvmti/jdk_jdi/tck, KS24, RunThese and
>> combinations like running with -XX:ZCollectionInterval=0.01 -
>> XX:ZFragmentationLimit=0.
>> Running above some of above concurrently (load ~240), slow debug,
>> etc...
>
> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>
>  - Merge branch 'master' into SuspendInHandshake
>  - 8257831: Suspend with handshake (review baseline)

This is an elegant evolution of the suspend/resume mechanism.

It is so nice to see all the suspend-equivalent stuff go away!

src/hotspot/os/posix/signals_posix.cpp line 1587:

> 1585:   // destructor has completed.
> 1586:
> 1587:   if (thread->is_Java_thread() && thread->as_Java_thread()->is_terminated()) {

We need @dholmes-ora to verify that this version of the code will
still solve the bug he was fixing when he added old L1610.

src/hotspot/share/runtime/handshake.hpp line 37:

> 35: class JavaThread;
> 36: class ThreadSuspensionHandshake;
> 37: class SuspendThreadHandshake;

Should these be in alpha sort order?

src/hotspot/share/runtime/handshake.hpp line 147:

> 145:   bool handshake_suspend();
> 146:   // Called from the async handshake (the trap)
> 147:   // to stop a thread from continuing executing when suspended.

nit typo: s/continuing executing/continuing execution/

src/hotspot/share/runtime/objectMonitor.cpp line 422:

> 420:         _recursions = 0;
> 421:         _succ = NULL;
> 422:         exit(false, current);

You'll have a conflict with @pchilano recent fix.

-------------

Marked as reviewed by dcubed (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Daniel D.Daugherty
On Wed, 31 Mar 2021 03:47:28 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>>
>>  - Merge branch 'master' into SuspendInHandshake
>>  - 8257831: Suspend with handshake (review baseline)
>
> src/hotspot/share/prims/jvmtiEnv.cpp line 946:
>
>> 944: // java_thread - pre-checked
>> 945: jvmtiError
>> 946: JvmtiEnv::SuspendThread(JavaThread* java_thread) {
>
> The comment above this still states:
>
> // Threads_lock NOT held, java_thread not protected by lock
>
> but the java_thread is protected by a TLH so we should document that so we know it is always safe to refer to java_thread below.

These `Threads_lock NOT held ...` comments in JVM/TI are a left over
issue from the Thread-SMR project and I think I forgot to file a follow-up
bug to clean those up.

I recommend handling that in a general cleanup issue for all of these
comments in JVM/TI (and possibly elsewhere). Please let me know if I
should go ahead and file that bug.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Tue, 6 Apr 2021 11:49:42 GMT, Richard Reingruber <[hidden email]> wrote:

>>>
>>>
>>> We don't want to always to take a lock (handshake state lock), if the poll is disarmed there cannot be any suspend request.
>>> We only need to unlock the OM for suspends that happened before we grabbed the OM lock in EnterI().
>>>
>>> When we leave EnterI we are still blocked, this means a thread might be processing the suspend handshake, which was emitted before we did EnterI().
>>> To synchronize the suspend check with such a thread we need to grab the HandshakeState lock, e.g. we cannot racy check the suspended flag.
>>
>> I don't think that checking the suspended flag without holding the handshake
>> state lock would be too racy. Holding the OM is sufficient synchronization for
>> checking the suspended flag here.
>>
>> If the suspender thread S did the suspend request while holding the OM then the
>> current thread T will see that it was suspended when it has entered the OM.
>> If S did the suspend without holding OM, only then the check would be racy but that would be ok.
>>
>>> I choosed to check for an async suspension handshake that needs to be processed. (We can have such without being suspended). We could ignored that async handshake by just checking is_suspended, it would be processed in the else statement instead with "SafepointMechanism::process_if_requested(current);" instead.
>>>
>>> But we avoid releasing the OM lock in cases where we already have been resumed.
>>>
>>> So I can change to is_suspended inside the the method, but we still must do it under the HandshakeState lock.
>
>>
>>
>> To clarify, is_suspended() can only be checked when the JavaThread is _unsafe_, _after_ you have transitioned from the safe state.
>
> For a minute I misunderstood this and thought you meant this as a general rule
> for calling `is_suspended()` while there are examples where it is called without
> any synchronization (e.g. in `JvmtiEnv::GetThreadState()`) which can be ok.
>
>> In this case we are checking if we are suspended before we have completed the transition because we need to know if we should drop the OM lock.
>
> I think it is sufficient to hold the OM when calling `is_suspended` to decide if OM has to be dropped.

I do not follow, the OM is unrelated.
The suspender do not hold the OM.

What happens is:
- Thread A wait on OM X in blocked.
- Thread Z suspends Thread A, thread Z have no relation to OM X.
- Thread B unlocks OM X, thread B have no relation to the suspend request.
- Thread A locks OM X while blocked.
- Thread A was not allowed to lock OM X due to it is actually suspended.
- Thread A unlocks OM X and waits until resumed.

So while A and B fight over OM X, A is trying to figure out if Z suspended him while grabbing the lock.

The suspend flag and 'async handshake' flag are protected by the HandshakeState lock. (this is not lockless protocol)
The stores in SuspendThreadHandshake are not ordered, so they may happen in any order.
To read the stores there you must hold the HandshakeState lock.
Therefore if we want the exact answer to am A suspended we must read the flag under the lock.

Makes sense?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Wed, 31 Mar 2021 14:01:19 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> src/hotspot/share/prims/jvmtiEnv.cpp line 946:
>>
>>> 944: // java_thread - pre-checked
>>> 945: jvmtiError
>>> 946: JvmtiEnv::SuspendThread(JavaThread* java_thread) {
>>
>> The comment above this still states:
>>
>> // Threads_lock NOT held, java_thread not protected by lock
>>
>> but the java_thread is protected by a TLH so we should document that so we know it is always safe to refer to java_thread below.
>
> These `Threads_lock NOT held ...` comments in JVM/TI are a left over
> issue from the Thread-SMR project and I think I forgot to file a follow-up
> bug to clean those up.
>
> I recommend handling that in a general cleanup issue for all of these
> comments in JVM/TI (and possibly elsewhere). Please let me know if I
> should go ahead and file that bug.

Yes, please file that issue.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Wed, 31 Mar 2021 03:51:02 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>>
>>  - Merge branch 'master' into SuspendInHandshake
>>  - 8257831: Suspend with handshake (review baseline)
>
> src/hotspot/share/prims/jvmtiImpl.cpp line 767:
>
>> 765: //
>> 766:
>> 767: bool JvmtiSuspendControl::suspend(JavaThread *java_thread) {
>
> Future RFE: JvmtiSuspendControl is no longer needed

Yes

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

David Holmes-2
In reply to this post by Daniel D.Daugherty
On Tue, 6 Apr 2021 18:47:58 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> I think we should add JVMTI_ERROR_THREAD_SUSPENDED as @reinrich says, it is possible for someone to sneak in a second suspend request before us.
>>
>> @dcubed-ojdk it seem like we could be posting JvmtiExport::post_monitor_contended_enter() from the ensure_join() which locks the threadObj.
>>
>> So it might be best to treat this the same way as the others?
>
> By "treat this the same way as the others", you mean check and return either
> JVMTI_ERROR_THREAD_NOT_ALIVE or JVMTI_ERROR_THREAD_SUSPENDED as
> appropriate when we get a false back from JvmtiSuspendControl::suspend(current)?
>
> I'm not sure what this question is about:
>
>> it seem like we could be posting JvmtiExport::post_monitor_contended_enter() from the ensure_join() which locks the threadObj.

I'm also unclear what Robbin is referring to. I go back to my original comment that surely JVMTI_ERROR_THREAD_NOT_ALIVE is impossible here?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Wed, 31 Mar 2021 05:38:14 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>>
>>  - Merge branch 'master' into SuspendInHandshake
>>  - 8257831: Suspend with handshake (review baseline)
>
> src/hotspot/share/runtime/handshake.cpp line 633:
>
>> 631: bool HandshakeState::suspend_request_pending() {
>> 632:   assert(JavaThread::current()->thread_state() != _thread_blocked, "should not be in a blocked state");
>> 633:   assert(JavaThread::current()->thread_state() != _thread_in_native, "should not be in native");
>
> Not clear why we care about the state of the current thread here ??

I missed one assert, this must be asked on yourself.
If you ask while in blocked/native the answer may change at any moment, since processing of handshake is ok.
To get a stable result you must be in another state.

Note @pchilano think we should remove this method, so either I add assert and comment or remove it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Wed, 31 Mar 2021 05:57:30 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>>
>>  - Merge branch 'master' into SuspendInHandshake
>>  - 8257831: Suspend with handshake (review baseline)
>
> src/hotspot/share/runtime/nonJavaThread.cpp line 326:
>
>> 324:
>> 325:   while (watcher_thread() != NULL) {
>> 326:     // This wait should make safepoint checks, wait without a timeout.
>
> Nit: s/checks, wait/checks and wait/

Fixed

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Wed, 31 Mar 2021 05:44:35 GMT, David Holmes <[hidden email]> wrote:

>> Robbin Ehn has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains two commits:
>>
>>  - Merge branch 'master' into SuspendInHandshake
>>  - 8257831: Suspend with handshake (review baseline)
>
> src/hotspot/share/runtime/handshake.cpp line 677:
>
>> 675:     } else {
>> 676:       // Target is going to wake up and leave suspension.
>> 677:       // Let's just stop the thread from doing that.
>
> IIUC this would be the case where the target was hit with a suspend request but has not yet processed the actual suspension handshake op, then a resume comes in so suspended is no longer true, and then another suspend request is made (this one) which simply turns the suspended flag back on - is that right?
> Overall I'm finding it very hard to see what the two suspend state flags actually signify. I'd like to see that written up somewhere.

Sure

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8257831: Suspend with handshakes [v2]

Patricio Chilano Mateo
In reply to this post by Daniel D.Daugherty
On Wed, 7 Apr 2021 14:33:00 GMT, Robbin Ehn <[hidden email]> wrote:

>> test/hotspot/jtreg/serviceability/jvmti/SuspendWithCurrentThread/SuspendWithCurrentThread.java line 132:
>>
>>> 130:             ThreadToSuspend.setAllThreadsReady();
>>> 131:
>>> 132:             while (!checkSuspendedStatus()) {
>>
>> The changes to this test are still quite unclear to me. Too hard to figure out what the test was originally trying to do!
>
> The test worked because we didn't check for suspend when leaving a safepoint request back to native.
> @pchilano have more info on why the test even works today.

Right, the reason the test was working so far but not anymore without this changes is because before we didn't check for suspension on ~ThreadBlockInVM() after leaving a safepoint safe state. The interaction is kind of subtle:
In the test the main thread creates 10 working threads and sets one of those to have the role of 'Suspender'. The 'Suspender' acquires agent_monitor(a jvmtirawmonitor), suspends all working threads including itself by calling jvmti->SuspendThreadList() and then it releases agent_monitor. In the mean time the main thread just blocks on agent_monitor until 'Suspender' releases it to check that everybody was indeed suspended. Now, at first glance this looks like(at least for me) the test should deadlock because 'Suspender' should never return from SuspendThreadList() to release agent_monitor since he was suspended too. The reason 'Suspender' returns from SuspendThreadList() is because he never actually checks for suspension after the safepoint operation (VM_ThreadsSuspendJVMTI) finished. 'Suspender' waits on the VMOperation_lock with as_suspend_equivalent=false (default) so it transitions back to _thread_in_vm without suspending. And then when going back to native in ~ThreadInVMfromNati
 ve() we also don't check for suspends (we now check it here too actually).
The JVMTI specs says this about SuspendThreadList:

"If the calling thread is specified in the request_list array, this function will not return until some other thread resumes it."

So this patch actually fixes that.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3191