RFR(XL): 8185640: Thread-local handshakes

classic Classic list List threaded Threaded
75 messages Options
1234
Reply | Threaded
Open this post in threaded view
|

RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
Hi all,

Starting the review of the code while JEP work is still not completed.

JEP: https://bugs.openjdk.java.net/browse/JDK-8185640

This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not
just all threads or none.

Entire changeset:
http://cr.openjdk.java.net/~rehn/8185640/v0/flat/

Divided into 3-parts,
SafepointMechanism abstraction:
http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
Consolidating polling page allocation:
http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
Handshakes:
http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/

A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread
itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be
performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a
handshake can be performed with that single JavaThread as well.

The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the
guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.

Example of potential use-cases:
-Biased lock revocation
-External requests for stack traces
-Deoptimization
-Async exception delivery
-External suspension
-Eliding memory barriers

All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported
platforms are Linux x64 and Solaris SPARC.

Tested heavily with various test suits and comes with a few new tests.

Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically
ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on
JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all
JavaThreads in an array instead of a linked list.

Thanks, Robbin
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Nils Eliasson
Hi Robbin,

I have reviewed the compiler parts of the patch - c1, c2, jvmci and cpu*.

Look great!

Regards,

Nils


On 2017-10-11 15:37, Robbin Ehn wrote:

> Hi all,
>
> Starting the review of the code while JEP work is still not completed.
>
> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>
> This JEP introduces a way to execute a callback on threads without
> performing a global VM safepoint. It makes it both possible and cheap
> to stop individual threads and not just all threads or none.
>
> Entire changeset:
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>
> Divided into 3-parts,
> SafepointMechanism abstraction:
> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
> Consolidating polling page allocation:
> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
> Handshakes:
> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>
> A handshake operation is a callback that is executed for each
> JavaThread while that thread is in a safepoint safe state. The
> callback is executed either by the thread itself or by the VM thread
> while keeping the thread in a blocked state. The big difference
> between safepointing and handshaking is that the per thread operation
> will be performed on all threads as soon as possible and they will
> continue to execute as soon as it’s own operation is completed. If a
> JavaThread is known to be running, then a handshake can be performed
> with that single JavaThread as well.
>
> The current safepointing scheme is modified to perform an indirection
> through a per-thread pointer which will allow a single thread's
> execution to be forced to trap on the guard page. In order to force a
> thread to yield the VM updates the per-thread pointer for the
> corresponding thread to point to the guarded page.
>
> Example of potential use-cases:
> -Biased lock revocation
> -External requests for stack traces
> -Deoptimization
> -Async exception delivery
> -External suspension
> -Eliding memory barriers
>
> All of these will benefit the VM moving towards becoming more
> low-latency friendly by reducing the number of global safepoints.
> Platforms that do not yet implement the per JavaThread poll, a
> fallback to normal safepoint is in place. HandshakeOneThread will then
> be a normal safepoint. The supported platforms are Linux x64 and
> Solaris SPARC.
>
> Tested heavily with various test suits and comes with a few new tests.
>
> Performance testing using standardized benchmark show no signification
> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris
> SPARC (not statistically ensured). A minor regression for the load vs
> load load on x64 is expected and a slight increase on SPARC due to the
> cost of ‘materializing’ the page vs load load.
> The time to trigger a safepoint was measured on a large machine to not
> be an issue. The looping over threads and arming the polling page will
> benefit from the work on JavaThread life-cycle (8167108 - SMR and
> JavaThread Lifecycle:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
> which puts all JavaThreads in an array instead of a linked list.
>
> Thanks, Robbin

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Erik Osterlund
In reply to this post by Robbin Ehn
Hi Robbin,

Looks fantastic.

Thanks,
/Erik

On 2017-10-11 15:37, Robbin Ehn wrote:

> Hi all,
>
> Starting the review of the code while JEP work is still not completed.
>
> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>
> This JEP introduces a way to execute a callback on threads without
> performing a global VM safepoint. It makes it both possible and cheap
> to stop individual threads and not just all threads or none.
>
> Entire changeset:
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>
> Divided into 3-parts,
> SafepointMechanism abstraction:
> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
> Consolidating polling page allocation:
> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
> Handshakes:
> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>
> A handshake operation is a callback that is executed for each
> JavaThread while that thread is in a safepoint safe state. The
> callback is executed either by the thread itself or by the VM thread
> while keeping the thread in a blocked state. The big difference
> between safepointing and handshaking is that the per thread operation
> will be performed on all threads as soon as possible and they will
> continue to execute as soon as it’s own operation is completed. If a
> JavaThread is known to be running, then a handshake can be performed
> with that single JavaThread as well.
>
> The current safepointing scheme is modified to perform an indirection
> through a per-thread pointer which will allow a single thread's
> execution to be forced to trap on the guard page. In order to force a
> thread to yield the VM updates the per-thread pointer for the
> corresponding thread to point to the guarded page.
>
> Example of potential use-cases:
> -Biased lock revocation
> -External requests for stack traces
> -Deoptimization
> -Async exception delivery
> -External suspension
> -Eliding memory barriers
>
> All of these will benefit the VM moving towards becoming more
> low-latency friendly by reducing the number of global safepoints.
> Platforms that do not yet implement the per JavaThread poll, a
> fallback to normal safepoint is in place. HandshakeOneThread will then
> be a normal safepoint. The supported platforms are Linux x64 and
> Solaris SPARC.
>
> Tested heavily with various test suits and comes with a few new tests.
>
> Performance testing using standardized benchmark show no signification
> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris
> SPARC (not statistically ensured). A minor regression for the load vs
> load load on x64 is expected and a slight increase on SPARC due to the
> cost of ‘materializing’ the page vs load load.
> The time to trigger a safepoint was measured on a large machine to not
> be an issue. The looping over threads and arming the polling page will
> benefit from the work on JavaThread life-cycle (8167108 - SMR and
> JavaThread Lifecycle:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
> which puts all JavaThreads in an array instead of a linked list.
>
> Thanks, Robbin

Reply | Threaded
Open this post in threaded view
|

RE: RFR(XL): 8185640: Thread-local handshakes

Doerr, Martin
In reply to this post by Robbin Ehn
Hi Robbin,

my first impression is very good. Thanks for providing the webrev.

I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism.
Would it be ok to move the decision between what to use to platform code?
(Some platforms could still use both if this is beneficial.)

E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion.

Best regards,
Martin


-----Original Message-----
From: hotspot-dev [mailto:[hidden email]] On Behalf Of Robbin Ehn
Sent: Mittwoch, 11. Oktober 2017 15:38
To: hotspot-dev developers <[hidden email]>
Subject: RFR(XL): 8185640: Thread-local handshakes

Hi all,

Starting the review of the code while JEP work is still not completed.

JEP: https://bugs.openjdk.java.net/browse/JDK-8185640

This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not
just all threads or none.

Entire changeset:
http://cr.openjdk.java.net/~rehn/8185640/v0/flat/

Divided into 3-parts,
SafepointMechanism abstraction:
http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
Consolidating polling page allocation:
http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
Handshakes:
http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/

A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread
itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be
performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a
handshake can be performed with that single JavaThread as well.

The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the
guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.

Example of potential use-cases:
-Biased lock revocation
-External requests for stack traces
-Deoptimization
-Async exception delivery
-External suspension
-Eliding memory barriers

All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported
platforms are Linux x64 and Solaris SPARC.

Tested heavily with various test suits and comes with a few new tests.

Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically
ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on
JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all
JavaThreads in an array instead of a linked list.

Thanks, Robbin
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
In reply to this post by Erik Osterlund
Thanks Erik,

On 2017-10-17 17:30, Erik Österlund wrote:
> Hi Robbin,
>
> Looks fantastic.

We have to credit Mikael Gerdin for much of the work.
Since you have been involved also, I count you as one of the contributors, and
view your review as a bit biased but really appreciated of course :)

/Robbin

>
> Thanks,
> /Erik
>
> On 2017-10-11 15:37, Robbin Ehn wrote:
>> Hi all,
>>
>> Starting the review of the code while JEP work is still not completed.
>>
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>
>> This JEP introduces a way to execute a callback on threads without performing
>> a global VM safepoint. It makes it both possible and cheap to stop individual
>> threads and not just all threads or none.
>>
>> Entire changeset:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>
>> Divided into 3-parts,
>> SafepointMechanism abstraction:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>> Consolidating polling page allocation:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>> Handshakes:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>
>> A handshake operation is a callback that is executed for each JavaThread while
>> that thread is in a safepoint safe state. The callback is executed either by
>> the thread itself or by the VM thread while keeping the thread in a blocked
>> state. The big difference between safepointing and handshaking is that the per
>> thread operation will be performed on all threads as soon as possible and they
>> will continue to execute as soon as it’s own operation is completed. If a
>> JavaThread is known to be running, then a handshake can be performed with that
>> single JavaThread as well.
>>
>> The current safepointing scheme is modified to perform an indirection through
>> a per-thread pointer which will allow a single thread's execution to be forced
>> to trap on the guard page. In order to force a thread to yield the VM updates
>> the per-thread pointer for the corresponding thread to point to the guarded page.
>>
>> Example of potential use-cases:
>> -Biased lock revocation
>> -External requests for stack traces
>> -Deoptimization
>> -Async exception delivery
>> -External suspension
>> -Eliding memory barriers
>>
>> All of these will benefit the VM moving towards becoming more low-latency
>> friendly by reducing the number of global safepoints.
>> Platforms that do not yet implement the per JavaThread poll, a fallback to
>> normal safepoint is in place. HandshakeOneThread will then be a normal
>> safepoint. The supported platforms are Linux x64 and Solaris SPARC.
>>
>> Tested heavily with various test suits and comes with a few new tests.
>>
>> Performance testing using standardized benchmark show no signification
>> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not
>> statistically ensured). A minor regression for the load vs load load on x64 is
>> expected and a slight increase on SPARC due to the cost of ‘materializing’ the
>> page vs load load.
>> The time to trigger a safepoint was measured on a large machine to not be an
>> issue. The looping over threads and arming the polling page will benefit from
>> the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle:
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
>> which puts all JavaThreads in an array instead of a linked list.
>>
>> Thanks, Robbin
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
In reply to this post by Doerr, Martin
Thanks for looking at this.

On 2017-10-17 19:58, Doerr, Martin wrote:
> Hi Robbin,
>
> my first impression is very good. Thanks for providing the webrev.

Great!

>
> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism.
> Would it be ok to move the decision between what to use to platform code?
> (Some platforms could still use both if this is beneficial.)
>
> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion.

I see no issue with this.
Maybe SafepointMechanism::local_poll_armed should be possibly platform specific.
Can we do this incremental when adding the platform support for PPC64?

Thanks, Robbin

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-dev [mailto:[hidden email]] On Behalf Of Robbin Ehn
> Sent: Mittwoch, 11. Oktober 2017 15:38
> To: hotspot-dev developers <[hidden email]>
> Subject: RFR(XL): 8185640: Thread-local handshakes
>
> Hi all,
>
> Starting the review of the code while JEP work is still not completed.
>
> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>
> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not
> just all threads or none.
>
> Entire changeset:
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>
> Divided into 3-parts,
> SafepointMechanism abstraction:
> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
> Consolidating polling page allocation:
> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
> Handshakes:
> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>
> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread
> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be
> performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a
> handshake can be performed with that single JavaThread as well.
>
> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the
> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>
> Example of potential use-cases:
> -Biased lock revocation
> -External requests for stack traces
> -Deoptimization
> -Async exception delivery
> -External suspension
> -Eliding memory barriers
>
> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported
> platforms are Linux x64 and Solaris SPARC.
>
> Tested heavily with various test suits and comes with a few new tests.
>
> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically
> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on
> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all
> JavaThreads in an array instead of a linked list.
>
> Thanks, Robbin
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
In reply to this post by Nils Eliasson
Thanks Nils for looking at that!

/Robbin

On 2017-10-17 16:37, Nils Eliasson wrote:

> Hi Robbin,
>
> I have reviewed the compiler parts of the patch - c1, c2, jvmci and cpu*.
>
> Look great!
>
> Regards,
>
> Nils
>
>
> On 2017-10-11 15:37, Robbin Ehn wrote:
>> Hi all,
>>
>> Starting the review of the code while JEP work is still not completed.
>>
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>
>> This JEP introduces a way to execute a callback on threads without performing
>> a global VM safepoint. It makes it both possible and cheap to stop individual
>> threads and not just all threads or none.
>>
>> Entire changeset:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>
>> Divided into 3-parts,
>> SafepointMechanism abstraction:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>> Consolidating polling page allocation:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>> Handshakes:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>
>> A handshake operation is a callback that is executed for each JavaThread while
>> that thread is in a safepoint safe state. The callback is executed either by
>> the thread itself or by the VM thread while keeping the thread in a blocked
>> state. The big difference between safepointing and handshaking is that the per
>> thread operation will be performed on all threads as soon as possible and they
>> will continue to execute as soon as it’s own operation is completed. If a
>> JavaThread is known to be running, then a handshake can be performed with that
>> single JavaThread as well.
>>
>> The current safepointing scheme is modified to perform an indirection through
>> a per-thread pointer which will allow a single thread's execution to be forced
>> to trap on the guard page. In order to force a thread to yield the VM updates
>> the per-thread pointer for the corresponding thread to point to the guarded page.
>>
>> Example of potential use-cases:
>> -Biased lock revocation
>> -External requests for stack traces
>> -Deoptimization
>> -Async exception delivery
>> -External suspension
>> -Eliding memory barriers
>>
>> All of these will benefit the VM moving towards becoming more low-latency
>> friendly by reducing the number of global safepoints.
>> Platforms that do not yet implement the per JavaThread poll, a fallback to
>> normal safepoint is in place. HandshakeOneThread will then be a normal
>> safepoint. The supported platforms are Linux x64 and Solaris SPARC.
>>
>> Tested heavily with various test suits and comes with a few new tests.
>>
>> Performance testing using standardized benchmark show no signification
>> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not
>> statistically ensured). A minor regression for the load vs load load on x64 is
>> expected and a slight increase on SPARC due to the cost of ‘materializing’ the
>> page vs load load.
>> The time to trigger a safepoint was measured on a large machine to not be an
>> issue. The looping over threads and arming the polling page will benefit from
>> the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle:
>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
>> which puts all JavaThreads in an array instead of a linked list.
>>
>> Thanks, Robbin
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
In reply to this post by Robbin Ehn
Hi all,

Update after re-base with new atomic implementation:
http://cr.openjdk.java.net/~rehn/8185640/v1/Atomic-Update-Rebase-3/
This goes on top of the Handshakes-2.

Let me know if you want some other kinds of webrevs.

I would like to point out that Mikael Gerdin and Erik Österlund also are
contributors of this changeset.

Thanks, Robbin

On 2017-10-11 15:37, Robbin Ehn wrote:

> Hi all,
>
> Starting the review of the code while JEP work is still not completed.
>
> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>
> This JEP introduces a way to execute a callback on threads without performing a
> global VM safepoint. It makes it both possible and cheap to stop individual
> threads and not just all threads or none.
>
> Entire changeset:
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>
> Divided into 3-parts,
> SafepointMechanism abstraction:
> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
> Consolidating polling page allocation:
> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
> Handshakes:
> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>
> A handshake operation is a callback that is executed for each JavaThread while
> that thread is in a safepoint safe state. The callback is executed either by the
> thread itself or by the VM thread while keeping the thread in a blocked state.
> The big difference between safepointing and handshaking is that the per thread
> operation will be performed on all threads as soon as possible and they will
> continue to execute as soon as it’s own operation is completed. If a JavaThread
> is known to be running, then a handshake can be performed with that single
> JavaThread as well.
>
> The current safepointing scheme is modified to perform an indirection through a
> per-thread pointer which will allow a single thread's execution to be forced to
> trap on the guard page. In order to force a thread to yield the VM updates the
> per-thread pointer for the corresponding thread to point to the guarded page.
>
> Example of potential use-cases:
> -Biased lock revocation
> -External requests for stack traces
> -Deoptimization
> -Async exception delivery
> -External suspension
> -Eliding memory barriers
>
> All of these will benefit the VM moving towards becoming more low-latency
> friendly by reducing the number of global safepoints.
> Platforms that do not yet implement the per JavaThread poll, a fallback to
> normal safepoint is in place. HandshakeOneThread will then be a normal
> safepoint. The supported platforms are Linux x64 and Solaris SPARC.
>
> Tested heavily with various test suits and comes with a few new tests.
>
> Performance testing using standardized benchmark show no signification changes,
> the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not
> statistically ensured). A minor regression for the load vs load load on x64 is
> expected and a slight increase on SPARC due to the cost of ‘materializing’ the
> page vs load load.
> The time to trigger a safepoint was measured on a large machine to not be an
> issue. The looping over threads and arming the polling page will benefit from
> the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
> which puts all JavaThreads in an array instead of a linked list.
>
> Thanks, Robbin
Reply | Threaded
Open this post in threaded view
|

RE: RFR(XL): 8185640: Thread-local handshakes

Doerr, Martin
In reply to this post by Robbin Ehn
Hi Robbin,

so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again?
I'd be fine with that, too.

While thinking a little longer about the interpreter implementation, a new idea came into my mind.
I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like
if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll();
in TemplateInterpreterGenerator::generate_and_dispatch.

Best regards,
Martin


-----Original Message-----
From: Robbin Ehn [mailto:[hidden email]]
Sent: Mittwoch, 18. Oktober 2017 11:07
To: Doerr, Martin <[hidden email]>; hotspot-dev developers <[hidden email]>
Subject: Re: RFR(XL): 8185640: Thread-local handshakes

Thanks for looking at this.

On 2017-10-17 19:58, Doerr, Martin wrote:
> Hi Robbin,
>
> my first impression is very good. Thanks for providing the webrev.

Great!

>
> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism.
> Would it be ok to move the decision between what to use to platform code?
> (Some platforms could still use both if this is beneficial.)
>
> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion.

I see no issue with this.
Maybe SafepointMechanism::local_poll_armed should be possibly platform specific.
Can we do this incremental when adding the platform support for PPC64?

Thanks, Robbin

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: hotspot-dev [mailto:[hidden email]] On Behalf Of Robbin Ehn
> Sent: Mittwoch, 11. Oktober 2017 15:38
> To: hotspot-dev developers <[hidden email]>
> Subject: RFR(XL): 8185640: Thread-local handshakes
>
> Hi all,
>
> Starting the review of the code while JEP work is still not completed.
>
> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>
> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not
> just all threads or none.
>
> Entire changeset:
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>
> Divided into 3-parts,
> SafepointMechanism abstraction:
> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
> Consolidating polling page allocation:
> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
> Handshakes:
> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>
> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread
> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be
> performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a
> handshake can be performed with that single JavaThread as well.
>
> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the
> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>
> Example of potential use-cases:
> -Biased lock revocation
> -External requests for stack traces
> -Deoptimization
> -Async exception delivery
> -External suspension
> -Eliding memory barriers
>
> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported
> platforms are Linux x64 and Solaris SPARC.
>
> Tested heavily with various test suits and comes with a few new tests.
>
> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically
> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on
> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all
> JavaThreads in an array instead of a linked list.
>
> Thanks, Robbin
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
Hi Martin,

On 2017-10-18 12:11, Doerr, Martin wrote:
> Hi Robbin,
>
> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again?
> I'd be fine with that, too.

Yes, great!

>
> While thinking a little longer about the interpreter implementation, a new idea came into my mind.
> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like
> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll();
> in TemplateInterpreterGenerator::generate_and_dispatch.

We have not seen any performance regression in simple benchmark with this.
I will do a better benchmark and compare what difference it makes.

Thanks, Robbin

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Robbin Ehn [mailto:[hidden email]]
> Sent: Mittwoch, 18. Oktober 2017 11:07
> To: Doerr, Martin <[hidden email]>; hotspot-dev developers <[hidden email]>
> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
>
> Thanks for looking at this.
>
> On 2017-10-17 19:58, Doerr, Martin wrote:
>> Hi Robbin,
>>
>> my first impression is very good. Thanks for providing the webrev.
>
> Great!
>
>>
>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism.
>> Would it be ok to move the decision between what to use to platform code?
>> (Some platforms could still use both if this is beneficial.)
>>
>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion.
>
> I see no issue with this.
> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific.
> Can we do this incremental when adding the platform support for PPC64?
>
> Thanks, Robbin
>
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: hotspot-dev [mailto:[hidden email]] On Behalf Of Robbin Ehn
>> Sent: Mittwoch, 11. Oktober 2017 15:38
>> To: hotspot-dev developers <[hidden email]>
>> Subject: RFR(XL): 8185640: Thread-local handshakes
>>
>> Hi all,
>>
>> Starting the review of the code while JEP work is still not completed.
>>
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>
>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not
>> just all threads or none.
>>
>> Entire changeset:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>
>> Divided into 3-parts,
>> SafepointMechanism abstraction:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>> Consolidating polling page allocation:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>> Handshakes:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>
>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread
>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be
>> performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a
>> handshake can be performed with that single JavaThread as well.
>>
>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the
>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>>
>> Example of potential use-cases:
>> -Biased lock revocation
>> -External requests for stack traces
>> -Deoptimization
>> -Async exception delivery
>> -External suspension
>> -Eliding memory barriers
>>
>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported
>> platforms are Linux x64 and Solaris SPARC.
>>
>> Tested heavily with various test suits and comes with a few new tests.
>>
>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically
>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on
>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all
>> JavaThreads in an array instead of a linked list.
>>
>> Thanks, Robbin
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

coleen.phillimore


On 10/18/17 9:57 AM, Robbin Ehn wrote:

>>
>> While thinking a little longer about the interpreter implementation,
>> a new idea came into my mind.
>> I think we could significantly reduce impact on interpreter code size
>> and performance by using safepoint polls only in a subset of
>> bytecodes. E.g., we could use only bytecodes which perform any kind
>> of jump by implementing something like
>> if (SafepointMechanism::uses_thread_local_poll() &&
>> t->does_dispatch()) generate_safepoint_poll();
>> in TemplateInterpreterGenerator::generate_and_dispatch.
>
> We have not seen any performance regression in simple benchmark with
> this.
> I will do a better benchmark and compare what difference it makes.

I think this is a good suggestion for a further RFE.  At one point, I'd
only enabled safepoints for backward branches and returns in the
safepoint table but it had no effect on performance, but since this
generates code in dispatch_epilogue, it might help with code bloat.

Thanks,
Coleen
Reply | Threaded
Open this post in threaded view
|

RE: RFR(XL): 8185640: Thread-local handshakes

Doerr, Martin
In reply to this post by Robbin Ehn
Hi Robbin,

thanks for the quick reply and for doing additional benchmarks.
Please note that t->does_dispatch() was just a first idea, but doesn't really fit for the purpose because it's false for conditional branch bytecodes for example. I just didn't find an appropriate quick check in the existing code.
I guess you will notice a performance impact when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.)

Best regards,
Martin


-----Original Message-----
From: Robbin Ehn [mailto:[hidden email]]
Sent: Mittwoch, 18. Oktober 2017 15:58
To: Doerr, Martin <[hidden email]>; hotspot-dev developers <[hidden email]>
Subject: Re: RFR(XL): 8185640: Thread-local handshakes

Hi Martin,

On 2017-10-18 12:11, Doerr, Martin wrote:
> Hi Robbin,
>
> so you would like to push your version first (as it does not break other platforms) and then help us to push non-Oracle platform implementations which change shared code again?
> I'd be fine with that, too.

Yes, great!

>
> While thinking a little longer about the interpreter implementation, a new idea came into my mind.
> I think we could significantly reduce impact on interpreter code size and performance by using safepoint polls only in a subset of bytecodes. E.g., we could use only bytecodes which perform any kind of jump by implementing something like
> if (SafepointMechanism::uses_thread_local_poll() && t->does_dispatch()) generate_safepoint_poll();
> in TemplateInterpreterGenerator::generate_and_dispatch.

We have not seen any performance regression in simple benchmark with this.
I will do a better benchmark and compare what difference it makes.

Thanks, Robbin

>
> Best regards,
> Martin
>
>
> -----Original Message-----
> From: Robbin Ehn [mailto:[hidden email]]
> Sent: Mittwoch, 18. Oktober 2017 11:07
> To: Doerr, Martin <[hidden email]>; hotspot-dev developers <[hidden email]>
> Subject: Re: RFR(XL): 8185640: Thread-local handshakes
>
> Thanks for looking at this.
>
> On 2017-10-17 19:58, Doerr, Martin wrote:
>> Hi Robbin,
>>
>> my first impression is very good. Thanks for providing the webrev.
>
> Great!
>
>>
>> I only don't like that "poll_page_val | poll_bit()" is used in shared code. I'd prefer to use either one or the other mechanism.
>> Would it be ok to move the decision between what to use to platform code?
>> (Some platforms could still use both if this is beneficial.)
>>
>> E.g. on PPC64, we'd like to use conditional trap instructions with special bit patterns if UseSIGTRAP is on. Would be excellent if we could implement set functions for _poll_armed_value and _poll_disarmed_value in platform code. poll_bit() also fits better into platform code in my opinion.
>
> I see no issue with this.
> Maybe SafepointMechanism::local_poll_armed should be possibly platform specific.
> Can we do this incremental when adding the platform support for PPC64?
>
> Thanks, Robbin
>
>>
>> Best regards,
>> Martin
>>
>>
>> -----Original Message-----
>> From: hotspot-dev [mailto:[hidden email]] On Behalf Of Robbin Ehn
>> Sent: Mittwoch, 11. Oktober 2017 15:38
>> To: hotspot-dev developers <[hidden email]>
>> Subject: RFR(XL): 8185640: Thread-local handshakes
>>
>> Hi all,
>>
>> Starting the review of the code while JEP work is still not completed.
>>
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>
>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not
>> just all threads or none.
>>
>> Entire changeset:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>
>> Divided into 3-parts,
>> SafepointMechanism abstraction:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>> Consolidating polling page allocation:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>> Handshakes:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>
>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread
>> itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be
>> performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a
>> handshake can be performed with that single JavaThread as well.
>>
>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the
>> guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>>
>> Example of potential use-cases:
>> -Biased lock revocation
>> -External requests for stack traces
>> -Deoptimization
>> -Async exception delivery
>> -External suspension
>> -Eliding memory barriers
>>
>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported
>> platforms are Linux x64 and Solaris SPARC.
>>
>> Tested heavily with various test suits and comes with a few new tests.
>>
>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically
>> ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on
>> JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all
>> JavaThreads in an array instead of a linked list.
>>
>> Thanks, Robbin
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Claes Redestad
Hi!

On 2017-10-18 16:05, Doerr, Martin wrote:
>   [...] when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.)

we do a lot of benchmarking to measure startup, warmup and footprint on
a variety of applications,
and have been improving tooling to flag even very small regressions
(statistically significant results on
<0.5M instruction increases).

-Xint is typically not explicitly used for any benchmarking other than
as a diagnostic tool, and even
if we did I'd imagine we'd not file bugs if they didn't also correlate
with a regression in a mixed mode
config.

/Claes
Reply | Threaded
Open this post in threaded view
|

RE: RFR(XL): 8185640: Thread-local handshakes

Doerr, Martin
Hi Claes,

thanks for the explanation. We use -Xint benchmarking only when we make significant interpreter changes as quick regression check (not so relevant for real life, but delivers stable and quick results).

Best regards,
Martin


-----Original Message-----
From: hotspot-dev [mailto:[hidden email]] On Behalf Of Claes Redestad
Sent: Mittwoch, 18. Oktober 2017 16:29
To: [hidden email]
Subject: Re: RFR(XL): 8185640: Thread-local handshakes

Hi!

On 2017-10-18 16:05, Doerr, Martin wrote:
>   [...] when benchmarking with -Xint. (I don't know if Oracle usually runs startup performance benchmarks.)

we do a lot of benchmarking to measure startup, warmup and footprint on
a variety of applications,
and have been improving tooling to flag even very small regressions
(statistically significant results on
<0.5M instruction increases).

-Xint is typically not explicitly used for any benchmarking other than
as a diagnostic tool, and even
if we did I'd imagine we'd not file bugs if they didn't also correlate
with a regression in a mixed mode
config.

/Claes
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

coleen.phillimore
In reply to this post by Robbin Ehn

This looks really nice.  A few minor comments.

http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.hpp.html

   51 // or the JavaThread it self.

typo, "itself"

Thank you for adding these comments.  I think they're just right in
length and detail in the header.

http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.cpp.html

The protocol in HandshakeState::process_self_inner and cancel_inner is:

     clear_handshake(thread);
     if (op != NULL) {
       op->do_handshake(thread);
     }

But in HandshakeState::process_by_vmthread(), the order is reversed. 
Can you explain why in the comments.

     _operation->do_handshake(target);
     clear_handshake(target);

It looks like the thread can't continue while the handshake operation is
in progress, so does the order matter?

http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackNativeTest.java.html

This has the wrong @test name.  These could use an @comment line about
what you expect also.  I don't know what's "Native" about it though,
isn't it testing what happens when you use -XX:+ThreadLocalHandshakes?

http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackFallbackTest.java.html

This one too an @comment that it's testing the fallback VM operation
would be good.

I don't need to see another webrev for the comment changes.

Lastly, as I said before, I think putting the safepoint polls in the
interpreter at return and backward branches would be a good follow on
changeset.

Thanks,
Coleen


On 10/11/17 9:37 AM, Robbin Ehn wrote:

> Hi all,
>
> Starting the review of the code while JEP work is still not completed.
>
> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>
> This JEP introduces a way to execute a callback on threads without
> performing a global VM safepoint. It makes it both possible and cheap
> to stop individual threads and not just all threads or none.
>
> Entire changeset:
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>
> Divided into 3-parts,
> SafepointMechanism abstraction:
> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
> Consolidating polling page allocation:
> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
> Handshakes:
> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>
> A handshake operation is a callback that is executed for each
> JavaThread while that thread is in a safepoint safe state. The
> callback is executed either by the thread itself or by the VM thread
> while keeping the thread in a blocked state. The big difference
> between safepointing and handshaking is that the per thread operation
> will be performed on all threads as soon as possible and they will
> continue to execute as soon as it’s own operation is completed. If a
> JavaThread is known to be running, then a handshake can be performed
> with that single JavaThread as well.
>
> The current safepointing scheme is modified to perform an indirection
> through a per-thread pointer which will allow a single thread's
> execution to be forced to trap on the guard page. In order to force a
> thread to yield the VM updates the per-thread pointer for the
> corresponding thread to point to the guarded page.
>
> Example of potential use-cases:
> -Biased lock revocation
> -External requests for stack traces
> -Deoptimization
> -Async exception delivery
> -External suspension
> -Eliding memory barriers
>
> All of these will benefit the VM moving towards becoming more
> low-latency friendly by reducing the number of global safepoints.
> Platforms that do not yet implement the per JavaThread poll, a
> fallback to normal safepoint is in place. HandshakeOneThread will then
> be a normal safepoint. The supported platforms are Linux x64 and
> Solaris SPARC.
>
> Tested heavily with various test suits and comes with a few new tests.
>
> Performance testing using standardized benchmark show no signification
> changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris
> SPARC (not statistically ensured). A minor regression for the load vs
> load load on x64 is expected and a slight increase on SPARC due to the
> cost of ‘materializing’ the page vs load load.
> The time to trigger a safepoint was measured on a large machine to not
> be an issue. The looping over threads and arming the polling page will
> benefit from the work on JavaThread life-cycle (8167108 - SMR and
> JavaThread Lifecycle:
> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
> which puts all JavaThreads in an array instead of a linked list.
>
> Thanks, Robbin

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
Thanks for looking at this Coleen,

On 2017-10-18 22:44, [hidden email] wrote:
>
> This looks really nice.  A few minor comments.
>
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.hpp.html
>
>    51 // or the JavaThread it self.
>
> typo, "itself"

Fixed

>
> Thank you for adding these comments.  I think they're just right in length and detail in the header.
>
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/src/hotspot/share/runtime/handshake.cpp.html
>
> The protocol in HandshakeState::process_self_inner and cancel_inner is:
>
>      clear_handshake(thread);
>      if (op != NULL) {
>        op->do_handshake(thread);
>      }
>
> But in HandshakeState::process_by_vmthread(), the order is reversed. Can you explain why in the comments.
>
>      _operation->do_handshake(target);
>      clear_handshake(target);
>
> It looks like the thread can't continue while the handshake operation is in progress, so does the order matter?

The key part here is that must be cleared before signaling the semaphore.
The early clearing is because if the thread is doing it's own operation, the VM thread can quickly skip this thread by looking if it still have an operation.

>
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackNativeTest.java.html
>
> This has the wrong @test name.  These could use an @comment line about what you expect also.  I don't know what's "Native" about it though, isn't it testing what happens when you use -XX:+ThreadLocalHandshakes?
>
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeWalkStackFallbackTest.java.html
>
> This one too an @comment that it's testing the fallback VM operation would be good.
>
> I don't need to see another webrev for the comment changes.

Here it is, there was inconsistencies in the tests, I think it is better now.

http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/

>
> Lastly, as I said before, I think putting the safepoint polls in the interpreter at return and backward branches would be a good follow on changeset.

I will let Claes R decided if that is an acceptable approach.

Thanks, Robbin

>
> Thanks,
> Coleen
>
>
> On 10/11/17 9:37 AM, Robbin Ehn wrote:
>> Hi all,
>>
>> Starting the review of the code while JEP work is still not completed.
>>
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>
>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none.
>>
>> Entire changeset:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>
>> Divided into 3-parts,
>> SafepointMechanism abstraction:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>> Consolidating polling page allocation:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>> Handshakes:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>
>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well.
>>
>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>>
>> Example of potential use-cases:
>> -Biased lock revocation
>> -External requests for stack traces
>> -Deoptimization
>> -Async exception delivery
>> -External suspension
>> -Eliding memory barriers
>>
>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC.
>>
>> Tested heavily with various test suits and comes with a few new tests.
>>
>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list.
>>
>> Thanks, Robbin
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
In reply to this post by Robbin Ehn
Here is the third incremental change:
http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/
Goes on top of Atomic-Update-Rebase-3.

Let me know if anyone want to see some other kind of webrevs.

Thanks, Robbin

On 2017-10-18 11:15, Robbin Ehn wrote:

> Hi all,
>
> Update after re-base with new atomic implementation:
> http://cr.openjdk.java.net/~rehn/8185640/v1/Atomic-Update-Rebase-3/
> This goes on top of the Handshakes-2.
>
> Let me know if you want some other kinds of webrevs.
>
> I would like to point out that Mikael Gerdin and Erik Österlund also are contributors of this changeset.
>
> Thanks, Robbin
>
> On 2017-10-11 15:37, Robbin Ehn wrote:
>> Hi all,
>>
>> Starting the review of the code while JEP work is still not completed.
>>
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>
>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none.
>>
>> Entire changeset:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>
>> Divided into 3-parts,
>> SafepointMechanism abstraction:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>> Consolidating polling page allocation:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>> Handshakes:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>
>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well.
>>
>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>>
>> Example of potential use-cases:
>> -Biased lock revocation
>> -External requests for stack traces
>> -Deoptimization
>> -Async exception delivery
>> -External suspension
>> -Eliding memory barriers
>>
>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC.
>>
>> Tested heavily with various test suits and comes with a few new tests.
>>
>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list.
>>
>> Thanks, Robbin
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

coleen.phillimore
http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/test/hotspot/jtreg/runtime/handshake/HandshakeTransitionTest.java.udiff.html

Thank you this is better.

In this test, what happens if it fails?

Everything looks better with this change.

Thanks,
Coleen


On 10/19/17 8:40 AM, Robbin Ehn wrote:

> Here is the third incremental change:
> http://cr.openjdk.java.net/~rehn/8185640/v2/Coleen-n-Test-Cleanup-4/webrev/ 
>
> Goes on top of Atomic-Update-Rebase-3.
>
> Let me know if anyone want to see some other kind of webrevs.
>
> Thanks, Robbin
>
> On 2017-10-18 11:15, Robbin Ehn wrote:
>> Hi all,
>>
>> Update after re-base with new atomic implementation:
>> http://cr.openjdk.java.net/~rehn/8185640/v1/Atomic-Update-Rebase-3/
>> This goes on top of the Handshakes-2.
>>
>> Let me know if you want some other kinds of webrevs.
>>
>> I would like to point out that Mikael Gerdin and Erik Österlund also
>> are contributors of this changeset.
>>
>> Thanks, Robbin
>>
>> On 2017-10-11 15:37, Robbin Ehn wrote:
>>> Hi all,
>>>
>>> Starting the review of the code while JEP work is still not completed.
>>>
>>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>>
>>> This JEP introduces a way to execute a callback on threads without
>>> performing a global VM safepoint. It makes it both possible and
>>> cheap to stop individual threads and not just all threads or none.
>>>
>>> Entire changeset:
>>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>>
>>> Divided into 3-parts,
>>> SafepointMechanism abstraction:
>>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>>> Consolidating polling page allocation:
>>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>>> Handshakes:
>>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>>
>>> A handshake operation is a callback that is executed for each
>>> JavaThread while that thread is in a safepoint safe state. The
>>> callback is executed either by the thread itself or by the VM thread
>>> while keeping the thread in a blocked state. The big difference
>>> between safepointing and handshaking is that the per thread
>>> operation will be performed on all threads as soon as possible and
>>> they will continue to execute as soon as it’s own operation is
>>> completed. If a JavaThread is known to be running, then a handshake
>>> can be performed with that single JavaThread as well.
>>>
>>> The current safepointing scheme is modified to perform an
>>> indirection through a per-thread pointer which will allow a single
>>> thread's execution to be forced to trap on the guard page. In order
>>> to force a thread to yield the VM updates the per-thread pointer for
>>> the corresponding thread to point to the guarded page.
>>>
>>> Example of potential use-cases:
>>> -Biased lock revocation
>>> -External requests for stack traces
>>> -Deoptimization
>>> -Async exception delivery
>>> -External suspension
>>> -Eliding memory barriers
>>>
>>> All of these will benefit the VM moving towards becoming more
>>> low-latency friendly by reducing the number of global safepoints.
>>> Platforms that do not yet implement the per JavaThread poll, a
>>> fallback to normal safepoint is in place. HandshakeOneThread will
>>> then be a normal safepoint. The supported platforms are Linux x64
>>> and Solaris SPARC.
>>>
>>> Tested heavily with various test suits and comes with a few new tests.
>>>
>>> Performance testing using standardized benchmark show no
>>> signification changes, the latest number was -0.7% on Linux x64 and
>>> +1.5% Solaris SPARC (not statistically ensured). A minor regression
>>> for the load vs load load on x64 is expected and a slight increase
>>> on SPARC due to the cost of ‘materializing’ the page vs load load.
>>> The time to trigger a safepoint was measured on a large machine to
>>> not be an issue. The looping over threads and arming the polling
>>> page will benefit from the work on JavaThread life-cycle (8167108 -
>>> SMR and JavaThread Lifecycle:
>>> http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html)
>>> which puts all JavaThreads in an array instead of a linked list.
>>>
>>> Thanks, Robbin

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Karen Kinnear
In reply to this post by Robbin Ehn
Robbin, Erik, Mikael -

Delighted to see this! Looks good. I don’t need to see any updates - these are minor comments.
Thank you for the performance testing

Couple of questions/comments:
1. platform support
supports_thread_local_poll returns true for AMD64 or SPARC
Your comment said Linux x64 and Sparc only.
What about Mac and Windows?

2. safepointMechanism_inline.hpp - comment clarification
line 42 - “Mutexes can be taken but none JavaThread”.
Are you saying: “Non-JavaThreads do not support handshakes, but must stop for
safepoints.”
Not sure what the Mutex comment is about

3. globals.hpp
The way I understand this - ThreadLocalHandshakes flag is not so much to enable
use of ThreadLocalHandle operations, but to enable use of TLH for global safe point.
If that is true, could you possibly at least clarify this in the comment if there is not
a better name for the flag?

4. thank you for looking into startup performance and interpreter return/backward branch checks.

5. handshake.cpp
Could you possibly add a comment that thread_has_completed and/or pool_for_completed_thread
means that the thread has either done the operation or the operation has been cancelled?
I get that we are polling this to tell when it is safe to return to the synchronous requestor not to
determine if the thread actually performed the operation. The comment would make that clearer.

thanks,
Karen

> On Oct 11, 2017, at 9:37 AM, Robbin Ehn <[hidden email]> wrote:
>
> Hi all,
>
> Starting the review of the code while JEP work is still not completed.
>
> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>
> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none.
>
> Entire changeset:
> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>
> Divided into 3-parts,
> SafepointMechanism abstraction:
> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
> Consolidating polling page allocation:
> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
> Handshakes:
> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>
> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well.
>
> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>
> Example of potential use-cases:
> -Biased lock revocation
> -External requests for stack traces
> -Deoptimization
> -Async exception delivery
> -External suspension
> -Eliding memory barriers
>
> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC.
>
> Tested heavily with various test suits and comes with a few new tests.
>
> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list.
>
> Thanks, Robbin

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8185640: Thread-local handshakes

Robbin Ehn
Hi,

On 2017-10-20 18:24, Karen Kinnear wrote:

> Robbin, Erik, Mikael -
>
> Delighted to see this! Looks good. I don’t need to see any updates - these are minor comments.
> Thank you for the performance testing
>
> Couple of questions/comments:
> 1. platform support
> supports_thread_local_poll returns true for AMD64 or SPARC
> Your comment said Linux x64 and Sparc only.
> What about Mac and Windows?

Sorry it should be x64 and SPARC, OS is not important. (so yes mac and windows)

>
> 2. safepointMechanism_inline.hpp - comment clarification
> line 42 - “Mutexes can be taken but none JavaThread”.
> Are you saying: “Non-JavaThreads do not support handshakes, but must stop for
> safepoints.”
> Not sure what the Mutex comment is about

Fixed:
"// If the poll is on a non-java thread, we can only check the global state."

This is possible from e.g. Monitor::TrySpin.

>
> 3. globals.hpp
> The way I understand this - ThreadLocalHandshakes flag is not so much to enable
> use of ThreadLocalHandle operations, but to enable use of TLH for global safe point.
> If that is true, could you possibly at least clarify this in the comment if there is not
> a better name for the flag?

Fixed
"Use thread-local polls instead of global poll for safepoints."

We can also do better name of option, e.g. -XX:+(Use)ThreadLocalPoll ?
Let me know.

>
> 4. thank you for looking into startup performance and interpreter return/backward branch checks.

We are committed to fix this before 18.3!

>
> 5. handshake.cpp
> Could you possibly add a comment that thread_has_completed and/or pool_for_completed_thread
> means that the thread has either done the operation or the operation has been cancelled?
> I get that we are polling this to tell when it is safe to return to the synchronous requestor not to
> determine if the thread actually performed the operation. The comment would make that clearer.

Fixed

Incremental:
http://cr.openjdk.java.net/~rehn/8185640/v3/Assorted-Karen-5/webrev/

Again let me know if anyone needs another kind!

Thanks Karen!

/Robbin

>
> thanks,
> Karen
>
>> On Oct 11, 2017, at 9:37 AM, Robbin Ehn <[hidden email]> wrote:
>>
>> Hi all,
>>
>> Starting the review of the code while JEP work is still not completed.
>>
>> JEP: https://bugs.openjdk.java.net/browse/JDK-8185640
>>
>> This JEP introduces a way to execute a callback on threads without performing a global VM safepoint. It makes it both possible and cheap to stop individual threads and not just all threads or none.
>>
>> Entire changeset:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/flat/
>>
>> Divided into 3-parts,
>> SafepointMechanism abstraction:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/SafepointMechanism-0/
>> Consolidating polling page allocation:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/PollingPage-1/
>> Handshakes:
>> http://cr.openjdk.java.net/~rehn/8185640/v0/Handshakes-2/
>>
>> A handshake operation is a callback that is executed for each JavaThread while that thread is in a safepoint safe state. The callback is executed either by the thread itself or by the VM thread while keeping the thread in a blocked state. The big difference between safepointing and handshaking is that the per thread operation will be performed on all threads as soon as possible and they will continue to execute as soon as it’s own operation is completed. If a JavaThread is known to be running, then a handshake can be performed with that single JavaThread as well.
>>
>> The current safepointing scheme is modified to perform an indirection through a per-thread pointer which will allow a single thread's execution to be forced to trap on the guard page. In order to force a thread to yield the VM updates the per-thread pointer for the corresponding thread to point to the guarded page.
>>
>> Example of potential use-cases:
>> -Biased lock revocation
>> -External requests for stack traces
>> -Deoptimization
>> -Async exception delivery
>> -External suspension
>> -Eliding memory barriers
>>
>> All of these will benefit the VM moving towards becoming more low-latency friendly by reducing the number of global safepoints.
>> Platforms that do not yet implement the per JavaThread poll, a fallback to normal safepoint is in place. HandshakeOneThread will then be a normal safepoint. The supported platforms are Linux x64 and Solaris SPARC.
>>
>> Tested heavily with various test suits and comes with a few new tests.
>>
>> Performance testing using standardized benchmark show no signification changes, the latest number was -0.7% on Linux x64 and +1.5% Solaris SPARC (not statistically ensured). A minor regression for the load vs load load on x64 is expected and a slight increase on SPARC due to the cost of ‘materializing’ the page vs load load.
>> The time to trigger a safepoint was measured on a large machine to not be an issue. The looping over threads and arming the polling page will benefit from the work on JavaThread life-cycle (8167108 - SMR and JavaThread Lifecycle: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2017-October/024773.html) which puts all JavaThreads in an array instead of a linked list.
>>
>> Thanks, Robbin
>
1234