RFR: 8261495: Shenandoah: reconsider update references memory ordering

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8261495: Shenandoah: reconsider update references memory ordering

Aleksey Shipilev-5
Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64.

This seems to be excessive for Shenandoah update references code, and "relaxed" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS).

Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64:

# Baseline
[135.065s][info][gc,stats] Concurrent Update Refs         =   73.685 s (a =   295924 us) (n =   249)
    (lvls, us =      354,     3418,   349609,   564453,   715405)

# Patched
[127.649s][info][gc,stats] Concurrent Update Refs         =   54.389 s (a =   169437 us) (n =   321)
    (lvls, us =      324,     2188,   183594,   322266,   394495)

Average time goes down, the number of GC cycles go up, since the cycles are shorter.

Additional testing:
 - [x] Linux x86_64 hotspot_gc_shenandoah
 - [x] Linux AArch64 hotspot_gc_shenandoah
 - [x] Linux AArch64 tier1 with Shenandoah

-------------

Commit messages:
 - 8261495: Shenandoah: reconsider update references memory ordering

Changes: https://git.openjdk.java.net/jdk/pull/2498/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261495
  Stats: 15 lines in 5 files changed: 0 ins; 0 del; 15 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2498.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering

Zhengyu Gu-3
On Wed, 10 Feb 2021 09:52:11 GMT, Aleksey Shipilev <[hidden email]> wrote:

> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64.
>
> This seems to be excessive for Shenandoah update references code, and "relaxed" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS).
>
> Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64:
>
> # Baseline
> [135.065s][info][gc,stats] Concurrent Update Refs         =   73.685 s (a =   295924 us) (n =   249)
>     (lvls, us =      354,     3418,   349609,   564453,   715405)
>
> # Patched
> [127.649s][info][gc,stats] Concurrent Update Refs         =   54.389 s (a =   169437 us) (n =   321)
>     (lvls, us =      324,     2188,   183594,   322266,   394495)
>
> Average time goes down, the number of GC cycles go up, since the cycles are shorter.
>
> Additional testing:
>  - [x] Linux x86_64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 tier1 with Shenandoah

src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 149:

> 147:   assert(is_aligned(addr, sizeof(narrowOop)), "Address should be aligned: " PTR_FORMAT, p2i(addr));
> 148:   narrowOop val = CompressedOops::encode(n);
> 149:   return CompressedOops::decode(Atomic::cmpxchg(addr, c, val, memory_order_relaxed));

Are you sure it is sufficient? I would think it needs acq/rel pair, otherwise, read side can see incomplete oop ...

-------------

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering

Aleksey Shipilev-5
On Wed, 10 Feb 2021 13:37:59 GMT, Zhengyu Gu <[hidden email]> wrote:

>> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64.
>>
>> This seems to be excessive for Shenandoah update references code, and "relaxed" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS).
>>
>> Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64:
>>
>> # Baseline
>> [135.065s][info][gc,stats] Concurrent Update Refs         =   73.685 s (a =   295924 us) (n =   249)
>>     (lvls, us =      354,     3418,   349609,   564453,   715405)
>>
>> # Patched
>> [127.649s][info][gc,stats] Concurrent Update Refs         =   54.389 s (a =   169437 us) (n =   321)
>>     (lvls, us =      324,     2188,   183594,   322266,   394495)
>>
>> Average time goes down, the number of GC cycles go up, since the cycles are shorter.
>>
>> Additional testing:
>>  - [x] Linux x86_64 hotspot_gc_shenandoah
>>  - [x] Linux AArch64 hotspot_gc_shenandoah
>>  - [x] Linux AArch64 tier1 with Shenandoah
>
> src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 149:
>
>> 147:   assert(is_aligned(addr, sizeof(narrowOop)), "Address should be aligned: " PTR_FORMAT, p2i(addr));
>> 148:   narrowOop val = CompressedOops::encode(n);
>> 149:   return CompressedOops::decode(Atomic::cmpxchg(addr, c, val, memory_order_relaxed));
>
> Are you sure it is sufficient? I would think it needs acq/rel pair, otherwise, read side can see incomplete oop ...

Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2]

Aleksey Shipilev-5
In reply to this post by Aleksey Shipilev-5
> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64.
>
> This seems to be excessive for Shenandoah update references code, and "acq_rel" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing.
>
> Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64:
>
> # Baseline
> [135.065s][info][gc,stats] Concurrent Update Refs         =   73.685 s (a =   295924 us) (n =   249)
>     (lvls, us =      354,     3418,   349609,   564453,   715405)
>
> # Patched
> [127.649s][info][gc,stats] Concurrent Update Refs         =   54.389 s (a =   169437 us) (n =   321)
>     (lvls, us =      324,     2188,   183594,   322266,   394495)
>
> Average time goes down, the number of GC cycles go up, since the cycles are shorter.
>
> Additional testing:
>  - [x] Linux x86_64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 tier1 with Shenandoah

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Do acq_rel instead

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2498/files
  - new: https://git.openjdk.java.net/jdk/pull/2498/files/d83b9af4..87a609f4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=00-01

  Stats: 41 lines in 1 file changed: 38 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2498.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2]

Aleksey Shipilev-5
In reply to this post by Aleksey Shipilev-5
On Wed, 10 Feb 2021 15:25:33 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahHeap.inline.hpp line 149:
>>
>>> 147:   assert(is_aligned(addr, sizeof(narrowOop)), "Address should be aligned: " PTR_FORMAT, p2i(addr));
>>> 148:   narrowOop val = CompressedOops::encode(n);
>>> 149:   return CompressedOops::decode(Atomic::cmpxchg(addr, c, val, memory_order_relaxed));
>>
>> Are you sure it is sufficient? I would think it needs acq/rel pair, otherwise, read side can see incomplete oop ...
>
> Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it.

I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2]

Zhengyu Gu-3
On Wed, 10 Feb 2021 18:59:42 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it.
>
> I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about?

Actually, what I meant is that CAS here should use memory_order_release and load barrier needs memory_order_acquire.
I am not sure memory_order_acq_rel is sufficient, C++ states _memory_order_acq_rel_:

A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and **the modification is visible in other threads that acquire the same atomic variable.**

-------------

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v2]

Zhengyu Gu-3
In reply to this post by Aleksey Shipilev-5
On Wed, 10 Feb 2021 18:59:42 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Actually, I think you are right: we must ensure the cumulativity of the barriers. Let me think a bit more about it.
>
> I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about?

Actually, what I meant is that CAS here should use memory_order_release and load barrier needs memory_order_acquire.
I am not sure memory_order_acq_rel is sufficient, C++ states _memory_order_acq_rel_:

A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and **the modification is visible in other threads that acquire the same atomic variable.**

-------------

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v3]

Aleksey Shipilev-5
In reply to this post by Aleksey Shipilev-5
> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64.
>
> This seems to be excessive for Shenandoah update references code, and "acq_rel" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing.
>
> Sample run with aggressive (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64:
>
> # Baseline
> [135.065s][info][gc,stats] Concurrent Update Refs         =   73.685 s (a =   295924 us) (n =   249)
>     (lvls, us =      354,     3418,   349609,   564453,   715405)
>
> # Patched
> [127.649s][info][gc,stats] Concurrent Update Refs         =   54.389 s (a =   169437 us) (n =   321)
>     (lvls, us =      324,     2188,   183594,   322266,   394495)
>
> Average time goes down, the number of GC cycles go up, since the cycles are shorter.
>
> Additional testing:
>  - [x] Linux x86_64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 tier1 with Shenandoah

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Use release only

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2498/files
  - new: https://git.openjdk.java.net/jdk/pull/2498/files/87a609f4..36bee3a9

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=01-02

  Stats: 3 lines in 1 file changed: 0 ins; 0 del; 3 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2498.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v3]

Zhengyu Gu-3
On Thu, 11 Feb 2021 06:37:56 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64.
>>
>> This seems to be excessive for Shenandoah update references code, and "release" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing.
>>
>> Average time goes down, the number of GC cycles go up, since the cycles are shorter.
>>
>> Additional testing:
>>  - [x] Linux x86_64 hotspot_gc_shenandoah
>>  - [x] Linux AArch64 hotspot_gc_shenandoah
>>  - [x] Linux AArch64 tier1 with Shenandoah
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
>   Use release only

Looks good to me.

-------------

Marked as reviewed by zgu (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v3]

Zhengyu Gu-3
In reply to this post by Zhengyu Gu-3
On Wed, 10 Feb 2021 20:44:34 GMT, Zhengyu Gu <[hidden email]> wrote:

>> I think I convinced myself there is a need for `memory_order_acq_rel`. I added a sketch of (counter-)example in code comments. See if that what you were concerned about?
>
> Actually, what I meant is that CAS here should use memory_order_release and load barrier needs memory_order_acquire.
> I am not sure memory_order_acq_rel is sufficient, C++ states _memory_order_acq_rel_:
>
> A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and **the modification is visible in other threads that acquire the same atomic variable.**
>
> For atomic_update_oop(), leading acq is prob. unnecessary, since the oop is either evacuated by current thread or there is a safepoint in between. My concern is missing read barrier on read side, e.g. an oop is evacuated and updated by a mutator, then the second Java thread follows fwdptr and loads the oop, without a read barrier on then load, it may see incompleted/newly evacuated oop.

Ah, I missed JDK-8261495, which is counterpart of this change.  I think memory_order_release is good.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2498
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261495: Shenandoah: reconsider update references memory ordering [v4]

Aleksey Shipilev-5
In reply to this post by Aleksey Shipilev-5
> Shenandoah update heap references code uses default Atomic::cmpxchg to avoid races with mutator updates. Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64.
>
> This seems to be excessive for Shenandoah update references code, and "release" is enough. We do not seem to piggyback on update-references memory effects anywhere (in fact, if not for mutator, we would not even need a CAS). But, there is an interplay with concurrent evacuation and updates from self-healing.
>
> Average time goes down, the number of GC cycles go up, since the cycles are shorter.
>
> Additional testing:
>  - [x] Linux x86_64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 hotspot_gc_shenandoah
>  - [x] Linux AArch64 tier1 with Shenandoah

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains 10 additional commits since the last revision:

 - Comment touchup
 - Specialize out witness-checking methods, drop acquire again
 - Even more explanation
 - Move the comment
 - Also handle clearing the oops
 - Minor touchups to the comment
 - Merge branch 'master' into JDK-8261495-shenandoah-updaterefs-memord
 - Use release only
 - Do acq_rel instead
 - 8261495: Shenandoah: reconsider update references memory ordering

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2498/files
  - new: https://git.openjdk.java.net/jdk/pull/2498/files/36bee3a9..0d299968

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2498&range=02-03

  Stats: 12253 lines in 405 files changed: 6246 ins; 3773 del; 2234 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2498.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2498/head:pull/2498

PR: https://git.openjdk.java.net/jdk/pull/2498