RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering

classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering

Aleksey Shipilev-5
Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.

For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.

For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.

Sample run with `aggressive` (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64:

# Baseline
[135.357s][info][gc,stats] Concurrent Evacuation          =   17.459 s (a =    66132 us) (n =   264)
(lvls, us =      152,     2402,    74414,   123047,   142021)
[135.357s][info][gc,stats] Concurrent Update Refs         =   72.774 s (a =   276708 us) (n =   263) (
lvls, us =      354,     3281,   259766,   548828,   720417)

# Patched
[135.923s][info][gc,stats] Concurrent Evacuation          =   17.266 s (a =    61444 us) (n =   281)
(lvls, us =      137,     2754,    42773,   119141,   142979)
[135.923s][info][gc,stats] Concurrent Update Refs         =   74.234 s (a =   265121 us) (n =   280)
(lvls, us =      354,     3672,   132812,   558594,   748677)

Average time goes down, the number of GC cycles go up, since the cycles are shorter.

Additional testing:
 - [x] Linux x86_64 `hotspot_gc_shenandoah`
 - [x] Linux AArch64 `hotspot_gc_shenandoah`
 - [x] Linux AArch64 `tier1` with Shenandoah

-------------

Commit messages:
 - 8261492: Shenandoah: reconsider forwardee accesses memory ordering

Changes: https://git.openjdk.java.net/jdk/pull/2496/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261492
  Stats: 17 lines in 3 files changed: 11 ins; 0 del; 6 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2496.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering

Roman Kennke
On Wed, 10 Feb 2021 08:55:39 GMT, Aleksey Shipilev <[hidden email]> wrote:

> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>
> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>
> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.
>
> Sample run with `aggressive` (back-to-back cycles) on SPECjvm2008:compiler.compiler on AArch64:
>
> # Baseline
> [135.357s][info][gc,stats] Concurrent Evacuation          =   17.459 s (a =    66132 us) (n =   264)
> (lvls, us =      152,     2402,    74414,   123047,   142021)
> [135.357s][info][gc,stats] Concurrent Update Refs         =   72.774 s (a =   276708 us) (n =   263) (
> lvls, us =      354,     3281,   259766,   548828,   720417)
>
> # Patched
> [135.923s][info][gc,stats] Concurrent Evacuation          =   17.266 s (a =    61444 us) (n =   281)
> (lvls, us =      137,     2754,    42773,   119141,   142979)
> [135.923s][info][gc,stats] Concurrent Update Refs         =   74.234 s (a =   265121 us) (n =   280)
> (lvls, us =      354,     3672,   132812,   558594,   748677)
>
> Average time goes down, the number of GC cycles go up, since the cycles are shorter.
>
> Additional testing:
>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `tier1` with Shenandoah

Nice! Looks good to me!

-------------

Marked as reviewed by rkennke (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering

Zhengyu Gu-3
In reply to this post by Aleksey Shipilev-5
On Wed, 10 Feb 2021 08:55:39 GMT, Aleksey Shipilev <[hidden email]> wrote:

> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>
> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>
> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.
>
> Additional testing:
>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `tier1` with Shenandoah

Changes requested by zgu (Reviewer).

src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89:

> 87:   // (would be paired with acquire in forwardee accessors). Acquire on failed update
> 88:   // would get the updated object after the forwardee load.
> 89:   markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel);

You have obj->mark_acquire() above, I don't think you need leading acquire here.

src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 43:

> 41:   // fwdptr. That object is still not forwarded, and we need to return
> 42:   // the object itself.
> 43:   markWord mark = obj->mark_acquire();

We also need the acquire barrier in fast path in generated code, right?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering

Aleksey Shipilev-5
On Thu, 11 Feb 2021 13:32:00 GMT, Zhengyu Gu <[hidden email]> wrote:

>> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>>
>> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>>
>> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.
>>
>> Additional testing:
>>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>>  - [x] Linux AArch64 `tier1` with Shenandoah
>
> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 43:
>
>> 41:   // fwdptr. That object is still not forwarded, and we need to return
>> 42:   // the object itself.
>> 43:   markWord mark = obj->mark_acquire();
>
> We also need the acquire barrier in fast path in generated code, right?

Dang. I thought the beauty of self-fixing barriers is that we moved all fwdptr accesses to C++ (either GC or LRB), and all of them end up in this file. But there is `ShenandoahBarrierSetAssembler::cmpxchg_oop` that accesses the fwdptr directly. I shall see what can be done there.

> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89:
>
>> 87:   // (would be paired with acquire in forwardee accessors). Acquire on failed update
>> 88:   // would get the updated object after the forwardee load.
>> 89:   markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel);
>
> You have obj->mark_acquire() above, I don't think you need leading acquire here.

A bit different: we want to acquire `prev_mark` (failure witness) here, if we lose the update race. So the `old_mark` acquire does not really help us.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Aleksey Shipilev-5
In reply to this post by Aleksey Shipilev-5
> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>
> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>
> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.
>
> Additional testing:
>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `tier1` with Shenandoah

Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:

  Make sure to access fwdptr with acquire semantics in assembler code

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2496/files
  - new: https://git.openjdk.java.net/jdk/pull/2496/files/b6ac8e4c..49626781

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=00-01

  Stats: 5 lines in 1 file changed: 4 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2496.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Aleksey Shipilev-5
In reply to this post by Aleksey Shipilev-5
On Thu, 11 Feb 2021 13:39:42 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 43:
>>
>>> 41:   // fwdptr. That object is still not forwarded, and we need to return
>>> 42:   // the object itself.
>>> 43:   markWord mark = obj->mark_acquire();
>>
>> We also need the acquire barrier in fast path in generated code, right?
>
> Dang. I thought the beauty of self-fixing barriers is that we moved all fwdptr accesses to C++ (either GC or LRB), and all of them end up in this file. But there is `ShenandoahBarrierSetAssembler::cmpxchg_oop` that accesses the fwdptr directly. I shall see what can be done there.

I believe only AArch64 needs a fix. x86_64 already has a strong semantics. See new commit.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Zhengyu Gu-3
In reply to this post by Aleksey Shipilev-5
On Thu, 11 Feb 2021 14:07:58 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>>
>> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>>
>> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.
>>
>> Additional testing:
>>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>>  - [x] Linux AArch64 `tier1` with Shenandoah
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
>   Make sure to access fwdptr with acquire semantics in assembler code

src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 58:

> 56:   assert(Thread::current()->is_Java_thread(), "Must be a mutator thread");
> 57:
> 58:   markWord mark = obj->mark_acquire();

Actually, I would argue that you don't need acquire here, since you don't touch anything other than mark word, so that there is no order that needs to be enforced.

src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89:

> 87:   // (would be paired with acquire in forwardee accessors). Acquire on failed update
> 88:   // would get the updated object after the forwardee load.
> 89:   markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel);

Same here, CAS guarantees you to see latest mark word. memory_order_release should be sufficient here, paired with memory_order_acquire in resolve_forward barrier to ensure safe publishing the new object.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Aleksey Shipilev-5
On Thu, 11 Feb 2021 14:25:21 GMT, Zhengyu Gu <[hidden email]> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Make sure to access fwdptr with acquire semantics in assembler code
>
> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 58:
>
>> 56:   assert(Thread::current()->is_Java_thread(), "Must be a mutator thread");
>> 57:
>> 58:   markWord mark = obj->mark_acquire();
>
> Actually, I would argue that you don't need acquire here, since you don't touch anything other than mark word, so that there is no order that needs to be enforced.

It is not about the mark word itself, we access the forwardee afterwards. It is about transitive dependency: object copy stores -> forwardee installation (markword store, release) -> forwardee discovery (markword load, acquire) --> object copy access. Remember this: https://github.com/openjdk/jdk/pull/2498#discussion_r574498830

> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89:
>
>> 87:   // (would be paired with acquire in forwardee accessors). Acquire on failed update
>> 88:   // would get the updated object after the forwardee load.
>> 89:   markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel);
>
> Same here, CAS guarantees you to see latest mark word. memory_order_release should be sufficient here, paired with memory_order_acquire in resolve_forward barrier to ensure safe publishing the new object.

Again, not really about the mark word. The transitive load of the object from that fwdptr is what we are after.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Zhengyu Gu-3
On Thu, 11 Feb 2021 16:04:38 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 58:
>>
>>> 56:   assert(Thread::current()->is_Java_thread(), "Must be a mutator thread");
>>> 57:
>>> 58:   markWord mark = obj->mark_acquire();
>>
>> Actually, I would argue that you don't need acquire here, since you don't touch anything other than mark word, so that there is no order that needs to be enforced.
>
> It is not about the mark word itself, we access the forwardee afterwards. It is about transitive dependency: object copy stores -> forwardee installation (markword store, release) -> forwardee discovery (markword load, acquire) --> object copy access. Remember this: https://github.com/openjdk/jdk/pull/2498#discussion_r574498830

Sorry. I commented on wrong place. I actually meant line #78 in try_update_forwardee().

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Zhengyu Gu-3
In reply to this post by Aleksey Shipilev-5
On Thu, 11 Feb 2021 14:07:58 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>>
>> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>>
>> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.
>>
>> Additional testing:
>>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>>  - [x] Linux AArch64 `tier1` with Shenandoah
>
> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>
>   Make sure to access fwdptr with acquire semantics in assembler code

Good to me.

-------------

Marked as reviewed by zgu (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Zhengyu Gu-3
In reply to this post by Aleksey Shipilev-5
On Thu, 11 Feb 2021 16:06:37 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> src/hotspot/share/gc/shenandoah/shenandoahForwarding.inline.hpp line 89:
>>
>>> 87:   // (would be paired with acquire in forwardee accessors). Acquire on failed update
>>> 88:   // would get the updated object after the forwardee load.
>>> 89:   markWord prev_mark = obj->cas_set_mark(new_mark, old_mark, memory_order_acq_rel);
>>
>> Same here, CAS guarantees you to see latest mark word. memory_order_release should be sufficient here, paired with memory_order_acquire in resolve_forward barrier to ensure safe publishing the new object.
>
> Again, not really about the mark word. The transitive load of the object from that fwdptr is what we are after.

I kind of seeing what you meant, if there is a load after, we do need acquire. So good to me.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Aleksey Shipilev-5
In reply to this post by Zhengyu Gu-3
On Thu, 11 Feb 2021 16:14:00 GMT, Zhengyu Gu <[hidden email]> wrote:

>> It is not about the mark word itself, we access the forwardee afterwards. It is about transitive dependency: object copy stores -> forwardee installation (markword store, release) -> forwardee discovery (markword load, acquire) --> object copy access. Remember this: https://github.com/openjdk/jdk/pull/2498#discussion_r574498830
>
> Sorry. I commented on wrong place. I actually meant line #78 in try_update_forwardee().

You have to do acquire there as well, because you can exit from the next block, the one with `old_mark.is_marked()`, never getting further. Then the whole thing unfolds: for transitive visibility of the object contents, you have to load the mark word with acquire before accessing forwardee contents.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Zhengyu Gu-3
On Thu, 11 Feb 2021 16:17:18 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Sorry. I commented on wrong place. I actually meant line #78 in try_update_forwardee().
>
> You have to do acquire there as well, because you can exit from the next block, the one with `old_mark.is_marked()`, never getting further. Then the whole thing unfolds: for transitive visibility of the object contents, you have to load the mark word with acquire before accessing forwardee contents.

Yes. I wasn't thinking of the use of oop content after.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Martin Doerr
In reply to this post by Zhengyu Gu-3
On Thu, 11 Feb 2021 16:18:30 GMT, Zhengyu Gu <[hidden email]> wrote:

>> Aleksey Shipilev has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Make sure to access fwdptr with acquire semantics in assembler code
>
> Good to me.

Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp?
Address dependency ensures the ordering. Or is there anything missing?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Aleksey Shipilev-5
On Thu, 11 Feb 2021 18:08:49 GMT, Martin Doerr <[hidden email]> wrote:

> Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp?
> Address dependency ensures the ordering. Or is there anything missing?

True, we could use address dependency for the ordering. This thing is more or less load-consume of the newly promoted object. But, I think reasoning with (stronger) acquires is more versatile for the code we do not control (C++ generally discourages using `memory_order_consume`; that is why our C++ code acquires mark word). While we control the assembler insns sequence, it seems good to match that. It does not seem worth over-relaxing the rare path: the CAS failure path under running GC.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Martin Doerr
On Fri, 12 Feb 2021 07:26:49 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp?
>> Address dependency ensures the ordering. Or is there anything missing?
>
>> Do we really need to change shenandoahBarrierSetAssembler_aarch64.cpp?
>> Address dependency ensures the ordering. Or is there anything missing?
>
> True, we could use address dependency for the ordering. This thing is more or less load-consume of the newly promoted object. But, I think reasoning with (stronger) acquires is more versatile for the code we do not control (C++ generally discourages using `memory_order_consume`; that is why our C++ code acquires mark word). While we control the assembler insns sequence, it seems good to match that. It does not seem worth over-relaxing the rare path: the CAS failure path under running GC.

Thanks for your reply. Yeah, I got your point regarding C++. But we use load-consume a lot in our self-made assembly code which should be ok.
I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Aleksey Shipilev-5
On Fri, 12 Feb 2021 10:06:52 GMT, Martin Doerr <[hidden email]> wrote:

> I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive?

Yes, it is not supposed to be: CAS failure path when GC is relocating the objects.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Martin Doerr
On Fri, 12 Feb 2021 10:56:34 GMT, Aleksey Shipilev <[hidden email]> wrote:

>> Thanks for your reply. Yeah, I got your point regarding C++. But we use load-consume a lot in our self-made assembly code which should be ok.
>> I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive?
>
>> I guess the shenandoahBarrierSetAssembler_aarch64.cpp part you're changing is not very perfomance sensitive?
>
> Yes, it is not supposed to be: CAS failure path when GC is relocating the objects.

I'd prefer using load-consume with comment in assembly code and acquire in C++ code. That would be consistent with other code. But that's just my opinion. I'll leave the aarch64 maintainers free to decide.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2496
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v2]

Andrew Haley
On 15/02/2021 12:00, Martin Doerr wrote:
> I'd prefer using load-consume with comment in assembly code and acquire in C++ code. That would be consistent with other code. But that's just my opinion. I'll leave the aarch64 maintainers free to decide.

That sounds right to me too. One day we'll get memory_order_consume
for HotSpot C++ code, but until then acquire will have to do.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261492: Shenandoah: reconsider forwardee accesses memory ordering [v3]

Aleksey Shipilev-5
In reply to this post by Aleksey Shipilev-5
> Shenandoah carries forwardee information in object's mark word. Installing the new mark word is effectively "releasing" the object copy, and reading from the new mark word is "acquiring" that object copy.
>
> For the forwardee update side, Hotspot's default for atomic operations is memory_order_conservative, which emits two-way memory fences around the CASes at least on AArch64 and PPC64. This seems to be excessive for Shenandoah forwardee updates, and "release" is enough.
>
> For the forwardee load side, we need to guarantee "acquire". We do not do it now, reading the markword without memory semantics. It does not seem to pose a practical problem today, because GC does not access the object contents in the new copy, and mutators get this from the JRT-called stub that separates the fwdptr access and object contents access by a lot. It still should be cleaner to "acquire" the mark on load to avoid surprises.
>
> Additional testing:
>  - [x] Linux x86_64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `hotspot_gc_shenandoah`
>  - [x] Linux AArch64 `tier1` with Shenandoah

Aleksey Shipilev has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - A few minor touchups
 - Add a blurb to x86 code as well
 - Use implicit "consume" in AArch64, add more notes.
 - Merge branch 'master' into JDK-8261492-shenandoah-forwardee-memord
 - Make sure to access fwdptr with acquire semantics in assembler code
 - 8261492: Shenandoah: reconsider forwardee accesses memory ordering

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2496/files
  - new: https://git.openjdk.java.net/jdk/pull/2496/files/49626781..0c159cc3

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2496&range=01-02

  Stats: 11804 lines in 386 files changed: 5929 ins; 3700 del; 2175 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2496.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2496/head:pull/2496

PR: https://git.openjdk.java.net/jdk/pull/2496
12