8261649: AArch64: Optimize LSE atomics in C++ code
------------- Commit messages: - Restore LSE CAS generation. - Merge https://github.com/openjdk/jdk into JDK-8261650 - Trailing membar. - Flush everything correctly - Committed - Intermediate Changes: https://git.openjdk.java.net/jdk/pull/2611/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261649 Stats: 282 lines in 5 files changed: 166 ins; 51 del; 65 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 |
On Wed, 17 Feb 2021 17:14:46 GMT, Andrew Haley <[hidden email]> wrote:
> 8261649: AArch64: Optimize LSE atomics in C++ code src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5651: > 5649: __ mov(prev, compare_val); > 5650: __ lse_cas(prev, exchange_val, ptr, size, acquire, release, /*not_pair*/true); > 5651: if (acquire && release) { These two flags are only ever passed as true,true or false,false. Does any other combination make sense? If not then should you not be using a single flag? or at least asserting (pro tem) that they are both equal? src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S line 75: > 73: .align 5 > 74: aarch64_atomic_cmpxchg_1_default_impl: > 75: dmb ish Having argued above that this dmb is never needed why is it in this default impl? (also for size 4 and 8) ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
On Wed, 17 Feb 2021 17:48:06 GMT, Andrew Dinn <[hidden email]> wrote:
>> 8261649: AArch64: Optimize LSE atomics in C++ code > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5651: > >> 5649: __ mov(prev, compare_val); >> 5650: __ lse_cas(prev, exchange_val, ptr, size, acquire, release, /*not_pair*/true); >> 5651: if (acquire && release) { > > These two flags are only ever passed as true,true or false,false. Does any other combination make sense? If not then should you not be using a single flag? or at least asserting (pro tem) that they are both equal? Today HotSpot only really supports mo_conservative and mo_relaxed, but there are many where release on its own would make sense; I think Aleksey recently found some. Having said that, it would be clearer here to expose mo_conservative as well. I'll do so. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Dinn-2
On Wed, 17 Feb 2021 17:51:35 GMT, Andrew Dinn <[hidden email]> wrote:
>> 8261649: AArch64: Optimize LSE atomics in C++ code > > src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S line 75: > >> 73: .align 5 >> 74: aarch64_atomic_cmpxchg_1_default_impl: >> 75: dmb ish > > Having argued above that this dmb is never needed why is it in this default impl? (also for size 4 and 8) The default impl uses ARMv8 LDXR ... STXR but the DMB is not needed if and only if ARMv8.1 LSE instructions. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
On Thu, 18 Feb 2021 09:20:21 GMT, Andrew Haley <[hidden email]> wrote:
>> src/hotspot/os_cpu/linux_aarch64/atomic_linux_aarch64.S line 75: >> >>> 73: .align 5 >>> 74: aarch64_atomic_cmpxchg_1_default_impl: >>> 75: dmb ish >> >> Having argued above that this dmb is never needed why is it in this default impl? (also for size 4 and 8) > > The default impl uses ARMv8 LDXR ... STXR but the DMB is not needed if and only if ARMv8.1 LSE instructions. Doh! And it says that clearly in the comment. Ok. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
On Wed, 17 Feb 2021 17:14:46 GMT, Andrew Haley <[hidden email]> wrote:
> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Ok, this looks good enough to me. ------------- Marked as reviewed by adinn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal.
> > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Make things slightly less confusing ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2611/files - new: https://git.openjdk.java.net/jdk/pull/2611/files/23df4485..68ff88d9 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=00-01 Stats: 21 lines in 1 file changed: 8 ins; 0 del; 13 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
On Thu, 18 Feb 2021 09:15:12 GMT, Andrew Haley <[hidden email]> wrote:
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5651: >> >>> 5649: __ mov(prev, compare_val); >>> 5650: __ lse_cas(prev, exchange_val, ptr, size, acquire, release, /*not_pair*/true); >>> 5651: if (acquire && release) { >> >> These two flags are only ever passed as true,true or false,false. Does any other combination make sense? If not then should you not be using a single flag? or at least asserting (pro tem) that they are both equal? > > Today HotSpot only really supports mo_conservative and mo_relaxed, but there are many places in HotSpot where release on its own would make sense; I think Aleksey recently found some. Having said that, it would be clearer here to expose mo_conservative as well. I'll do so. Clearer now? ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal.
> > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Remove mistaken change. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2611/files - new: https://git.openjdk.java.net/jdk/pull/2611/files/68ff88d9..6b338901 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=01-02 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 |
On Thu, 18 Feb 2021 16:52:29 GMT, Andrew Haley <[hidden email]> wrote:
>> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. >> >> This patch: >> >> Moves memory barriers from the atomic_linux_aarch64 file into the stubs. >> Rewrites the LSE versions of the stubs to be more efficient. >> Fixes a race condition in stub generation. >> Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Remove mistaken change. Passer-by comments... src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 192: > 190: if (_cpu == CPU_ARM && (_model == 0xd07 || _model2 == 0xd07)) _features |= CPU_STXR_PREFETCH; > 191: > 192: _features |= CPU_STXR_PREFETCH; This looks weird. The line above it adds `CPU_STXR_PREFETCH` conditionally. Which one is correct? ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Dinn-2
On Thu, 18 Feb 2021 16:28:47 GMT, Andrew Dinn <[hidden email]> wrote:
> Ok, this looks good enough to me. I pulled the vm_version.cpp change, which I committed by mistake. I think it's the right thing to do, but it needs a separate discussion. OK? ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
On Thu, 18 Feb 2021 16:49:00 GMT, Andrew Haley <[hidden email]> wrote:
>> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. >> >> This patch: >> >> Moves memory barriers from the atomic_linux_aarch64 file into the stubs. >> Rewrites the LSE versions of the stubs to be more efficient. >> Fixes a race condition in stub generation. >> Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. > > Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: > > Make things slightly less confusing src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5655: > 5653: acquire = false, release = false; break; > 5654: default: > 5655: acquire = true, release = true; break; I think hotspot style is to indent the `case`-s. Also commas in `acquire` and `release` assignments look weird. Consider: Register prev = r3; Register ptr = c_rarg0; Register compare_val = c_rarg1; Register exchange_val = c_rarg2; bool acquire, release; switch (order) { case memory_order_relaxed: acquire = false; release = false; break; default: acquire = true; release = true; break; } ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
On Thu, 18 Feb 2021 16:49:26 GMT, Aleksey Shipilev <[hidden email]> wrote:
>> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Make things slightly less confusing > > src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5655: > >> 5653: acquire = false, release = false; break; >> 5654: default: >> 5655: acquire = true, release = true; break; > > I think hotspot style is to indent the `case`-s. Also commas in `acquire` and `release` assignments look weird. Consider: > > Register prev = r3; > Register ptr = c_rarg0; > Register compare_val = c_rarg1; > Register exchange_val = c_rarg2; > > bool acquire, release; > switch (order) { > case memory_order_relaxed: > acquire = false; > release = false; > break; > default: > acquire = true; > release = true; > break; > } Oh noes! I'll have to change my emacs indentation setup. I may be gone some time... ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal.
> > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: Switch statement layout. ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2611/files - new: https://git.openjdk.java.net/jdk/pull/2611/files/6b338901..83016c1d Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=03 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2611&range=02-03 Stats: 8 lines in 1 file changed: 4 ins; 0 del; 4 mod Patch: https://git.openjdk.java.net/jdk/pull/2611.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2611/head:pull/2611 PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
On Thu, 18 Feb 2021 16:58:39 GMT, Andrew Haley <[hidden email]> wrote:
>> src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp line 5655: >> >>> 5653: acquire = false, release = false; break; >>> 5654: default: >>> 5655: acquire = true, release = true; break; >> >> I think hotspot style is to indent the `case`-s. Also commas in `acquire` and `release` assignments look weird. Consider: >> >> Register prev = r3; >> Register ptr = c_rarg0; >> Register compare_val = c_rarg1; >> Register exchange_val = c_rarg2; >> >> bool acquire, release; >> switch (order) { >> case memory_order_relaxed: >> acquire = false; >> release = false; >> break; >> default: >> acquire = true; >> release = true; >> break; >> } > > Oh noes! I'll have to change my emacs indentation setup. I may be gone some time... Ha! Turns out it's an emacs FAQ: (c-set-offset 'case-label '+) ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Aleksey Shipilev-5
On Thu, 18 Feb 2021 16:43:29 GMT, Aleksey Shipilev <[hidden email]> wrote:
>> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch statement layout. > > src/hotspot/cpu/aarch64/vm_version_aarch64.cpp line 192: > >> 190: if (_cpu == CPU_ARM && (_model == 0xd07 || _model2 == 0xd07)) _features |= CPU_STXR_PREFETCH; >> 191: >> 192: _features |= CPU_STXR_PREFETCH; > > This looks weird. The line above it adds `CPU_STXR_PREFETCH` conditionally. Which one is correct? It was a mistake. Gone now. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Aleksey Shipilev-5
On Thu, 18 Feb 2021 16:52:29 GMT, Aleksey Shipilev <[hidden email]> wrote:
>> Andrew Haley has updated the pull request incrementally with one additional commit since the last revision: >> >> Switch statement layout. > > Passer-by comments... The memory_order_conservative/relaxed change is better. still good to go. ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
In reply to this post by Andrew Haley-2
On Wed, 17 Feb 2021 17:14:46 GMT, Andrew Haley <[hidden email]> wrote:
> Now that we have support for LSE atomics in C++ HotSpot source, we can generate much better code for them. In particular, the sequence we generate for CMPXCHG with a full two-way barrier using two DMBs is way suboptimal. > > This patch: > > Moves memory barriers from the atomic_linux_aarch64 file into the stubs. > Rewrites the LSE versions of the stubs to be more efficient. > Fixes a race condition in stub generation. > Mostly leaves the pre-LSE stubs alone, except that I added a PRFM which according to kernel engineers improves performance. This pull request has now been integrated. Changeset: 1b0c36b0 Author: Andrew Haley <[hidden email]> URL: https://git.openjdk.java.net/jdk/commit/1b0c36b0 Stats: 285 lines in 4 files changed: 170 ins; 45 del; 70 mod 8261649: AArch64: Optimize LSE atomics in C++ code Reviewed-by: adinn ------------- PR: https://git.openjdk.java.net/jdk/pull/2611 |
Free forum by Nabble | Edit this page |