Quantcast

RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Haley
In C2 we use LDAR/STLR to handle volatile accesses, but in C1 and the
interpreter we use separate DMB instructions and relaxed loads. When
used together, these do not form a sequentially-consistent memory
ordering. For example, if stores use STLR and loads use LDR;DMB a
simple Dekker idiom will fail.

This is extremely hard to test because the loads and stores have to be
in separately-compiled methods, but it is incorrect, and likely to
fail in very weakly-ordered implementations.

Note: this is for JDK 9.

 http://cr.openjdk.java.net/~aph/8179954/

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Aleksey Shipilev-4
On 05/09/2017 07:18 PM, Andrew Haley wrote:

> In C2 we use LDAR/STLR to handle volatile accesses, but in C1 and the
> interpreter we use separate DMB instructions and relaxed loads. When
> used together, these do not form a sequentially-consistent memory
> ordering. For example, if stores use STLR and loads use LDR;DMB a
> simple Dekker idiom will fail.
>
> This is extremely hard to test because the loads and stores have to be
> in separately-compiled methods, but it is incorrect, and likely to
> fail in very weakly-ordered implementations.
>
> Note: this is for JDK 9.
>
>  http://cr.openjdk.java.net/~aph/8179954/

Makes sense to me.

-Aleksey



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

White, Derek
Hi Andrew,

Good catch!

My only comment is in src/cpu/aarch64/vm/templateTable_aarch64.cpp, TemplateTable::getfield_or_static():

The comment in the trailing member around line 2542 says:
 - "It's really not worth bothering to check whether this field really is volatile in the slow case."

But getfield_or_static() is used once to "quicken" getfield byte codes, as well as used forevermore on all getstatic bytecodes (and some weird cases in class sharing?).

I can't claim that makes a definite performance difference (it's just the interpreter), but adding an additional unconditional membar might make it more likely to matter.

FYI, the ppc and arm64 ports do check if the field is volatile before executing the membar (s in the ppc case).

Thanks,

 - Derek

-----Original Message-----
From: hotspot-dev [mailto:[hidden email]] On Behalf Of Aleksey Shipilev
Sent: Tuesday, May 09, 2017 1:30 PM
To: Andrew Haley <[hidden email]>; hotspot-dev Source Developers <[hidden email]>
Subject: Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

On 05/09/2017 07:18 PM, Andrew Haley wrote:

> In C2 we use LDAR/STLR to handle volatile accesses, but in C1 and the
> interpreter we use separate DMB instructions and relaxed loads. When
> used together, these do not form a sequentially-consistent memory
> ordering. For example, if stores use STLR and loads use LDR;DMB a
> simple Dekker idiom will fail.
>
> This is extremely hard to test because the loads and stores have to be
> in separately-compiled methods, but it is incorrect, and likely to
> fail in very weakly-ordered implementations.
>
> Note: this is for JDK 9.
>
>  http://cr.openjdk.java.net/~aph/8179954/

Makes sense to me.

-Aleksey



Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

David Holmes
In reply to this post by Andrew Haley
Hi Andrew,

On 10/05/2017 3:18 AM, Andrew Haley wrote:
> In C2 we use LDAR/STLR to handle volatile accesses, but in C1 and the
> interpreter we use separate DMB instructions and relaxed loads. When
> used together, these do not form a sequentially-consistent memory
> ordering. For example, if stores use STLR and loads use LDR;DMB a
> simple Dekker idiom will fail.

I'm somewhat confused by this description. Outside of Aarch64 the
general approach, for C1 and Unsafe at least, is that a volatile-read is
a load-acquire() (or a fence-load-acquire if you want the IRIW support)
and a volatile write is a release-store-fence (or just release-store
with IRIW support). Does Aarch64 not follow this pattern?

I'm trying to see if the issue here is the original code generation or a
subtle incompatibility between the ld-acq/st-rel instructions and
explicit DMB.

Thanks,
David

> This is extremely hard to test because the loads and stores have to be
> in separately-compiled methods, but it is incorrect, and likely to
> fail in very weakly-ordered implementations.
>
> Note: this is for JDK 9.
>
>  http://cr.openjdk.java.net/~aph/8179954/
>
> Andrew.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Haley
In reply to this post by White, Derek
On 09/05/17 22:39, White, Derek wrote:

> My only comment is in src/cpu/aarch64/vm/templateTable_aarch64.cpp, TemplateTable::getfield_or_static():
>
> The comment in the trailing member around line 2542 says:
>  - "It's really not worth bothering to check whether this field
> really is volatile in the slow case."
>
> But getfield_or_static() is used once to "quicken" getfield byte
> codes, as well as used forevermore on all getstatic bytecodes (and
> some weird cases in class sharing?).
>
> I can't claim that makes a definite performance difference (it's
> just the interpreter), but adding an additional unconditional membar
> might make it more likely to matter.
>
> FYI, the ppc and arm64 ports do check if the field is volatile
> before executing the membar (s in the ppc case).

Sure.  There's always a trade-off between code complexity and absolute
speed, and we tend to concentrate our fire where it can help the most,
and that's probably not in the slow getfield part of the interpreter.

I know that mispredicted branches hurt, and that fence instructions
hurt; I don't know which hurts the most in general.  I'm not at all
sure that the conditional branches around the fences are right, and it
might work better if they all were removed.  The right way to fix this
case is to rewrite the interpreter to use stlr and ldar, but that's
too complex for JDK9.  It'd be nice to get it done for JDK10.

So yeah, you might have a point, but I really don't think it's worth
changing the patch for JDK9.

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Haley
In reply to this post by David Holmes
On 10/05/17 05:04, David Holmes wrote:
> I'm somewhat confused by this description. Outside of Aarch64 the
> general approach, for C1 and Unsafe at least, is that a volatile-read is
> a load-acquire() (or a fence-load-acquire if you want the IRIW support)
> and a volatile write is a release-store-fence (or just release-store
> with IRIW support). Does Aarch64 not follow this pattern?

No.  AArch64 has its own sequentially-consistent load and store
instructions which are designed to provide just enough for volatiles
but no more.  These are preferable to using fences, but that's hard in
C1 because shared code inserts fences, regardless of the target
machine.  This is wrong, but it's legacy code.

> I'm trying to see if the issue here is the original code generation or a
> subtle incompatibility between the ld-acq/st-rel instructions and
> explicit DMB.

I wouldn't be surprised.  The problem is that the approach taken in
HotSpot is much too naive.  There is not an exact correspondence
between real processors' fence instructions and what we need for
Hotspot.  The best mappings are here:

https://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html

The ones we need for volatiles are the Seq Cst set.

As you can see, the choices of instruction sequences are different for
different processors.  C1 (and all the compilers) should delegate this
to the back ends, but instead they try map volatile accesses onto
acquire/release/fence.

PPC has special code which is #ifdef'd in the shared code in the
compilers, so I'm sure it gets this right.

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Dinn
In reply to this post by Andrew Haley
On 09/05/17 18:18, Andrew Haley wrote:
> In C2 we use LDAR/STLR to handle volatile accesses, but in C1 and the
> interpreter we use separate DMB instructions and relaxed loads. When
> used together, these do not form a sequentially-consistent memory
> ordering. For example, if stores use STLR and loads use LDR;DMB a
> simple Dekker idiom will fail.

Oh, well caught!

> This is extremely hard to test because the loads and stores have to be
> in separately-compiled methods, but it is incorrect, and likely to
> fail in very weakly-ordered implementations.

Not to mention hard to debug ;-)

> Note: this is for JDK 9.
>
>  http://cr.openjdk.java.net/~aph/8179954/

Yes, this patch looks good and really ought to go into jdk9.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Haley
In reply to this post by Andrew Haley
So you were right.  Dammit.  :-)

This test:

    @State(Scope.Benchmark)
    public static class BenchmarkState {
        static int nn = 99;
    }
    @Benchmark
    public int testMethod(BenchmarkState state) {
        return state.nn;
    }

Interpreter-only, before my patch:

Benchmark               Mode  Cnt   Score   Error  Units
MyBenchmark.testMethod  avgt    5  92.938 ? 0.870  ns/op

After my patch, it's slower:

# Run complete. Total time: 00:00:07

Benchmark               Mode  Cnt   Score   Error  Units
MyBenchmark.testMethod  avgt    5  94.518 ? 0.562  ns/op

But if I insert conditional branches around the fences as you suggest,
the result is much better:

Benchmark               Mode  Cnt   Score   Error  Units
MyBenchmark.testMethod  avgt   25  83.825 ? 0.161  ns/op

New patch at http://cr.openjdk.java.net/~aph/8179954-2/

OK?

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

White, Derek
So good news - you made getstatic 5x faster. Bad news - only in the interpreter.

So this one goes out to all the static initializers!

Actual code review question:
In templateTable_aarch64.cpp:
  Line 2411: We tbz on flags.
  Line 2423: We extract into flags.
  Line 2547: We tbz on flags.
    - Is the volatile bit still in flags?

Thanks!

 - Derek

-----Original Message-----
From: Andrew Haley [mailto:[hidden email]]
Sent: Wednesday, May 10, 2017 10:19 AM
To: White, Derek <[hidden email]>; Aleksey Shipilev <[hidden email]>; hotspot-dev Source Developers <[hidden email]>
Subject: Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

So you were right.  Dammit.  :-)

This test:

    @State(Scope.Benchmark)
    public static class BenchmarkState {
        static int nn = 99;
    }
    @Benchmark
    public int testMethod(BenchmarkState state) {
        return state.nn;
    }

Interpreter-only, before my patch:

Benchmark               Mode  Cnt   Score   Error  Units
MyBenchmark.testMethod  avgt    5  92.938 ? 0.870  ns/op

After my patch, it's slower:

# Run complete. Total time: 00:00:07

Benchmark               Mode  Cnt   Score   Error  Units
MyBenchmark.testMethod  avgt    5  94.518 ? 0.562  ns/op

But if I insert conditional branches around the fences as you suggest, the result is much better:

Benchmark               Mode  Cnt   Score   Error  Units
MyBenchmark.testMethod  avgt   25  83.825 ? 0.161  ns/op

New patch at http://cr.openjdk.java.net/~aph/8179954-2/

OK?

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Haley
On 10/05/17 17:42, White, Derek wrote:
> So this one goes out to all the static initializers!
>
> Actual code review question:
> In templateTable_aarch64.cpp:
>   Line 2411: We tbz on flags.
>   Line 2423: We extract into flags.
>   Line 2547: We tbz on flags.
>     - Is the volatile bit still in flags?

Ummm, probably not.  Yuck.  I'll think on.

You can see why I couldn't be bothered to handle the volatile bit.

Andrew.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Haley
In reply to this post by White, Derek
Thanks for finding that bug.  It's a reminder that we need to
concentrate on doing the minimum at this stage to ensure correctness.
Our performance is good, and this change is not hugely profitable.  It's
not worth risking the whole ship for.

Nevertheless, I've made a new webrev, which does the right thing.  I
stepped through the code to make sure.  I've come this far, so I might
as well get it right.  (Yes, I'm aware that I just fell into the sunk
cost fallacy, but I want something to show for the work I've done.)

http://cr.openjdk.java.net/~aph/8179954-3/

In JDK 10 we should look at replacing all the explicit fences used for
volatiles with LDAR/STLR.

OK?  I'd like two reviewers for this one.

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Stuart Monteith
Hi,
   It looks fine as far as I can tell. Could the comment be explicit
about replacing the code sequence:

STLR
LDR
DMB

with:
  STLR
  DMB
  LDR

as initially, I was thinking about them being on different threads.
(although, looking at the thread, it was probably just me that thought
that).

BR,
   Stuart


On 11 May 2017 at 13:31, Andrew Haley <[hidden email]> wrote:

> Thanks for finding that bug.  It's a reminder that we need to
> concentrate on doing the minimum at this stage to ensure correctness.
> Our performance is good, and this change is not hugely profitable.  It's
> not worth risking the whole ship for.
>
> Nevertheless, I've made a new webrev, which does the right thing.  I
> stepped through the code to make sure.  I've come this far, so I might
> as well get it right.  (Yes, I'm aware that I just fell into the sunk
> cost fallacy, but I want something to show for the work I've done.)
>
> http://cr.openjdk.java.net/~aph/8179954-3/
>
> In JDK 10 we should look at replacing all the explicit fences used for
> volatiles with LDAR/STLR.
>
> OK?  I'd like two reviewers for this one.
>
> Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

White, Derek
In reply to this post by Andrew Haley
Hi Andrew,

I understand your point on correctness. But thanks for fighting through this one.

Code looks good.

 - Derek

-----Original Message-----
From: Andrew Haley [mailto:[hidden email]]
Sent: Thursday, May 11, 2017 8:31 AM
To: White, Derek <[hidden email]>; Aleksey Shipilev <[hidden email]>; hotspot-dev Source Developers <[hidden email]>
Subject: Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Thanks for finding that bug.  It's a reminder that we need to concentrate on doing the minimum at this stage to ensure correctness.
Our performance is good, and this change is not hugely profitable.  It's not worth risking the whole ship for.

Nevertheless, I've made a new webrev, which does the right thing.  I stepped through the code to make sure.  I've come this far, so I might as well get it right.  (Yes, I'm aware that I just fell into the sunk cost fallacy, but I want something to show for the work I've done.)

http://cr.openjdk.java.net/~aph/8179954-3/

In JDK 10 we should look at replacing all the explicit fences used for volatiles with LDAR/STLR.

OK?  I'd like two reviewers for this one.

Andrew.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Haley
In reply to this post by Stuart Monteith
On 11/05/17 16:34, Stuart Monteith wrote:

>    It looks fine as far as I can tell. Could the comment be explicit
> about replacing the code sequence:
>
> STLR
> LDR
> DMB
>
> with:
>   STLR
>   DMB
>   LDR

I've put it in the bug report.  I think that should do it.

Andrew.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Roland Westrelin-3
In reply to this post by Andrew Haley

> http://cr.openjdk.java.net/~aph/8179954-3/

That looks good to me.

Roland.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Dinn
In reply to this post by Andrew Haley
On 11/05/17 13:31, Andrew Haley wrote:

> Thanks for finding that bug.  It's a reminder that we need to
> concentrate on doing the minimum at this stage to ensure correctness.
> Our performance is good, and this change is not hugely profitable.  It's
> not worth risking the whole ship for.
>
> Nevertheless, I've made a new webrev, which does the right thing.  I
> stepped through the code to make sure.  I've come this far, so I might
> as well get it right.  (Yes, I'm aware that I just fell into the sunk
> cost fallacy, but I want something to show for the work I've done.)
>
> http://cr.openjdk.java.net/~aph/8179954-3/
>
> In JDK 10 we should look at replacing all the explicit fences used for
> volatiles with LDAR/STLR.
>
> OK?  I'd like two reviewers for this one.

The latest patch also looks good by eyeball.

I'm currently checking it builds and runs ok. Will gte back soon re that.

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Andrew Dinn
On 11/05/17 17:27, Andrew Dinn wrote:
> On 11/05/17 13:31, Andrew Haley wrote:
>> OK?  I'd like two reviewers for this one.
>
> The latest patch also looks good by eyeball.
>
> I'm currently checking it builds and runs ok. Will gte back soon re that.

Yes, it builds for me and my build runs java Hello, javac Hello.java and
netbeans ok.

I cannot guarantee that I am likely to have tested the actual fix since,
as Andrew points out, it needs to have code with different levels of
compilation happen to mix in the right way. But this at least suggests
that nothing got side-swiped along the way.

So, I say ship it :-)

regards,


Andrew Dinn
-----------
Senior Principal Software Engineer
Red Hat UK Ltd
Registered in England and Wales under Company Registration No. 03798903
Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR: AArch64: 8179954: AArch64: C1 and C2 volatile accesses are not sequentially consistent

Stuart Monteith
Looks good to me - a quick "sanity" quick on JCStress ran cleanly too.

On 11 May 2017 at 17:52, Andrew Dinn <[hidden email]> wrote:

> On 11/05/17 17:27, Andrew Dinn wrote:
>> On 11/05/17 13:31, Andrew Haley wrote:
>>> OK?  I'd like two reviewers for this one.
>>
>> The latest patch also looks good by eyeball.
>>
>> I'm currently checking it builds and runs ok. Will gte back soon re that.
>
> Yes, it builds for me and my build runs java Hello, javac Hello.java and
> netbeans ok.
>
> I cannot guarantee that I am likely to have tested the actual fix since,
> as Andrew points out, it needs to have code with different levels of
> compilation happen to mix in the right way. But this at least suggests
> that nothing got side-swiped along the way.
>
> So, I say ship it :-)
>
> regards,
>
>
> Andrew Dinn
> -----------
> Senior Principal Software Engineer
> Red Hat UK Ltd
> Registered in England and Wales under Company Registration No. 03798903
> Directors: Michael Cunningham, Michael ("Mike") O'Neill, Eric Shander
Loading...