Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Eric Liu-2
On Mon, 4 Jan 2021 01:58:38 GMT, Eric Liu <[hidden email]> wrote:

>> This patch transforms '(x >>> rshift) + (x << lshift)' into
>> 'RotateRight(x, rshift)' during GVN phase when both the shift exponents
>> are constants and their sum equals to the number of bits for the type
>> of shift base.
>>
>> This patch implements some new match rules for AArch64 instructions
>> which can take ROR as the optional shift. Such instructions are 'and',
>> 'or', 'eor', 'eon', 'bic' and 'orn'.
>>
>>   ror     w11, w2, #5
>>   eor     w0, w1, w11
>>
>> With this patch, above code could be optimized to below:
>>
>>   eor     w0, w1, w2, ror #5
>>
>> Finally, the patch refactors TestRotate.java[1][2].
>>
>> Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps,
>> jdk::jdk_core, langtools::tier1.
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8252776
>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html
>
> Could anyone help to take a look at this PR?

Ping

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Azeem Jiva-5
On Thu, 28 Jan 2021 01:47:32 GMT, Eric Liu <[hidden email]> wrote:

>> Could anyone help to take a look at this PR?
>
> Ping

What impact does this have on runtime?  There are also Tier1 failures in x86, which you'll need to investigate

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Eric Liu-2
On Thu, 28 Jan 2021 01:58:33 GMT, Azeem Jiva <[hidden email]> wrote:

>> Ping
>
> What impact does this have on runtime?  There are also Tier1 failures in x86, which you'll need to investigate

I suppose this failure was not caused by my patch, and it has been fixed by https://github.com/openjdk/jdk/commit/1594372c.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Andrew Haley
In reply to this post by Eric Liu-2
On 1/28/21 1:50 AM, Eric Liu wrote:
>
> Ping
>
> -------------
>
> PR: https://git.openjdk.java.net/jdk/pull/1858

I'm reviewing it now.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Andrew Haley
On 1/28/21 9:30 AM, Andrew Haley wrote:
> On 1/28/21 1:50 AM, Eric Liu wrote:
>>
>> Ping
>>
>> -------------
>>
>> PR: https://git.openjdk.java.net/jdk/pull/1858
>
> I'm reviewing it now.

Looks basically OK, but was tough to review because of the flags and
comments cleanup that made the diff hard to read.

An oddity. For rotations we have:

  match(Set dst (AndI src1 (XorI(RotateRight src2 src3) src4)));

but for all other shifts we have:

  match(Set dst (AndL src1 (XorL(RShiftL src2 src3) src4)));

That is to say, the other shifts have a type suffix. This doesn't look
right. Do you know what is going on?

Otherwise, this looks OK.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Eric Liu-2
In reply to this post by Eric Liu-2
On Thu, 28 Jan 2021 02:17:47 GMT, Eric Liu <[hidden email]> wrote:

>> What impact does this have on runtime?  There are also Tier1 failures in x86, which you'll need to investigate
>
> I suppose this failure was not caused by my patch, and it has been fixed by https://github.com/openjdk/jdk/commit/1594372c.

Thanks for your kindly review.

> An oddity. For rotations we have:
>
> match(Set dst (AndI src1 (XorI(RotateRight src2 src3) src4)));
>
> but for all other shifts we have:
>
> match(Set dst (AndL src1 (XorL(RShiftL src2 src3) src4)));
>
> That is to say, the other shifts have a type suffix. This doesn't look
> right. Do you know what is going on?

For rotation node, they have an extra field to identify the type, maybe this could save some code size...   And I didn't use 'predicate' to check that because middle-end could guarantee Xor's input type was correct.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Azeem Jiva-5
In reply to this post by Eric Liu-2
On Thu, 28 Jan 2021 02:17:47 GMT, Eric Liu <[hidden email]> wrote:

> I suppose this failure was not caused by my patch, and it has been fixed by https://github.com/openjdk/jdk/commit/1594372c.

Benchmark results?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Andrew Haley-2
In reply to this post by Eric Liu-2
On Mon, 21 Dec 2020 10:10:10 GMT, Eric Liu <[hidden email]> wrote:

> This patch transforms '(x >>> rshift) + (x << lshift)' into
> 'RotateRight(x, rshift)' during GVN phase when both the shift exponents
> are constants and their sum equals to the number of bits for the type
> of shift base.
>
> This patch implements some new match rules for AArch64 instructions
> which can take ROR as the optional shift. Such instructions are 'and',
> 'or', 'eor', 'eon', 'bic' and 'orn'.
>
>   ror     w11, w2, #5
>   eor     w0, w1, w11
>
> With this patch, above code could be optimized to below:
>
>   eor     w0, w1, w2, ror #5
>
> Finally, the patch refactors TestRotate.java[1][2].
>
> Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps,
> jdk::jdk_core, langtools::tier1.
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8252776
> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html

All that remains to do is the benchmarks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Andrew Haley-2
In reply to this post by Eric Liu-2
On Thu, 28 Jan 2021 13:29:59 GMT, Eric Liu <[hidden email]> wrote:

> > An oddity. For rotations we have:
> > ``` match(Set dst (AndI src1 (XorI(RotateRight src2 src3) src4))); ```
> > but for all other shifts we have:
> > ``` match(Set dst (AndL src1 (XorL(RShiftL src2 src3) src4))); ```
> > That is to say, the other shifts have a type suffix. This doesn't look
> > right. Do you know what is going on?
>
> For rotation node, they have an extra field to identify the type, maybe this could save some code size... And I didn't use 'predicate' to check that because middle-end could guarantee Xor's input type was correct.

Ok, I see. That's very ugly, but it's not your fault.

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Eric Liu-2
In reply to this post by Andrew Haley-2
On Thu, 28 Jan 2021 14:26:32 GMT, Andrew Haley <[hidden email]> wrote:

>> This patch transforms '(x >>> rshift) + (x << lshift)' into
>> 'RotateRight(x, rshift)' during GVN phase when both the shift exponents
>> are constants and their sum equals to the number of bits for the type
>> of shift base.
>>
>> This patch implements some new match rules for AArch64 instructions
>> which can take ROR as the optional shift. Such instructions are 'and',
>> 'or', 'eor', 'eon', 'bic' and 'orn'.
>>
>>   ror     w11, w2, #5
>>   eor     w0, w1, w11
>>
>> With this patch, above code could be optimized to below:
>>
>>   eor     w0, w1, w2, ror #5
>>
>> Finally, the patch refactors TestRotate.java[1][2].
>>
>> Tested jtreg TestRotate.java, hotspot::hotspot_all_no_apps,
>> jdk::jdk_core, langtools::tier1.
>>
>> [1] https://bugs.openjdk.java.net/browse/JDK-8252776
>> [2] https://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2020-September/039911.html
>
> All that remains to do is the benchmarks.

I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Before:
Benchmark                Mode  Cnt    Score   Error  Units
Rotation.addRotateRight  avgt   15  270.080 ± 0.630  ns/op
Rotation.andRotateRight  avgt   15  269.000 ± 0.226  ns/op
Rotation.bicRotateRight  avgt   15  269.269 ± 0.769  ns/op
Rotation.eonRotateRight  avgt   15  269.456 ± 0.056  ns/op
Rotation.ornRotateRight  avgt   15  269.681 ± 1.028  ns/op
Rotation.xorRotateRight  avgt   15  270.953 ± 2.764  ns/op
After:
Benchmark                Mode  Cnt    Score    Error  Units
Rotation.addRotateRight  avgt   15  399.995 ± 10.702  ns/op
Rotation.andRotateRight  avgt   15  399.517 ±  7.933  ns/op
Rotation.bicRotateRight  avgt   15  396.602 ±  2.038  ns/op
Rotation.eonRotateRight  avgt   15  401.978 ± 17.371  ns/op
Rotation.ornRotateRight  avgt   15  431.052 ± 73.711  ns/op
Rotation.xorRotateRight  avgt   15  410.447 ± 42.929  ns/op
------------------
[1] http://cr.openjdk.java.net/~xgong/rfr/8256438/Rotation.java

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Andrew Haley
On 1/29/21 8:35 AM, Eric Liu wrote:
> I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Wow. I'll have to look at this.

In theory we prefer combined instructions because it should reduce code
size. However, we can't escape the possibility that on some implementations
combined instructions might cause pipeline stalls.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Andrew Haley
On 29/01/2021 10:11, Andrew Haley wrote:
> On 1/29/21 8:35 AM, Eric Liu wrote:
>> I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Your benchmark did not work for me. It did not generate the correct
instructions.

Please try with this or similar:

    @Benchmark
    public void xorRotateRight(MyState s, Blackhole blackhole) {
        int x = s.xi;
        int y = s.yi;
        for (int i = 0; i < COUNT; i++) {
            y = x ^ ((y >>> 5) | (y << -5));
            x = y ^ ((x >>> 5) | (x << -5));
        }
        blackhole.consume(x);
    }

I get:

Benchmark                             Mode  Cnt     Score    Error  Units
Rotation.xorRotateRight   (before)    avgt    3  6142.575 ± 15.940  ns/op
Rotation.xorRotateRight   (after)     avgt    3  4081.587 ± 33.904  ns/op

Please integrate the corrected benchmark into your patch.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

RE: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

Eric Liu
Hi Andrew,

Thanks for your feedback.

I refined those benchmarks and on two platforms could have a better performance.
One can get about 100% gain, and 30% for another one. Other platforms were hard to
see any performance changes.

Before:
  Benchmark                Mode  Cnt     Score   Error  Units
  Rotation.andRotateRight  avgt   15  3860.994 ± 3.409  ns/op
  Rotation.bicRotateRight  avgt   15  3861.247 ± 3.321  ns/op
  Rotation.eonRotateRight  avgt   15  3860.865 ± 3.003  ns/op
  Rotation.ornRotateRight  avgt   15  3860.884 ± 3.260  ns/op
  Rotation.xorRotateRight  avgt   15  3860.886 ± 2.728  ns/op

After:
  Benchmark                 Mode  Cnt     Score    Error  Units
  Rotation.andRotateRight   avgt   15  1933.495 ± 0.263  ns/op
  Rotation.bicRotateRight   avgt   15  1933.436 ± 0.244  ns/op
  Rotation.eonRotateRight   avgt   15  1933.459 ± 0.255  ns/op
  Rotation.ornRotateRight   avgt   15  1933.559 ± 0.316  ns/op
  Rotation.xorRotateRight   avgt   15  1933.467 ± 0.245  ns/op

I would update my patch with the benchmark tests after finishing the whole tests.

--Eric

-----Original Message-----
From: hotspot-compiler-dev <[hidden email]> On Behalf Of Andrew Haley
Sent: Saturday, January 30, 2021 12:30 AM
To: [hidden email]
Subject: Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value

On 29/01/2021 10:11, Andrew Haley wrote:
> On 1/29/21 8:35 AM, Eric Liu wrote:
>> I benchmarked on 7 platforms with jmh test[1], most of them were hard to see any performance changes. On one platform the performance become bad, and seems a little unstable.

Your benchmark did not work for me. It did not generate the correct instructions.

Please try with this or similar:

    @Benchmark
    public void xorRotateRight(MyState s, Blackhole blackhole) {
        int x = s.xi;
        int y = s.yi;
        for (int i = 0; i < COUNT; i++) {
            y = x ^ ((y >>> 5) | (y << -5));
            x = y ^ ((x >>> 5) | (x << -5));
        }
        blackhole.consume(x);
    }

I get:

Benchmark                             Mode  Cnt     Score    Error  Units
Rotation.xorRotateRight   (before)    avgt    3  6142.575 ± 15.940  ns/op
Rotation.xorRotateRight   (after)     avgt    3  4081.587 ± 33.904  ns/op

Please integrate the corrected benchmark into your patch.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com> https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3]

Eric Liu-2
In reply to this post by Andrew Haley-2
On Thu, 28 Jan 2021 14:26:32 GMT, Andrew Haley <[hidden email]> wrote:

>> Eric Liu has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Add benchmark test
>>  
>>   Change-Id: I63ca51d06070a07e5c20daf4b42d2c8d7237a1da
>
> All that remains to do is the benchmarks.

@theRealAph Could you help to take a look ?

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8256438: AArch64: Implement match rules with ROR shift register value [v3]

Eric Liu-2
On Tue, 9 Feb 2021 03:07:38 GMT, Eric Liu <[hidden email]> wrote:

>> All that remains to do is the benchmarks.
>
> @theRealAph Could you help to take a look ?

Ping

-------------

PR: https://git.openjdk.java.net/jdk/pull/1858