RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Xiaohong Gong
Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
  mov z16.b, #-1
  eor z17.d, z20.d, z16.d
  and z18.d, z18.d, z17.d
This could be improved with a single instruction:
  bic z16.d, z16.d, z18.d
Similarly, the following optimization for NEON is also needed:
  not v21.16b, v21.16b
  and v21.16b, v21.16b, v18.16b  ==>  bic v21.16b, v18.16b, v21.16b
This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
  mov z16.b, #-1
  eor z17.d, z20.d, z16.d     ==>   not z17.d, p7/m, z20.d
The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].

[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343

Tested tier1 and jdk:tier3.

-------------

Commit messages:
 - 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Changes: https://git.openjdk.java.net/jdk/pull/3370/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3370&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264352
  Stats: 219 lines in 7 files changed: 185 ins; 0 del; 34 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3370.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3370/head:pull/3370

PR: https://git.openjdk.java.net/jdk/pull/3370
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Andrew Haley-2
On Wed, 7 Apr 2021 05:53:46 GMT, Xiaohong Gong <[hidden email]> wrote:

> Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d
>   and z18.d, z18.d, z17.d
> This could be improved with a single instruction:
>   bic z16.d, z16.d, z18.d
> Similarly, the following optimization for NEON is also needed:
>   not v21.16b, v21.16b
>   and v21.16b, v21.16b, v18.16b  ==>  bic v21.16b, v18.16b, v21.16b
> This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d     ==>   not z17.d, p7/m, z20.d
> The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].
>
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343
>
> Tested tier1 and jdk:tier3.

Looks OK. Is there any test code for this is mainline?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3370
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Xiaohong Gong
On Wed, 7 Apr 2021 08:31:19 GMT, Andrew Haley <[hidden email]> wrote:

> Looks OK. Is there any test code for this is mainline?

Hi @theRealAph , thanks for looking at this PR.  Yes, there is the Vector API jtreg tests that have covered the opcode `NOT/AND_NOT`.
Please see the tests for byte vector: https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/ByteMaxVectorTests.java#L1708
and https://github.com/openjdk/jdk/blob/master/test/jdk/jdk/incubator/vector/ByteMaxVectorTests.java#L4602

-------------

PR: https://git.openjdk.java.net/jdk/pull/3370
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Andrew Haley-2
In reply to this post by Xiaohong Gong
On Wed, 7 Apr 2021 05:53:46 GMT, Xiaohong Gong <[hidden email]> wrote:

> Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d
>   and z18.d, z18.d, z17.d
> This could be improved with a single instruction:
>   bic z16.d, z16.d, z18.d
> Similarly, the following optimization for NEON is also needed:
>   not v21.16b, v21.16b
>   and v21.16b, v21.16b, v18.16b  ==>  bic v21.16b, v18.16b, v21.16b
> This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d     ==>   not z17.d, p7/m, z20.d
> The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].
>
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343
>
> Tested tier1 and jdk:tier3.

Marked as reviewed by aph (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/3370
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Ningsheng Jian-2
In reply to this post by Xiaohong Gong
On Wed, 7 Apr 2021 05:53:46 GMT, Xiaohong Gong <[hidden email]> wrote:

> Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d
>   and z18.d, z18.d, z17.d
> This could be improved with a single instruction:
>   bic z16.d, z16.d, z18.d
> Similarly, the following optimization for NEON is also needed:
>   not v21.16b, v21.16b
>   and v21.16b, v21.16b, v18.16b  ==>  bic v21.16b, v18.16b, v21.16b
> This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d     ==>   not z17.d, p7/m, z20.d
> The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].
>
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343
>
> Tested tier1 and jdk:tier3.

Looks good.

-------------

Marked as reviewed by njian (Committer).

PR: https://git.openjdk.java.net/jdk/pull/3370
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Xiaohong Gong
In reply to this post by Andrew Haley-2
On Wed, 7 Apr 2021 09:03:55 GMT, Andrew Haley <[hidden email]> wrote:

>> Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
>>   mov z16.b, #-1
>>   eor z17.d, z20.d, z16.d
>>   and z18.d, z18.d, z17.d
>> This could be improved with a single instruction:
>>   bic z16.d, z16.d, z18.d
>> Similarly, the following optimization for NEON is also needed:
>>   not v21.16b, v21.16b
>>   and v21.16b, v21.16b, v18.16b  ==>  bic v21.16b, v18.16b, v21.16b
>> This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
>>   mov z16.b, #-1
>>   eor z17.d, z20.d, z16.d     ==>   not z17.d, p7/m, z20.d
>> The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].
>>
>> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343
>>
>> Tested tier1 and jdk:tier3.
>
> Marked as reviewed by aph (Reviewer).

Thanks for the review @theRealAph @nsjian !

-------------

PR: https://git.openjdk.java.net/jdk/pull/3370
Reply | Threaded
Open this post in threaded view
|

Integrated: 8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Xiaohong Gong
In reply to this post by Xiaohong Gong
On Wed, 7 Apr 2021 05:53:46 GMT, Xiaohong Gong <[hidden email]> wrote:

> Since the vector bitwise `"andNot"` is implemented with `"v1.and(v2.xor(-1))"`, the generated codes with SVE look like:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d
>   and z18.d, z18.d, z17.d
> This could be improved with a single instruction:
>   bic z16.d, z16.d, z18.d
> Similarly, the following optimization for NEON is also needed:
>   not v21.16b, v21.16b
>   and v21.16b, v21.16b, v18.16b  ==>  bic v21.16b, v18.16b, v21.16b
> This patch also adds the following optimization to vector` "not"` for SVE which has already been added for NEON:
>   mov z16.b, #-1
>   eor z17.d, z20.d, z16.d     ==>   not z17.d, p7/m, z20.d
> The performance can improve about `16% ~ 36%` with NEON for the `"AND_NOT"` benchmark [1].
>
> [1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/jdk/jdk/incubator/vector/benchmark/src/main/java/benchmark/jdk/incubator/vector/ByteMaxVector.java#L343
>
> Tested tier1 and jdk:tier3.

This pull request has now been integrated.

Changeset: e89542fb
Author:    Xiaohong Gong <[hidden email]>
Committer: Ningsheng Jian <[hidden email]>
URL:       https://git.openjdk.java.net/jdk/commit/e89542fb
Stats:     219 lines in 7 files changed: 185 ins; 0 del; 34 mod

8264352: AArch64: Optimize vector "not/andNot" for NEON and SVE

Reviewed-by: aph, njian

-------------

PR: https://git.openjdk.java.net/jdk/pull/3370