RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width

Dong Bo
In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
    /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
    public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);

The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 8`, but `ushr dst.4H, src.4H, 16` instead.
According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);

The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en

This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

-------------

Commit messages:
 - fix trailing whitespaces
 - 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width

Changes: https://git.openjdk.java.net/jdk/pull/2472/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261142
  Stats: 771 lines in 3 files changed: 641 ins; 17 del; 113 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width

Ningsheng Jian-2
On Tue, 9 Feb 2021 06:55:50 GMT, Dong Bo <[hidden email]> wrote:

> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Thanks for the fix.

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5285:

> 5283:   ins_encode %{
> 5284:     int sh = (int)$shift$$constant;
> 5285:     if (sh == 0) {

If src and dst are the same reg, no need to emit code. Or maybe c2 can even be improved to optimize this (sh=0 case) out?

src/hotspot/cpu/aarch64/aarch64_neon.ad line 5271:

> 5269:     } else {
> 5270:       if (sh >= 8) sh = 7;
> 5271:       __ sshr(as_FloatRegister($dst$$reg), __ T8B,

I think we should add an assert to make sure 0 is not passed to the assembler.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width

Dong Bo
On Tue, 9 Feb 2021 07:47:57 GMT, Ningsheng Jian <[hidden email]> wrote:

> If src and dst are the same reg, no need to emit code.

If we want to do this enhancement, I think we need do it for left shifting and all SVE left/right shifting as well for completeness.

> Or maybe c2 can even be improved to optimize this (sh=0 case) out?

We can add code in `Ideal` to optimize it to ORR, but I'm not sure `orr` performs better than `shift` on other platforms.
Seems we have to created a generic new node to do `vector move` here.

> src/hotspot/cpu/aarch64/aarch64_neon.ad line 5271:
>
>> 5269:     } else {
>> 5270:       if (sh >= 8) sh = 7;
>> 5271:       __ sshr(as_FloatRegister($dst$$reg), __ T8B,
>
> I think we should add an assert to make sure 0 is not passed to the assembler.

Agree, I'll do this.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has updated the pull request incrementally with one additional commit since the last revision:

  add assertion in the assembler

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/c44bebb0..8439f167

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=00-01

  Stats: 1 line in 1 file changed: 1 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2]

Andrew Haley-2
On Tue, 9 Feb 2021 09:13:47 GMT, Dong Bo <[hidden email]> wrote:

>> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
>> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>>
>> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
>> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
>> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
>> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
>> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>>
>> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
>> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>>
>> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
>> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.
>
> Dong Bo has updated the pull request incrementally with one additional commit since the last revision:
>
>   add assertion in the assembler

src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057:

> 2055:              as_FloatRegister($src$$reg), as_FloatRegister($src$$reg));
> 2056:     } else {ifelse($4, B,`
> 2057:       if (sh >= 8) sh = 7;

I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2]

Ningsheng Jian-2
In reply to this post by Dong Bo
On Tue, 9 Feb 2021 08:53:14 GMT, Dong Bo <[hidden email]> wrote:

>> src/hotspot/cpu/aarch64/aarch64_neon.ad line 5285:
>>
>>> 5283:   ins_encode %{
>>> 5284:     int sh = (int)$shift$$constant;
>>> 5285:     if (sh == 0) {
>>
>> If src and dst are the same reg, no need to emit code. Or maybe c2 can even be improved to optimize this (sh=0 case) out?
>
>> If src and dst are the same reg, no need to emit code.
>
> If we want to do this enhancement, I think we need do it for left shifting and all SVE left/right shifting as well for completeness.
>
>> Or maybe c2 can even be improved to optimize this (sh=0 case) out?
>
> We can add code in `Ideal` to optimize it to ORR, but I'm not sure `orr` performs better than `shift` on other platforms.
> Seems we have to created a generic new node to do `vector move` here.

I think with proper optimization, no move is required. But I agree it's beyond the scope of this patch. I will have a look.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v3]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:

  back out AD modifications and handle zero shift in assembler

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/8439f167..af3f2a15

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=01-02

  Stats: 284 lines in 3 files changed: 19 ins; 143 del; 122 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2]

Dong Bo
In reply to this post by Andrew Haley-2
On Tue, 9 Feb 2021 09:29:50 GMT, Andrew Haley <[hidden email]> wrote:

>> Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR.
>
> src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057:
>
>> 2055:              as_FloatRegister($src$$reg), as_FloatRegister($src$$reg));
>> 2056:     } else {ifelse($4, B,`
>> 2057:       if (sh >= 8) sh = 7;
>
> I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow.

I backed out the modifications of `aarch64_neon.ad` and `aarch64_neon_ad.m4`.
The `shift == 0` case is handled by the assembler now. Verified with the regression tests.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v4]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has updated the pull request incrementally with one additional commit since the last revision:

  generate add if shift == 0 for accumulation and fix some test code

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/af3f2a15..a7b72b0a

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=02-03

  Stats: 127 lines in 2 files changed: 27 ins; 0 del; 100 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v5]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has updated the pull request incrementally with one additional commit since the last revision:

  fix windows build failure

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/a7b72b0a..d75ee99e

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=04
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=03-04

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:

 - add tests in aarch64-asmtest.py
 - fix windows operator precedence error and cleanup testcase
 - Merge branch 'master' into aarch64_vector_api_shift
 - fix windows build failure
 - generate add if shift == 0 for accumulation and fix some test code
 - back out AD modifications and handle zero shift in assembler

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/d75ee99e..d746f209

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=05
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=04-05

  Stats: 22519 lines in 709 files changed: 13758 ins; 4768 del; 3993 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v2]

Dong Bo
In reply to this post by Dong Bo
On Wed, 10 Feb 2021 02:59:24 GMT, Dong Bo <[hidden email]> wrote:

>> src/hotspot/cpu/aarch64/aarch64_neon_ad.m4 line 2057:
>>
>>> 2055:              as_FloatRegister($src$$reg), as_FloatRegister($src$$reg));
>>> 2056:     } else {ifelse($4, B,`
>>> 2057:       if (sh >= 8) sh = 7;
>>
>> I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow.
>
> I backed out the modifications of `aarch64_neon.ad` and `aarch64_neon_ad.m4`.
> The `shift == 0` case is handled by the assembler now. Verified with the regression tests.

> I think it would be possible to move some of this logic from the AD file into MacroAssembler, with macros to generate the appropriate instruction based on their arguments. This might be cleaner: the logic here is very hard to follow.

Hi, I moved the logic to the assembler.
The assembler will generate different instructions based on the value of `shift`.
If `shift == 0` and need not to accumulte, generated a `mov`.
If `shift == 0` and need to accumulte, generated an `add`.

Also added tests in `aarch64-asmtest.py` to verify the assembler modifications.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6]

Andrew Haley-2
In reply to this post by Dong Bo
On Thu, 18 Feb 2021 07:57:25 GMT, Dong Bo <[hidden email]> wrote:

>> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
>> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>>
>> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
>> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
>> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
>> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
>> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>>
>> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
>> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>>
>> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
>> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.
>
> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains six additional commits since the last revision:
>
>  - add tests in aarch64-asmtest.py
>  - fix windows operator precedence error and cleanup testcase
>  - Merge branch 'master' into aarch64_vector_api_shift
>  - fix windows build failure
>  - generate add if shift == 0 for accumulation and fix some test code
>  - back out AD modifications and handle zero shift in assembler

src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2709:

> 2707:       f(encodedShift, 22, 16); f(opc2, 15, 10), rf(Vn, 5), rf(Vd, 0);   \
> 2708:     }                                                                   \
> 2709:   }

Is this correct, according to the definition in the Architecture Reference Manual? It doesn't look like it to me. Assembler methods should generate bit patterns exactly as defined in the Manual. This logic should be in a MacroAssembler method.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:

  handle zero shift in macro assembler

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/d746f209..1aba5629

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=06
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=05-06

  Stats: 530 lines in 4 files changed: 27 ins; 174 del; 329 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v6]

Dong Bo
In reply to this post by Andrew Haley-2
On Thu, 18 Feb 2021 09:35:07 GMT, Andrew Haley <[hidden email]> wrote:

>> Dong Bo has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase.
>
> src/hotspot/cpu/aarch64/assembler_aarch64.hpp line 2694:
>
>> 2692:     assert((1 << ((T>>1)+3)) > shift, "Invalid Shift value");           \
>> 2693:     if (shift == 0) {                                                   \
>> 2694:       bool accumulate = ((opc2 & 0b100) != 0);                          \
>
> Is this correct, according to the definition in the Architecture Reference Manual? It doesn't look like it to me. Assembler methods should generate bit patterns exactly as defined in the Manual. This logic should be in a MacroAssembler method.

Hi, I moved the logic into MacroAssembler.
The assert is kept to make sure that we would never pass a zero right shift to assemlber.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7]

Andrew Haley-2
In reply to this post by Dong Bo
On Fri, 19 Feb 2021 03:13:12 GMT, Dong Bo <[hidden email]> wrote:

>> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
>> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>>
>> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
>> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
>> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
>> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
>> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>>
>> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
>> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>>
>> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
>> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.
>
> Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
>   handle zero shift in macro assembler

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 554:

> 552:
> 553:   WRAP(usra) WRAP(ssra)
> 554: #undef WRAP

Are ssra and usra tested by anything? I don't seem them accessed in the test case.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 531:

> 529:
> 530:   // NEON shift instructions
> 531: #define WRAP(INSN)                                                                \

This comment should  be

  // AdvSIMD shift by immediate.    
  // These are "user friendly" variants which allow a shift count of 0.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v8]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has updated the pull request incrementally with two additional commits since the last revision:

 - fix trailing whitespace
 - split ssra/usra tests

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/1aba5629..ba8dc5ac

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=07
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=06-07

  Stats: 469 lines in 3 files changed: 352 ins; 112 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v7]

Dong Bo
In reply to this post by Andrew Haley-2
On Fri, 19 Feb 2021 14:42:15 GMT, Andrew Haley <[hidden email]> wrote:

>> Dong Bo has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR.
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 554:
>
>> 552:
>> 553:   WRAP(usra) WRAP(ssra)
>> 554: #undef WRAP
>
> Are ssra and usra tested by anything? I don't seem them accessed in the test case.

Updated. The `ssra/usra` are accessed by tests in `TestVectorShiftImmAndAccumulate.java`.
Manually injected error by changing `addv` to `subv` if shifting right and accumulating with 0, the tests failed as expected.

The `vba.add(vbb.lanewise(SHIFT, Imm))` pattern in `TestVectorShiftImmAndAccumulate.java` are actually the same with the original code in `TestVectorShiftImm.java`.
As of now, I have no idea why `ssra/usra` are not accessed by the previous test code.
The `vba.add(vbb.lanewise(SHIFT, Imm))` pattern should match `ssra/usra` anyway.
I think we need a separate investigation.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9]

Dong Bo
In reply to this post by Dong Bo
> In vectorAPI, when right-shifting a vector with a shift equals to the element width, the shift is transformed to zero,
> see `src/jdk.incubator.vector/share/classes/jdk/incubator/vector/VectorOperators.java`:
>     /** Produce {@code a>>>(n&(ESIZE*8-1))}. Integral only. */
>     public static final /*bitwise*/ Binary LSHR = binary("LSHR", ">>>", VectorSupport.VECTOR_OP_URSHIFT, VO_SHIFT);
>
> The aarch64 assembler generates wrong or illegal instructions in this case, e.g. for the JAVA code below on aarch64,
> assembler call `__ ushr(dst, __ T8B, src, 0)`, the instruction generated is not `ushr dst.8B, src.8B, 0`, but `ushr dst.4H, src.4H, 16` instead.
> According to local tests, JVM gives wrong results for byte/short and crashes with SIGILL for integer/long.
> ByteVector vba = ByteVector.fromArray(byte64SPECIES, bytesA, 8 * i);
> vbb.lanewise(VectorOperators.ASHR, 8).intoArray(arrBytes, 8 * i);
>
> The legal right shift amount should be in the range 1 to the element width in bits on aarch64:
> https://developer.arm.com/documentation/dui0801/f/A64-SIMD-Vector-Instructions/USHR--vector-?lang=en
>
> This fix handles zero shift separately. If the shift is zero, it generates `orr` for right shift, `addv` for right shift and accumulate.
> Verified with linux-aarch64-server-fastdebug, tier1. Also created a jtreg to reproduce the issue and for regression tests.

Dong Bo has updated the pull request incrementally with two additional commits since the last revision:

 - whitespace
 - Rebase tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg case, assembly print verified.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2472/files
  - new: https://git.openjdk.java.net/jdk/pull/2472/files/ba8dc5ac..e2dc7b83

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=08
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2472&range=07-08

  Stats: 903 lines in 2 files changed: 320 ins; 387 del; 196 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2472.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2472/head:pull/2472

PR: https://git.openjdk.java.net/jdk/pull/2472
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261142: AArch64: Incorrect instruction encoding when right-shifting vectors with shift amount equals to the element width [v9]

Dong Bo
In reply to this post by Ningsheng Jian-2
On Tue, 9 Feb 2021 07:50:45 GMT, Ningsheng Jian <[hidden email]> wrote:

>> Dong Bo has updated the pull request incrementally with two additional commits since the last revision:
>>
>>  - whitespace
>>  - Rebase tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg case, assembly print verified.
>
> Thanks for the fix.

Hi, @theRealAph

I've rebased the tests so that sshr/ushr/sshl/ssra/usra are accessed in one jtreg `test/hotspot/jtreg/compiler/vectorapi/TestVectorShiftImm.java`.
Local tests by manually injected error shows all instructions are covered by the jtreg case. Suggestions?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2472
12