RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values

Marcus G K Williams
8261671: X86 I2L conversion can be skipped for certain masked positive values

For the following expression:
 (long)(value & mask)
Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq

Before:
Benchmark Mode Cnt Score Error Units
SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op


After:
Benchmark Mode Cnt Score Error Units
SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op

Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Commit messages:
 - 8261671: Skip unnecessary Int2L conversions

Changes: https://git.openjdk.java.net/jdk/pull/2590/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261671
  Stats: 37 lines in 3 files changed: 37 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2590.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values

Florian Weimer-4
On Tue, 16 Feb 2021 16:13:48 GMT, Marcus G K Williams <[hidden email]> wrote:

> 8261671: X86 I2L conversion can be skipped for certain masked positive values
>
> For the following expression:
>  (long)(value & mask)
> Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq
>
> Before:
> Benchmark Mode Cnt Score Error Units
> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op
>
>
> After:
> Benchmark Mode Cnt Score Error Units
> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op
>
> Signed-off-by: Marcus G K Williams <[hidden email]>

src/hotspot/cpu/x86/x86_64.ad line 9172:

> 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr)
> 9171: %{
> 9172:   predicate(VM_Version::supports_bmi2());

Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v2]

Marcus G K Williams
In reply to this post by Marcus G K Williams
> 8261671: X86 I2L conversion can be skipped for certain masked positive values
>
> For the following expression:
>  (long)(value & mask)
> Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq
>
> Before:
> Benchmark Mode Cnt Score Error Units
> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op
>
>
> After:
> Benchmark Mode Cnt Score Error Units
> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op
>
> Signed-off-by: Marcus G K Williams <[hidden email]>

Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:

  Update convI2LAndI_reg_immIbitmask w/ is_Intel
 
  Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2590/files
  - new: https://git.openjdk.java.net/jdk/pull/2590/files/28cbd6cb..5dc976e2

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=00-01

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2590.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v2]

Marcus G K Williams
In reply to this post by Florian Weimer-4
On Tue, 16 Feb 2021 17:08:13 GMT, Florian Weimer <[hidden email]> wrote:

>> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Update convI2LAndI_reg_immIbitmask w/ is_Intel
>>  
>>   Signed-off-by: Marcus G K Williams <[hidden email]>
>
> src/hotspot/cpu/x86/x86_64.ad line 9172:
>
>> 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr)
>> 9171: %{
>> 9172:   predicate(VM_Version::supports_bmi2());
>
> Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that?

Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());`

-------------

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v2]

Florian Weimer-4
On Tue, 16 Feb 2021 17:39:34 GMT, Marcus G K Williams <[hidden email]> wrote:

>> src/hotspot/cpu/x86/x86_64.ad line 9172:
>>
>>> 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr)
>>> 9171: %{
>>> 9172:   predicate(VM_Version::supports_bmi2());
>>
>> Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that?
>
> Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());`

I'm so sorry, I confused this with `PDEP` and `PEXT`. `BZHI` is actually fine on Zen. So the original code is correct.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3]

Marcus G K Williams
In reply to this post by Marcus G K Williams
> 8261671: X86 I2L conversion can be skipped for certain masked positive values
>
> For the following expression:
>  (long)(value & mask)
> Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq
>
> Before:
> Benchmark Mode Cnt Score Error Units
> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op
>
>
> After:
> Benchmark Mode Cnt Score Error Units
> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op
>
> Signed-off-by: Marcus G K Williams <[hidden email]>

Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:

  Revert to predicate supports_bmi2
 
  Signed-off-by: Marcus G K Williams <[hidden email]>

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2590/files
  - new: https://git.openjdk.java.net/jdk/pull/2590/files/5dc976e2..2717297b

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=01-02

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2590.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3]

Marcus G K Williams
In reply to this post by Florian Weimer-4
On Wed, 17 Feb 2021 05:56:04 GMT, Florian Weimer <[hidden email]> wrote:

>> Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());`
>
> I'm so sorry, I confused this with `PDEP` and `PEXT`. `BZHI` is actually fine on Zen. So the original code is correct.

No worries @fweimer. I've reverted to the original. I looked at Agner's Optimization Guide but I couldn't find a place that said Zen 2 and earlier used microcode for BZHI. I guess we know why now 😃.

Happy to hear any other review comments.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3]

Vladimir Kozlov-2
In reply to this post by Marcus G K Williams
On Wed, 17 Feb 2021 17:51:18 GMT, Marcus G K Williams <[hidden email]> wrote:

>> 8261671: X86 I2L conversion can be skipped for certain masked positive values
>>
>> For the following expression:
>>  (long)(value & mask)
>> Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq
>>
>> Before:
>> Benchmark Mode Cnt Score Error Units
>> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op
>>
>>
>> After:
>> Benchmark Mode Cnt Score Error Units
>> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op
>>
>> Signed-off-by: Marcus G K Williams <[hidden email]>
>
> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:
>
>   Revert to predicate supports_bmi2
>  
>   Signed-off-by: Marcus G K Williams <[hidden email]>

Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java`
There are also example of verification of generated assembler in `compiler//intrinsics/bmi/`

-------------

Changes requested by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3]

Marcus G K Williams
On Wed, 17 Feb 2021 18:33:16 GMT, Vladimir Kozlov <[hidden email]> wrote:

> Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java`
> There are also example of verification of generated assembler in `compiler//intrinsics/bmi/`

I'm taking a look at a test to verify generated code like the BMI1 test you suggest above.

There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java

Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3]

Vladimir Kozlov-2
On Wed, 17 Feb 2021 18:54:27 GMT, Marcus G K Williams <[hidden email]> wrote:

> > Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java`
> > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/`
>
> I'm taking a look at a test to verify generated code like the BMI1 test you suggest above.
>
> There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java

It does not cover your case which have `AND` operation.
But I am fine if you add your case to it (or to other test, like BMI1.java) instead of creating new test file.

>
> Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example.

Missing tests in previous changes does not mean we should never create one.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2590
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261671: X86 I2L conversion can be skipped for certain masked positive values [v3]

Marcus G K Williams
On Wed, 17 Feb 2021 20:38:45 GMT, Vladimir Kozlov <[hidden email]> wrote:

> > > Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java`
> > > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/`
> >
> >
> > I'm taking a look at a test to verify generated code like the BMI1 test you suggest above.
> > There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java
>
> It does not cover your case which have `AND` operation.
> But I am fine if you add your case to it (or to other test, like BMI1.java) instead of creating new test file.
>
> > Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example.
>
> Missing tests in previous changes does not mean we should never create one.

Thanks @vnkozlov.

In the process of testing both a BMI2.java and compiler/intrinsics/bmi/verifycode/BzhiTest.java addition I have written to test this change.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2590