8261671: X86 I2L conversion can be skipped for certain masked positive values
For the following expression: (long)(value & mask) Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq Before: Benchmark Mode Cnt Score Error Units SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op After: Benchmark Mode Cnt Score Error Units SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op Signed-off-by: Marcus G K Williams <[hidden email]> ------------- Commit messages: - 8261671: Skip unnecessary Int2L conversions Changes: https://git.openjdk.java.net/jdk/pull/2590/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261671 Stats: 37 lines in 3 files changed: 37 ins; 0 del; 0 mod Patch: https://git.openjdk.java.net/jdk/pull/2590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590 PR: https://git.openjdk.java.net/jdk/pull/2590 |
On Tue, 16 Feb 2021 16:13:48 GMT, Marcus G K Williams <[hidden email]> wrote:
> 8261671: X86 I2L conversion can be skipped for certain masked positive values > > For the following expression: > (long)(value & mask) > Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq > > Before: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op > > > After: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op > > Signed-off-by: Marcus G K Williams <[hidden email]> src/hotspot/cpu/x86/x86_64.ad line 9172: > 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr) > 9171: %{ > 9172: predicate(VM_Version::supports_bmi2()); Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that? ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 |
In reply to this post by Marcus G K Williams
> 8261671: X86 I2L conversion can be skipped for certain masked positive values
> > For the following expression: > (long)(value & mask) > Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq > > Before: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op > > > After: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op > > Signed-off-by: Marcus G K Williams <[hidden email]> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: Update convI2LAndI_reg_immIbitmask w/ is_Intel Signed-off-by: Marcus G K Williams <[hidden email]> ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2590/files - new: https://git.openjdk.java.net/jdk/pull/2590/files/28cbd6cb..5dc976e2 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=00-01 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590 PR: https://git.openjdk.java.net/jdk/pull/2590 |
In reply to this post by Florian Weimer-4
On Tue, 16 Feb 2021 17:08:13 GMT, Florian Weimer <[hidden email]> wrote:
>> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: >> >> Update convI2LAndI_reg_immIbitmask w/ is_Intel >> >> Signed-off-by: Marcus G K Williams <[hidden email]> > > src/hotspot/cpu/x86/x86_64.ad line 9172: > >> 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr) >> 9171: %{ >> 9172: predicate(VM_Version::supports_bmi2()); > > Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that? Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());` ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 |
On Tue, 16 Feb 2021 17:39:34 GMT, Marcus G K Williams <[hidden email]> wrote:
>> src/hotspot/cpu/x86/x86_64.ad line 9172: >> >>> 9170: instruct convI2LAndI_reg_immIbitmask(rRegL dst, rRegI src, immI_bitmask mask, rRegI tmp, rFlagsReg cr) >>> 9171: %{ >>> 9172: predicate(VM_Version::supports_bmi2()); >> >> Agner's optimization guide says that BZHI uses microcode on Zen 2 and earlier, so perhaps the predicate should reflect that? > > Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());` I'm so sorry, I confused this with `PDEP` and `PEXT`. `BZHI` is actually fine on Zen. So the original code is correct. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 |
In reply to this post by Marcus G K Williams
> 8261671: X86 I2L conversion can be skipped for certain masked positive values
> > For the following expression: > (long)(value & mask) > Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq > > Before: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op > > > After: > Benchmark Mode Cnt Score Error Units > SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op > > Signed-off-by: Marcus G K Williams <[hidden email]> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: Revert to predicate supports_bmi2 Signed-off-by: Marcus G K Williams <[hidden email]> ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2590/files - new: https://git.openjdk.java.net/jdk/pull/2590/files/5dc976e2..2717297b Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2590&range=01-02 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2590.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2590/head:pull/2590 PR: https://git.openjdk.java.net/jdk/pull/2590 |
In reply to this post by Florian Weimer-4
On Wed, 17 Feb 2021 05:56:04 GMT, Florian Weimer <[hidden email]> wrote:
>> Added `predicate(VM_Version::supports_bmi2() && VM_Version::is_intel());` > > I'm so sorry, I confused this with `PDEP` and `PEXT`. `BZHI` is actually fine on Zen. So the original code is correct. No worries @fweimer. I've reverted to the original. I looked at Agner's Optimization Guide but I couldn't find a place that said Zen 2 and earlier used microcode for BZHI. I guess we know why now 😃. Happy to hear any other review comments. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 |
In reply to this post by Marcus G K Williams
On Wed, 17 Feb 2021 17:51:18 GMT, Marcus G K Williams <[hidden email]> wrote:
>> 8261671: X86 I2L conversion can be skipped for certain masked positive values >> >> For the following expression: >> (long)(value & mask) >> Where value is of int type and mask is constant (power of two – 1), we can directly generate bzhi instruction to zero the upper bits instead of doing an andl, followed by movslq >> >> Before: >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 10.679 ± 1.496 ns/op >> >> >> After: >> Benchmark Mode Cnt Score Error Units >> SkipIntToLongCast.skipMaskedSmallPositiveCast avgt 15 7.870 ± 0.067 ns/op >> >> Signed-off-by: Marcus G K Williams <[hidden email]> > > Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision: > > Revert to predicate supports_bmi2 > > Signed-off-by: Marcus G K Williams <[hidden email]> Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` ------------- Changes requested by kvn (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2590 |
On Wed, 17 Feb 2021 18:33:16 GMT, Vladimir Kozlov <[hidden email]> wrote:
> Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` I'm taking a look at a test to verify generated code like the BMI1 test you suggest above. There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 |
On Wed, 17 Feb 2021 18:54:27 GMT, Marcus G K Williams <[hidden email]> wrote:
> > Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` > > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` > > I'm taking a look at a test to verify generated code like the BMI1 test you suggest above. > > There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java It does not cover your case which have `AND` operation. But I am fine if you add your case to it (or to other test, like BMI1.java) instead of creating new test file. > > Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example. Missing tests in previous changes does not mean we should never create one. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 |
On Wed, 17 Feb 2021 20:38:45 GMT, Vladimir Kozlov <[hidden email]> wrote:
> > > Please, add a test which verifies result of generated code for different ranges of mask (especially corner cases). See `compiler//codegen/BMI1.java` > > > There are also example of verification of generated assembler in `compiler//intrinsics/bmi/` > > > > > > I'm taking a look at a test to verify generated code like the BMI1 test you suggest above. > > There is a current test that verifies operation indirectly (both long to int and int to long): test/hotspot/jtreg/compiler/c2/TestSkipLongToIntCast.java > > It does not cover your case which have `AND` operation. > But I am fine if you add your case to it (or to other test, like BMI1.java) instead of creating new test file. > > > Do we need both? A number of similar patches have been merged without, https://hg.openjdk.java.net/jdk/jdk/rev/d0f55423e913 for example. > > Missing tests in previous changes does not mean we should never create one. Thanks @vnkozlov. In the process of testing both a BMI2.java and compiler/intrinsics/bmi/verifycode/BzhiTest.java addition I have written to test this change. ------------- PR: https://git.openjdk.java.net/jdk/pull/2590 |
Free forum by Nabble | Edit this page |