[10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in interpreter and C1

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in interpreter and C1

Dmitry Chuyko-2
Hello,

Please review an improvement of CRC32C calculation on AArch64. The
implementation is based on JDK-8155162 [1] and the code for CRC32.

Intrinsics for array / byte buffer and direct byte buffer are enabled in
C1 on AArch64, LIRGenerator::do_update_CRC32C calculates parameters and
calls StubRoutines::updateBytesCRC32C().
Template interpreter now also generates
TemplateInterpreterGenerator::generate_CRC32C_updateBytes_entry where it
calculates parameters and jumps to StubRoutines::updateBytesCRC32C().

rfe: https://bugs.openjdk.java.net/browse/JDK-8189745
webrev: http://cr.openjdk.java.net/~dchuyko/8189745/webrev.00/
benchmark:
http://cr.openjdk.java.net/~dchuyko/8189745/crc32c/CRC32CBench.java

Performance results for T88 [2] show ~7x boost in C1 and ~30-50x boost
in interpreter.

For testing I made comparison of CRC32C result sets in C1 and
interpreter for both array and direct byte buffer with zero and non-zero
offset.

-Dmitry

[1] https://bugs.openjdk.java.net/browse/JDK-8155162
[2]
https://bugs.openjdk.java.net/browse/JDK-8189745?focusedCommentId=14127141&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14127141


Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in interpreter and C1

Andrew Haley
Hi,

On 31/10/17 16:01, Dmitry Chuyko wrote:

> Please review an improvement of CRC32C calculation on AArch64. The
> implementation is based on JDK-8155162 [1] and the code for CRC32.
>
> Intrinsics for array / byte buffer and direct byte buffer are enabled in
> C1 on AArch64, LIRGenerator::do_update_CRC32C calculates parameters and
> calls StubRoutines::updateBytesCRC32C().
> Template interpreter now also generates
> TemplateInterpreterGenerator::generate_CRC32C_updateBytes_entry where it
> calculates parameters and jumps to StubRoutines::updateBytesCRC32C().
>
> rfe: https://bugs.openjdk.java.net/browse/JDK-8189745
> webrev: http://cr.openjdk.java.net/~dchuyko/8189745/webrev.00/
> benchmark:
> http://cr.openjdk.java.net/~dchuyko/8189745/crc32c/CRC32CBench.java
>
> Performance results for T88 [2] show ~7x boost in C1 and ~30-50x boost
> in interpreter.
>
> For testing I made comparison of CRC32C result sets in C1 and
> interpreter for both array and direct byte buffer with zero and non-zero
> offset.

That looks good to me, thanks.

--
Andrew Haley
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671
Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR: 8189745 - AARCH64: Use CRC32C intrinsic code in interpreter and C1

Dmitry Chuyko-2
Andrew, thanks for an instant review.

Dmitrij Pochepko has pushed the change.

-Dmitry


On 10/31/2017 08:25 PM, Andrew Haley wrote:

> Hi,
>
> On 31/10/17 16:01, Dmitry Chuyko wrote:
>
>> Please review an improvement of CRC32C calculation on AArch64. The
>> implementation is based on JDK-8155162 [1] and the code for CRC32.
>>
>> Intrinsics for array / byte buffer and direct byte buffer are enabled in
>> C1 on AArch64, LIRGenerator::do_update_CRC32C calculates parameters and
>> calls StubRoutines::updateBytesCRC32C().
>> Template interpreter now also generates
>> TemplateInterpreterGenerator::generate_CRC32C_updateBytes_entry where it
>> calculates parameters and jumps to StubRoutines::updateBytesCRC32C().
>>
>> rfe: https://bugs.openjdk.java.net/browse/JDK-8189745
>> webrev: http://cr.openjdk.java.net/~dchuyko/8189745/webrev.00/
>> benchmark:
>> http://cr.openjdk.java.net/~dchuyko/8189745/crc32c/CRC32CBench.java
>>
>> Performance results for T88 [2] show ~7x boost in C1 and ~30-50x boost
>> in interpreter.
>>
>> For testing I made comparison of CRC32C result sets in C1 and
>> interpreter for both array and direct byte buffer with zero and non-zero
>> offset.
> That looks good to me, thanks.
>