this is a second pre-review as part of JEP “Improve performance of
String and Array operations on AArch64” for another improved
intrinsics to get early feedback.
Please pre-review patch for 8189101 - “AARCH64: AARCH64: string
compare intrinsic doesn't use prefetch”
This patch moves code for long string processing to a stub and
reorganize it. For large strings code was re-organized, added
large 64-byte unrolled loops and prefetch. Webrev is available at
Surpisingly, it helps a bit for small strings, because code for
string comparison node is now shorter, so, less icache lines
needed to be populated to execute it.
A benchmark was developed to measure performance , which
contains 4 cases with various sizes: LL (latin1 vs latin1), LU
(latin1 vs utf), UL (utf vs latin1) and UU (utf vs utf). I can see
up to x5 performance on systems without h/w prefetcher (ThunderX)
and up to 40% improvement on system with h/w prefetcher(Cortex
Raw performance numbers are at . Charts for performance numbers
above are: Cortex A53  and ThunderX .
Testing: I've run java/lang/String (contains test for
String::compareTo method) jtreg tests with both Xmixed and Xcomp
modes and found no regressions.
Any additional numbers on other systems are welcome, as well as
early feedback on the code.