http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8190869 Math.pow() was completely rewritten on 9 and important special case (exponent == 2.0) was missed. It caused severe performance regression (up to 10x) on some numeric code. The fix is two-fold: * add special case in C2 to transform pow(x,2.0) into x*x; * for consistency of results between different execution modes (interpreter, C1, and C2), add special case into generated stub at MacroAssembler::fast_pow(). Some clarifications follow. Testing: * microbenchmark * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) * test case from 8189172 on consistency of results The fix doesn't cover all possible cases (e.g., when exponent is not a constant during parsing, but becomes such later). I experimented with more elaborate fix, but wasn't satisfied with the results and chose simpler route. I see 2 alternatives: (1) Introduce macro node PowD and either transform it into MulD if exponent == 2.0D or expand it into CallLeafNode. The downside is that PowD should be CallNode and replacing it with a MulD requires some IR tweaking around it. (2) Introduce a runtime check before calling the stub (prototype [1]). If exponent becomes 2.0 later, the check is folded and stub call is removed. The downside is that when exponent doesn't turn into 2.0, the check stays and duplicates the check in the stub. Unfortunately, there's no way to remove it from the stub, because interpreter & C1 relies on it. Otherwise, inconsistent computation results become possible (like JDK-8189172 [2]). So, I decided to go with a simple fix and cover most common case (explicit usage of 2.0 with Math.pow) for now. Best regards, Vladimir Ivanov [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check [2] https://bugs.openjdk.java.net/browse/JDK-8189172 |
I agree with suggested fix. What check we have before 9 and where was it?
Why do you need to move value to tmp1? + movdq(tmp1, xmm1); + cmp64(tmp1, ExternalAddress(DOUBLE2)); Thanks, Vladimir On 12/12/17 11:16 AM, Vladimir Ivanov wrote: > http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/ > https://bugs.openjdk.java.net/browse/JDK-8190869 > > Math.pow() was completely rewritten on 9 and important special case > (exponent == 2.0) was missed. It caused severe performance regression > (up to 10x) on some numeric code. > > The fix is two-fold: > * add special case in C2 to transform pow(x,2.0) into x*x; > * for consistency of results between different execution modes > (interpreter, C1, and C2), add special case into generated stub at > MacroAssembler::fast_pow(). > > Some clarifications follow. > > Testing: > * microbenchmark > * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) > * test case from 8189172 on consistency of results > > The fix doesn't cover all possible cases (e.g., when exponent is not a > constant during parsing, but becomes such later). I experimented with > more elaborate fix, but wasn't satisfied with the results and chose > simpler route. > > I see 2 alternatives: > > (1) Introduce macro node PowD and either transform it into MulD if > exponent == 2.0D or expand it into CallLeafNode. The downside is that > PowD should be CallNode and replacing it with a MulD requires some IR > tweaking around it. > > (2) Introduce a runtime check before calling the stub (prototype [1]). > If exponent becomes 2.0 later, the check is folded and stub call is > removed. The downside is that when exponent doesn't turn into 2.0, the > check stays and duplicates the check in the stub. Unfortunately, there's > no way to remove it from the stub, because interpreter & C1 relies on > it. Otherwise, inconsistent computation results become possible (like > JDK-8189172 [2]). > > So, I decided to go with a simple fix and cover most common case > (explicit usage of 2.0 with Math.pow) for now. > > Best regards, > Vladimir Ivanov > > [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check > [2] https://bugs.openjdk.java.net/browse/JDK-8189172 |
Thanks for review, Vladimir.
> I agree with suggested fix. What check we have before 9 and where was it? It was quite complicated. There were LibraryCallKit::inline_pow() [1] & LibraryCallKit::finish_pow_exp() [2] with a runtime check for 2.0 exponent, multiple PowDs on fast paths, and a call into SharedRuntime::dpow on slow path. PowD had matcher rules [3] which produced x87-based implementation. > Why do you need to move value to tmp1? > > + movdq(tmp1, xmm1); > + cmp64(tmp1, ExternalAddress(DOUBLE2)); We want to compare 64-bit values and move the exponent from XMM to GP register. void Assembler::movdq(Register dst, XMMRegister src) { void MacroAssembler::cmp64(Register src1, AddressLiteral src2) { Best regards, Vladimir Ivanov [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.72 [2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.16 [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l13.7 > On 12/12/17 11:16 AM, Vladimir Ivanov wrote: >> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/ >> https://bugs.openjdk.java.net/browse/JDK-8190869 >> >> Math.pow() was completely rewritten on 9 and important special case >> (exponent == 2.0) was missed. It caused severe performance regression >> (up to 10x) on some numeric code. >> >> The fix is two-fold: >> * add special case in C2 to transform pow(x,2.0) into x*x; >> * for consistency of results between different execution modes >> (interpreter, C1, and C2), add special case into generated stub at >> MacroAssembler::fast_pow(). >> >> Some clarifications follow. >> >> Testing: >> * microbenchmark >> * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) >> * test case from 8189172 on consistency of results >> >> The fix doesn't cover all possible cases (e.g., when exponent is not a >> constant during parsing, but becomes such later). I experimented with >> more elaborate fix, but wasn't satisfied with the results and chose >> simpler route. >> >> I see 2 alternatives: >> >> (1) Introduce macro node PowD and either transform it into MulD if >> exponent == 2.0D or expand it into CallLeafNode. The downside is that >> PowD should be CallNode and replacing it with a MulD requires some IR >> tweaking around it. >> >> (2) Introduce a runtime check before calling the stub (prototype [1]). >> If exponent becomes 2.0 later, the check is folded and stub call is >> removed. The downside is that when exponent doesn't turn into 2.0, the >> check stays and duplicates the check in the stub. Unfortunately, there's >> no way to remove it from the stub, because interpreter & C1 relies on >> it. Otherwise, inconsistent computation results become possible (like >> JDK-8189172 [2]). >> >> So, I decided to go with a simple fix and cover most common case >> (explicit usage of 2.0 with Math.pow) for now. >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check >> [2] https://bugs.openjdk.java.net/browse/JDK-8189172 |
On 12/13/17 6:58 AM, Vladimir Ivanov wrote:
> Thanks for review, Vladimir. > >> I agree with suggested fix. What check we have before 9 and where was it? > > It was quite complicated. There were LibraryCallKit::inline_pow() [1] & > LibraryCallKit::finish_pow_exp() [2] with a runtime check for 2.0 > exponent, multiple PowDs on fast paths, and a call into > SharedRuntime::dpow on slow path. PowD had matcher rules [3] which > produced x87-based implementation. Okay. > >> Why do you need to move value to tmp1? >> >> + movdq(tmp1, xmm1); >> + cmp64(tmp1, ExternalAddress(DOUBLE2)); > > We want to compare 64-bit values and move the exponent from XMM to GP > register. Of cause, you need to use GP register to compare. I thought tmp1 also xmm. Looks good. Thanks, Vladimir > > void Assembler::movdq(Register dst, XMMRegister src) { > void MacroAssembler::cmp64(Register src1, AddressLiteral src2) { > > Best regards, > Vladimir Ivanov > > [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.72 > [2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.16 > [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l13.7 > >> On 12/12/17 11:16 AM, Vladimir Ivanov wrote: >>> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/ >>> https://bugs.openjdk.java.net/browse/JDK-8190869 >>> >>> Math.pow() was completely rewritten on 9 and important special case >>> (exponent == 2.0) was missed. It caused severe performance regression >>> (up to 10x) on some numeric code. >>> >>> The fix is two-fold: >>> * add special case in C2 to transform pow(x,2.0) into x*x; >>> * for consistency of results between different execution modes >>> (interpreter, C1, and C2), add special case into generated stub at >>> MacroAssembler::fast_pow(). >>> >>> Some clarifications follow. >>> >>> Testing: >>> * microbenchmark >>> * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) >>> * test case from 8189172 on consistency of results >>> >>> The fix doesn't cover all possible cases (e.g., when exponent is not a >>> constant during parsing, but becomes such later). I experimented with >>> more elaborate fix, but wasn't satisfied with the results and chose >>> simpler route. >>> >>> I see 2 alternatives: >>> >>> (1) Introduce macro node PowD and either transform it into MulD if >>> exponent == 2.0D or expand it into CallLeafNode. The downside is that >>> PowD should be CallNode and replacing it with a MulD requires some IR >>> tweaking around it. >>> >>> (2) Introduce a runtime check before calling the stub (prototype [1]). >>> If exponent becomes 2.0 later, the check is folded and stub call is >>> removed. The downside is that when exponent doesn't turn into 2.0, the >>> check stays and duplicates the check in the stub. Unfortunately, there's >>> no way to remove it from the stub, because interpreter & C1 relies on >>> it. Otherwise, inconsistent computation results become possible (like >>> JDK-8189172 [2]). >>> >>> So, I decided to go with a simple fix and cover most common case >>> (explicit usage of 2.0 with Math.pow) for now. >>> >>> Best regards, >>> Vladimir Ivanov >>> >>> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check >>> [2] https://bugs.openjdk.java.net/browse/JDK-8189172 |
Thanks, Vladimir!
Best regards, Vladimir Ivanov On 12/13/17 7:25 PM, Vladimir Kozlov wrote: > On 12/13/17 6:58 AM, Vladimir Ivanov wrote: >> Thanks for review, Vladimir. >> >>> I agree with suggested fix. What check we have before 9 and where was >>> it? >> >> It was quite complicated. There were LibraryCallKit::inline_pow() [1] >> & LibraryCallKit::finish_pow_exp() [2] with a runtime check for 2.0 >> exponent, multiple PowDs on fast paths, and a call into >> SharedRuntime::dpow on slow path. PowD had matcher rules [3] which >> produced x87-based implementation. > > Okay. > >> >>> Why do you need to move value to tmp1? >>> >>> + movdq(tmp1, xmm1); >>> + cmp64(tmp1, ExternalAddress(DOUBLE2)); >> >> We want to compare 64-bit values and move the exponent from XMM to GP >> register. > > Of cause, you need to use GP register to compare. I thought tmp1 also xmm. > > Looks good. > > Thanks, > Vladimir > >> >> void Assembler::movdq(Register dst, XMMRegister src) { >> void MacroAssembler::cmp64(Register src1, AddressLiteral src2) { >> >> Best regards, >> Vladimir Ivanov >> >> [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.72 >> [2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.16 >> [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l13.7 >> >>> On 12/12/17 11:16 AM, Vladimir Ivanov wrote: >>>> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/ >>>> https://bugs.openjdk.java.net/browse/JDK-8190869 >>>> >>>> Math.pow() was completely rewritten on 9 and important special case >>>> (exponent == 2.0) was missed. It caused severe performance regression >>>> (up to 10x) on some numeric code. >>>> >>>> The fix is two-fold: >>>> * add special case in C2 to transform pow(x,2.0) into x*x; >>>> * for consistency of results between different execution modes >>>> (interpreter, C1, and C2), add special case into generated stub at >>>> MacroAssembler::fast_pow(). >>>> >>>> Some clarifications follow. >>>> >>>> Testing: >>>> * microbenchmark >>>> * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress) >>>> * test case from 8189172 on consistency of results >>>> >>>> The fix doesn't cover all possible cases (e.g., when exponent is not a >>>> constant during parsing, but becomes such later). I experimented with >>>> more elaborate fix, but wasn't satisfied with the results and chose >>>> simpler route. >>>> >>>> I see 2 alternatives: >>>> >>>> (1) Introduce macro node PowD and either transform it into MulD if >>>> exponent == 2.0D or expand it into CallLeafNode. The downside is that >>>> PowD should be CallNode and replacing it with a MulD requires some IR >>>> tweaking around it. >>>> >>>> (2) Introduce a runtime check before calling the stub (prototype [1]). >>>> If exponent becomes 2.0 later, the check is folded and stub call is >>>> removed. The downside is that when exponent doesn't turn into 2.0, the >>>> check stays and duplicates the check in the stub. Unfortunately, >>>> there's >>>> no way to remove it from the stub, because interpreter & C1 relies on >>>> it. Otherwise, inconsistent computation results become possible (like >>>> JDK-8189172 [2]). >>>> >>>> So, I decided to go with a simple fix and cover most common case >>>> (explicit usage of 2.0 with Math.pow) for now. >>>> >>>> Best regards, >>>> Vladimir Ivanov >>>> >>>> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check >>>> [2] https://bugs.openjdk.java.net/browse/JDK-8189172 |
Free forum by Nabble | Edit this page |