[10] RFR (S): C2: missing strength reduction of Math.pow(x, 2.0D) to x*x

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[10] RFR (S): C2: missing strength reduction of Math.pow(x, 2.0D) to x*x

Vladimir Ivanov
http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/
https://bugs.openjdk.java.net/browse/JDK-8190869

Math.pow() was completely rewritten on 9 and important special case
(exponent == 2.0) was missed. It caused severe performance regression
(up to 10x) on some numeric code.

The fix is two-fold:
   * add special case in C2 to transform pow(x,2.0) into x*x;
   * for consistency of results between different execution modes
(interpreter, C1, and C2), add special case into generated stub at
MacroAssembler::fast_pow().

Some clarifications follow.

Testing:
   * microbenchmark
   * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress)
   * test case from 8189172 on consistency of results

The fix doesn't cover all possible cases (e.g., when exponent is not a
constant during parsing, but becomes such later). I experimented with
more elaborate fix, but wasn't satisfied with the results and chose
simpler route.

I see 2 alternatives:

(1) Introduce macro node PowD and either transform it into MulD if
exponent == 2.0D or expand it into CallLeafNode. The downside is that
PowD should be CallNode and replacing it with a MulD requires some IR
tweaking around it.

(2) Introduce a runtime check before calling the stub (prototype [1]).
If exponent becomes 2.0 later, the check is folded and stub call is
removed. The downside is that when exponent doesn't turn into 2.0, the
check stays and duplicates the check in the stub. Unfortunately, there's
no way to remove it from the stub, because interpreter & C1 relies on
it. Otherwise, inconsistent computation results become possible (like
JDK-8189172 [2]).

So, I decided to go with a simple fix and cover most common case
(explicit usage of 2.0 with Math.pow) for now.

Best regards,
Vladimir Ivanov

[1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check
[2] https://bugs.openjdk.java.net/browse/JDK-8189172
Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR (S): C2: missing strength reduction of Math.pow(x, 2.0D) to x*x

Vladimir Kozlov
I agree with suggested fix. What check we have before 9 and where was it?

Why do you need to move value to tmp1?

+  movdq(tmp1, xmm1);
+  cmp64(tmp1, ExternalAddress(DOUBLE2));

Thanks,
Vladimir

On 12/12/17 11:16 AM, Vladimir Ivanov wrote:

> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/
> https://bugs.openjdk.java.net/browse/JDK-8190869
>
> Math.pow() was completely rewritten on 9 and important special case
> (exponent == 2.0) was missed. It caused severe performance regression
> (up to 10x) on some numeric code.
>
> The fix is two-fold:
>    * add special case in C2 to transform pow(x,2.0) into x*x;
>    * for consistency of results between different execution modes
> (interpreter, C1, and C2), add special case into generated stub at
> MacroAssembler::fast_pow().
>
> Some clarifications follow.
>
> Testing:
>    * microbenchmark
>    * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress)
>    * test case from 8189172 on consistency of results
>
> The fix doesn't cover all possible cases (e.g., when exponent is not a
> constant during parsing, but becomes such later). I experimented with
> more elaborate fix, but wasn't satisfied with the results and chose
> simpler route.
>
> I see 2 alternatives:
>
> (1) Introduce macro node PowD and either transform it into MulD if
> exponent == 2.0D or expand it into CallLeafNode. The downside is that
> PowD should be CallNode and replacing it with a MulD requires some IR
> tweaking around it.
>
> (2) Introduce a runtime check before calling the stub (prototype [1]).
> If exponent becomes 2.0 later, the check is folded and stub call is
> removed. The downside is that when exponent doesn't turn into 2.0, the
> check stays and duplicates the check in the stub. Unfortunately, there's
> no way to remove it from the stub, because interpreter & C1 relies on
> it. Otherwise, inconsistent computation results become possible (like
> JDK-8189172 [2]).
>
> So, I decided to go with a simple fix and cover most common case
> (explicit usage of 2.0 with Math.pow) for now.
>
> Best regards,
> Vladimir Ivanov
>
> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check
> [2] https://bugs.openjdk.java.net/browse/JDK-8189172
Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR (S): C2: missing strength reduction of Math.pow(x, 2.0D) to x*x

Vladimir Ivanov
Thanks for review, Vladimir.

> I agree with suggested fix. What check we have before 9 and where was it?

It was quite complicated. There were LibraryCallKit::inline_pow() [1] &
LibraryCallKit::finish_pow_exp() [2] with a runtime check for 2.0
exponent, multiple PowDs on fast paths, and a call into
SharedRuntime::dpow on slow path. PowD had matcher rules [3] which
produced x87-based implementation.

> Why do you need to move value to tmp1?
>
> +  movdq(tmp1, xmm1);
> +  cmp64(tmp1, ExternalAddress(DOUBLE2));

We want to compare 64-bit values and move the exponent from XMM to GP
register.

void Assembler::movdq(Register dst, XMMRegister src) {
void MacroAssembler::cmp64(Register src1, AddressLiteral src2) {

Best regards,
Vladimir Ivanov

[1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.72
[2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.16
[3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l13.7

> On 12/12/17 11:16 AM, Vladimir Ivanov wrote:
>> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/
>> https://bugs.openjdk.java.net/browse/JDK-8190869
>>
>> Math.pow() was completely rewritten on 9 and important special case
>> (exponent == 2.0) was missed. It caused severe performance regression
>> (up to 10x) on some numeric code.
>>
>> The fix is two-fold:
>>    * add special case in C2 to transform pow(x,2.0) into x*x;
>>    * for consistency of results between different execution modes
>> (interpreter, C1, and C2), add special case into generated stub at
>> MacroAssembler::fast_pow().
>>
>> Some clarifications follow.
>>
>> Testing:
>>    * microbenchmark
>>    * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress)
>>    * test case from 8189172 on consistency of results
>>
>> The fix doesn't cover all possible cases (e.g., when exponent is not a
>> constant during parsing, but becomes such later). I experimented with
>> more elaborate fix, but wasn't satisfied with the results and chose
>> simpler route.
>>
>> I see 2 alternatives:
>>
>> (1) Introduce macro node PowD and either transform it into MulD if
>> exponent == 2.0D or expand it into CallLeafNode. The downside is that
>> PowD should be CallNode and replacing it with a MulD requires some IR
>> tweaking around it.
>>
>> (2) Introduce a runtime check before calling the stub (prototype [1]).
>> If exponent becomes 2.0 later, the check is folded and stub call is
>> removed. The downside is that when exponent doesn't turn into 2.0, the
>> check stays and duplicates the check in the stub. Unfortunately, there's
>> no way to remove it from the stub, because interpreter & C1 relies on
>> it. Otherwise, inconsistent computation results become possible (like
>> JDK-8189172 [2]).
>>
>> So, I decided to go with a simple fix and cover most common case
>> (explicit usage of 2.0 with Math.pow) for now.
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check
>> [2] https://bugs.openjdk.java.net/browse/JDK-8189172
Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR (S): C2: missing strength reduction of Math.pow(x, 2.0D) to x*x

Vladimir Kozlov
On 12/13/17 6:58 AM, Vladimir Ivanov wrote:
> Thanks for review, Vladimir.
>
>> I agree with suggested fix. What check we have before 9 and where was it?
>
> It was quite complicated. There were LibraryCallKit::inline_pow() [1] &
> LibraryCallKit::finish_pow_exp() [2] with a runtime check for 2.0
> exponent, multiple PowDs on fast paths, and a call into
> SharedRuntime::dpow on slow path. PowD had matcher rules [3] which
> produced x87-based implementation.

Okay.

>
>> Why do you need to move value to tmp1?
>>
>> +  movdq(tmp1, xmm1);
>> +  cmp64(tmp1, ExternalAddress(DOUBLE2));
>
> We want to compare 64-bit values and move the exponent from XMM to GP
> register.

Of cause, you need to use GP register to compare. I thought tmp1 also xmm.

Looks good.

Thanks,
Vladimir

>
> void Assembler::movdq(Register dst, XMMRegister src) {
> void MacroAssembler::cmp64(Register src1, AddressLiteral src2) {
>
> Best regards,
> Vladimir Ivanov
>
> [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.72
> [2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.16
> [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l13.7
>
>> On 12/12/17 11:16 AM, Vladimir Ivanov wrote:
>>> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/
>>> https://bugs.openjdk.java.net/browse/JDK-8190869
>>>
>>> Math.pow() was completely rewritten on 9 and important special case
>>> (exponent == 2.0) was missed. It caused severe performance regression
>>> (up to 10x) on some numeric code.
>>>
>>> The fix is two-fold:
>>>    * add special case in C2 to transform pow(x,2.0) into x*x;
>>>    * for consistency of results between different execution modes
>>> (interpreter, C1, and C2), add special case into generated stub at
>>> MacroAssembler::fast_pow().
>>>
>>> Some clarifications follow.
>>>
>>> Testing:
>>>    * microbenchmark
>>>    * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress)
>>>    * test case from 8189172 on consistency of results
>>>
>>> The fix doesn't cover all possible cases (e.g., when exponent is not a
>>> constant during parsing, but becomes such later). I experimented with
>>> more elaborate fix, but wasn't satisfied with the results and chose
>>> simpler route.
>>>
>>> I see 2 alternatives:
>>>
>>> (1) Introduce macro node PowD and either transform it into MulD if
>>> exponent == 2.0D or expand it into CallLeafNode. The downside is that
>>> PowD should be CallNode and replacing it with a MulD requires some IR
>>> tweaking around it.
>>>
>>> (2) Introduce a runtime check before calling the stub (prototype [1]).
>>> If exponent becomes 2.0 later, the check is folded and stub call is
>>> removed. The downside is that when exponent doesn't turn into 2.0, the
>>> check stays and duplicates the check in the stub. Unfortunately, there's
>>> no way to remove it from the stub, because interpreter & C1 relies on
>>> it. Otherwise, inconsistent computation results become possible (like
>>> JDK-8189172 [2]).
>>>
>>> So, I decided to go with a simple fix and cover most common case
>>> (explicit usage of 2.0 with Math.pow) for now.
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check
>>> [2] https://bugs.openjdk.java.net/browse/JDK-8189172
Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR (S): C2: missing strength reduction of Math.pow(x, 2.0D) to x*x

Vladimir Ivanov
Thanks, Vladimir!

Best regards,
Vladimir Ivanov

On 12/13/17 7:25 PM, Vladimir Kozlov wrote:

> On 12/13/17 6:58 AM, Vladimir Ivanov wrote:
>> Thanks for review, Vladimir.
>>
>>> I agree with suggested fix. What check we have before 9 and where was
>>> it?
>>
>> It was quite complicated. There were LibraryCallKit::inline_pow() [1]
>> & LibraryCallKit::finish_pow_exp() [2] with a runtime check for 2.0
>> exponent, multiple PowDs on fast paths, and a call into
>> SharedRuntime::dpow on slow path. PowD had matcher rules [3] which
>> produced x87-based implementation.
>
> Okay.
>
>>
>>> Why do you need to move value to tmp1?
>>>
>>> +  movdq(tmp1, xmm1);
>>> +  cmp64(tmp1, ExternalAddress(DOUBLE2));
>>
>> We want to compare 64-bit values and move the exponent from XMM to GP
>> register.
>
> Of cause, you need to use GP register to compare. I thought tmp1 also xmm.
>
> Looks good.
>
> Thanks,
> Vladimir
>
>>
>> void Assembler::movdq(Register dst, XMMRegister src) {
>> void MacroAssembler::cmp64(Register src1, AddressLiteral src2) {
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> [1] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.72
>> [2] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l22.16
>> [3] http://hg.openjdk.java.net/jdk9/jdk9/hotspot/rev/94849fb8ce93#l13.7
>>
>>> On 12/12/17 11:16 AM, Vladimir Ivanov wrote:
>>>> http://cr.openjdk.java.net/~vlivanov/8190869/webrev.00/
>>>> https://bugs.openjdk.java.net/browse/JDK-8190869
>>>>
>>>> Math.pow() was completely rewritten on 9 and important special case
>>>> (exponent == 2.0) was missed. It caused severe performance regression
>>>> (up to 10x) on some numeric code.
>>>>
>>>> The fix is two-fold:
>>>>    * add special case in C2 to transform pow(x,2.0) into x*x;
>>>>    * for consistency of results between different execution modes
>>>> (interpreter, C1, and C2), add special case into generated stub at
>>>> MacroAssembler::fast_pow().
>>>>
>>>> Some clarifications follow.
>>>>
>>>> Testing:
>>>>    * microbenchmark
>>>>    * hs-precheckin-comp, hs-tier1, hs-tier2 (in progress)
>>>>    * test case from 8189172 on consistency of results
>>>>
>>>> The fix doesn't cover all possible cases (e.g., when exponent is not a
>>>> constant during parsing, but becomes such later). I experimented with
>>>> more elaborate fix, but wasn't satisfied with the results and chose
>>>> simpler route.
>>>>
>>>> I see 2 alternatives:
>>>>
>>>> (1) Introduce macro node PowD and either transform it into MulD if
>>>> exponent == 2.0D or expand it into CallLeafNode. The downside is that
>>>> PowD should be CallNode and replacing it with a MulD requires some IR
>>>> tweaking around it.
>>>>
>>>> (2) Introduce a runtime check before calling the stub (prototype [1]).
>>>> If exponent becomes 2.0 later, the check is folded and stub call is
>>>> removed. The downside is that when exponent doesn't turn into 2.0, the
>>>> check stays and duplicates the check in the stub. Unfortunately,
>>>> there's
>>>> no way to remove it from the stub, because interpreter & C1 relies on
>>>> it. Otherwise, inconsistent computation results become possible (like
>>>> JDK-8189172 [2]).
>>>>
>>>> So, I decided to go with a simple fix and cover most common case
>>>> (explicit usage of 2.0 with Math.pow) for now.
>>>>
>>>> Best regards,
>>>> Vladimir Ivanov
>>>>
>>>> [1] http://cr.openjdk.java.net/~vlivanov/8190869/webrev.runtime_check
>>>> [2] https://bugs.openjdk.java.net/browse/JDK-8189172