RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Roland Westrelin-3

http://cr.openjdk.java.net/~roland/8193130/webrev.00/

In the test case:

The loop in test1() (inlined from test1_helper()) runs for a small
number of iterations but the init value is not constant and profiling
tells the compiler it runs for a large number of iterations. First, loop
predication moves the range checks out of loop, then pre/main/post loops
are created and the main loop is unrolled 16 times. After unrolling the
init value of the main loop is known to be <= -11 and the limit is -10
(the main loop is never run). The type of the main loop iv is set to <=
-10. The main loop no longer has range checks but it still has the range
check CastII and the ConvI2L nodes that enforce that the iv should be >=
0. The CastII/ConvI2L nodes become top and some data paths in the main
loop body die. Because there's no test that compare the iv to 0 in the
loop body (they are predicates above the pre loop), no control
paths die. That leads to a broken graph.

The only way to fix this that I found, is to have predicates between the
main and pre loops that verify that the init value of the main loop
falls into the range allowed by range checks. This is tricky because
when predication runs there's no pre/main/post loops yet. The change I
propose adds skeleton predicates during loop predicates. They are a copy
of predicates on the init value of the loop but use a place holder for
the init value (Opaque1 node). They are built using an If->Opaque4->Bool
pattern that will make them go away after loop optimizations (the
Opaque4 node has a default value it takes after loop opts). When the
pre/main/post loops are created, the skeleton predicates are copied
above the main loop, the Opaque1 node is replaced by the actual init
value for the main loop, the uncommon traps of the predicate are
replaced by a Halt nodes. The copies also use Opaque4 nodes that will
cause the predicates of the main loop to be removed from the final
graph.

With this change, when the type of the init value of the main loop is
known to be -11, the main loop becomes unreachable and is optimized out.

An extra complication is that the same problem exists with iteration
splitting: after iteration splitting + unrolling the main loop could
become unreachable at runtime and CastII/ConvI2L nodes in the main loop
body die. This is fixed here by adding predicates between the main and
pre loop that enforce range check constraints similar to what is
decribed above for loop predication.

Roland.
Reply | Threaded
Open this post in threaded view
|

Re: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Tobias Hartmann-2
Hi Roland,

sorry for the delay. Very nice that you were able to find a regression test!

Here are some comments:

compile.cpp
- Why do you only remove the Opaque4Nodes after the macro expansion phase? Couldn't we avoid the additional
igvn.optimize() by merging it into line 2346?
- line 3363: Please add the comments from loopPredicate.cpp

loopPredicate.cpp
- line 780: Why is that check necessary? And wouldn't it make more sense to check this in the caller?

loopTransform.cpp
- looks reasonable but this is very hard to review :)

loopnode.hpp
- Please add a comment to CountedLoopNode::skip_predicates()
- Please use the place for "*" consistently (I prefer it at the type instead of the name). For example, in line 1069, 1076.

opaquenode.hpp
- Please also fix the whitespaces in the constructor argument list in lines 38, 45
- Maybe we should fix the comment for Opaque4Nodes because they are now also used in a different context

Please update the copyright dates to 2018. Did you run any performance testing?

I've started correctness testing.

Best regards,
Tobias

On 18.12.2017 15:40, Roland Westrelin wrote:

>
> http://cr.openjdk.java.net/~roland/8193130/webrev.00/
>
> In the test case:
>
> The loop in test1() (inlined from test1_helper()) runs for a small
> number of iterations but the init value is not constant and profiling
> tells the compiler it runs for a large number of iterations. First, loop
> predication moves the range checks out of loop, then pre/main/post loops
> are created and the main loop is unrolled 16 times. After unrolling the
> init value of the main loop is known to be <= -11 and the limit is -10
> (the main loop is never run). The type of the main loop iv is set to <=
> -10. The main loop no longer has range checks but it still has the range
> check CastII and the ConvI2L nodes that enforce that the iv should be >=
> 0. The CastII/ConvI2L nodes become top and some data paths in the main
> loop body die. Because there's no test that compare the iv to 0 in the
> loop body (they are predicates above the pre loop), no control
> paths die. That leads to a broken graph.
>
> The only way to fix this that I found, is to have predicates between the
> main and pre loops that verify that the init value of the main loop
> falls into the range allowed by range checks. This is tricky because
> when predication runs there's no pre/main/post loops yet. The change I
> propose adds skeleton predicates during loop predicates. They are a copy
> of predicates on the init value of the loop but use a place holder for
> the init value (Opaque1 node). They are built using an If->Opaque4->Bool
> pattern that will make them go away after loop optimizations (the
> Opaque4 node has a default value it takes after loop opts). When the
> pre/main/post loops are created, the skeleton predicates are copied
> above the main loop, the Opaque1 node is replaced by the actual init
> value for the main loop, the uncommon traps of the predicate are
> replaced by a Halt nodes. The copies also use Opaque4 nodes that will
> cause the predicates of the main loop to be removed from the final
> graph.
>
> With this change, when the type of the init value of the main loop is
> known to be -11, the main loop becomes unreachable and is optimized out.
>
> An extra complication is that the same problem exists with iteration
> splitting: after iteration splitting + unrolling the main loop could
> become unreachable at runtime and CastII/ConvI2L nodes in the main loop
> body die. This is fixed here by adding predicates between the main and
> pre loop that enforce range check constraints similar to what is
> decribed above for loop predication.
>
> Roland.
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Roland Westrelin-3

Hi Tobias,

Thanks for looking at this. Here is a new webrev:

http://cr.openjdk.java.net/~roland/8193130/webrev.01/

> compile.cpp
> - Why do you only remove the Opaque4Nodes after the macro expansion phase? Couldn't we avoid the additional
> igvn.optimize() by merging it into line 2346?

Opaque1 nodes are only guaranteed to have been eliminated once macro
expansion is over. So the igvn that follows could still cause some of
the predicates I added on the main loop to make the path to the main
loop go dead. That's why elimination of Opaque4 nodes is delayed that
late.

> loopPredicate.cpp
> - line 780: Why is that check necessary? And wouldn't it make more sense to check this in the caller?

is_range_check_if() returns true for non RangeCheck nodes so there needs
to be a check for RangeCheck. I moved it into the caller.

> Please update the copyright dates to 2018. Did you run any performance testing?

I did eventhough I'm perplexed that we need to do that by hands. I took
care of all other comments as well.

Roland.
Reply | Threaded
Open this post in threaded view
|

Re: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Tobias Hartmann-2
Hi Roland,

On 15.01.2018 10:18, Roland Westrelin wrote:
> http://cr.openjdk.java.net/~roland/8193130/webrev.01/

Looks good to me but I think a second review is required here because the changes are complex.

>> compile.cpp
>> - Why do you only remove the Opaque4Nodes after the macro expansion phase? Couldn't we avoid the additional
>> igvn.optimize() by merging it into line 2346?
>
> Opaque1 nodes are only guaranteed to have been eliminated once macro
> expansion is over. So the igvn that follows could still cause some of
> the predicates I added on the main loop to make the path to the main
> loop go dead. That's why elimination of Opaque4 nodes is delayed that
> late.

Okay, thanks for the explanation.

>> loopPredicate.cpp
>> - line 780: Why is that check necessary? And wouldn't it make more sense to check this in the caller?
>
> is_range_check_if() returns true for non RangeCheck nodes so there needs
> to be a check for RangeCheck. I moved it into the caller.

Right, looks good.

>> Please update the copyright dates to 2018. Did you run any performance testing?
>
> I did eventhough I'm perplexed that we need to do that by hands. I took
> care of all other comments as well.

Thanks Roland, I've basically meant the copyright date of the newly added test. But I guess you are right, copyright
dates should be updated automatically at some point.

Best regards
Tobias
Reply | Threaded
Open this post in threaded view
|

Re: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Vladimir Kozlov
First, I am suggesting to defer it to jdk 11 since it is present in jdk
9 (not new in jdk 10) and the fix is not simple. It is just 6 months delay.

I want to look on it. Will do it after offsite meeting.

Thanks,
Vladimir

On 1/15/18 2:33 AM, Tobias Hartmann wrote:

> Hi Roland,
>
> On 15.01.2018 10:18, Roland Westrelin wrote:
>> http://cr.openjdk.java.net/~roland/8193130/webrev.01/
>
> Looks good to me but I think a second review is required here because the changes are complex.
>
>>> compile.cpp
>>> - Why do you only remove the Opaque4Nodes after the macro expansion phase? Couldn't we avoid the additional
>>> igvn.optimize() by merging it into line 2346?
>>
>> Opaque1 nodes are only guaranteed to have been eliminated once macro
>> expansion is over. So the igvn that follows could still cause some of
>> the predicates I added on the main loop to make the path to the main
>> loop go dead. That's why elimination of Opaque4 nodes is delayed that
>> late.
>
> Okay, thanks for the explanation.
>
>>> loopPredicate.cpp
>>> - line 780: Why is that check necessary? And wouldn't it make more sense to check this in the caller?
>>
>> is_range_check_if() returns true for non RangeCheck nodes so there needs
>> to be a check for RangeCheck. I moved it into the caller.
>
> Right, looks good.
>
>>> Please update the copyright dates to 2018. Did you run any performance testing?
>>
>> I did eventhough I'm perplexed that we need to do that by hands. I took
>> care of all other comments as well.
>
> Thanks Roland, I've basically meant the copyright date of the newly added test. But I guess you are right, copyright
> dates should be updated automatically at some point.
>
> Best regards
> Tobias
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Roland Westrelin-3

> First, I am suggesting to defer it to jdk 11 since it is present in jdk
> 9 (not new in jdk 10) and the fix is not simple. It is just 6 months delay.

That's fine with me.

Roland.
Reply | Threaded
Open this post in threaded view
|

Re: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Vladimir Kozlov
http://cr.openjdk.java.net/~roland/8193130/webrev.01/

After going back and force and debugging code myself I came almost to the same ideas as in this fix.
I agree with it now. And sorry that it took long time.

I submitted new testing for latest jdk/hs. I had to remove all Copyright years updates from patch -
they already updated in sources.

I am not sure about backport into JDK 10 which will be short lived. Changes are too complex and I
think they should "baked" in current JDK first.

Thanks,
Vladimir

On 1/18/18 12:12 AM, Roland Westrelin wrote:
>
>> First, I am suggesting to defer it to jdk 11 since it is present in jdk
>> 9 (not new in jdk 10) and the fix is not simple. It is just 6 months delay.
>
> That's fine with me.
>
> Roland.
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(M): 8193130: Bad graph when unrolled loop bounds conflicts with range checks

Roland Westrelin-3

Hi Vladimir,

> http://cr.openjdk.java.net/~roland/8193130/webrev.01/
>
> After going back and force and debugging code myself I came almost to the same ideas as in this fix.
> I agree with it now. And sorry that it took long time.

Cool! Thanks for the review, testing and sponsoring.

> I am not sure about backport into JDK 10 which will be short lived. Changes are too complex and I
> think they should "baked" in current JDK first.

Not backporting is fine with me.

Roland.