Re: RFR(L): 8186027: C2: loop strip mining

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8186027: C2: loop strip mining

Nils Eliasson
Hi,


On 2017-11-23 15:18, Roland Westrelin wrote:

> Hi Vladimir,
>
>> I am running testing again. But if this will repeat and presence of this
>> Sparse.small regression suggesting to me that may be we should keep this
>> optimization off by default - keep UseCountedLoopSafepoints false.
>>
>> We may switch it on later with additional changes which address regressions.
>>
>> What do you think?
> If the inner loop runs for a small number of iterations and the compiler
> can't statically prove it, I don't see a way to remove the overhead of
> loop strip mining entirely. So I'm not optimistic the regression can be
> fixed.
Agreed. In other words: Loop strip mining adds a guarantee that
time-to-safepoint won't be too long, and that has a small cost

The current situation is that we have some extra performance with
UseCountedLoopSafepoints default off, but let some users have a bad
experience when they encounter long time-to-safepoint times or failures
(https://bugs.openjdk.java.net/browse/JDK-5014723). I rather turn the
table and have loop strip mining on, and let the power users experiment
with turning it off for any uncertain performance boost.

> If loop strip mining defaults to false, would there we be any regular
> testing on your side?
We would have to add some.
>
> It seems to me that it would make sense to enable loop strip mining
> depending on what GC is used: it makes little sense for parallel gc but
> we'll want it enabled for Shenandoah for instance. Where does G1 fit? I
> can't really say and I don't have a strong opinion. But as I understand,
> G1 was made default under the assumption that users would be ok trading
> throughput for better latency. Maybe, that same reasoning applies to
> loop strip mining?

Scimark.sparse.small show a regression, but having long
time-to-safepoint has a throughput cost in some settings like the
companion benchmark scimark.sparse.large. Numbers using G1:

-XX:-UseCountedLoopSafepoints (default) ~86 ops/m
-XX:+UseCountedLoopSafepoints ~106 ops/m
-XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m

I would prefer having it on by default, at least in G1. Let's ask the G1
GC-team on their opinion.

// Nils

>
> Roland.

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8186027: C2: loop strip mining

Thomas Schatzl
Hi Nils,

On Thu, 2017-11-23 at 22:59 +0100, Nils Eliasson wrote:

> Hi,
>
>
> On 2017-11-23 15:18, Roland Westrelin wrote:
> > Hi Vladimir,
> >
> > > I am running testing again. But if this will repeat and presence
> > > of this Sparse.small regression suggesting to me that may be we
> > > should keep this optimization off by default - keep
> > > UseCountedLoopSafepoints false.
> > >
> > > We may switch it on later with additional changes which address
> > > regressions.
> > >
> > > What do you think?
> >
> > If the inner loop runs for a small number of iterations and the
> > compiler can't statically prove it, I don't see a way to remove the
> > overhead of loop strip mining entirely. So I'm not optimistic the
> > regression can be fixed.
>
> Agreed. In other words: Loop strip mining adds a guarantee that
> time-to-safepoint won't be too long, and that has a small cost
>
> The current situation is that we have some extra performance with
> UseCountedLoopSafepoints default off, but let some users have a bad
> experience when they encounter long time-to-safepoint times or
> failures (https://bugs.openjdk.java.net/browse/JDK-5014723). I rather
> turn the table and have loop strip mining on, and let the power users
> experiment with turning it off for any uncertain performance boost.
>
> > If loop strip mining defaults to false, would there we be any
> > regular testing on your side?
>
> We would have to add some.
> >
> > It seems to me that it would make sense to enable loop strip mining
> > depending on what GC is used: it makes little sense for parallel gc
> > but we'll want it enabled for Shenandoah for instance. Where does
> > G1 fit? I can't really say and I don't have a strong opinion. But
> > as I understand, G1 was made default under the assumption that
> > users would be ok trading throughput for better latency. Maybe,
> > that same reasoning applies to loop strip mining?
>
> Scimark.sparse.small show a regression, but having long
> time-to-safepoint has a throughput cost in some settings like the
> companion benchmark scimark.sparse.large. Numbers using G1:
>
> -XX:-UseCountedLoopSafepoints (default) ~86 ops/m
> -XX:+UseCountedLoopSafepoints ~106 ops/m
> -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m
>
> I would prefer having it on by default, at least in G1. Let's ask the
> G1 GC-team on their opinion.

  our perf team uses -XX:+UseCountedLoopSafepoints for _all_ collectors
 for some time now. When asked, they replied that predictability of
results is very important for them too.

We also closed out some perf regressions (e.g. JDK-8177704) due to the
problems with -XX:-UseCountedLoopSafepoints after you posted these
results in agreement with the perf team.

So I am all for making it default for G1 (and I am sure others agree),
if not for all GCs. However I recommend having a separate CR for
changing the defaults. Makes it easier reverting it in case things go
wrong.

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8186027: C2: loop strip mining

Vladimir Kozlov
Second pre-integration re-run finshed and there were timeouts. Looks like it depend what hosts were used to run tests.

I agree to enable it by default on subset if GCs which are sensitive to pauses: G1 and Shenandoah.

Roland, please, prepare changeset.

I am not sure we need separate CR for setting defaults. It is simple to file bug and change only defaults if we need.

Thanks,
Vladimir

On 11/24/17 1:22 AM, Thomas Schatzl wrote:

> Hi Nils,
>
> On Thu, 2017-11-23 at 22:59 +0100, Nils Eliasson wrote:
>> Hi,
>>
>>
>> On 2017-11-23 15:18, Roland Westrelin wrote:
>>> Hi Vladimir,
>>>
>>>> I am running testing again. But if this will repeat and presence
>>>> of this Sparse.small regression suggesting to me that may be we
>>>> should keep this optimization off by default - keep
>>>> UseCountedLoopSafepoints false.
>>>>
>>>> We may switch it on later with additional changes which address
>>>> regressions.
>>>>
>>>> What do you think?
>>>
>>> If the inner loop runs for a small number of iterations and the
>>> compiler can't statically prove it, I don't see a way to remove the
>>> overhead of loop strip mining entirely. So I'm not optimistic the
>>> regression can be fixed.
>>
>> Agreed. In other words: Loop strip mining adds a guarantee that
>> time-to-safepoint won't be too long, and that has a small cost
>>
>> The current situation is that we have some extra performance with
>> UseCountedLoopSafepoints default off, but let some users have a bad
>> experience when they encounter long time-to-safepoint times or
>> failures (https://bugs.openjdk.java.net/browse/JDK-5014723). I rather
>> turn the table and have loop strip mining on, and let the power users
>> experiment with turning it off for any uncertain performance boost.
>>
>>> If loop strip mining defaults to false, would there we be any
>>> regular testing on your side?
>>
>> We would have to add some.
>>>
>>> It seems to me that it would make sense to enable loop strip mining
>>> depending on what GC is used: it makes little sense for parallel gc
>>> but we'll want it enabled for Shenandoah for instance. Where does
>>> G1 fit? I can't really say and I don't have a strong opinion. But
>>> as I understand, G1 was made default under the assumption that
>>> users would be ok trading throughput for better latency. Maybe,
>>> that same reasoning applies to loop strip mining?
>>
>> Scimark.sparse.small show a regression, but having long
>> time-to-safepoint has a throughput cost in some settings like the
>> companion benchmark scimark.sparse.large. Numbers using G1:
>>
>> -XX:-UseCountedLoopSafepoints (default) ~86 ops/m
>> -XX:+UseCountedLoopSafepoints ~106 ops/m
>> -XX:+UseCountedLoopSafepoints -XX:LoopStripMiningIter=1000 ~111 ops/m
>>
>> I would prefer having it on by default, at least in G1. Let's ask the
>> G1 GC-team on their opinion.
>
>    our perf team uses -XX:+UseCountedLoopSafepoints for _all_ collectors
>   for some time now. When asked, they replied that predictability of
> results is very important for them too.
>
> We also closed out some perf regressions (e.g. JDK-8177704) due to the
> problems with -XX:-UseCountedLoopSafepoints after you posted these
> results in agreement with the perf team.
>
> So I am all for making it default for G1 (and I am sure others agree),
> if not for all GCs. However I recommend having a separate CR for
> changing the defaults. Makes it easier reverting it in case things go
> wrong.
>
> Thanks,
>    Thomas
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(L): 8186027: C2: loop strip mining

Thomas Schatzl
Hi,

On Mon, 2017-11-27 at 10:19 -0800, Vladimir Kozlov wrote:

> Second pre-integration re-run finshed and there were timeouts. Looks
> like it depend what hosts were used to run tests.
>
> I agree to enable it by default on subset if GCs which are sensitive
> to pauses: G1 and Shenandoah.
>
> Roland, please, prepare changeset.
>
> I am not sure we need separate CR for setting defaults. It is simple
> to file bug and change only defaults if we need.

  the reason I suggested it is because this defaults change will then
be more prominent when browsing the changesets for explanations of
regressions.

Ie. I am very sure we in the GC team would do that change seperately,
but ymmv.

Thanks,
  Thomas