EpsilonGC and throughput.

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

EpsilonGC and throughput.

Sergey Kuksenko-3
Hi All,

Reading discussions about Epsilon GC and performance I'd rather warn you
to do not mix latency and throughput.
I agree that it makes sense to talk about latency, but, please, don't
expect that you will be able to achieve high throughput with Epsilon GC.
Having zero barriers is not enough for this.
Just a simple example, I randomly took 9 standard throughput measuring
benchmarks and compared Epsilon GC vs G1 and ParallelOld.

- EpsilonGC vs ParallelOld:
   -- only on 3 benchmarks overall throughput with Epsilon GC was higher
than ParallelOld and speedup was : 0.2%-0.6%
   -- on 6 benchmarks, ParallelOld (with barriers and pauses) was faster
(faster means throughput!), within 1%-10%.

- EpsilonGC vs G1
   -- EpsilonGC has shown higher throughput on 4 benchmarks, within 2%-3%
   -  G1 was faster on 5 benchmarks, within 2%-10%.

Compacting GCs have significant advantage over non-GC in terms of
throughput (e.g.
https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/)

--
Best regards,
Sergey Kuksenko



Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Aleksey Shipilev-4
On 12/18/2017 08:01 PM, Sergey Kuksenko wrote:> I agree that it makes sense to talk about latency,
but, please, don't expect that you will be able> to achieve high throughput with Epsilon GC. Having
zero barriers is not enough for this.> Just a simple example, I randomly took 9 standard throughput
measuring benchmarks and compared> Epsilon GC vs G1 and ParallelOld.
I assume you have ran SPECjvm2008.

Beware of what I call the Catch-22 of (GC) Performance Evaluation: "standard benchmarks" tend to be
developed/tuned with existing GCs in mind. For example, it would be hard to find the "standard
benchmark" that exhibits large LDS, or otherwise experiences large GC pauses, or experiences GC
problems in its steady state (ignoring transient hiccups in the warmups).


> - EpsilonGC vs ParallelOld:
>   -- only on 3 benchmarks overall throughput with Epsilon GC was higher than ParallelOld and speedup
> was : 0.2%-0.6%
>   -- on 6 benchmarks, ParallelOld (with barriers and pauses) was faster (faster means throughput!),
> within 1%-10%.
>
> - EpsilonGC vs G1
>   -- EpsilonGC has shown higher throughput on 4 benchmarks, within 2%-3%
>   -  G1 was faster on 5 benchmarks, within 2%-10%.

Oh! The throughput figures are actually pretty good for non-compacting collector, and performance
improvements are in-line with that is called out in JEP as "Last-drop performance improvements" on
special workloads.

As noted above, it makes little sense to run Epsilon for throughput on "standard benchmarks" that do
not suffer from GC issues. It is instructive, however, to run workloads that *do* suffer from them.
For example, try this for a quick turn-around CLI workload that is supposed to do one thing very
quickly:

public class AL {
    static List<Object> l;
    public static void main(String... args) throws Throwable {
        l = new ArrayList<>();
        for (int c = 0; c < 100_000_000; c++) {
            l.add(new Object());
        }
        System.out.println(l.hashCode());
    }
}


$ time java -XX:+UseParallelGC AL
-1907572722

real 0m25.063s
user 1m5.700s
sys 0m1.084s

$ time java -XX:+UseG1GC AL
-1907572722

real 0m14.908s
user 0m33.264s
sys 0m0.788s

$ time java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC AL
-1907572722

real 0m8.995s
user 0m8.784s
sys 0m0.260s

In workloads like these, having GC pauses does impact application throughput. When out-of-the-box GC
performance is concerned, the difference is not even in single-digit percents. Of course, you can
configure GC to avoid pauses in the timespan that is critical for you (e.g. setting -Xms8g -Xmx8g
-Xmn7g for the workload above), and hope you got it right, but one of the points for Epsilon is not
to guess about this, but actually have the guarantee GC never happens.


> Compacting GCs have significant advantage over non-GC in terms of throughput (e.g.
> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/)
True, and it is called out in JEP:

"Locality considerations. Non-compacting GC implicitly means it maintains the object graph in its
allocation order. This has impact on spatial locality, and regular applications may experience the
throughput hit if allocations are random or generate lots of sparse garbage. While this may entail
some throughput overhead, this is outside of GC control, and would affect most non-moving GCs.
Locality-aware application coding would be required to mitigate this drawback, if locality proves to
be a problem."

Locality is something that users can control, especially when small contained applications are
concerned, and/or (hopefully) Valhalla and other language features that help to flatten the memory.

Thanks,
-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Sergey Kuksenko-3


On 12/19/2017 12:14 AM, Aleksey Shipilev wrote:
> I assume you have ran SPECjvm2008.
Bingo. All of them which were able to work with EpsilonGC at least 30
seconds.
>
> Beware of what I call the Catch-22 of (GC) Performance Evaluation: "standard benchmarks" tend to be
> developed/tuned with existing GCs in mind.
You are partially true. Looking into some sources I could conclude that
they were written having general Java style in mind, not tuned to
particular GCs.

> For example, it would be hard to find the "standard
> benchmark" that exhibits large LDS, or otherwise experiences large GC pauses, or experiences GC
> problems in its steady state (ignoring transient hiccups in the warmups).
>
>
>> - EpsilonGC vs ParallelOld:
>>    -- only on 3 benchmarks overall throughput with Epsilon GC was higher than ParallelOld and speedup
>> was : 0.2%-0.6%
>>    -- on 6 benchmarks, ParallelOld (with barriers and pauses) was faster (faster means throughput!),
>> within 1%-10%.
>>
>> - EpsilonGC vs G1
>>    -- EpsilonGC has shown higher throughput on 4 benchmarks, within 2%-3%
>>    -  G1 was faster on 5 benchmarks, within 2%-10%.
> Oh! The throughput figures are actually pretty good for non-compacting collector, and performance
> improvements are in-line with that is called out in JEP as "Last-drop performance improvements" on
> special workloads.
For special cases yes. I wrote about typical cases. And I my my message
was: don't expect that EpsilonGC will show you "ideal throughput"
without GC overheads, sometimes GC overhead is important for higher
performance.
>
> As noted above, it makes little sense to run Epsilon for throughput on "standard benchmarks" that do
> not suffer from GC issues. It is instructive, however, to run workloads that *do* suffer from them.
I have concerns here. I am afraid that if application *does* suffer from
GC issues it will continue suffering from EpsilonGC issues (OutOfMemory).

> For example, try this for a quick turn-around CLI workload that is supposed to do one thing very
> quickly:
>
> public class AL {
>      static List<Object> l;
>      public static void main(String... args) throws Throwable {
>          l = new ArrayList<>();
>          for (int c = 0; c < 100_000_000; c++) {
>              l.add(new Object());
>          }
>          System.out.println(l.hashCode());
>      }
> }
>
>
> $ time java -XX:+UseParallelGC AL
> -1907572722
>
> real 0m25.063s
> user 1m5.700s
> sys 0m1.084s
>
> $ time java -XX:+UseG1GC AL
> -1907572722
>
> real 0m14.908s
> user 0m33.264s
> sys 0m0.788s
>
> $ time java -XX:+UnlockExperimentalVMOptions -XX:+UseEpsilonGC AL
> -1907572722
>
> real 0m8.995s
> user 0m8.784s
> sys 0m0.260s
It doesn't look like throughput benchmark, it's startup. I am sorry, I
had to be more clear in my previous email, I was writing about steady
state throughput.
Converting this into throughput benchmark I've got:
G1: 12 seconds
ParallelOld: 24 seconds
EpsilonGC: 9.5 seconds
Not so huge difference, and EpsilonGC can't do more than a couple
iterations.
> In workloads like these, having GC pauses does impact application throughput.
Nobody argued with this. I just have shown examples that sometimes GC
pauses (with compaction) provide better overall throughput.

> When out-of-the-box GC
> performance is concerned, the difference is not even in single-digit percents. Of course, you can
> configure GC to avoid pauses in the timespan that is critical for you (e.g. setting -Xms8g -Xmx8g
> -Xmn7g for the workload above), and hope you got it right, but one of the points for Epsilon is not
> to guess about this, but actually have the guarantee GC never happens.
>
>
>> Compacting GCs have significant advantage over non-GC in terms of throughput (e.g.
>> https://shipilev.net/jvm-anatomy-park/11-moving-gc-locality/)
> True, and it is called out in JEP:
>
> "Locality considerations. Non-compacting GC implicitly means it maintains the object graph in its
> allocation order. This has impact on spatial locality, and regular applications may experience the
> throughput hit if allocations are random or generate lots of sparse garbage. While this may entail
> some throughput overhead, this is outside of GC control, and would affect most non-moving GCs.
> Locality-aware application coding would be required to mitigate this drawback, if locality proves to
> be a problem."
>
> Locality is something that users can control, especially when small contained applications are
> concerned, and/or (hopefully) Valhalla and other language features that help to flatten the memory.
Sure. Just have to note that such special tuned locality-aware
application barely could use standard Java API, because of it is out of
user control.
Epsilon GC is not a silver bullet, and for *practical* usage it will
require more efforts than existing GCs to achieve benefits. I don't mind
that such benefits are exist.
>
> Thanks,
> -Aleksey
>

--
Best regards,
Sergey Kuksenko



Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Aleksey Shipilev-4
On 12/19/2017 07:52 PM, Sergey Kuksenko wrote:
>> Locality is something that users can control, especially when small contained applications are
>> concerned, and/or (hopefully) Valhalla and other language features that help to flatten the memory.
> Sure. Just have to note that such special tuned locality-aware application barely could use
> standard Java API, because of it is out of user control.

You would probably be okay with small inefficiencies within the class library, if you can control
the bulk of your own data either by relying on particular classlib implementation, or winding up
your own.

> Epsilon GC is not a silver bullet,and for *practical* usage it will require more efforts than
> existing GCs to achieve benefits. I don't mind that such benefits are exist.
Well, nobody claimed Epsilon is a silver bullet. Before you can reap any of its benefits, you have
to get the footprint under control [*]. After that, you can start exploring exotic memory management
techniques, and no-op GC is one of many tools in the toolbelt there. What makes Epsilon different
from other tools is that it requires VM-side implementation -- and this is why it should be included
into JVM.

Thanks,
-Aleksey

[*] In fact, it is also called out in JEP, the other way around: fail predictably when a lot is
allocated. Over a few last months, I had a pleasant experience asserting allocation pressure
invariants with just running with Epsilon with given heap and checking if it fails. When it does, I
have the full heap-dump view of the garbage produced. This turns out to be much more convenient than
I previously anticipated.


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Kirk Pepperdine

> On Dec 19, 2017, at 8:14 PM, Aleksey Shipilev <[hidden email]> wrote:
>
> On 12/19/2017 07:52 PM, Sergey Kuksenko wrote:
>>> Locality is something that users can control, especially when small contained applications are
>>> concerned, and/or (hopefully) Valhalla and other language features that help to flatten the memory.
>> Sure. Just have to note that such special tuned locality-aware application barely could use
>> standard Java API, because of it is out of user control.
>
> You would probably be okay with small inefficiencies within the class library, if you can control
> the bulk of your own data either by relying on particular classlib implementation, or winding up
> your own.
>
>> Epsilon GC is not a silver bullet,and for *practical* usage it will require more efforts than
>> existing GCs to achieve benefits. I don't mind that such benefits are exist.
> Well, nobody claimed Epsilon is a silver bullet. Before you can reap any of its benefits, you have
> to get the footprint under control [*]. After that, you can start exploring exotic memory management
> techniques, and no-op GC is one of many tools in the toolbelt there. What makes Epsilon different
> from other tools is that it requires VM-side implementation -- and this is why it should be included
> into JVM.

That a bench with Epsilon will not complete to me is completely unsurprising. It is my opinion that there are (niche) classes of applications that are not well served by the current set of benchmarks used to gage JVM performance and I don’t see how one can gauge Epsilon performance based on these benchmarks. In fact, IME, most of the benchmarks don’t push the JVM in ways that are close to what I consistently see in production environments. That and these environments are changing while the benchmarks have remained relatively static. The talk about cache locality while interesting I’m not sure it is relevant or a good predictor of performance given that current design practice discourage good cache line densities and changes the JVM that have been proposed at improving developers ability to pack memory more densely have been rejected.

The way to scale is to do the same (or more) with less. Epsilon does less and encouraged application developers to do less for the same workloads. Whenever this happens I’ve rarely seen a performance regression but instead seen a couple performance improve by a few orders of magnitude.

While Aleksey initially proposed Epsilon for experimental test purposes I’d already considered a number of cases where live applications would benefit from being able use (no-)GC. So, I agree with Aleksey, this option should be included or at least be made to be easily plug-able into OpenJDK.

Kind regards,
Kirk

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Thomas Schatzl
Hi,

On Wed, 2017-12-20 at 12:19 +0100, Kirk Pepperdine wrote:

> > On Dec 19, 2017, at 8:14 PM, Aleksey Shipilev <[hidden email]>
> > wrote:
> >
> > On 12/19/2017 07:52 PM, Sergey Kuksenko wrote:
> > > > Locality is something that users can control, especially when
> > > > small contained applications are
> > > > concerned, and/or (hopefully) Valhalla and other language
> > > > features that help to flatten the memory.
> > >
> > > Sure. Just have to note that such special tuned locality-aware
> > > application barely could use
> > > standard Java API, because of it is out of user control.
> >
> > You would probably be okay with small inefficiencies within the
> > class library, if you can control
> > the bulk of your own data either by relying on particular classlib
> > implementation, or winding up
> > your own.
> >
> > > Epsilon GC is not a silver bullet,and for *practical* usage it
> > > will require more efforts than
> > > existing GCs to achieve benefits. I don't mind that such benefits
> > > are exist.
> >
> > Well, nobody claimed Epsilon is a silver bullet. Before you can
> > reap any of its benefits, you have
> > to get the footprint under control [*]. After that, you can start
> > exploring exotic memory management
> > techniques, and no-op GC is one of many tools in the toolbelt
> > there. What makes Epsilon different
> > from other tools is that it requires VM-side implementation -- and
> > this is why it should be included
> > into JVM.
>
> That a bench with Epsilon will not complete to me is completely
> unsurprising. It is my opinion that there are (niche) classes of
> applications that are not well served by the current set of
> benchmarks used to gauge JVM performance and I don’t see how one can
> gauge Epsilon performance based on these benchmarks. In fact, IME,
> most of the benchmarks don’t push the JVM in ways that are close to
> what I consistently see in production environments. That and these

Then please help providing ones which we VM devs can use to improve the
VM. :)

> environments are changing while the benchmarks have remained
> relatively static. The talk about cache locality while interesting
> I’m not sure it is relevant or a good predictor of performance given
> that current design practice discourage good cache line densities and

It is all about caches nowadays :) E.g. in our tests with ZGC we are
seeing surprisingly good throughput numbers due to caching effects.
I.e. ones that seem to contradict what I remember of literature.

I guess Aleksey is all over putting ZGC through their benchmarks right
now, maybe he can tell us more.

SPECjvm2008 is maybe also a very conservative indicator for caching
impact due to compaction: it's live data set for many of its tests is
really small, and the actively accessed amount of memory probably even
smaller. Some of them even maybe fit completely into today's L3 caches
of current (high-end) machines....

I mean, if G1 with its notoriously huge barrier (which can be improved
btw, if you want to work on this just ask) and rather bad (really
bloated, i.e. in some cases extremely cache unfriendly) remembered set
can already "win" of some of these benchmarks... (again, just ask).

> changes the JVM that have been proposed at improving developers
> ability to pack memory more densely have been rejected.

Not sure about this. E.g. value types I think are actively developed?
Please elaborate. Maybe you are just talking to the wrong people=

> The way to scale is to do the same (or more) with less. Epsilon does
> less and encouraged application developers to do less for the same
> workloads. Whenever this happens I’ve rarely seen a performance
> regression but instead seen a couple performance improve by a few
> orders of magnitude.

You mean improve caching/memory usage? I agree :)

> While Aleksey initially proposed Epsilon for experimental test
> purposes I’d already considered a number of cases where live
> applications would benefit from being able use (no-)GC. So, I agree
> with Aleksey, this option should be included or at least be made to
> be easily plug-able into OpenJDK.
>

You mentioned this twice already without going into detail. Could you
please elaborate about this, particularly compared to a for such cases
properly configured Serial/Parallel GC?
(Note that Epsilon GC *also* needs at least some heap size
configuratoin; so unless adding -Xmn/-Xms is too much work compared to
significant changes to your application to fit this paradigm, please
tell)

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Kirk Pepperdine

>>
>> That a bench with Epsilon will not complete to me is completely
>> unsurprising. It is my opinion that there are (niche) classes of
>> applications that are not well served by the current set of
>> benchmarks used to gauge JVM performance and I don’t see how one can
>> gauge Epsilon performance based on these benchmarks. In fact, IME,
>> most of the benchmarks don’t push the JVM in ways that are close to
>> what I consistently see in production environments. That and these
>
> Then please help providing ones which we VM devs can use to improve the
> VM. :)

Thank you for the platitude.
>
>> environments are changing while the benchmarks have remained
>> relatively static. The talk about cache locality while interesting
>> I’m not sure it is relevant or a good predictor of performance given
>> that current design practice discourage good cache line densities and
>
> It is all about caches nowadays :) E.g. in our tests with ZGC we are
> seeing surprisingly good throughput numbers due to caching effects.
> I.e. ones that seem to contradict what I remember of literature.

Just be certain that this isn’t an artifact of your benchmark.

>
>
> SPECjvm2008 is maybe also a very conservative indicator for caching
> impact due to compaction: it's live data set for many of its tests is
> really small, and the actively accessed amount of memory probably even
> smaller. Some of them even maybe fit completely into today's L3 caches
> of current (high-end) machines….

And that is part of my point. Many of them are simply too small relative to the size of applications today.


 
>
>> changes the JVM that have been proposed at improving developers
>> ability to pack memory more densely have been rejected.
>
> Not sure about this. E.g. value types I think are actively developed?
> Please elaborate. Maybe you are just talking to the wrong people=

Gil Tene’s proposal on object layouts.

>>
>
> You mentioned this twice already without going into detail. Could you
> please elaborate about this, particularly compared to a for such cases
> properly configured Serial/Parallel GC?
> (Note that Epsilon GC *also* needs at least some heap size
> configuratoin; so unless adding -Xmn/-Xms is too much work compared to
> significant changes to your application to fit this paradigm, please
> tell)

There are applications that have very well known memory needs and in those cases it is possible to know how big a heap needs to be in order to complete a business cycle (be it 4 hours, 8 hours, 24 hours…) without experiencing a collection cycle. Serverless is another (albeit insane) use case where epsilon looks attractive.

Kind regards,
Kirk

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Thomas Schatzl
Hi,

On Wed, 2017-12-20 at 13:38 +0100, Kirk Pepperdine wrote:

> >
> >
> > SPECjvm2008 is maybe also a very conservative indicator for caching
> > impact due to compaction: it's live data set for many of its tests
> > is really small, and the actively accessed amount of memory
> > probably even smaller. Some of them even maybe fit completely into
> > today's L3 caches of current (high-end) machines….
>
> And that is part of my point. Many of them are simply too small
> relative to the size of applications today.

Assuming that with these

> > > changes the JVM that have been proposed at improving developers
> > > ability to pack memory more densely have been rejected.
> >
> > Not sure about this. E.g. value types I think are actively
> > developed?
> > Please elaborate. Maybe you are just talking to the wrong people=
>
> Gil Tene’s proposal on object layouts.

I am not seeing any work on that, not even a JEP. So maybe Gil and you
are not pushing it enough. Otoh it may just take some time, as value
types did.

> > You mentioned this twice already without going into detail. Could
> > you please elaborate about this, particularly compared to a for
> > such cases properly configured Serial/Parallel GC?
> > (Note that Epsilon GC *also* needs at least some heap size
> > configuratoin; so unless adding -Xmn/-Xms is too much work compared
> > to significant changes to your application to fit this paradigm,
> > please tell)
>
> There are applications that have very well known memory needs and in
> those cases it is possible to know how big a heap needs to be in
> order to complete a business cycle (be it 4 hours, 8 hours, 24
> hours…) without experiencing a collection cycle. Serverless is
> another (albeit insane) use case where epsilon looks attractive.

Unfortunately that answer again does not answer my question - the
particular part about "... particularly compared to a for such cases
properly configured Serial/Parallel GC" is important here.

I mean, we want to add 1k LOC that to a large degree I think copies
Serial GC (allocation) code, and want to argue that this outweighs
using a few command line switches (see the other email).

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Kirk Pepperdine

>
>>>> changes the JVM that have been proposed at improving developers
>>>> ability to pack memory more densely have been rejected.
>>>
>>> Not sure about this. E.g. value types I think are actively
>>> developed?
>>> Please elaborate. Maybe you are just talking to the wrong people=
>>
>> Gil Tene’s proposal on object layouts.
>
> I am not seeing any work on that, not even a JEP. So maybe Gil and you
> are not pushing it enough. Otoh it may just take some time, as value
> types did.

You’re right, there is no JEP because it was made clear from discussions with the relevant parties that ValueTypes were considered to be more important at the time so the JEP process wasn’t pursued. There is an implementation in Gil’s Github account. Any implementation will require JVM support. However Gil contented that the Java code should run reasonably on JVMs that don’t have the intrinsic support.. so you can try out the libraries. There is an internal implementation in Azul’s JVM so.. it’s proven to work and have benefits.

>
>>> You mentioned this twice already without going into detail. Could
>>> you please elaborate about this, particularly compared to a for
>>> such cases properly configured Serial/Parallel GC?
>>> (Note that Epsilon GC *also* needs at least some heap size
>>> configuratoin; so unless adding -Xmn/-Xms is too much work compared
>>> to significant changes to your application to fit this paradigm,
>>> please tell)
>>
>> There are applications that have very well known memory needs and in
>> those cases it is possible to know how big a heap needs to be in
>> order to complete a business cycle (be it 4 hours, 8 hours, 24
>> hours…) without experiencing a collection cycle. Serverless is
>> another (albeit insane) use case where epsilon looks attractive.
>
> Unfortunately that answer again does not answer my question - the
> particular part about "... particularly compared to a for such cases
> properly configured Serial/Parallel GC" is important here.

According to Aleksey's benchmarks the answer appear to be yes, there is an advantage to Epsilon-GC.

>
> I mean, we want to add 1k LOC that to a large degree I think copies
> Serial GC (allocation) code, and want to argue that this outweighs
> using a few command line switches (see the other email).\

I get that one doesn’t want to add every imaginable feature to the JVM so good question. To be honest I don’t know if 1k LOC is better than command line switches. I do know there isn’t a “one size fits all” option in this space today. There are always trade-offs. It depends on how much we trust Aleksey’s benchmarks.

— Kirk

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Thomas Schatzl
In reply to this post by Aleksey Shipilev-4
Hi,

On Tue, 2017-12-19 at 20:14 +0100, Aleksey Shipilev wrote:

> On 12/19/2017 07:52 PM, Sergey Kuksenko wrote:
> > > Locality is something that users can control, especially when
> > > small contained applications are
> > > concerned, and/or (hopefully) Valhalla and other language
> > > features that help to flatten the memory.
> >
> > Sure. Just have to note that such special tuned locality-aware
> > application barely could use
> > standard Java API, because of it is out of user control.
>
> You would probably be okay with small inefficiencies within the class
> library, if you can control the bulk of your own data either by
> relying on particular classlib implementation, or winding up
> your own.

And e.g. Serial GC *by itself* has what particular dependency on
something in the OpenJDK classlib that makes that impossible? (Maybe
the java.lang.ref.reference stuff?)

> > Epsilon GC is not a silver bullet,and for *practical* usage it will
> > require more efforts than existing GCs to achieve benefits. I don't
> > mind that such benefits are exist.
>
> Well, nobody claimed Epsilon is a silver bullet. Before you can reap
> any of its benefits, you have to get the footprint under control [*].
> After that, you can start exploring exotic memory management
> techniques,

Can you explain to me how you can't do that with e.g. Serial GC? Is the
allocation code in Serial that much different? Actually I think it
should be almost the same. If it is not, it may be useful to clean
serial gc allocation code up instead of adding new stuff that does
exactly the same (Hint: FastTLABRefill related code will go away in
11).

It won't be called "EpsilonGC" though, and won't have an extra switch,
but benefit openjdk probably even more.

The lukewarm reception from me is mostly because I am judging on the
merits of what's in the JEP, not some future magic fairy dust that
helps every collector anyway in the future. Can you at least give some
ideas where you want to go with this, where Serial GC or any other
existing GC will prevent progress? Base "another gc" (the "exotic
memory management techniques") on it? That seems to contradict the
purpose of Epsilon GC.

To me personally the best argument that is given in the JEP seems to be
that it helps validating the GC interface - but all other GCs
implementing it also do that already to some degree (serial, parallel,
not parallelold, cms, g1, probably Shenandoah, and Z).

That just does not seem impressive.

> and no-op GC is one of many tools in the toolbelt there. What makes
> Epsilon different from other tools is that it requires VM-side
> implementation -- and this is why it should be included into JVM.

The question is: do we need a new tool that only reinvents the old ones
with minimal (I would dare to say non-real world) advantages.

> Thanks,
> -Aleksey
>
> [*] In fact, it is also called out in JEP, the other way around: fail
> predictably when a lot is allocated. Over a few last months, I had a
> pleasant experience asserting allocation pressure
> invariants with just running with Epsilon with given heap and
> checking if it fails. When it does, I have the full heap-dump view of
> the garbage produced. This turns out to be much more convenient than
> I previously anticipated.

java "-XX:+UseSerialGC -Xmn<something> -Xms<something> -Xmx<something>
-XX:SurvivorRatio=<something> -XX:+DumpHeapAtOome" (or something like
this) myapplication

seems to give exactly the same information.

(untested; that will waste 3*4kb of memory; I can almost guarantee you
that OpenJDK static data is >8kb, so any GC will fail here. Maybe there
is even already an undocumented DieOnFirstGC development switch...)

Yes, that's a bit longer command line than "-Xmx<something>
-XX:+UseEpsilonGC -XX:+DumpHeapAtOome", but it adds exactly *zero*
maintenance overhead to the code base.

Thanks,
  Thomas


Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Aleksey Shipilev-4
On 12/20/2017 03:46 PM, Thomas Schatzl wrote:
>> You would probably be okay with small inefficiencies within the class
>> library, if you can control the bulk of your own data either by
>> relying on particular classlib implementation, or winding up
>> your own.
>
> And e.g. Serial GC *by itself* has what particular dependency on
> something in the OpenJDK classlib that makes that impossible? (Maybe
> the java.lang.ref.reference stuff?)

This is not about Serial GC. Sergey's argument was that classlib allocations are outside of users'
control, and thus locality there is out of users' control either. My counter-point is that some
locality waste might be acceptable, as long as the bulk of the work is done by user
locality-conscious code anyway.


>> Well, nobody claimed Epsilon is a silver bullet. Before you can reap any of its benefits, you
>> have to get the footprint under control [*]. After that, you can start exploring exotic memory
>> management techniques,
> Can you explain to me how you can't do that with e.g. Serial GC? Is the allocation code in Serial
> that much different? Actually I think it should be almost the same.
Concentrating on allocation path misses the point.

The crucial point is that Epsilon *guarantees* the absence of GC, rather than relying on obscure
tuning of current GCs. In return, it trivially avoids setting up anything that might be needed
during GC cycle that other GCs would have to set up, on the off chance the configuration is wrong
and does accept the GC cycle in some corner cases. Examples: GC threads, task queues, card tables
and other remembered sets, barriers, special handling of Reference.get, finalizers, etc.


> If it is not, it may be useful to clean serial gc allocation code up instead of adding new stuff
> that does exactly the same (Hint: FastTLABRefill related code will go away in 11).

Epsilon already does share lots of code with gc/shared. For example, allocation code calls into
VirtualSpace::expand_by and ContiguousSpace::par_allocate to do the work on allocation path, and the
rest is handled by shared TLAB machinery. We can consider making coarser-grained API for allocations
like this, and that will save e.g. 20 lines of code in the allocation path. But I really think that
would be the over-zealous application of DRY principle, and would be against "prefer duplication
over the wrong abstraction" guideline.


> It won't be called "EpsilonGC" though, and won't have an extra switch,
> but benefit openjdk probably even more.

See, this is the guarantee thing again. Having the extra configuration to mimic what Epsilon does in
existing GC might be a way out, until you silently regress it via the interaction with some other GC
option, some other bugfix, or some other performance improvement, or because GC developers in their
wisdom changed the behavior ever so slightly. Having the GC that does not collect _by design_ makes
it hard to compromise this property.

Suppose you find the configuration that prevents GC in existing Serial code. Asserting the needed
behavior in current GC would mean developing white- or black-box style tests that assert the
configuration setting works as expected, and that also has to be revisited every time some
potentially-interacting GC feature / option is added. That is again, because Serial *might* collect,
and you just *hope* you got the config right so that GC does (not) happen when you do (not) need it.

This is about having the guarantees by design, instead of being hopeful about the configuration.
Epsilon makes an allocation failure the hard error, no excuses, no misconfiguration opportunities.


> The lukewarm reception from me is mostly because I am judging on the
> merits of what's in the JEP, not some future magic fairy dust that
> helps every collector anyway in the future. Can you at least give some
> ideas where you want to go with this, where Serial GC or any other
> existing GC will prevent progress? Base "another gc" (the "exotic
> memory management techniques") on it? That seems to contradict the
> purpose of Epsilon GC.

"Exotic memory management techniques" in my example is basically managing the persistent working
set, and disabling GC completely. Epsilon is not supposed to be extended with any new GC code --
like we saw in other thread, even simplistic Full GC is out of the question -- Epsilon does not do
any memory reclamation, period.


> To me personally the best argument that is given in the JEP seems to be
> that it helps validating the GC interface - but all other GCs
> implementing it also do that already to some degree (serial, parallel,
> not parallelold, cms, g1, probably Shenandoah, and Z).

The key thing is "personally to you" -- and that is fine. It does not mean other uses are wrong,
because you don't need them, or the expert can configure other GCs to do (barely) the similar (but
not exactly the same) thing.


> That just does not seem impressive.

Epsilon is not supposed to be impressive. Most of the useful tools are straight-forward and boring.
It would indeed be odd to gauge the tools by their impressiveness.


>> and no-op GC is one of many tools in the toolbelt there. What makes
>> Epsilon different from other tools is that it requires VM-side
>> implementation -- and this is why it should be included into JVM.
>
> The question is: do we need a new tool that only reinvents the old ones
> with minimal (I would dare to say non-real world) advantages.

Yes, we do. An year ago, I thought this was a thought (pun intended) experiment, and I would
probably have the same position -- just use the myriad of GC options to configure the existing GC.
But since then I had interesting talks with people who have use cases for the simple/trivial/dumb
no-op GC: most of these things are captured in JEP. Java ecosystem is vast, and even 0.1% of use
cases add up to substantial absolute number of use cases. In the interesting twist of fate, we are
even considering backporting Epsilon to JDK 8, because this is where the most current Java ecosystem
is -- and having separate implementation does give nice isolation guarantees for backports.

Coming to from a personal perspective, Epsilon is like peat whiskey for me: first taste feels very
wrong and you question the sanity of those enjoying it, and then, as you become familiar with it,
you realize it is just something else, in its essence, and you begin to see the appeal. It is not an
everyday drink, for sure.


>> [*] In fact, it is also called out in JEP, the other way around: fail
>> predictably when a lot is allocated. Over a few last months, I had a
>> pleasant experience asserting allocation pressure
>> invariants with just running with Epsilon with given heap and
>> checking if it fails. When it does, I have the full heap-dump view of
>> the garbage produced. This turns out to be much more convenient than
>> I previously anticipated.
>
> java "-XX:+UseSerialGC -Xmn<something> -Xms<something> -Xmx<something>
> -XX:SurvivorRatio=<something> -XX:+DumpHeapAtOome" (or something like
> this) myapplication
>
> seems to give exactly the same information.
Nope, it does not. Because Serial would still attempt at least one GC when faced with potential
OOME, and that will prune out the floating garbage -- and I am interested in *all* allocations. GC
guys might argue that allocations are cheap, and that GC cycles pruning dead objects are also cheap,
but the industrial reality is that people still hunt down and eliminate garbage allocations with
non-ignorable performance improvements. The ability to heap dump with no object left behind is
surprisingly useful. Again, some things are trivial in some GC designs. It is trivial to guarantee
all allocated objects end up in heap dump with the no-op GC.


Thanks,
-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Thomas Schatzl
Hi,

  (please read my answers in full before answering :))

On Wed, 2017-12-20 at 20:05 +0100, Aleksey Shipilev wrote:

> On 12/20/2017 03:46 PM, Thomas Schatzl wrote:
> > > You would probably be okay with small inefficiencies within the
> > > class library, if you can control the bulk of your own data
> > > either by relying on particular classlib implementation, or
> > > winding up your own.
> >
> > And e.g. Serial GC *by itself* has what particular dependency on
> > something in the OpenJDK classlib that makes that impossible?
> > (Maybe the java.lang.ref.reference stuff?)
>
> This is not about Serial GC. Sergey's argument was that classlib
> allocations are outside of users' control, and thus locality there is
> out of users' control either. My counter-point is that some
> locality waste might be acceptable, as long as the bulk of the work
> is done by user locality-conscious code anyway.

Yes, but in this case there is no difference between using Epsilon and
any other GC. All of them benefit. Now Epsilon might benefit more than
others.

I do not know how much. Probably it won't make a lot of difference,
because the default evacuation order tends to improve locality (and for
collectors like Z/Shenandoah I think with move on read access it does
even more); so unless the user also manually lays out memory in heap
according to access, then these persons are really really desperate.

It might just be better and not much more inconvenience to just not use
Java in the first place for them without other real good reasons (Imho)

> > > Well, nobody claimed Epsilon is a silver bullet. Before you can
> > > reap any of its benefits, you
> > > have to get the footprint under control [*]. After that, you can
> > > start exploring exotic memory
> > > management techniques,
> >
> > Can you explain to me how you can't do that with e.g. Serial GC? Is
> > the allocation code in Serial
> > that much different? Actually I think it should be almost the same.
>
> Concentrating on allocation path misses the point.
>
> The crucial point is that Epsilon *guarantees* the absence of GC,
> rather than relying on obscure tuning of current GCs.

Please don't exaggerate here: none of the switches I showed in my
example are obscure or hard to understand. And they pretty much disable
all heuristics in mentioned collectors.

And it is very unlikely anybody is going to touch Serial in the future,
because like Epsilon it serves a very particular niche rather well. And
you know that Serial does not do *anything* outside of pauses by
design.

[...]

>
> > It won't be called "EpsilonGC" though, and won't have an extra
> > switch, but benefit openjdk probably even more.
>
> See, this is the guarantee thing again. Having the extra
> configuration to mimic what Epsilon does in
> existing GC might be a way out, until you silently regress it via the
> interaction with some other GC option, some other bugfix, or some
> other performance improvement, or because GC developers in their
> wisdom changed the behavior ever so slightly. Having the GC that does
> not collect _by design_ makes it hard to compromise this property.
>
> Suppose you find the configuration that prevents GC in existing
> Serial code. Asserting the needed behavior in current GC would mean
> developing white- or black-box style tests that assert the
> configuration setting works as expected, and that also has to be
> revisited every time some potentially-interacting GC feature / option
> is added. That is again, because Serial *might* collect,
> and you just *hope* you got the config right so that GC does (not)
> happen when you do (not) need it.
>
> This is about having the guarantees by design, instead of being
> hopeful about the configuration.
> Epsilon makes an allocation failure the hard error, no excuses, no
> misconfiguration opportunities.

This guarantee is nice to have (and is trivially checked by a test to
avoid accidentally removing this behavior), but the JEP needs to spell
out why this guarantee is important in particular. I.e. what makes it
interesting, what can a be achieved *beyond* what can already be done
in (not too inconvenient) other ways by particularly exploiting the
guarantee that makes it so unique and useful.

Exaggerating: Establishing a random guarantee does not help a lot.

> > To me personally the best argument that is given in the JEP seems
> > to be that it helps validating the GC interface - but all other GCs
> > implementing it also do that already to some degree (serial,
> > parallel, not parallelold, cms, g1, probably Shenandoah, and Z).
>
> The key thing is "personally to you" -- and that is fine. It does not
> mean other uses are wrong, because you don't need them, or the expert
> can configure other GCs to do (barely) the similar (but not exactly
> the same) thing.

I am fine to agree to disagree :) But in this case I kept talking to
you because I wanted to understand the reasons for this change and why
it's so useful in production because I knew there were some somewhere -
just not in the JEP. That's why we are annoying you with "tell me what
is the use of that change and what makes it so special" all the time.

Sorry, the JEP just does not answer these questions for me at the
moment. And apparently not for other people you did not talk to in
person.

> > > and no-op GC is one of many tools in the toolbelt there. What
> > > makes  Epsilon different from other tools is that it requires VM-
> > > side implementation -- and this is why it should be included into
> > > JVM.
> >
> > The question is: do we need a new tool that only reinvents the old
> > ones with minimal (I would dare to say non-real world) advantages.
>
> Yes, we do. An year ago, I thought this was a thought (pun intended)
> experiment, and I would probably have the same position -- just use
> the myriad of GC options to configure the existing GC.
> But since then I had interesting talks with people who have use cases
> for the simple/trivial/dumb no-op GC: most of these things are
> captured in JEP.

I just want to point out that people can't judge the change by what you
talked about with other people. I can only read the JEP, and criticize
accordingly. Not everyone has your/Kirk's/whoever's knowledge.

And the JEP, and I re-read the Motivation of the JEP multiple times
just now, does not spell out what is so unique and desireable (I do not
count random guarantees) about this change, or can't be done (almost)
the same with other GCs. What makes it so useful for production use, as
you were rambling on. I can only see a few improvements for devs, and
there are a few theoretically quantifiable claims made, that were not
quantified (not even attempted to). That's why we have been discussing
(apparently in circles) all the time, particularly about the latter.

Also, parts of the reasoning in the motivation, particularly the one
about performance, seem wrong or at least not clear (see the subject of
this email thread) or contrived. There are a few unmentioned, to me
right now, better alternatives than Epsilon (not necessarily
implemented in the VM).

> Java ecosystem is vast, and even 0.1% of use cases add up to
> substantial absolute number of use cases. In the interesting twist  
> of fate, we are even considering backporting Epsilon to JDK 8,
> because this is where the most current Java ecosystem
> is -- and having separate implementation does give nice isolation
> guarantees for backports.

I do not understand the last sentence, sorry. And that 0.1% of use
cases is a number you just invented. I think a few months ago I
actually tried to quantify these number of users with you, with no good
answer. 0.1% seems way too much because otherwise people would be
complaining *much* more loudly.

0.1% would be 1 in 1000. I again dare to say, that not 1 in 1000 VM
users care about "last drop performance"/"performance, functional,
interface testing"/"that odd guarantee" (partially because it's
negligible) instantly, at least not directly.

> Coming to from a personal perspective, Epsilon is like peat whiskey
> for me: first taste feels very wrong and you question the sanity of
> those enjoying it, and then, as you become familiar with it,
> you realize it is just something else, in its essence, and you begin
> to see the appeal. It is not an everyday drink, for sure.

Thanks for the explanation - but I do not drink alcohol at all but your
explanation helped (well, just to support the following statement):
just as the JEP should contain such explanations...

> > > [*] In fact, it is also called out in JEP, the other way around:
> > > fail predictably when a lot is allocated. Over a few last months,
> > > I had a pleasant experience asserting allocation pressure
> > > invariants with just running with Epsilon with given heap and
> > > checking if it fails. When it does, I have the full heap-dump
> > > view of the garbage produced. This turns out to be much more
> > > convenient than I previously anticipated.
> >
> > java "-XX:+UseSerialGC -Xmn<something> -Xms<something>
> > -Xmx<something>
> > -XX:SurvivorRatio=<something> -XX:+DumpHeapAtOome" (or something
> > like
> > this) myapplication
> >
> > seems to give exactly the same information.
>
> Nope, it does not. Because Serial would still attempt at least one GC
> when faced with potential OOME, and that will prune out the floating
> garbage -- and I am interested in *all* allocations. GC guys might
> argue that allocations are cheap, and that GC cycles pruning dead
> objects are also cheap, but the industrial reality is that people
> still hunt down and eliminate garbage allocations with non-ignorable
> performance improvements. The ability to heap dump with no object
> left behind is surprisingly useful. Again, some things are trivial in
> some GC designs. It is trivial to guarantee all allocated objects end
> up in heap dump with the no-op GC.

I did not think of that - and it would be really nice if paragraphs
like the one above would be in the JEP, as motivation and corresponding
explanation in the "Alternatives" section. You know, spelled out in
detail for people. Answering the question "why would I want this (in
production)".

This use case/explanation makes a way better case for the change,
particularly the "can't do that with other collectors" parts, actually
explains why you would want this (particularly) in production, than all
other reasons in the JEP combined.

For me that makes the point of the change much more clear. Thanks.

Please fix the JEP though. It's imho terrible and missing the point you
want to make (at least to persons you did not talk to; maybe you missed
even more).

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Aleksey Shipilev-4
On 12/20/2017 10:25 PM, Thomas Schatzl wrote:
>> This is not about Serial GC. Sergey's argument was that classlib
>> allocations are outside of users' control, and thus locality there is
>> out of users' control either. My counter-point is that some
>> locality waste might be acceptable, as long as the bulk of the work
>> is done by user locality-conscious code anyway.
>
> Yes, but in this case there is no difference between using Epsilon and
> any other GC. All of them benefit. Now Epsilon might benefit more than
> others.

...

> I do not know how much. Probably it won't make a lot of difference,
> because the default evacuation order tends to improve locality (and for
> collectors like Z/Shenandoah I think with move on read access it does
> even more); so unless the user also manually lays out memory in heap
> according to access, then these persons are really really desperate.

See, assuming the GC lays out the objects in the order most beneficial to application is also kinda
wishful thinking: we don't even know whether it should be depth-first, or breadth-first, or
topological, or read-traversal, or something else. All of them seem right for particular classes of
applications. In fact, you can have GCs messing up our nice and tidy object layout, see e.g.
https://bugs.openjdk.java.net/browse/JDK-8024394

In such cases user saying "don't you touch anything, I'll do it myself" might be a better option.
And users implement similar approaches today with flyweight objects encoding over byte[] arrays,
argh! You've got to see it to believe it.


> It might just be better and not much more inconvenience to just not use
> Java in the first place for them without other real good reasons (Imho)

This is an old argument, but people do use Java in mechanical-sympathetic cases where investing in
massaging the code to work right in JVM is better tradeoff than rewriting it out of JVM completely.
What is amusing is that those uses are very high-profile and are substantial contributors of "Java
is vibrant and everywhere" world view. We have to appreciate that, even though sometimes we
deliberately make things harder for them (see e.g: Unsafe compartmentalization).


>> The crucial point is that Epsilon *guarantees* the absence of GC,
>> rather than relying on obscure tuning of current GCs.
>
> Please don't exaggerate here: none of the switches I showed in my
> example are obscure or hard to understand. And they pretty much disable
> all heuristics in mentioned collectors.

Please be aware of expertise trap: those options are obvious to you, knowing how OpenJDK collectors
work. It is odd to expect the same kind of expertize even from power users, who cannot really tell
if those particular Serial GC options give them what they want, is that the strong thing that is by
design, or that is the collateral implementation property, etc. That can probably get better by
capturing the intent with documentation, new options, etc, but then we run into... (next paragraph)


> And it is very unlikely anybody is going to touch Serial in the future,
> because like Epsilon it serves a very particular niche rather well. And
> you know that Serial does not do *anything* outside of pauses by
> design.

Aha! So, if we need something changed in Serial to implement Epsilon-like feature, that would run
into more resistance than having a separate implementation, right? At least I would be much more
wary, because Serial is quite extensively used. Small code duplication seems much less of the
concern than regression in widely used GC, at least from my vantage point.


>> This is about having the guarantees by design, instead of being
>> hopeful about the configuration.
>> Epsilon makes an allocation failure the hard error, no excuses, no
>> misconfiguration opportunities.
>
> This guarantee is nice to have (and is trivially checked by a test to
> avoid accidentally removing this behavior), but the JEP needs to spell
> out why this guarantee is important in particular. I.e. what makes it
> interesting, what can a be achieved *beyond* what can already be done
> in (not too inconvenient) other ways by particularly exploiting the
> guarantee that makes it so unique and useful.
>
> Exaggerating: Establishing a random guarantee does not help a lot.
Yeah, JEP mentions "low-overhead" as the replacement for "no GC", which is confusing. The goal was
to avoid any memory reclamation work, as clearly stated in JEP goals. Guaranteeing no GC is
therefore pretty much the project goal -- not as random as the exaggeration seems to paint it -- and
all the uses naturally evolve from that.


> Sorry, the JEP just does not answer these questions for me at the
> moment. And apparently not for other people you did not talk to in
> person.

Excellent, noted! Let me come up with better JEP text.


> And the JEP, and I re-read the Motivation of the JEP multiple times
> just now, does not spell out what is so unique and desireable (I do not
> count random guarantees) about this change, or can't be done (almost)
> the same with other GCs.

So this is where it seems to go off the rail: we need to emphasize "no GC" is the actual guarantee,
so that readers could not disregard that goal as "random".


> Also, parts of the reasoning in the motivation, particularly the one
> about performance, seem wrong or at least not clear (see the subject of
> this email thread) or contrived. There are a few unmentioned, to me
> right now, better alternatives than Epsilon (not necessarily
> implemented in the VM).

What are they? Are we missing some points from "Alternatives" here?


>> Java ecosystem is vast, and even 0.1% of use cases add up to
>> substantial absolute number of use cases. In the interesting twist  
>> of fate, we are even considering backporting Epsilon to JDK 8,
>> because this is where the most current Java ecosystem
>> is -- and having separate implementation does give nice isolation
>> guarantees for backports.
>
> I do not understand the last sentence, sorry.

You said yourself: "It is very unlikely anybody is going to touch Serial in the future", and I
agree. Touching Serial GC, especially in backports, for implementing Epsilon-like functionality is a
greater risk than having a completely separate no-op GC implementation. I did the trial backport of
Epsilon to 8u, and fits without scary changes to the rest of Hotspot.


> I do not understand the last sentence, sorry. And that 0.1% of use
> cases is a number you just invented. I think a few months ago I
> actually tried to quantify these number of users with you, with no good
> answer.

Well, yeah, 0.1% is invented for the sake of example about the Java ecosystem. My actual go-to pun
is: "With the extraordinary size of Java ecosystem, epsilon-neighborhood of zero applicability
contains non-zero users".


> This use case/explanation makes a way better case for the change,
> particularly the "can't do that with other collectors" parts, actually
> explains why you would want this (particularly) in production, than all
> other reasons in the JEP combined.

Yeah, I did not realize this was the strong suit at the time JEP was drafted. As I said before, you
sometimes find some facets are actually more useful than the others.

------

Process comments below:

> [JEP] is imho terrible and missing the point you want to make (at least to persons you did not
> talk to; maybe you missed even more).
"Terrible"?

> What makes it so useful for production use, as you were rambling on.

"rambling": adj.
 1. (of writing or speech) lengthy and confused or inconsequential.

> I can only see a few improvements for devs, and there are a few theoretically quantifiable claims
> made, that were not quantified (not even attempted to).
Hm. Is there a hard requirement to quantify everything stated in JEP? Because I did quantifications,
for locality, barrier costs, startup improvements after the JEP was submitted. Again, you might just
kindly ask to link the data into the JEP, instead of assuming the submitter is lazy. (Which is also
weird, because Sergey links *my* post about *Epsilon* in the beginning of this thread)


> I am fine to agree to disagree :) But in this case I kept talking to you because I wanted to
> understand the reasons for this change and why it's so useful in production because I knew there
> were some somewhere - just not in the JEP. That's why we are annoying you with "tell me what is
> the use of that change and what makes it so special" all the time.
I understand. I do note, however, that in the spirit of ongoing collaboration, saying "I see
disadvantages A, B, C, and advantages G, H, I, on top of what is written in JEP, and the goal, if I
understand the intent right, should be N, not M?" is quite different from asking "tell me what makes
it so special".


> I just want to point out that people can't judge the change by what you talked about with other
> people. I can only read the JEP, and criticize accordingly. Not everyone has
> your/Kirk's/whoever's knowledge.
Totally. And I have to point out that JEP is neither the code review, nor architectural review, nor
the paper to review, nor something cast in stone. As I understand it, it is supposed to capture the
key points of the idea, and get collaboratively refined to contrast the salient points and
disadvantages. It is expected that experts may have more ideas about refinements *above* what
original submitter meant to write, have suggestions for refining some points, etc. Instead what we
get is "you defend"-style "collaboration" -- which, if continued, will attract much fewer
contributors than OpenJDK really needs.

So, with all above, please excuse me if I get all defensive.

I would really appreciate if the discussions around JEPs were not in the spirit of "prove to us why
we have to consider your terrible ramblings", but rather "let us refine the JEP to clearly highlight
the benefits and disadvantages for OpenJDK". Which seems to finally happen for this JEP, and I am
happy about that.

-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Kirk Pepperdine
Hi,

>
>
> In such cases user saying "don't you touch anything, I'll do it myself" might be a better option.
> And users implement similar approaches today with flyweight objects encoding over byte[] arrays,
> argh! You've got to see it to believe it.

In fact it’s a recommended coding practice if you need to ensure heftier cache line densities. They typically run at an anemic 8-22% and encoding into arrays with fly-weights and pump that up to approach 100%. Lets not discuss pre-fetching improvements over the typical pointer chasing that happens as the collector scramples your heap on each copy phase.

>
>
>> It might just be better and not much more inconvenience to just not use
>> Java in the first place for them without other real good reasons (Imho)
>
> This is an old argument, but people do use Java in mechanical-sympathetic cases where investing in
> massaging the code to work right in JVM is better tradeoff than rewriting it out of JVM completely.
> What is amusing is that those uses are very high-profile and are substantial contributors of "Java
> is vibrant and everywhere" world view. We have to appreciate that, even though sometimes we
> deliberately make things harder for them (see e.g: Unsafe compartmentalization).

+1… there are many many reasons to use Java aside from the technical aspects of that decision and we shouldn’t be discouraging that but forcibly cutting off paths to get useful work done. Unsafe is a classic case of a garbage can class of parts that serve a definite need but there is some mindset that actively works against that need and thus we don’t get proper support for these much needed features.

I think that one of the assumptions here is that there is an even and perfect distribution of knowledge of garbage collection. There is also another assumption that even if that there was a perfect and even distribution of knowledge of GC that selecting the correct collector and tuning it is easy. I can say from experience that neither of these assumptions are even close to being true. Consider the first white paper on how to tune CMS that was published by Sun. This is a paper that you cannot find on the web any where anymore because it was completely wrong. And this was written by one of the best GC experts on the planet. And that is not the only example I can cite. How do you expect mere mortal developers to cope with the experts consistently get it wrong.

>
>
>> I do not understand the last sentence, sorry. And that 0.1% of use
>> cases is a number you just invented. I think a few months ago I
>> actually tried to quantify these number of users with you, with no good
>> answer.
>
> Well, yeah, 0.1% is invented for the sake of example about the Java ecosystem. My actual go-to pun
> is: "With the extraordinary size of Java ecosystem, epsilon-neighborhood of zero applicability
> contains non-zero users”.

I can help here by confusing this with more estimates. IME, those in the low latency space preferred to use iCMS. Now that iCMS is gone, they’ve switched to using some very unconventional configuration with CMS. The vast majority of those people that attend my workshop who don’t have low latency requirements have *never* heard of iCMS. Additionally, (again from my observation), Oracle typically doesn’t reach the vast majority of those in the low latency space thus the results of the informal survey to look at iCMS usage almost completely missed on an entire group of applications that were very much tied to using iCMS. If I scan the GC logs that I’ve collected over the years I can safely say that about 1-2% of them are iCMS logs of which the majority came from the low latency space. I will claim that those that may have the most interest in this collector come from that space. The advantage of this collector over serial or parallel is that there will be *no* GC pauses…. guaranteed. That space does not like stalls of random duration happening at random times. Oh, I forgot about other down stream effects such as writing per data to disk inflating the pause times and/or GC triggering page recycling stalls at the OS level. No, GC is not completely responsible but it adds to the pressures which result in the phenomena being triggered more frequently.

All I can add is that life in the real world rarely resembles life in a benchmark.

Kind regards,
Kirk

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Thomas Schatzl
In reply to this post by Aleksey Shipilev-4
Hi Aleksey,

  I apologize for my somewhat inappropriate words, this has been due to
some frustration; also for the long delay that were due to the winter
holidays.

Let's try to start all over with this... I will try to be constructive
this time. Feel free to remind me if needed.

One purpose of the JEP is to share a problem and propose an idea (often
already accompanied by a solution) to solve them. This problem and the
idea is then discussed by the community, eventually refining it along
the way.

The community then evaluates that idea based on its contents, of course
starting with the people trying to determine whether there is a
problem, what the problem is, and whether the proposed idea will fix
the problem.

For this evaluation to happen, the JEP needs to clearly state the
problem, it's seriousness, and the proposed idea.

It also helps if the JEP is written in a way to make it interesting for
the community to read it, and respond. The less thinking a reader has
to do to answer whether he is impacted or not, and whether and by how
much it would simplify the life of himself or in general Java users,
the more people will feel urged to get this in (or at least not
deterred).

Finally, I assume you do understand that, in general, although there is
always a certain level of duplication in the VM, but if a change only
solves the problems that existing code already solves, or solves
problems almost nobody has, or it does not give enough benefit (also
dependent on the complexity of a change), it makes it a hard(er) sell?

So the JEP template (http://openjdk.java.net/jeps/2) provides some
questions on how to structure this idea proposal and what to put into
the various sections.

In general this is to help you providing the relevant information to
the community. While this might be onerous for a writer at first
glance, it saves everyone else lots of time trying to find out what and
how you want to solve something.


I am going over the Motivation section in detail in the remainder of
this email, with some comments at the end about the Alternatives one
which seem to be the most important here.

The JEP template states under the Motivation section:

"Motivation
----------

// Why should this work be done?  What are its benefits?  Who's asking
// for it?  How does it compare to the competition, if any?"


Now let me try to associate these questions to the relevant parts of
the existing JEP 318 (http://openjdk.java.net/jeps/318) text.

And please, before reading below, I really do not want to shoot down
the proposal if you see a question mark. It should indicate just that
there is a question where I honestly do not know the answer to, but
which I hope you do. Similarly if I raise some concerns about some
statements I expect you to notice that there may be something missing
here, nothing else. I.e. not necessarily that I am "right" about
something. You said you already talked about it many times with other
people in the field, thought it over for a long time, so hopefully
these questions can be answered quickly, and in the future the JEP also
contains this information for other people too.

Some may not need an answer as they only try to make you think about
the seriousness of a stated problem.

JEP text: "Java implementations are well known for a broad choice of
highly configurable GC implementations."

Potential answer to "Why should this work be done?". Or does the
sentence indicate we need another GC because we already have so many,
and another does not hurt? I am asking this in full seriousness, I
really do not know. Or is this only an introductory sentence without
meaning?

JEP text: "There are four use cases where a trivial no-op GC proves
useful."

This seems to be a transition sentence, but is fine to make it flow
better.

Reading this, and given that only a list of benefits follows, I assume
that these two sentences were supposed to answer the "Why should this
work be done? Who's asking for it?" questions from the JEP.

In the earlier email you mentioned these power users that want full
control. Mention them here. Define them. Also mention other user groups
that might be interested. Particularly groups the benefits list could
refer to.

Let's go into these benefits in more detail:

JEP text: "Performance testing. Having a GC that does almost nothing is
a useful tool to do differential performance analysis for other, real
GCs. Having a no-op GC can help to filter out GC-induced performance
artifacts."

Benefit. Maybe it would be useful to list a few of these performance
artifacts here ("... , e.g. barrier code, concurrent threads").

Who are the benefactors of this? Not sure about these "power users"
(see M. Berger's response in this exact thread). Probably developers of
new GC algorithms?

An alternative could be a developer just nop'ing out the relevant GC
interface section. That is somewhat cumbersome, but for how many users
is this a problem? Spell that out in the appropriate Alternatives
section.

Also tell that using Epsilon GC for barrier testing may not be an ideal
tool, because all other existing collectors are generational (but in
the future it might apply to Shenandoah unless it goes generational
too, idk), and testing generational barriers on a non-generational heap
may not give a complete picture of barrier overhead.

JEP text: "Functional testing. For Java code testing, a way to
establish a threshold for allocated memory is useful to assert memory
pressure invariants. Today, we have to pick up the allocation data from
MXBeans, or even resort to parsing GC logs. Having a GC that accepts
only the bounded number of allocations, and fails on heap exhaustion,
simplifies testing."

Benefit. For regression testing, in how many cases do you think it is
sufficient (or in what circumstances) to get a fail/no-fail answer
only?
This seems to pass work on a failure to the dev, them needing to write
another test that also prints and monitors the memory usage increases
over time anyway.
How much work, given that you already need to monitor memory usage is
the test to fail when heap usage goes above a threshold then?

"VM interface testing. For VM development purposes, having a simple GC
helps to understand the absolute minimum required from the VM-GC
interface to have a functional allocator. This serves as proof that the
VM-GC interface is sane, which is important in lieu of JEP 304
("Garbage Collector Interface")."

Benefit. Who are the (main) benefactors for that - probably developers?
For a developer, how much is that benefit if there are already 5 or 6
implementations of that interface?

"Last-drop performance improvements. For ultra-latency-sensitive
applications, where developers are conscious about memory allocations
and know the application memory footprint exactly, or even have
(almost) completely garbage-free applications. In those applications,
GC cycles may be considered an implementation bug that wastes CPU
cycles for no good reason."

This is the only benefit in this list that actually mentions its target
group. I assume it is those power users (not necessarily developers
only?), that are ultra-latency aware. This paragraph further
characterizes them that they are also throughput conscious.
The discussion earlier also characterized them as also being very
conscious about memory layout etc, they do not want object reordering
because it is inconsistent between GCs (which is a different issue, and
I do not want to discuss it here).

From what I gathered so far, they want absolute control over memory
management - but the real question is whether this is their real or
only problem with the Java VM to achieve consistent VM behavior.
There are certainly more components in the VM that introduce
potentially more significant jitter (now assuming that that power user
can set heap sizes accordingly to use e.g. Serial GC).

This execution consistency is maybe another goal that is even more
important than last-drop performance.

It may be useful to investigate the problem of these power users in
more detail, and see if we could provide a (more?) complete solution
for them.


"Extremely short lived jobs are one example of this."

I do not understand the use of Epsilon in such use case. The
alternative I can see would be to restart the VM after every short
lived job (something for the Alternatives section). That seems strange
to me, depending on the definition of a "short lived job", particularly
if nothing survives after execution of that short lived job, a GC will
be extremely fast.

Further I assume this example is about FaaS (Function-as-a-service) and
their users, and while there may be an overlap with those "power
users", I would expect the "regular java users" a way larger group than
the power users. There may be an overlap with those power users, power
users probably would not want to incur the associated loss of control.

"There are also cases when restarting the JVM -- letting load balancers
figure out failover -- is sometimes a better recovery strategy than
accepting a GC cycle."

I really can't find a good example where a GC, particularly in the
situation that has been described so far, also for these short-lived
jobs, where a GC (on an almost empty heap) is not at least as fast as a
restart.

It would make for a very good paragraph explaining this use case in the
alternatives section.

Another problem with these two sentences to me is (and I am by no means
a "FaaS power user") that I believe that waiting for the VM to
crash/shut down to steer the load balancers is not a good strategy.
Maybe you can give some more information about this use case?

"Even for non-allocating workloads, the choice of GC means choosing the
set of GC barriers that the workload has to use, even if no GC cycle
actually happens. Most JDK GCs are generational, and they emit at least
one reference write barrier. Avoiding this barrier brings the last bit
of performance improvement."

(_All_ JDK GCs are currently generational)

Now, as mentioned earlier in the thread, when talking about performance
improvements, it would be nice to mention the potential gains that can
be made (or elsewhere, like in the alternatives section). There is
already an implementation, and so you can measure this too.

Please make your comparison in context: since this whole paragraph is
about last-drop performance improvements for power users, a balanced
comparison would probably only be a comparison that such a power user
would do - i.e. not running the VM with randomly selected default
options that arbitrarily penalizes your competition.

In the earlier email I only directly asked for performance numbers
because in order to streamline this discussion, and given that you are
a well-known performance and benchmark guru (afaik you were "R"eviewer
long before me joining) it seemed a logical request. If you can't find
numbers, there is also the reference ("Barriers, Friendlier Still" or
so from Blackburn et al I think) I got that is also mentioned iirc in
the very good Jones GC book.
"Real" newbies I would just ask to perform this test.


In our discussion we found at least one more, actually unique benefit
(the one about getting correct heap dumps on failure).


Of course there is a limit on the length of that section and others
(i.e. considering the attention span of your readership), but all
questions asked by the JEP template should be answered in the
corresponding section. There is some intentional overlap in the JEP,
particularly in the first three sections, similar to a scientific paper
so that different groups of readers need only read the sections they
are interested in to see whether this change is actually affecting them
(and interesting to follow).

It shouldn't be as long as a scientific paper though, so if you think a
section is too long, drop the less impactful benefits, and other parts
of the JEP will automatically follow.

Again, given your experience with the VM I assume you know alternatives
as good or even better than me to make a balanced assessment here.

Otherwise, keep them and please raise specific questions.

As for the Alternatives section, it is the same procedure, start with
answering the questions raised in the template:

"Alternatives
------------

// Did you consider any alternative approaches or technologies?  If so
// then please describe them here and explain why they were not
// chosen."

I would assume that for all of these benefits we can easily come up
with alternative ways of doing the same or a similar thing (I already
stated a few alternatives that I think are very valid in this or
previous emails; some valid ones are already in the JEP), and why we
would want to particularly do it this way given the context of that
benefit (e.g. the user group). If there is no alternative, add a
sentence that says so in that section.

Again, try to make these alternative review balanced, and in context of
the users the benefit is for.

This section should imho also include a discussion of "mostly complete
alternatives", as suggested in this email thread already, e.g. adding a
-XX:+DieOnFirstGC switch, and reasons for and against it.

Please understand that the JEP will be the reference to talk about, not
some email or private offline discussions. Keeping that in mind I think
discussions will go much smoother.

I hope I made clear now why I, unfortunately not in a very friendly way
(apologies again), suggested that the current JEP text lacks the
required answers to the questions stated in the JEP template to (re-
)start a hopefully more focused discussion.

Thanks,
  Thomas

Reply | Threaded
Open this post in threaded view
|

Re: EpsilonGC and throughput.

Mikael Vidstedt-3

Thomas,

Thank you for bringing up these questions and comments. While I think it would be great to get some additional data and use case information for this feature added to the JEP, the isolated nature of the feature along with the fact that it is experimental means that the impact of making it is relatively small. With that in mind, I suggest that we move forward with this JEP/feature, and that more information can be added if/when it’s available. In line with that I will be endorsing the JEP shortly.

Cheers,
Mikael

> On Jan 8, 2018, at 8:45 AM, Thomas Schatzl <[hidden email]> wrote:
>
> Hi Aleksey,
>
>  I apologize for my somewhat inappropriate words, this has been due to
> some frustration; also for the long delay that were due to the winter
> holidays.
>
> Let's try to start all over with this... I will try to be constructive
> this time. Feel free to remind me if needed.
>
> One purpose of the JEP is to share a problem and propose an idea (often
> already accompanied by a solution) to solve them. This problem and the
> idea is then discussed by the community, eventually refining it along
> the way.
>
> The community then evaluates that idea based on its contents, of course
> starting with the people trying to determine whether there is a
> problem, what the problem is, and whether the proposed idea will fix
> the problem.
>
> For this evaluation to happen, the JEP needs to clearly state the
> problem, it's seriousness, and the proposed idea.
>
> It also helps if the JEP is written in a way to make it interesting for
> the community to read it, and respond. The less thinking a reader has
> to do to answer whether he is impacted or not, and whether and by how
> much it would simplify the life of himself or in general Java users,
> the more people will feel urged to get this in (or at least not
> deterred).
>
> Finally, I assume you do understand that, in general, although there is
> always a certain level of duplication in the VM, but if a change only
> solves the problems that existing code already solves, or solves
> problems almost nobody has, or it does not give enough benefit (also
> dependent on the complexity of a change), it makes it a hard(er) sell?
>
> So the JEP template (http://openjdk.java.net/jeps/2) provides some
> questions on how to structure this idea proposal and what to put into
> the various sections.
>
> In general this is to help you providing the relevant information to
> the community. While this might be onerous for a writer at first
> glance, it saves everyone else lots of time trying to find out what and
> how you want to solve something.
>
>
> I am going over the Motivation section in detail in the remainder of
> this email, with some comments at the end about the Alternatives one
> which seem to be the most important here.
>
> The JEP template states under the Motivation section:
>
> "Motivation
> ----------
>
> // Why should this work be done?  What are its benefits?  Who's asking
> // for it?  How does it compare to the competition, if any?"
>
>
> Now let me try to associate these questions to the relevant parts of
> the existing JEP 318 (http://openjdk.java.net/jeps/318) text.
>
> And please, before reading below, I really do not want to shoot down
> the proposal if you see a question mark. It should indicate just that
> there is a question where I honestly do not know the answer to, but
> which I hope you do. Similarly if I raise some concerns about some
> statements I expect you to notice that there may be something missing
> here, nothing else. I.e. not necessarily that I am "right" about
> something. You said you already talked about it many times with other
> people in the field, thought it over for a long time, so hopefully
> these questions can be answered quickly, and in the future the JEP also
> contains this information for other people too.
>
> Some may not need an answer as they only try to make you think about
> the seriousness of a stated problem.
>
> JEP text: "Java implementations are well known for a broad choice of
> highly configurable GC implementations."
>
> Potential answer to "Why should this work be done?". Or does the
> sentence indicate we need another GC because we already have so many,
> and another does not hurt? I am asking this in full seriousness, I
> really do not know. Or is this only an introductory sentence without
> meaning?
>
> JEP text: "There are four use cases where a trivial no-op GC proves
> useful."
>
> This seems to be a transition sentence, but is fine to make it flow
> better.
>
> Reading this, and given that only a list of benefits follows, I assume
> that these two sentences were supposed to answer the "Why should this
> work be done? Who's asking for it?" questions from the JEP.
>
> In the earlier email you mentioned these power users that want full
> control. Mention them here. Define them. Also mention other user groups
> that might be interested. Particularly groups the benefits list could
> refer to.
>
> Let's go into these benefits in more detail:
>
> JEP text: "Performance testing. Having a GC that does almost nothing is
> a useful tool to do differential performance analysis for other, real
> GCs. Having a no-op GC can help to filter out GC-induced performance
> artifacts."
>
> Benefit. Maybe it would be useful to list a few of these performance
> artifacts here ("... , e.g. barrier code, concurrent threads").
>
> Who are the benefactors of this? Not sure about these "power users"
> (see M. Berger's response in this exact thread). Probably developers of
> new GC algorithms?
>
> An alternative could be a developer just nop'ing out the relevant GC
> interface section. That is somewhat cumbersome, but for how many users
> is this a problem? Spell that out in the appropriate Alternatives
> section.
>
> Also tell that using Epsilon GC for barrier testing may not be an ideal
> tool, because all other existing collectors are generational (but in
> the future it might apply to Shenandoah unless it goes generational
> too, idk), and testing generational barriers on a non-generational heap
> may not give a complete picture of barrier overhead.
>
> JEP text: "Functional testing. For Java code testing, a way to
> establish a threshold for allocated memory is useful to assert memory
> pressure invariants. Today, we have to pick up the allocation data from
> MXBeans, or even resort to parsing GC logs. Having a GC that accepts
> only the bounded number of allocations, and fails on heap exhaustion,
> simplifies testing."
>
> Benefit. For regression testing, in how many cases do you think it is
> sufficient (or in what circumstances) to get a fail/no-fail answer
> only?
> This seems to pass work on a failure to the dev, them needing to write
> another test that also prints and monitors the memory usage increases
> over time anyway.
> How much work, given that you already need to monitor memory usage is
> the test to fail when heap usage goes above a threshold then?
>
> "VM interface testing. For VM development purposes, having a simple GC
> helps to understand the absolute minimum required from the VM-GC
> interface to have a functional allocator. This serves as proof that the
> VM-GC interface is sane, which is important in lieu of JEP 304
> ("Garbage Collector Interface")."
>
> Benefit. Who are the (main) benefactors for that - probably developers?
> For a developer, how much is that benefit if there are already 5 or 6
> implementations of that interface?
>
> "Last-drop performance improvements. For ultra-latency-sensitive
> applications, where developers are conscious about memory allocations
> and know the application memory footprint exactly, or even have
> (almost) completely garbage-free applications. In those applications,
> GC cycles may be considered an implementation bug that wastes CPU
> cycles for no good reason."
>
> This is the only benefit in this list that actually mentions its target
> group. I assume it is those power users (not necessarily developers
> only?), that are ultra-latency aware. This paragraph further
> characterizes them that they are also throughput conscious.
> The discussion earlier also characterized them as also being very
> conscious about memory layout etc, they do not want object reordering
> because it is inconsistent between GCs (which is a different issue, and
> I do not want to discuss it here).
>
> From what I gathered so far, they want absolute control over memory
> management - but the real question is whether this is their real or
> only problem with the Java VM to achieve consistent VM behavior.
> There are certainly more components in the VM that introduce
> potentially more significant jitter (now assuming that that power user
> can set heap sizes accordingly to use e.g. Serial GC).
>
> This execution consistency is maybe another goal that is even more
> important than last-drop performance.
>
> It may be useful to investigate the problem of these power users in
> more detail, and see if we could provide a (more?) complete solution
> for them.
>
>
> "Extremely short lived jobs are one example of this."
>
> I do not understand the use of Epsilon in such use case. The
> alternative I can see would be to restart the VM after every short
> lived job (something for the Alternatives section). That seems strange
> to me, depending on the definition of a "short lived job", particularly
> if nothing survives after execution of that short lived job, a GC will
> be extremely fast.
>
> Further I assume this example is about FaaS (Function-as-a-service) and
> their users, and while there may be an overlap with those "power
> users", I would expect the "regular java users" a way larger group than
> the power users. There may be an overlap with those power users, power
> users probably would not want to incur the associated loss of control.
>
> "There are also cases when restarting the JVM -- letting load balancers
> figure out failover -- is sometimes a better recovery strategy than
> accepting a GC cycle."
>
> I really can't find a good example where a GC, particularly in the
> situation that has been described so far, also for these short-lived
> jobs, where a GC (on an almost empty heap) is not at least as fast as a
> restart.
>
> It would make for a very good paragraph explaining this use case in the
> alternatives section.
>
> Another problem with these two sentences to me is (and I am by no means
> a "FaaS power user") that I believe that waiting for the VM to
> crash/shut down to steer the load balancers is not a good strategy.
> Maybe you can give some more information about this use case?
>
> "Even for non-allocating workloads, the choice of GC means choosing the
> set of GC barriers that the workload has to use, even if no GC cycle
> actually happens. Most JDK GCs are generational, and they emit at least
> one reference write barrier. Avoiding this barrier brings the last bit
> of performance improvement."
>
> (_All_ JDK GCs are currently generational)
>
> Now, as mentioned earlier in the thread, when talking about performance
> improvements, it would be nice to mention the potential gains that can
> be made (or elsewhere, like in the alternatives section). There is
> already an implementation, and so you can measure this too.
>
> Please make your comparison in context: since this whole paragraph is
> about last-drop performance improvements for power users, a balanced
> comparison would probably only be a comparison that such a power user
> would do - i.e. not running the VM with randomly selected default
> options that arbitrarily penalizes your competition.
>
> In the earlier email I only directly asked for performance numbers
> because in order to streamline this discussion, and given that you are
> a well-known performance and benchmark guru (afaik you were "R"eviewer
> long before me joining) it seemed a logical request. If you can't find
> numbers, there is also the reference ("Barriers, Friendlier Still" or
> so from Blackburn et al I think) I got that is also mentioned iirc in
> the very good Jones GC book.
> "Real" newbies I would just ask to perform this test.
>
>
> In our discussion we found at least one more, actually unique benefit
> (the one about getting correct heap dumps on failure).
>
>
> Of course there is a limit on the length of that section and others
> (i.e. considering the attention span of your readership), but all
> questions asked by the JEP template should be answered in the
> corresponding section. There is some intentional overlap in the JEP,
> particularly in the first three sections, similar to a scientific paper
> so that different groups of readers need only read the sections they
> are interested in to see whether this change is actually affecting them
> (and interesting to follow).
>
> It shouldn't be as long as a scientific paper though, so if you think a
> section is too long, drop the less impactful benefits, and other parts
> of the JEP will automatically follow.
>
> Again, given your experience with the VM I assume you know alternatives
> as good or even better than me to make a balanced assessment here.
>
> Otherwise, keep them and please raise specific questions.
>
> As for the Alternatives section, it is the same procedure, start with
> answering the questions raised in the template:
>
> "Alternatives
> ------------
>
> // Did you consider any alternative approaches or technologies?  If so
> // then please describe them here and explain why they were not
> // chosen."
>
> I would assume that for all of these benefits we can easily come up
> with alternative ways of doing the same or a similar thing (I already
> stated a few alternatives that I think are very valid in this or
> previous emails; some valid ones are already in the JEP), and why we
> would want to particularly do it this way given the context of that
> benefit (e.g. the user group). If there is no alternative, add a
> sentence that says so in that section.
>
> Again, try to make these alternative review balanced, and in context of
> the users the benefit is for.
>
> This section should imho also include a discussion of "mostly complete
> alternatives", as suggested in this email thread already, e.g. adding a
> -XX:+DieOnFirstGC switch, and reasons for and against it.
>
> Please understand that the JEP will be the reference to talk about, not
> some email or private offline discussions. Keeping that in mind I think
> discussions will go much smoother.
>
> I hope I made clear now why I, unfortunately not in a very friendly way
> (apologies again), suggested that the current JEP text lacks the
> required answers to the questions stated in the JEP template to (re-
> )start a hopefully more focused discussion.
>
> Thanks,
>  Thomas
>