JIT stops compiling after a while (java 8u45)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

JIT stops compiling after a while (java 8u45)

nileema
We are seeing an issue with the CodeCache becoming full which causes the compiler to be disabled in jdk-8u45 to jdk-8u72.

We had seen a similar issue in Java7 (old issue: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-August/011333.html). This issue went away with earlier versions of Java 8.

We used the test http://github.com/martint/jittest to compare the behavior of jdk-8u25 and jdk-8u45. For this test, we did not see any CodeCache full messages with jdk-8u25  but did see them with 8u45+ (8u60  and 8u74)

Test results comparing 8u25, 8u45 and 8u74: https://gist.github.com/nileema/6fb667a215e95919242f

In the results you can see that 8u25 starts collecting the code cache much sooner than 8u45. 8u45 very quickly hits the limit for code cache. If we force a full gc when it is about to hit the code cache limit, we see the code cache size go down.

Is this a known issue?

Thanks!

Nileema
Reply | Threaded
Open this post in threaded view
|

Re: JIT stops compiling after a while (java 8u45)

Tobias Hartmann-2
Hi Nileema,

thanks for reporting this issue!

CC'ing the GC team because this seems to be a GC issue (see evaluation below).

On 29.02.2016 23:59, nileema wrote:
> We are seeing an issue with the CodeCache becoming full which causes the
> compiler to be disabled in jdk-8u45 to jdk-8u72.
>
> We had seen a similar issue in Java7 (old issue:
> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-August/011333.html).
> This issue went away with earlier versions of Java 8.

Reading the old conversation, I'm wondering if this could again be a problem with OSR nmethods that are not flushed? The bug (JDK-8023191) is still open - now assigned to me.

Doing a quick experiment, it looks like we mostly compile OSR methods:
  22129 2137 %     3       Runnable_412::run @ 4 (31 bytes)
  22130 2189 %     4       Runnable_371::run @ 4 (31 bytes)
  22134 2129 %     3       Runnable_376::run @ 4 (31 bytes)
  22136 2109 %     3       Runnable_410::run @ 4 (31 bytes)
 
Currently, OSR nmethods are not flushed just because the code cache is full but only if the nmethod becomes invalid (class loading/unloading, uncommon trap, ..)

With your test, class unloading should happen and therefore the OSR nmethods *should* be flushed.

> We used the test http://github.com/martint/jittest to compare the behavior
> of jdk-8u25 and jdk-8u45. For this test, we did not see any CodeCache full
> messages with jdk-8u25  but did see them with 8u45+ (8u60  and 8u74)
> Test results comparing 8u25, 8u45 and 8u74:
> https://gist.github.com/nileema/6fb667a215e95919242f
>
> In the results you can see that 8u25 starts collecting the code cache much
> sooner than 8u45. 8u45 very quickly hits the limit for code cache. If we
> force a full gc when it is about to hit the code cache limit, we see the
> code cache size go down.

You can use the following flags to get additional information:
-XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintMethodFlushing -XX:+TraceClassUnloading

I did some more experiments with 8u45:

java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -XX:+UseG1GC -jar jittest-1.0-SNAPSHOT-standalone.jar | grep "Unloading"
-> We do *not* unload any classes. The code cache fills up with OSR nmethods that are not flushed.

Removing the -XX:+UseG1GC flag solves the issue:

java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -jar jittest-1.0-SNAPSHOT-standalone.jar | grep Unloading
-> Prints plenty of [Unloading class Runnable_40 0x00000007c0086028] messages and the code cache does not fill up.
-> OSR nmethods are flushed because the classes are unloaded:
   21670  970 %     4       Runnable_87::run @ -2 (31 bytes)   made zombie

The log files look good:

1456825330672   112939  10950016        10195496        111.28
1456825331675   118563  11432256        10467176        112.41
1456825332678   125935  11972928        10778432        115.72
[Unloading class Runnable_2498 0x00000007c0566028]
...
[Unloading class Runnable_34 0x00000007c0082028]
1456825333684   131493  10220608        5382976         117.46
1456825334688   137408  10359296        5636120         116.81
1456825335692   143593  7635136         5914624         114.21

After the code cache fills up, we unload classes and therefore flush methods and start over again.

I checked for several releases if classes are unloaded:
- 8u27: success
- 8u33: success
- 8u40: fail
- 8u45: fail
- 8u76: fail

The regression was introduced in 8u40.

I also tried with the latest JDK 9 build and it fails as well (had to change the bean name from "Code Cache" to "CodeCache" and run with -XX:-SegmentedCodeCache). Again, -XX:-UseG1GC -XX:+UseParallelGC solves the problem.

Can someone from the GC team have a look?

> Is this a known issue?

I'm not aware of any related issue.

Best regards,
Tobias

> Thanks!
>
> Nileema
>
>
>
> --
> View this message in context: http://openjdk.5641.n7.nabble.com/JIT-stops-compiling-after-a-while-java-8u45-tp259603.html
> Sent from the OpenJDK Hotspot Compiler Development List mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

Re: JIT stops compiling after a while (java 8u45)

Tobias Hartmann-2
Hi,

is just had a another look and it turned out that even with 8u40+ class unloading is triggered. I missed that because it happens *much* later (compared to 8u33) when the code cache already filled up and compilation is disabled. At this point we don't recover because new classes are loaded and new OSR nmethods are compiled rapidly.

Summary:
The code cache fills up due to OSR nmethods that are not being flushed. With 8u33 and earlier, G1 did more aggressive class unloading (probably due to more allocations or different heuristics) and this allowed the sweeper to flush enough OSR nmethods to continue compilation. With 8u40 and later, class unloading happens long after the code cache is full.

I think we should fix this by flushing "cold" OSR nmethods as well (JDK-8023191). Thomas Schatzl mentioned that we could also trigger a concurrent mark if the code cache is full and hope that some classes are unloaded but I'm afraid this is too invasive (and does not help much in the general case).

Opinions?

Best regards,
Tobias

On 01.03.2016 11:27, Tobias Hartmann wrote:

> Hi Nileema,
>
> thanks for reporting this issue!
>
> CC'ing the GC team because this seems to be a GC issue (see evaluation below).
>
> On 29.02.2016 23:59, nileema wrote:
>> We are seeing an issue with the CodeCache becoming full which causes the
>> compiler to be disabled in jdk-8u45 to jdk-8u72.
>>
>> We had seen a similar issue in Java7 (old issue:
>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-August/011333.html).
>> This issue went away with earlier versions of Java 8.
>
> Reading the old conversation, I'm wondering if this could again be a problem with OSR nmethods that are not flushed? The bug (JDK-8023191) is still open - now assigned to me.
>
> Doing a quick experiment, it looks like we mostly compile OSR methods:
>   22129 2137 %     3       Runnable_412::run @ 4 (31 bytes)
>   22130 2189 %     4       Runnable_371::run @ 4 (31 bytes)
>   22134 2129 %     3       Runnable_376::run @ 4 (31 bytes)
>   22136 2109 %     3       Runnable_410::run @ 4 (31 bytes)
>  
> Currently, OSR nmethods are not flushed just because the code cache is full but only if the nmethod becomes invalid (class loading/unloading, uncommon trap, ..)
>
> With your test, class unloading should happen and therefore the OSR nmethods *should* be flushed.
>
>> We used the test http://github.com/martint/jittest to compare the behavior
>> of jdk-8u25 and jdk-8u45. For this test, we did not see any CodeCache full
>> messages with jdk-8u25  but did see them with 8u45+ (8u60  and 8u74)
>> Test results comparing 8u25, 8u45 and 8u74:
>> https://gist.github.com/nileema/6fb667a215e95919242f
>>
>> In the results you can see that 8u25 starts collecting the code cache much
>> sooner than 8u45. 8u45 very quickly hits the limit for code cache. If we
>> force a full gc when it is about to hit the code cache limit, we see the
>> code cache size go down.
>
> You can use the following flags to get additional information:
> -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintMethodFlushing -XX:+TraceClassUnloading
>
> I did some more experiments with 8u45:
>
> java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -XX:+UseG1GC -jar jittest-1.0-SNAPSHOT-standalone.jar | grep "Unloading"
> -> We do *not* unload any classes. The code cache fills up with OSR nmethods that are not flushed.
>
> Removing the -XX:+UseG1GC flag solves the issue:
>
> java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -jar jittest-1.0-SNAPSHOT-standalone.jar | grep Unloading
> -> Prints plenty of [Unloading class Runnable_40 0x00000007c0086028] messages and the code cache does not fill up.
> -> OSR nmethods are flushed because the classes are unloaded:
>    21670  970 %     4       Runnable_87::run @ -2 (31 bytes)   made zombie
>
> The log files look good:
>
> 1456825330672   112939  10950016        10195496        111.28
> 1456825331675   118563  11432256        10467176        112.41
> 1456825332678   125935  11972928        10778432        115.72
> [Unloading class Runnable_2498 0x00000007c0566028]
> ...
> [Unloading class Runnable_34 0x00000007c0082028]
> 1456825333684   131493  10220608        5382976         117.46
> 1456825334688   137408  10359296        5636120         116.81
> 1456825335692   143593  7635136         5914624         114.21
>
> After the code cache fills up, we unload classes and therefore flush methods and start over again.
>
> I checked for several releases if classes are unloaded:
> - 8u27: success
> - 8u33: success
> - 8u40: fail
> - 8u45: fail
> - 8u76: fail
>
> The regression was introduced in 8u40.
>
> I also tried with the latest JDK 9 build and it fails as well (had to change the bean name from "Code Cache" to "CodeCache" and run with -XX:-SegmentedCodeCache). Again, -XX:-UseG1GC -XX:+UseParallelGC solves the problem.
>
> Can someone from the GC team have a look?
>
>> Is this a known issue?
>
> I'm not aware of any related issue.
>
> Best regards,
> Tobias
>
>> Thanks!
>>
>> Nileema
>>
>>
>>
>> --
>> View this message in context: http://openjdk.5641.n7.nabble.com/JIT-stops-compiling-after-a-while-java-8u45-tp259603.html
>> Sent from the OpenJDK Hotspot Compiler Development List mailing list archive at Nabble.com.
>>
Reply | Threaded
Open this post in threaded view
|

Re: JIT stops compiling after a while (java 8u45)

Mikael Gerdin
Hi,

On 2016-03-01 13:35, Tobias Hartmann wrote:
> Hi,
>
> is just had a another look and it turned out that even with 8u40+ class unloading is triggered. I missed that because it happens *much* later (compared to 8u33) when the code cache already filled up and compilation is disabled. At this point we don't recover because new classes are loaded and new OSR nmethods are compiled rapidly.
>
> Summary:
> The code cache fills up due to OSR nmethods that are not being flushed. With 8u33 and earlier, G1 did more aggressive class unloading (probably due to more allocations or different heuristics) and this allowed the sweeper to flush enough OSR nmethods to continue compilation. With 8u40 and later, class unloading happens long after the code cache is full.

Before 8u40 G1 could only unload classes at Full GCs.
After 8u40 G1 can unload classes at the end of a concurrent GC cycle,
avoiding Full GC.

If you run the test with CMS with +CMSClassUnloadingEnabled you will
probably see similar problematic results since the class unloading in G1
is very similar to the one in CMS.
I haven't investigated in depth why the classes do not get unloaded in
the G1 and CMS cases but there are several known quirks with how
concurrent class unloading behaves which causes them to unload classes
later than the serial Full GC.

Running G1 with -XX:-ClassUnloadingWithConcurrentMark
or CMS with -XX:-CMSClassUnloadingEnabled
disables concurrent class unloading completely and works around the
issue you are seeing.

For real world applications I hope that this is a much smaller issue but
if you must load and execute loads and loads of short lived classes then
it might be reasonable to disable concurrent class unloading (at the
cost of getting serial Full gcs instead).


>
> I think we should fix this by flushing "cold" OSR nmethods as well (JDK-8023191). Thomas Schatzl mentioned that we could also trigger a concurrent mark if the code cache is full and hope that some classes are unloaded but I'm afraid this is too invasive (and does not help much in the general case).

If it is possible to flush OSR nmethods without doing a full class
unloading cycle then I think that path is prefereable.

/Mikael

>
> Opinions?
>
> Best regards,
> Tobias
>
> On 01.03.2016 11:27, Tobias Hartmann wrote:
>> Hi Nileema,
>>
>> thanks for reporting this issue!
>>
>> CC'ing the GC team because this seems to be a GC issue (see evaluation below).
>>
>> On 29.02.2016 23:59, nileema wrote:
>>> We are seeing an issue with the CodeCache becoming full which causes the
>>> compiler to be disabled in jdk-8u45 to jdk-8u72.
>>>
>>> We had seen a similar issue in Java7 (old issue:
>>> http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-August/011333.html).
>>> This issue went away with earlier versions of Java 8.
>>
>> Reading the old conversation, I'm wondering if this could again be a problem with OSR nmethods that are not flushed? The bug (JDK-8023191) is still open - now assigned to me.
>>
>> Doing a quick experiment, it looks like we mostly compile OSR methods:
>>    22129 2137 %     3       Runnable_412::run @ 4 (31 bytes)
>>    22130 2189 %     4       Runnable_371::run @ 4 (31 bytes)
>>    22134 2129 %     3       Runnable_376::run @ 4 (31 bytes)
>>    22136 2109 %     3       Runnable_410::run @ 4 (31 bytes)
>>
>> Currently, OSR nmethods are not flushed just because the code cache is full but only if the nmethod becomes invalid (class loading/unloading, uncommon trap, ..)
>>
>> With your test, class unloading should happen and therefore the OSR nmethods *should* be flushed.
>>
>>> We used the test http://github.com/martint/jittest to compare the behavior
>>> of jdk-8u25 and jdk-8u45. For this test, we did not see any CodeCache full
>>> messages with jdk-8u25  but did see them with 8u45+ (8u60  and 8u74)
>>> Test results comparing 8u25, 8u45 and 8u74:
>>> https://gist.github.com/nileema/6fb667a215e95919242f
>>>
>>> In the results you can see that 8u25 starts collecting the code cache much
>>> sooner than 8u45. 8u45 very quickly hits the limit for code cache. If we
>>> force a full gc when it is about to hit the code cache limit, we see the
>>> code cache size go down.
>>
>> You can use the following flags to get additional information:
>> -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintMethodFlushing -XX:+TraceClassUnloading
>>
>> I did some more experiments with 8u45:
>>
>> java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -XX:+UseG1GC -jar jittest-1.0-SNAPSHOT-standalone.jar | grep "Unloading"
>> -> We do *not* unload any classes. The code cache fills up with OSR nmethods that are not flushed.
>>
>> Removing the -XX:+UseG1GC flag solves the issue:
>>
>> java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -jar jittest-1.0-SNAPSHOT-standalone.jar | grep Unloading
>> -> Prints plenty of [Unloading class Runnable_40 0x00000007c0086028] messages and the code cache does not fill up.
>> -> OSR nmethods are flushed because the classes are unloaded:
>>     21670  970 %     4       Runnable_87::run @ -2 (31 bytes)   made zombie
>>
>> The log files look good:
>>
>> 1456825330672   112939  10950016        10195496        111.28
>> 1456825331675   118563  11432256        10467176        112.41
>> 1456825332678   125935  11972928        10778432        115.72
>> [Unloading class Runnable_2498 0x00000007c0566028]
>> ...
>> [Unloading class Runnable_34 0x00000007c0082028]
>> 1456825333684   131493  10220608        5382976         117.46
>> 1456825334688   137408  10359296        5636120         116.81
>> 1456825335692   143593  7635136         5914624         114.21
>>
>> After the code cache fills up, we unload classes and therefore flush methods and start over again.
>>
>> I checked for several releases if classes are unloaded:
>> - 8u27: success
>> - 8u33: success
>> - 8u40: fail
>> - 8u45: fail
>> - 8u76: fail
>>
>> The regression was introduced in 8u40.
>>
>> I also tried with the latest JDK 9 build and it fails as well (had to change the bean name from "Code Cache" to "CodeCache" and run with -XX:-SegmentedCodeCache). Again, -XX:-UseG1GC -XX:+UseParallelGC solves the problem.
>>
>> Can someone from the GC team have a look?
>>
>>> Is this a known issue?
>>
>> I'm not aware of any related issue.
>>
>> Best regards,
>> Tobias
>>
>>> Thanks!
>>>
>>> Nileema
>>>
>>>
>>>
>>> --
>>> View this message in context: http://openjdk.5641.n7.nabble.com/JIT-stops-compiling-after-a-while-java-8u45-tp259603.html
>>> Sent from the OpenJDK Hotspot Compiler Development List mailing list archive at Nabble.com.
>>>
Reply | Threaded
Open this post in threaded view
|

Re: JIT stops compiling after a while (java 8u45)

Martin Traverso
For real world applications I hope that this is a much smaller issue but if you must load and execute loads and loads of short lived classes then it might be reasonable to disable concurrent class unloading (at the cost of getting serial Full gcs instead).

Unfortunately, this is not a theoretical issue for us. We see this problem running Presto (http://prestodb.io), which generates bytecode for every query it processes. For now, we're working around it with a background thread that watches the size of the code cache and calls System.gc() when it gets close to the max (https://github.com/facebook/presto/commit/91e1b3bb6bbfffc62401025a24231cd388992d7c).

Martin



On Tue, Mar 1, 2016 at 5:17 AM, Mikael Gerdin <[hidden email]> wrote:
Hi,

On 2016-03-01 13:35, Tobias Hartmann wrote:
Hi,

is just had a another look and it turned out that even with 8u40+ class unloading is triggered. I missed that because it happens *much* later (compared to 8u33) when the code cache already filled up and compilation is disabled. At this point we don't recover because new classes are loaded and new OSR nmethods are compiled rapidly.

Summary:
The code cache fills up due to OSR nmethods that are not being flushed. With 8u33 and earlier, G1 did more aggressive class unloading (probably due to more allocations or different heuristics) and this allowed the sweeper to flush enough OSR nmethods to continue compilation. With 8u40 and later, class unloading happens long after the code cache is full.

Before 8u40 G1 could only unload classes at Full GCs.
After 8u40 G1 can unload classes at the end of a concurrent GC cycle, avoiding Full GC.

If you run the test with CMS with +CMSClassUnloadingEnabled you will probably see similar problematic results since the class unloading in G1 is very similar to the one in CMS.
I haven't investigated in depth why the classes do not get unloaded in the G1 and CMS cases but there are several known quirks with how concurrent class unloading behaves which causes them to unload classes later than the serial Full GC.

Running G1 with -XX:-ClassUnloadingWithConcurrentMark
or CMS with -XX:-CMSClassUnloadingEnabled
disables concurrent class unloading completely and works around the issue you are seeing.

For real world applications I hope that this is a much smaller issue but if you must load and execute loads and loads of short lived classes then it might be reasonable to disable concurrent class unloading (at the cost of getting serial Full gcs instead).



I think we should fix this by flushing "cold" OSR nmethods as well (JDK-8023191). Thomas Schatzl mentioned that we could also trigger a concurrent mark if the code cache is full and hope that some classes are unloaded but I'm afraid this is too invasive (and does not help much in the general case).

If it is possible to flush OSR nmethods without doing a full class unloading cycle then I think that path is prefereable.

/Mikael



Opinions?

Best regards,
Tobias

On 01.03.2016 11:27, Tobias Hartmann wrote:
Hi Nileema,

thanks for reporting this issue!

CC'ing the GC team because this seems to be a GC issue (see evaluation below).

On 29.02.2016 23:59, nileema wrote:
We are seeing an issue with the CodeCache becoming full which causes the
compiler to be disabled in jdk-8u45 to jdk-8u72.

We had seen a similar issue in Java7 (old issue:
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-August/011333.html).
This issue went away with earlier versions of Java 8.

Reading the old conversation, I'm wondering if this could again be a problem with OSR nmethods that are not flushed? The bug (JDK-8023191) is still open - now assigned to me.

Doing a quick experiment, it looks like we mostly compile OSR methods:
   22129 2137 %     3       Runnable_412::run @ 4 (31 bytes)
   22130 2189 %     4       Runnable_371::run @ 4 (31 bytes)
   22134 2129 %     3       Runnable_376::run @ 4 (31 bytes)
   22136 2109 %     3       Runnable_410::run @ 4 (31 bytes)

Currently, OSR nmethods are not flushed just because the code cache is full but only if the nmethod becomes invalid (class loading/unloading, uncommon trap, ..)

With your test, class unloading should happen and therefore the OSR nmethods *should* be flushed.

We used the test http://github.com/martint/jittest to compare the behavior
of jdk-8u25 and jdk-8u45. For this test, we did not see any CodeCache full
messages with jdk-8u25  but did see them with 8u45+ (8u60  and 8u74)
Test results comparing 8u25, 8u45 and 8u74:
https://gist.github.com/nileema/6fb667a215e95919242f

In the results you can see that 8u25 starts collecting the code cache much
sooner than 8u45. 8u45 very quickly hits the limit for code cache. If we
force a full gc when it is about to hit the code cache limit, we see the
code cache size go down.

You can use the following flags to get additional information:
-XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintMethodFlushing -XX:+TraceClassUnloading

I did some more experiments with 8u45:

java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -XX:+UseG1GC -jar jittest-1.0-SNAPSHOT-standalone.jar | grep "Unloading"
-> We do *not* unload any classes. The code cache fills up with OSR nmethods that are not flushed.

Removing the -XX:+UseG1GC flag solves the issue:

java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -jar jittest-1.0-SNAPSHOT-standalone.jar | grep Unloading
-> Prints plenty of [Unloading class Runnable_40 0x00000007c0086028] messages and the code cache does not fill up.
-> OSR nmethods are flushed because the classes are unloaded:
    21670  970 %     4       Runnable_87::run @ -2 (31 bytes)   made zombie

The log files look good:

1456825330672   112939  10950016        10195496        111.28
1456825331675   118563  11432256        10467176        112.41
1456825332678   125935  11972928        10778432        115.72
[Unloading class Runnable_2498 0x00000007c0566028]
...
[Unloading class Runnable_34 0x00000007c0082028]
1456825333684   131493  10220608        5382976         117.46
1456825334688   137408  10359296        5636120         116.81
1456825335692   143593  7635136         5914624         114.21

After the code cache fills up, we unload classes and therefore flush methods and start over again.

I checked for several releases if classes are unloaded:
- 8u27: success
- 8u33: success
- 8u40: fail
- 8u45: fail
- 8u76: fail

The regression was introduced in 8u40.

I also tried with the latest JDK 9 build and it fails as well (had to change the bean name from "Code Cache" to "CodeCache" and run with -XX:-SegmentedCodeCache). Again, -XX:-UseG1GC -XX:+UseParallelGC solves the problem.

Can someone from the GC team have a look?

Is this a known issue?

I'm not aware of any related issue.

Best regards,
Tobias

Thanks!

Nileema



--
View this message in context: http://openjdk.5641.n7.nabble.com/JIT-stops-compiling-after-a-while-java-8u45-tp259603.html
Sent from the OpenJDK Hotspot Compiler Development List mailing list archive at Nabble.com.


Reply | Threaded
Open this post in threaded view
|

Re: JIT stops compiling after a while (java 8u45)

Tobias Hartmann-2
Hi Martin,

On 01.03.2016 20:18, Martin Traverso wrote:
>> For real world applications I hope that this is a much smaller issue but if you must load and execute loads and loads of short lived classes then it might be reasonable to disable concurrent class unloading (at the cost of getting serial Full gcs instead).
>
> Unfortunately, this is not a theoretical issue for us. We see this problem running Presto (http://prestodb.io), which generates bytecode for every query it processes. For now, we're working around it with a background thread that watches the size of the code cache and calls System.gc() when it gets close to the max (https://github.com/facebook/presto/commit/91e1b3bb6bbfffc62401025a24231cd388992d7c).

Okay, I changed JDK-8023191 from enhancement to bug and set fix version to 9. We can then backport this to 8u.

Best regards,
Tobias

>
> Martin
>
>
>
> On Tue, Mar 1, 2016 at 5:17 AM, Mikael Gerdin <[hidden email] <mailto:[hidden email]>> wrote:
>
>     Hi,
>
>     On 2016-03-01 13:35, Tobias Hartmann wrote:
>
>         Hi,
>
>         is just had a another look and it turned out that even with 8u40+ class unloading is triggered. I missed that because it happens *much* later (compared to 8u33) when the code cache already filled up and compilation is disabled. At this point we don't recover because new classes are loaded and new OSR nmethods are compiled rapidly.
>
>         Summary:
>         The code cache fills up due to OSR nmethods that are not being flushed. With 8u33 and earlier, G1 did more aggressive class unloading (probably due to more allocations or different heuristics) and this allowed the sweeper to flush enough OSR nmethods to continue compilation. With 8u40 and later, class unloading happens long after the code cache is full.
>
>
>     Before 8u40 G1 could only unload classes at Full GCs.
>     After 8u40 G1 can unload classes at the end of a concurrent GC cycle, avoiding Full GC.
>
>     If you run the test with CMS with +CMSClassUnloadingEnabled you will probably see similar problematic results since the class unloading in G1 is very similar to the one in CMS.
>     I haven't investigated in depth why the classes do not get unloaded in the G1 and CMS cases but there are several known quirks with how concurrent class unloading behaves which causes them to unload classes later than the serial Full GC.
>
>     Running G1 with -XX:-ClassUnloadingWithConcurrentMark
>     or CMS with -XX:-CMSClassUnloadingEnabled
>     disables concurrent class unloading completely and works around the issue you are seeing.
>
>     For real world applications I hope that this is a much smaller issue but if you must load and execute loads and loads of short lived classes then it might be reasonable to disable concurrent class unloading (at the cost of getting serial Full gcs instead).
>
>
>
>         I think we should fix this by flushing "cold" OSR nmethods as well (JDK-8023191). Thomas Schatzl mentioned that we could also trigger a concurrent mark if the code cache is full and hope that some classes are unloaded but I'm afraid this is too invasive (and does not help much in the general case).
>
>
>     If it is possible to flush OSR nmethods without doing a full class unloading cycle then I think that path is prefereable.
>
>     /Mikael
>
>
>
>         Opinions?
>
>         Best regards,
>         Tobias
>
>         On 01.03.2016 11:27, Tobias Hartmann wrote:
>
>             Hi Nileema,
>
>             thanks for reporting this issue!
>
>             CC'ing the GC team because this seems to be a GC issue (see evaluation below).
>
>             On 29.02.2016 23:59, nileema wrote:
>
>                 We are seeing an issue with the CodeCache becoming full which causes the
>                 compiler to be disabled in jdk-8u45 to jdk-8u72.
>
>                 We had seen a similar issue in Java7 (old issue:
>                 http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2013-August/011333.html).
>                 This issue went away with earlier versions of Java 8.
>
>
>             Reading the old conversation, I'm wondering if this could again be a problem with OSR nmethods that are not flushed? The bug (JDK-8023191) is still open - now assigned to me.
>
>             Doing a quick experiment, it looks like we mostly compile OSR methods:
>                22129 2137 %     3       Runnable_412::run @ 4 (31 bytes)
>                22130 2189 %     4       Runnable_371::run @ 4 (31 bytes)
>                22134 2129 %     3       Runnable_376::run @ 4 (31 bytes)
>                22136 2109 %     3       Runnable_410::run @ 4 (31 bytes)
>
>             Currently, OSR nmethods are not flushed just because the code cache is full but only if the nmethod becomes invalid (class loading/unloading, uncommon trap, ..)
>
>             With your test, class unloading should happen and therefore the OSR nmethods *should* be flushed.
>
>                 We used the test http://github.com/martint/jittest to compare the behavior
>                 of jdk-8u25 and jdk-8u45. For this test, we did not see any CodeCache full
>                 messages with jdk-8u25  but did see them with 8u45+ (8u60  and 8u74)
>                 Test results comparing 8u25, 8u45 and 8u74:
>                 https://gist.github.com/nileema/6fb667a215e95919242f
>
>                 In the results you can see that 8u25 starts collecting the code cache much
>                 sooner than 8u45. 8u45 very quickly hits the limit for code cache. If we
>                 force a full gc when it is about to hit the code cache limit, we see the
>                 code cache size go down.
>
>
>             You can use the following flags to get additional information:
>             -XX:CICompilerCount=1 -XX:+PrintCompilation -XX:+PrintMethodFlushing -XX:+TraceClassUnloading
>
>             I did some more experiments with 8u45:
>
>             java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -XX:+UseG1GC -jar jittest-1.0-SNAPSHOT-standalone.jar | grep "Unloading"
>             -> We do *not* unload any classes. The code cache fills up with OSR nmethods that are not flushed.
>
>             Removing the -XX:+UseG1GC flag solves the issue:
>
>             java -mx20g -ms20g -XX:ReservedCodeCacheSize=20m -XX:+TraceClassUnloading -jar jittest-1.0-SNAPSHOT-standalone.jar | grep Unloading
>             -> Prints plenty of [Unloading class Runnable_40 0x00000007c0086028] messages and the code cache does not fill up.
>             -> OSR nmethods are flushed because the classes are unloaded:
>                 21670  970 %     4       Runnable_87::run @ -2 (31 bytes)   made zombie
>
>             The log files look good:
>
>             1456825330672   112939  10950016        10195496        111.28
>             1456825331675   118563  11432256        10467176        112.41
>             1456825332678   125935  11972928        10778432        115.72
>             [Unloading class Runnable_2498 0x00000007c0566028]
>             ...
>             [Unloading class Runnable_34 0x00000007c0082028]
>             1456825333684   131493  10220608        5382976         117.46
>             1456825334688   137408  10359296        5636120         116.81
>             1456825335692   143593  7635136         5914624         114.21
>
>             After the code cache fills up, we unload classes and therefore flush methods and start over again.
>
>             I checked for several releases if classes are unloaded:
>             - 8u27: success
>             - 8u33: success
>             - 8u40: fail
>             - 8u45: fail
>             - 8u76: fail
>
>             The regression was introduced in 8u40.
>
>             I also tried with the latest JDK 9 build and it fails as well (had to change the bean name from "Code Cache" to "CodeCache" and run with -XX:-SegmentedCodeCache). Again, -XX:-UseG1GC -XX:+UseParallelGC solves the problem.
>
>             Can someone from the GC team have a look?
>
>                 Is this a known issue?
>
>
>             I'm not aware of any related issue.
>
>             Best regards,
>             Tobias
>
>                 Thanks!
>
>                 Nileema
>
>
>
>                 --
>                 View this message in context: http://openjdk.5641.n7.nabble.com/JIT-stops-compiling-after-a-while-java-8u45-tp259603.html
>                 Sent from the OpenJDK Hotspot Compiler Development List mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

RE: JIT stops compiling after a while (java 8u45)

Uwe Schindler-4
Hi,

> >> For real world applications I hope that this is a much smaller issue but if
> you must load and execute loads and loads of short lived classes then it might
> be reasonable to disable concurrent class unloading (at the cost of getting
> serial Full gcs instead).
> >
> > Unfortunately, this is not a theoretical issue for us. We see this problem
> running Presto (http://prestodb.io), which generates bytecode for every
> query it processes. For now, we're working around it with a background
> thread that watches the size of the code cache and calls System.gc() when it
> gets close to the max
> (https://github.com/facebook/presto/commit/91e1b3bb6bbfffc62401025a24
> 231cd388992d7c).
>
> Okay, I changed JDK-8023191 from enhancement to bug and set fix version to
> 9. We can then backport this to 8u.

Hi many thanks for taking care!

Apache Lucene's Expressions module (heavily used by Elasticsearch) also compiles small Java classes to execute custom document scoring operations that the user can supply, similar to Presto's SQL, as a Javascript-like formula that can work on arbitrary static double-parameter/double returning functions. These classes are loaded in a separate ClassLoader used solely for a single class and thrown away afterwards. In older Java versions GC was perfectly throwing away those classes...

Compiler & Classloader:
https://github.com/apache/lucene-solr/blob/master/lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompiler.java

See tests like:
https://github.com/apache/lucene-solr/tree/master/lucene/expressions/src/test/org/apache/lucene/expressions/js

Uwe

Reply | Threaded
Open this post in threaded view
|

Re: JIT stops compiling after a while (java 8u45)

Tobias Hartmann-2
Hi Uwe,

On 02.03.2016 10:08, Uwe Schindler wrote:

> Hi,
>
>>>> For real world applications I hope that this is a much smaller issue but if
>> you must load and execute loads and loads of short lived classes then it might
>> be reasonable to disable concurrent class unloading (at the cost of getting
>> serial Full gcs instead).
>>>
>>> Unfortunately, this is not a theoretical issue for us. We see this problem
>> running Presto (http://prestodb.io), which generates bytecode for every
>> query it processes. For now, we're working around it with a background
>> thread that watches the size of the code cache and calls System.gc() when it
>> gets close to the max
>> (https://github.com/facebook/presto/commit/91e1b3bb6bbfffc62401025a24
>> 231cd388992d7c).
>>
>> Okay, I changed JDK-8023191 from enhancement to bug and set fix version to
>> 9. We can then backport this to 8u.
>
> Hi many thanks for taking care!
>
> Apache Lucene's Expressions module (heavily used by Elasticsearch) also compiles small Java classes to execute custom document scoring operations that the user can supply, similar to Presto's SQL, as a Javascript-like formula that can work on arbitrary static double-parameter/double returning functions. These classes are loaded in a separate ClassLoader used solely for a single class and thrown away afterwards. In older Java versions GC was perfectly throwing away those classes...
>
> Compiler & Classloader:
> https://github.com/apache/lucene-solr/blob/master/lucene/expressions/src/java/org/apache/lucene/expressions/js/JavascriptCompiler.java
>
> See tests like:
> https://github.com/apache/lucene-solr/tree/master/lucene/expressions/src/test/org/apache/lucene/expressions/js

Please note that JDK-8023191 only takes care of flushing of unused OSR nmethods. This should help if the code cache fills up due to OSR compilations but it's not about unloading of classes. Class unloading just helps because it forces flushing right away.

As Mikael wrote, unloading happens later with concurrent class unloading. This shouldn't be a problem for your application as long as the code cache does not fill up, right?

Best regards,
Tobias
Reply | Threaded
Open this post in threaded view
|

RE: JIT stops compiling after a while (java 8u45)

Uwe Schindler-4
Hi Tobias,

> >>>> For real world applications I hope that this is a much smaller issue but if
> >> you must load and execute loads and loads of short lived classes then it
> might
> >> be reasonable to disable concurrent class unloading (at the cost of getting
> >> serial Full gcs instead).
> >>>
> >>> Unfortunately, this is not a theoretical issue for us. We see this problem
> >> running Presto (http://prestodb.io), which generates bytecode for every
> >> query it processes. For now, we're working around it with a background
> >> thread that watches the size of the code cache and calls System.gc() when
> it
> >> gets close to the max
> >>
> (https://github.com/facebook/presto/commit/91e1b3bb6bbfffc62401025a24
> >> 231cd388992d7c).
> >>
> >> Okay, I changed JDK-8023191 from enhancement to bug and set fix
> version to
> >> 9. We can then backport this to 8u.
> >
> > Hi many thanks for taking care!
> >
> > Apache Lucene's Expressions module (heavily used by Elasticsearch) also
> compiles small Java classes to execute custom document scoring operations
> that the user can supply, similar to Presto's SQL, as a Javascript-like formula
> that can work on arbitrary static double-parameter/double returning
> functions. These classes are loaded in a separate ClassLoader used solely for
> a single class and thrown away afterwards. In older Java versions GC was
> perfectly throwing away those classes...
> >
> > Compiler & Classloader:
> > https://github.com/apache/lucene-
> solr/blob/master/lucene/expressions/src/java/org/apache/lucene/expressi
> ons/js/JavascriptCompiler.java
> >
> > See tests like:
> > https://github.com/apache/lucene-
> solr/tree/master/lucene/expressions/src/test/org/apache/lucene/expressio
> ns/js
>
> Please note that JDK-8023191 only takes care of flushing of unused OSR
> nmethods. This should help if the code cache fills up due to OSR compilations
> but it's not about unloading of classes. Class unloading just helps because it
> forces flushing right away.

Thanks! This is how it was designed: The classloader only lives for very short time and the compiled expression gets thrown away after fulltext query execution. This works quite good. I just have seen the issue on this mailing list and I wanted to start some test with recent Java 8 VMs if everything is still alright.

Martin Traverso's post alarmed me, because he said that they generate bytecode for every query (which is similar in our case). The main difference is: Our compiled methods are just plain simple mathematical formulas which are represented to the caller by a functional interface. We have no loops in the generated bytecode. The Lucene scoring algorithm just calls the compiled expression bytecode while processing results to calculate score (and depending on the number of search engine results collected, this can be several million times per query execution).

> As Mikael wrote, unloading happens later with concurrent class unloading.
> This shouldn't be a problem for your application as long as the code cache
> does not fill up, right?

As said before, it should not fill up - as we have no loops in those expressions. I was just alarmed and wanted to start testing. We checked the whole stuff with Java 6 and Java 7 (when it was written), but have not done extensive testing of garbage collection with Java 8. But there were also no bug reports...

> Best regards,
> Tobias

Uwe

Reply | Threaded
Open this post in threaded view
|

Re: JIT stops compiling after a while (java 8u45)

Vitaly Davidovich


On Wednesday, March 2, 2016, Uwe Schindler <[hidden email]> wrote:
Hi Tobias,

> >>>> For real world applications I hope that this is a much smaller issue but if
> >> you must load and execute loads and loads of short lived classes then it
> might
> >> be reasonable to disable concurrent class unloading (at the cost of getting
> >> serial Full gcs instead).
> >>>
> >>> Unfortunately, this is not a theoretical issue for us. We see this problem
> >> running Presto (http://prestodb.io), which generates bytecode for every
> >> query it processes. For now, we're working around it with a background
> >> thread that watches the size of the code cache and calls System.gc() when
> it
> >> gets close to the max
> >>
> (https://github.com/facebook/presto/commit/91e1b3bb6bbfffc62401025a24
> >> 231cd388992d7c).
> >>
> >> Okay, I changed JDK-8023191 from enhancement to bug and set fix
> version to
> >> 9. We can then backport this to 8u.
> >
> > Hi many thanks for taking care!
> >
> > Apache Lucene's Expressions module (heavily used by Elasticsearch) also
> compiles small Java classes to execute custom document scoring operations
> that the user can supply, similar to Presto's SQL, as a Javascript-like formula
> that can work on arbitrary static double-parameter/double returning
> functions. These classes are loaded in a separate ClassLoader used solely for
> a single class and thrown away afterwards. In older Java versions GC was
> perfectly throwing away those classes...
> >
> > Compiler & Classloader:
> > https://github.com/apache/lucene-
> solr/blob/master/lucene/expressions/src/java/org/apache/lucene/expressi
> ons/js/JavascriptCompiler.java
> >
> > See tests like:
> > https://github.com/apache/lucene-
> solr/tree/master/lucene/expressions/src/test/org/apache/lucene/expressio
> ns/js
>
> Please note that JDK-8023191 only takes care of flushing of unused OSR
> nmethods. This should help if the code cache fills up due to OSR compilations
> but it's not about unloading of classes. Class unloading just helps because it
> forces flushing right away.

Thanks! This is how it was designed: The classloader only lives for very short time and the compiled expression gets thrown away after fulltext query execution. This works quite good. I just have seen the issue on this mailing list and I wanted to start some test with recent Java 8 VMs if everything is still alright.

Martin Traverso's post alarmed me, because he said that they generate bytecode for every query (which is similar in our case). The main difference is: Our compiled methods are just plain simple mathematical formulas which are represented to the caller by a functional interface. We have no loops in the generated bytecode. The Lucene scoring algorithm just calls the compiled expression bytecode while processing results to calculate score (and depending on the number of search engine results collected, this can be several million times per query execution).

> As Mikael wrote, unloading happens later with concurrent class unloading.
> This shouldn't be a problem for your application as long as the code cache
> does not fill up, right?

As said before, it should not fill up - as we have no loops in those expressions. 
No loops means no OSR compilation of that bytecode but if it's called millions of times (based on what you said above) then it'll get compiled.  It may be that your bytecode generates really small nmethods (you mentioned it's simple algebraic expressions) and their generation naturally doesn't outpace code cache cleaning.

I was just alarmed and wanted to start testing. We checked the whole stuff with Java 6 and Java 7 (when it was written), but have not done extensive testing of garbage collection with Java 8. But there were also no bug reports...

> Best regards,
> Tobias

Uwe



--
Sent from my phone