Quantcast

linux os processor optimizations for OpenJDK GC performance enhancement

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
We are working on linux os processor specific optimizations centered around cache partitioning and better processor core thread(s) affinity for OpenJDK GC performance enhancement with the following benefits
- throughput increase
- minimizing tail latency for latency sensitive applications
- superior isolation in a multi-tenant environment

​We have been looking at the G1GC code/documentation and performed some initial experiments. Based on this, what we understand is that STW events pause all application threads independent of which regions are being impacted by garbage collection. For example, if application thread "x" uses regions 1, 4, 5 and  garbage collection is working on regions 2 and 11, application thread "x" is paused during the STW event.
 
Your expert opinion on this topic is much appreciated.

--
​Thanks in advance,
Ramki​

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Kim Barrett
> On Apr 12, 2017, at 9:27 PM, Ram Krishnan <[hidden email]> wrote:
>
> We are working on linux os processor specific optimizations centered around cache partitioning and better processor core thread(s) affinity for OpenJDK GC performance enhancement with the following benefits
> - throughput increase
> - minimizing tail latency for latency sensitive applications
> - superior isolation in a multi-tenant environment
>
> ​We have been looking at the G1GC code/documentation and performed some initial experiments. Based on this, what we understand is that STW events pause all application threads independent of which regions are being impacted by garbage collection. For example, if application thread "x" uses regions 1, 4, 5 and  garbage collection is working on regions 2 and 11, application thread "x" is paused during the STW event.
>  
> Your expert opinion on this topic is much appreciated.
>
> --
> ​Thanks in advance,
> Ramki​

An application thread may touch memory in any region; there is no
notion of a thread being "scoped" to a specific set of regions. While
it might happen that a thread would only touch regions not being
worked on by the collector, there is no a priori way to know that.

If there were some way to allow collector access to a region while
blocking non-collector access, then it might be possible to defer
stopping application threads until they do a blocked access. But it's
not obvious such a scheme would actually be useful, as application
threads are likely to access young generation regions a lot, and those
regions are usually of interest during collector pauses.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
Thanks Kim for the detailed explanation.

Another possible solution to such a scenario would be to use multiple JVMs, for example one JVM could be hosting latency sensitive applications and another JVM could be hosting normal applications. Cache partitioning can be applied at a JVM level. This way, GC activity (especially data copies) on the normal JVM will minimally impact the latency sensitive JVM. My thought process is to focus our initial contribution on this scenario.

Thanks,
Ramki

On Thu, Apr 13, 2017 at 8:33 AM, Kim Barrett <[hidden email]> wrote:
> On Apr 12, 2017, at 9:27 PM, Ram Krishnan <[hidden email]> wrote:
>
> We are working on linux os processor specific optimizations centered around cache partitioning and better processor core thread(s) affinity for OpenJDK GC performance enhancement with the following benefits
> - throughput increase
> - minimizing tail latency for latency sensitive applications
> - superior isolation in a multi-tenant environment
>
> ​We have been looking at the G1GC code/documentation and performed some initial experiments. Based on this, what we understand is that STW events pause all application threads independent of which regions are being impacted by garbage collection. For example, if application thread "x" uses regions 1, 4, 5 and  garbage collection is working on regions 2 and 11, application thread "x" is paused during the STW event.
>
> Your expert opinion on this topic is much appreciated.
>
> --
> ​Thanks in advance,
> Ramki​

An application thread may touch memory in any region; there is no
notion of a thread being "scoped" to a specific set of regions. While
it might happen that a thread would only touch regions not being
worked on by the collector, there is no a priori way to know that.

If there were some way to allow collector access to a region while
blocking non-collector access, then it might be possible to defer
stopping application threads until they do a blocked access. But it's
not obvious such a scheme would actually be useful, as application
threads are likely to access young generation regions a lot, and those
regions are usually of interest during collector pauses.




--
Thanks, 
Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Andrew Haley
In reply to this post by Kim Barrett
On 13/04/17 16:33, Kim Barrett wrote:
> An application thread may touch memory in any region; there is no
> notion of a thread being "scoped" to a specific set of regions. While
> it might happen that a thread would only touch regions not being
> worked on by the collector, there is no a priori way to know that.

Surely there is: a thread could have its TLAB allocated from a region
local to that socket (or core), and the GC thread for that region
could run on the same socket.  It only works for young gen, but that's
a lot of the problem.

Andrew.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
Thanks Andrew.

​>>Surely there is: a thread could have its TLAB allocated from a region
>>local to that socket (or core), and the GC thread for that region
>>could run on the same socket.  It only works for young gen, but that's
>>a lot of the problem.

A clarification -- does the TLAB allocation apply to tenured space also? If not, the above would work only for young gen cases where there is no promotion to tenured right?

Thanks,
Ramki

On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan <[hidden email]> wrote:

---------- Forwarded message ----------
From:
​​
Andrew Haley
<[hidden email]>
Date: Thu, Apr 13, 2017 at 9:52 AM
Subject: Re: linux os processor optimizations for OpenJDK GC performance enhancement
To: [hidden email]


On 13/04/17 16:33, Kim Barrett wrote:
> An application thread may touch memory in any region; there is no
> notion of a thread being "scoped" to a specific set of regions. While
> it might happen that a thread would only touch regions not being
> worked on by the collector, there is no a priori way to know that.

​​
Surely there is: a thread could have its TLAB allocated from a region
local to that socket (or core), and the GC thread for that region
could run on the same socket.  It only works for young gen, but that's
a lot of the problem.

Andrew.




--
Thanks, 
Ramki



--
Thanks, 
Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Bernd Eckenfels-4
Maybe it would be better to concentrate the processor optimizations on accessors and barrriers without introducing a completely new GC architecture. I can imagine that especially in the area of NUMA, TLAB, huge pages, cache consistency and possibly MMX extensions there is some potential. 

Abandoning the global STW - while it seems like a pretty powerful change - is I guess not a good starter exercise. Especially since it is not only a question of mutator threads.


From: hotspot-gc-dev <[hidden email]> on behalf of Ram Krishnan <[hidden email]>
Sent: Friday, April 14, 2017 6:36:27 AM
To: Asif Qamar; Andrew Haley; [hidden email]
Subject: Re: linux os processor optimizations for OpenJDK GC performance enhancement
 
Thanks Andrew.

​>>Surely there is: a thread could have its TLAB allocated from a region
>>local to that socket (or core), and the GC thread for that region
>>could run on the same socket.  It only works for young gen, but that's
>>a lot of the problem.

A clarification -- does the TLAB allocation apply to tenured space also? If not, the above would work only for young gen cases where there is no promotion to tenured right?

Thanks,
Ramki

On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan <[hidden email]> wrote:

---------- Forwarded message ----------
From:
​​
Andrew Haley
<[hidden email]>
Date: Thu, Apr 13, 2017 at 9:52 AM
Subject: Re: linux os processor optimizations for OpenJDK GC performance enhancement
To: [hidden email]


On 13/04/17 16:33, Kim Barrett wrote:
> An application thread may touch memory in any region; there is no
> notion of a thread being "scoped" to a specific set of regions. While
> it might happen that a thread would only touch regions not being
> worked on by the collector, there is no a priori way to know that.

​​
Surely there is: a thread could have its TLAB allocated from a region
local to that socket (or core), and the GC thread for that region
could run on the same socket.  It only works for young gen, but that's
a lot of the problem.

Andrew.




--
Thanks, 
Ramki



--
Thanks, 
Ramki
Loading...