Re: linux os processor optimizations for OpenJDK GC performance enhancement

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
Please find detailed proposal below, looking forward to your comments.

"Minimize application tail latency using cache-partitioning-aware G1GC" --
https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit

Thanks,
Ramki

On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels <[hidden email]>
wrote:

> Maybe it would be better to concentrate the processor optimizations on
> accessors and barrriers without introducing a completely new GC
> architecture. I can imagine that especially in the area of NUMA, TLAB, huge
> pages, cache consistency and possibly MMX extensions there is some
> potential.
>
> Abandoning the global STW - while it seems like a pretty powerful change -
> is I guess not a good starter exercise. Especially since it is not only a
> question of mutator threads.
>
> Gruss
> Bernd
> --
> http://bernd.eckenfels.net
> ------------------------------
> *From:* hotspot-gc-dev <[hidden email]> on
> behalf of Ram Krishnan <[hidden email]>
> *Sent:* Friday, April 14, 2017 6:36:27 AM
> *To:* Asif Qamar; Andrew Haley; [hidden email]
> *Subject:* Re: linux os processor optimizations for OpenJDK GC
> performance enhancement
>
> Thanks Andrew.
>
> ​>>Surely there is: a thread could have its TLAB allocated from a region
> >>local to that socket (or core), and the GC thread for that region
> >>could run on the same socket.  It only works for young gen, but that's
> >>a lot of the problem.
>
> A clarification -- does the TLAB allocation apply to tenured space also?
> If not, the above would work only for young gen cases where there is no
> promotion to tenured right?
>
> Thanks,
> Ramki
>
> On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan <[hidden email]>
> wrote:
>
>>
>> ---------- Forwarded message ----------
>> From:
>> ​​
>> Andrew Haley <[hidden email]>
>> Date: Thu, Apr 13, 2017 at 9:52 AM
>> Subject: Re: linux os processor optimizations for OpenJDK GC performance
>> enhancement
>> To:
>> ​​
>> [hidden email]
>>
>>
>> On 13/04/17 16:33, Kim Barrett wrote:
>> > An application thread may touch memory in any region; there is no
>> > notion of a thread being "scoped" to a specific set of regions. While
>> > it might happen that a thread would only touch regions not being
>> > worked on by the collector, there is no a priori way to know that.
>>
>> ​​
>> Surely there is: a thread could have its TLAB allocated from a region
>> local to that socket (or core), and the GC thread for that region
>> could run on the same socket.  It only works for young gen, but that's
>> a lot of the problem.
>>
>> Andrew.
>>
>>
>>
>>
>> --
>> Thanks,
>> Ramki
>>
>
>
>
> --
> Thanks,
> Ramki
>



--
Thanks,
Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

David Holmes
Hi Ramki,

On 19/04/2017 12:34 AM, Ram Krishnan wrote:
> Please find detailed proposal below, looking forward to your comments.
>
> "Minimize application tail latency using cache-partitioning-aware G1GC" --
> https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit

All contributions to OpenJDK need to be hosted on OpenJDK infrastructure
not on external systems like the above.

Also I can not see you listed as an OCA signatory. Are you an OpenJDK
contributor?

Thanks,
David
-----

> Thanks,
> Ramki
>
> On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels <[hidden email]>
> wrote:
>
>> Maybe it would be better to concentrate the processor optimizations on
>> accessors and barrriers without introducing a completely new GC
>> architecture. I can imagine that especially in the area of NUMA, TLAB, huge
>> pages, cache consistency and possibly MMX extensions there is some
>> potential.
>>
>> Abandoning the global STW - while it seems like a pretty powerful change -
>> is I guess not a good starter exercise. Especially since it is not only a
>> question of mutator threads.
>>
>> Gruss
>> Bernd
>> --
>> http://bernd.eckenfels.net
>> ------------------------------
>> *From:* hotspot-gc-dev <[hidden email]> on
>> behalf of Ram Krishnan <[hidden email]>
>> *Sent:* Friday, April 14, 2017 6:36:27 AM
>> *To:* Asif Qamar; Andrew Haley; [hidden email]
>> *Subject:* Re: linux os processor optimizations for OpenJDK GC
>> performance enhancement
>>
>> Thanks Andrew.
>>
>> ​>>Surely there is: a thread could have its TLAB allocated from a region
>>>> local to that socket (or core), and the GC thread for that region
>>>> could run on the same socket.  It only works for young gen, but that's
>>>> a lot of the problem.
>>
>> A clarification -- does the TLAB allocation apply to tenured space also?
>> If not, the above would work only for young gen cases where there is no
>> promotion to tenured right?
>>
>> Thanks,
>> Ramki
>>
>> On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan <[hidden email]>
>> wrote:
>>
>>>
>>> ---------- Forwarded message ----------
>>> From:
>>> ​​
>>> Andrew Haley <[hidden email]>
>>> Date: Thu, Apr 13, 2017 at 9:52 AM
>>> Subject: Re: linux os processor optimizations for OpenJDK GC performance
>>> enhancement
>>> To:
>>> ​​
>>> [hidden email]
>>>
>>>
>>> On 13/04/17 16:33, Kim Barrett wrote:
>>>> An application thread may touch memory in any region; there is no
>>>> notion of a thread being "scoped" to a specific set of regions. While
>>>> it might happen that a thread would only touch regions not being
>>>> worked on by the collector, there is no a priori way to know that.
>>>
>>> ​​
>>> Surely there is: a thread could have its TLAB allocated from a region
>>> local to that socket (or core), and the GC thread for that region
>>> could run on the same socket.  It only works for young gen, but that's
>>> a lot of the problem.
>>>
>>> Andrew.
>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Ramki
>>>
>>
>>
>>
>> --
>> Thanks,
>> Ramki
>>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
Hi David,

Thanks for the clarification.

I have signed the OCA and mailed it to oracle-ca_us(at)oracle.com. Any help
to expedite processing would be much appreciated.

We are seeing promising POC results (details in the google doc) for this
proposal -- would really appreciate your help in moving this forward.

Thanks,
Ramki

On Tue, Apr 18, 2017 at 1:55 PM, David Holmes <[hidden email]>
wrote:

> Hi Ramki,
>
> On 19/04/2017 12:34 AM, Ram Krishnan wrote:
>
>> Please find detailed proposal below, looking forward to your comments.
>>
>> "Minimize application tail latency using cache-partitioning-aware G1GC" --
>> https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbB
>> ZTclOWyg0arhuycXN94/edit
>>
>
> All contributions to OpenJDK need to be hosted on OpenJDK infrastructure
> not on external systems like the above.
>
> Also I can not see you listed as an OCA signatory. Are you an OpenJDK
> contributor?
>
> Thanks,
> David
> -----
>
> Thanks,
>> Ramki
>>
>> On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels <[hidden email]
>> >
>> wrote:
>>
>> Maybe it would be better to concentrate the processor optimizations on
>>> accessors and barrriers without introducing a completely new GC
>>> architecture. I can imagine that especially in the area of NUMA, TLAB,
>>> huge
>>> pages, cache consistency and possibly MMX extensions there is some
>>> potential.
>>>
>>> Abandoning the global STW - while it seems like a pretty powerful change
>>> -
>>> is I guess not a good starter exercise. Especially since it is not only a
>>> question of mutator threads.
>>>
>>> Gruss
>>> Bernd
>>> --
>>> http://bernd.eckenfels.net
>>> ------------------------------
>>> *From:* hotspot-gc-dev <[hidden email]> on
>>> behalf of Ram Krishnan <[hidden email]>
>>> *Sent:* Friday, April 14, 2017 6:36:27 AM
>>> *To:* Asif Qamar; Andrew Haley; [hidden email]
>>> *Subject:* Re: linux os processor optimizations for OpenJDK GC
>>> performance enhancement
>>>
>>> Thanks Andrew.
>>>
>>> ​>>Surely there is: a thread could have its TLAB allocated from a region
>>>
>>>> local to that socket (or core), and the GC thread for that region
>>>>> could run on the same socket.  It only works for young gen, but that's
>>>>> a lot of the problem.
>>>>>
>>>>
>>> A clarification -- does the TLAB allocation apply to tenured space also?
>>> If not, the above would work only for young gen cases where there is no
>>> promotion to tenured right?
>>>
>>> Thanks,
>>> Ramki
>>>
>>> On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan <[hidden email]>
>>> wrote:
>>>
>>>
>>>> ---------- Forwarded message ----------
>>>> From:
>>>> ​​
>>>> Andrew Haley <[hidden email]>
>>>> Date: Thu, Apr 13, 2017 at 9:52 AM
>>>> Subject: Re: linux os processor optimizations for OpenJDK GC performance
>>>> enhancement
>>>> To:
>>>> ​​
>>>> [hidden email]
>>>>
>>>>
>>>> On 13/04/17 16:33, Kim Barrett wrote:
>>>>
>>>>> An application thread may touch memory in any region; there is no
>>>>> notion of a thread being "scoped" to a specific set of regions. While
>>>>> it might happen that a thread would only touch regions not being
>>>>> worked on by the collector, there is no a priori way to know that.
>>>>>
>>>>
>>>> ​​
>>>> Surely there is: a thread could have its TLAB allocated from a region
>>>> local to that socket (or core), and the GC thread for that region
>>>> could run on the same socket.  It only works for young gen, but that's
>>>> a lot of the problem.
>>>>
>>>> Andrew.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks,
>>>> Ramki
>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Ramki
>>>
>>>
>>
>>
>>


--
Thanks,
Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

David Holmes
Hi Ramki,

On 19/04/2017 8:27 AM, Ram Krishnan wrote:
> Hi David,
>
> Thanks for the clarification.
>
> I have signed the OCA and mailed it to oracle-ca_us(at)oracle.com
> <http://oracle.com>. Any help to expedite processing would be much
> appreciated.

Can't help with that I'm afraid. :)

> We are seeing promising POC results (details in the google doc) for this
> proposal -- would really appreciate your help in moving this forward.

If you email me a text/html version of the document I can host it on
cr.openjdk.java.net temporarily. For this to become a JEP you will need
a sponsor with the necessary OpenJDK credentials.

http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html

Cheers,
David


> Thanks,
> Ramki
>
> On Tue, Apr 18, 2017 at 1:55 PM, David Holmes <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Ramki,
>
>     On 19/04/2017 12:34 AM, Ram Krishnan wrote:
>
>         Please find detailed proposal below, looking forward to your
>         comments.
>
>         "Minimize application tail latency using
>         cache-partitioning-aware G1GC" --
>         https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit
>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit>
>
>
>     All contributions to OpenJDK need to be hosted on OpenJDK
>     infrastructure not on external systems like the above.
>
>     Also I can not see you listed as an OCA signatory. Are you an
>     OpenJDK contributor?
>
>     Thanks,
>     David
>     -----
>
>         Thanks,
>         Ramki
>
>         On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels
>         <[hidden email] <mailto:[hidden email]>>
>         wrote:
>
>             Maybe it would be better to concentrate the processor
>             optimizations on
>             accessors and barrriers without introducing a completely new GC
>             architecture. I can imagine that especially in the area of
>             NUMA, TLAB, huge
>             pages, cache consistency and possibly MMX extensions there
>             is some
>             potential.
>
>             Abandoning the global STW - while it seems like a pretty
>             powerful change -
>             is I guess not a good starter exercise. Especially since it
>             is not only a
>             question of mutator threads.
>
>             Gruss
>             Bernd
>             --
>             http://bernd.eckenfels.net
>             ------------------------------
>             *From:* hotspot-gc-dev
>             <[hidden email]
>             <mailto:[hidden email]>> on
>             behalf of Ram Krishnan <[hidden email]
>             <mailto:[hidden email]>>
>             *Sent:* Friday, April 14, 2017 6:36:27 AM
>             *To:* Asif Qamar; Andrew Haley;
>             [hidden email]
>             <mailto:[hidden email]>
>             *Subject:* Re: linux os processor optimizations for OpenJDK GC
>             performance enhancement
>
>             Thanks Andrew.
>
>             ​>>Surely there is: a thread could have its TLAB allocated
>             from a region
>
>                     local to that socket (or core), and the GC thread
>                     for that region
>                     could run on the same socket.  It only works for
>                     young gen, but that's
>                     a lot of the problem.
>
>
>             A clarification -- does the TLAB allocation apply to tenured
>             space also?
>             If not, the above would work only for young gen cases where
>             there is no
>             promotion to tenured right?
>
>             Thanks,
>             Ramki
>
>             On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan
>             <[hidden email] <mailto:[hidden email]>>
>             wrote:
>
>
>                 ---------- Forwarded message ----------
>                 From:
>                 ​​
>                 Andrew Haley <[hidden email] <mailto:[hidden email]>>
>                 Date: Thu, Apr 13, 2017 at 9:52 AM
>                 Subject: Re: linux os processor optimizations for
>                 OpenJDK GC performance
>                 enhancement
>                 To:
>                 ​​
>                 [hidden email]
>                 <mailto:[hidden email]>
>
>
>                 On 13/04/17 16:33, Kim Barrett wrote:
>
>                     An application thread may touch memory in any
>                     region; there is no
>                     notion of a thread being "scoped" to a specific set
>                     of regions. While
>                     it might happen that a thread would only touch
>                     regions not being
>                     worked on by the collector, there is no a priori way
>                     to know that.
>
>
>                 ​​
>                 Surely there is: a thread could have its TLAB allocated
>                 from a region
>                 local to that socket (or core), and the GC thread for
>                 that region
>                 could run on the same socket.  It only works for young
>                 gen, but that's
>                 a lot of the problem.
>
>                 Andrew.
>
>
>
>
>                 --
>                 Thanks,
>                 Ramki
>
>
>
>
>             --
>             Thanks,
>             Ramki
>
>
>
>
>
>
>
> --
> Thanks,
> Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
Hi David,

Many thanks, please find attached text version of document for temporary
hosting.

Thanks,
Ramki

On Tue, Apr 18, 2017 at 5:42 PM, David Holmes <[hidden email]>
wrote:

> Hi Ramki,
>
> On 19/04/2017 8:27 AM, Ram Krishnan wrote:
>
>> Hi David,
>>
>> Thanks for the clarification.
>>
>> I have signed the OCA and mailed it to oracle-ca_us(at)oracle.com
>> <http://oracle.com>. Any help to expedite processing would be much
>> appreciated.
>>
>
> Can't help with that I'm afraid. :)
>
> We are seeing promising POC results (details in the google doc) for this
>> proposal -- would really appreciate your help in moving this forward.
>>
>
> If you email me a text/html version of the document I can host it on
> cr.openjdk.java.net temporarily. For this to become a JEP you will need a
> sponsor with the necessary OpenJDK credentials.
>
> http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html
>
> Cheers,
> David
>
>
> Thanks,
>> Ramki
>>
>> On Tue, Apr 18, 2017 at 1:55 PM, David Holmes <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi Ramki,
>>
>>     On 19/04/2017 12:34 AM, Ram Krishnan wrote:
>>
>>         Please find detailed proposal below, looking forward to your
>>         comments.
>>
>>         "Minimize application tail latency using
>>         cache-partitioning-aware G1GC" --
>>         https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbB
>> ZTclOWyg0arhuycXN94/edit
>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
>> BZTclOWyg0arhuycXN94/edit>
>>
>>
>>     All contributions to OpenJDK need to be hosted on OpenJDK
>>     infrastructure not on external systems like the above.
>>
>>     Also I can not see you listed as an OCA signatory. Are you an
>>     OpenJDK contributor?
>>
>>     Thanks,
>>     David
>>     -----
>>
>>         Thanks,
>>         Ramki
>>
>>         On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels
>>         <[hidden email] <mailto:[hidden email]>>
>>         wrote:
>>
>>             Maybe it would be better to concentrate the processor
>>             optimizations on
>>             accessors and barrriers without introducing a completely new
>> GC
>>             architecture. I can imagine that especially in the area of
>>             NUMA, TLAB, huge
>>             pages, cache consistency and possibly MMX extensions there
>>             is some
>>             potential.
>>
>>             Abandoning the global STW - while it seems like a pretty
>>             powerful change -
>>             is I guess not a good starter exercise. Especially since it
>>             is not only a
>>             question of mutator threads.
>>
>>             Gruss
>>             Bernd
>>             --
>>             http://bernd.eckenfels.net
>>             ------------------------------
>>             *From:* hotspot-gc-dev
>>             <[hidden email]
>>             <mailto:[hidden email]>> on
>>             behalf of Ram Krishnan <[hidden email]
>>             <mailto:[hidden email]>>
>>             *Sent:* Friday, April 14, 2017 6:36:27 AM
>>             *To:* Asif Qamar; Andrew Haley;
>>             [hidden email]
>>             <mailto:[hidden email]>
>>             *Subject:* Re: linux os processor optimizations for OpenJDK GC
>>             performance enhancement
>>
>>             Thanks Andrew.
>>
>>             ​>>Surely there is: a thread could have its TLAB allocated
>>             from a region
>>
>>                     local to that socket (or core), and the GC thread
>>                     for that region
>>                     could run on the same socket.  It only works for
>>                     young gen, but that's
>>                     a lot of the problem.
>>
>>
>>             A clarification -- does the TLAB allocation apply to tenured
>>             space also?
>>             If not, the above would work only for young gen cases where
>>             there is no
>>             promotion to tenured right?
>>
>>             Thanks,
>>             Ramki
>>
>>             On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan
>>             <[hidden email] <mailto:[hidden email]>>
>>             wrote:
>>
>>
>>                 ---------- Forwarded message ----------
>>                 From:
>>                 ​​
>>                 Andrew Haley <[hidden email] <mailto:[hidden email]>>
>>                 Date: Thu, Apr 13, 2017 at 9:52 AM
>>                 Subject: Re: linux os processor optimizations for
>>                 OpenJDK GC performance
>>                 enhancement
>>                 To:
>>                 ​​
>>                 [hidden email]
>>                 <mailto:[hidden email]>
>>
>>
>>                 On 13/04/17 16:33, Kim Barrett wrote:
>>
>>                     An application thread may touch memory in any
>>                     region; there is no
>>                     notion of a thread being "scoped" to a specific set
>>                     of regions. While
>>                     it might happen that a thread would only touch
>>                     regions not being
>>                     worked on by the collector, there is no a priori way
>>                     to know that.
>>
>>
>>                 ​​
>>                 Surely there is: a thread could have its TLAB allocated
>>                 from a region
>>                 local to that socket (or core), and the GC thread for
>>                 that region
>>                 could run on the same socket.  It only works for young
>>                 gen, but that's
>>                 a lot of the problem.
>>
>>                 Andrew.
>>
>>
>>
>>
>>                 --
>>                 Thanks,
>>                 Ramki
>>
>>
>>
>>
>>             --
>>             Thanks,
>>             Ramki
>>
>>
>>
>>
>>
>>
>>
>> --
>> Thanks,
>> Ramki
>>
>

--
Thanks,
Ramki

JEP-cache-partitioning-v1.txt (15K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

David Holmes
On 19/04/2017 11:38 AM, Ram Krishnan wrote:
> Hi David,
>
> Many thanks, please find attached text version of document for temporary
> hosting.

Hosted at: http://cr.openjdk.java.net/~dholmes/JEP-cache-partitioning-v1.txt

David

>
> Thanks,
> Ramki
>
> On Tue, Apr 18, 2017 at 5:42 PM, David Holmes <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Ramki,
>
>     On 19/04/2017 8:27 AM, Ram Krishnan wrote:
>
>         Hi David,
>
>         Thanks for the clarification.
>
>         I have signed the OCA and mailed it to
>         oracle-ca_us(at)oracle.com <http://oracle.com>
>         <http://oracle.com>. Any help to expedite processing would be much
>         appreciated.
>
>
>     Can't help with that I'm afraid. :)
>
>         We are seeing promising POC results (details in the google doc)
>         for this
>         proposal -- would really appreciate your help in moving this
>         forward.
>
>
>     If you email me a text/html version of the document I can host it on
>     cr.openjdk.java.net <http://cr.openjdk.java.net> temporarily. For
>     this to become a JEP you will need a sponsor with the necessary
>     OpenJDK credentials.
>
>     http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html
>     <http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html>
>
>     Cheers,
>     David
>
>
>         Thanks,
>         Ramki
>
>         On Tue, Apr 18, 2017 at 1:55 PM, David Holmes
>         <[hidden email] <mailto:[hidden email]>
>         <mailto:[hidden email]
>         <mailto:[hidden email]>>> wrote:
>
>             Hi Ramki,
>
>             On 19/04/2017 12:34 AM, Ram Krishnan wrote:
>
>                 Please find detailed proposal below, looking forward to your
>                 comments.
>
>                 "Minimize application tail latency using
>                 cache-partitioning-aware G1GC" --
>
>         https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit
>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit>
>
>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit
>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbBZTclOWyg0arhuycXN94/edit>>
>
>
>             All contributions to OpenJDK need to be hosted on OpenJDK
>             infrastructure not on external systems like the above.
>
>             Also I can not see you listed as an OCA signatory. Are you an
>             OpenJDK contributor?
>
>             Thanks,
>             David
>             -----
>
>                 Thanks,
>                 Ramki
>
>                 On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels
>                 <[hidden email] <mailto:[hidden email]>
>         <mailto:[hidden email] <mailto:[hidden email]>>>
>                 wrote:
>
>                     Maybe it would be better to concentrate the processor
>                     optimizations on
>                     accessors and barrriers without introducing a
>         completely new GC
>                     architecture. I can imagine that especially in the
>         area of
>                     NUMA, TLAB, huge
>                     pages, cache consistency and possibly MMX extensions
>         there
>                     is some
>                     potential.
>
>                     Abandoning the global STW - while it seems like a pretty
>                     powerful change -
>                     is I guess not a good starter exercise. Especially
>         since it
>                     is not only a
>                     question of mutator threads.
>
>                     Gruss
>                     Bernd
>                     --
>                     http://bernd.eckenfels.net
>                     ------------------------------
>                     *From:* hotspot-gc-dev
>                     <[hidden email]
>         <mailto:[hidden email]>
>                     <mailto:[hidden email]
>         <mailto:[hidden email]>>> on
>                     behalf of Ram Krishnan <[hidden email]
>         <mailto:[hidden email]>
>                     <mailto:[hidden email]
>         <mailto:[hidden email]>>>
>                     *Sent:* Friday, April 14, 2017 6:36:27 AM
>                     *To:* Asif Qamar; Andrew Haley;
>                     [hidden email]
>         <mailto:[hidden email]>
>                     <mailto:[hidden email]
>         <mailto:[hidden email]>>
>                     *Subject:* Re: linux os processor optimizations for
>         OpenJDK GC
>                     performance enhancement
>
>                     Thanks Andrew.
>
>                     ​>>Surely there is: a thread could have its TLAB
>         allocated
>                     from a region
>
>                             local to that socket (or core), and the GC
>         thread
>                             for that region
>                             could run on the same socket.  It only works for
>                             young gen, but that's
>                             a lot of the problem.
>
>
>                     A clarification -- does the TLAB allocation apply to
>         tenured
>                     space also?
>                     If not, the above would work only for young gen
>         cases where
>                     there is no
>                     promotion to tenured right?
>
>                     Thanks,
>                     Ramki
>
>                     On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan
>                     <[hidden email] <mailto:[hidden email]>
>         <mailto:[hidden email] <mailto:[hidden email]>>>
>                     wrote:
>
>
>                         ---------- Forwarded message ----------
>                         From:
>                         ​​
>                         Andrew Haley <[hidden email]
>         <mailto:[hidden email]> <mailto:[hidden email]
>         <mailto:[hidden email]>>>
>                         Date: Thu, Apr 13, 2017 at 9:52 AM
>                         Subject: Re: linux os processor optimizations for
>                         OpenJDK GC performance
>                         enhancement
>                         To:
>                         ​​
>                         [hidden email]
>         <mailto:[hidden email]>
>                         <mailto:[hidden email]
>         <mailto:[hidden email]>>
>
>
>                         On 13/04/17 16:33, Kim Barrett wrote:
>
>                             An application thread may touch memory in any
>                             region; there is no
>                             notion of a thread being "scoped" to a
>         specific set
>                             of regions. While
>                             it might happen that a thread would only touch
>                             regions not being
>                             worked on by the collector, there is no a
>         priori way
>                             to know that.
>
>
>                         ​​
>                         Surely there is: a thread could have its TLAB
>         allocated
>                         from a region
>                         local to that socket (or core), and the GC
>         thread for
>                         that region
>                         could run on the same socket.  It only works for
>         young
>                         gen, but that's
>                         a lot of the problem.
>
>                         Andrew.
>
>
>
>
>                         --
>                         Thanks,
>                         Ramki
>
>
>
>
>                     --
>                     Thanks,
>                     Ramki
>
>
>
>
>
>
>
>         --
>         Thanks,
>         Ramki
>
>
>
>
> --
> Thanks,
> Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
Many thanks David.

Thanks,
Ramki

On Tue, Apr 18, 2017 at 11:08 PM, David Holmes <[hidden email]>
wrote:

> On 19/04/2017 11:38 AM, Ram Krishnan wrote:
>
>> Hi David,
>>
>> Many thanks, please find attached text version of document for temporary
>> hosting.
>>
>
> Hosted at: http://cr.openjdk.java.net/~dholmes/JEP-cache-partitioning-
> v1.txt
>
> David
>
>
>> Thanks,
>> Ramki
>>
>> On Tue, Apr 18, 2017 at 5:42 PM, David Holmes <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi Ramki,
>>
>>     On 19/04/2017 8:27 AM, Ram Krishnan wrote:
>>
>>         Hi David,
>>
>>         Thanks for the clarification.
>>
>>         I have signed the OCA and mailed it to
>>         oracle-ca_us(at)oracle.com <http://oracle.com>
>>         <http://oracle.com>. Any help to expedite processing would be
>> much
>>         appreciated.
>>
>>
>>     Can't help with that I'm afraid. :)
>>
>>         We are seeing promising POC results (details in the google doc)
>>         for this
>>         proposal -- would really appreciate your help in moving this
>>         forward.
>>
>>
>>     If you email me a text/html version of the document I can host it on
>>     cr.openjdk.java.net <http://cr.openjdk.java.net> temporarily. For
>>     this to become a JEP you will need a sponsor with the necessary
>>     OpenJDK credentials.
>>
>>     http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html
>>     <http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html>
>>
>>     Cheers,
>>     David
>>
>>
>>         Thanks,
>>         Ramki
>>
>>         On Tue, Apr 18, 2017 at 1:55 PM, David Holmes
>>         <[hidden email] <mailto:[hidden email]>
>>         <mailto:[hidden email]
>>         <mailto:[hidden email]>>> wrote:
>>
>>             Hi Ramki,
>>
>>             On 19/04/2017 12:34 AM, Ram Krishnan wrote:
>>
>>                 Please find detailed proposal below, looking forward to
>> your
>>                 comments.
>>
>>                 "Minimize application tail latency using
>>                 cache-partitioning-aware G1GC" --
>>
>>         https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbB
>> ZTclOWyg0arhuycXN94/edit
>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
>> BZTclOWyg0arhuycXN94/edit>
>>
>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
>> BZTclOWyg0arhuycXN94/edit
>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
>> BZTclOWyg0arhuycXN94/edit>>
>>
>>
>>             All contributions to OpenJDK need to be hosted on OpenJDK
>>             infrastructure not on external systems like the above.
>>
>>             Also I can not see you listed as an OCA signatory. Are you an
>>             OpenJDK contributor?
>>
>>             Thanks,
>>             David
>>             -----
>>
>>                 Thanks,
>>                 Ramki
>>
>>                 On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels
>>                 <[hidden email] <mailto:[hidden email]>
>>         <mailto:[hidden email] <mailto:[hidden email]>>>
>>                 wrote:
>>
>>                     Maybe it would be better to concentrate the processor
>>                     optimizations on
>>                     accessors and barrriers without introducing a
>>         completely new GC
>>                     architecture. I can imagine that especially in the
>>         area of
>>                     NUMA, TLAB, huge
>>                     pages, cache consistency and possibly MMX extensions
>>         there
>>                     is some
>>                     potential.
>>
>>                     Abandoning the global STW - while it seems like a
>> pretty
>>                     powerful change -
>>                     is I guess not a good starter exercise. Especially
>>         since it
>>                     is not only a
>>                     question of mutator threads.
>>
>>                     Gruss
>>                     Bernd
>>                     --
>>                     http://bernd.eckenfels.net
>>                     ------------------------------
>>                     *From:* hotspot-gc-dev
>>                     <[hidden email]
>>         <mailto:[hidden email]>
>>                     <mailto:[hidden email]
>>         <mailto:[hidden email]>>> on
>>                     behalf of Ram Krishnan <[hidden email]
>>         <mailto:[hidden email]>
>>                     <mailto:[hidden email]
>>         <mailto:[hidden email]>>>
>>                     *Sent:* Friday, April 14, 2017 6:36:27 AM
>>                     *To:* Asif Qamar; Andrew Haley;
>>                     [hidden email]
>>         <mailto:[hidden email]>
>>                     <mailto:[hidden email]
>>         <mailto:[hidden email]>>
>>                     *Subject:* Re: linux os processor optimizations for
>>         OpenJDK GC
>>                     performance enhancement
>>
>>                     Thanks Andrew.
>>
>>                     ​>>Surely there is: a thread could have its TLAB
>>         allocated
>>                     from a region
>>
>>                             local to that socket (or core), and the GC
>>         thread
>>                             for that region
>>                             could run on the same socket.  It only works
>> for
>>                             young gen, but that's
>>                             a lot of the problem.
>>
>>
>>                     A clarification -- does the TLAB allocation apply to
>>         tenured
>>                     space also?
>>                     If not, the above would work only for young gen
>>         cases where
>>                     there is no
>>                     promotion to tenured right?
>>
>>                     Thanks,
>>                     Ramki
>>
>>                     On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan
>>                     <[hidden email] <mailto:[hidden email]>
>>         <mailto:[hidden email] <mailto:[hidden email]>>>
>>                     wrote:
>>
>>
>>                         ---------- Forwarded message ----------
>>                         From:
>>                         ​​
>>                         Andrew Haley <[hidden email]
>>         <mailto:[hidden email]> <mailto:[hidden email]
>>         <mailto:[hidden email]>>>
>>                         Date: Thu, Apr 13, 2017 at 9:52 AM
>>                         Subject: Re: linux os processor optimizations for
>>                         OpenJDK GC performance
>>                         enhancement
>>                         To:
>>                         ​​
>>                         [hidden email]
>>         <mailto:[hidden email]>
>>                         <mailto:[hidden email]
>>         <mailto:[hidden email]>>
>>
>>
>>                         On 13/04/17 16:33, Kim Barrett wrote:
>>
>>                             An application thread may touch memory in any
>>                             region; there is no
>>                             notion of a thread being "scoped" to a
>>         specific set
>>                             of regions. While
>>                             it might happen that a thread would only touch
>>                             regions not being
>>                             worked on by the collector, there is no a
>>         priori way
>>                             to know that.
>>
>>
>>                         ​​
>>                         Surely there is: a thread could have its TLAB
>>         allocated
>>                         from a region
>>                         local to that socket (or core), and the GC
>>         thread for
>>                         that region
>>                         could run on the same socket.  It only works for
>>         young
>>                         gen, but that's
>>                         a lot of the problem.
>>
>>                         Andrew.
>>
>>
>>
>>
>>                         --
>>                         Thanks,
>>                         Ramki
>>
>>
>>
>>
>>                     --
>>                     Thanks,
>>                     Ramki
>>
>>
>>
>>
>>
>>
>>
>>         --
>>         Thanks,
>>         Ramki
>>
>>
>>
>>
>> --
>> Thanks,
>> Ramki
>>
>


--
Thanks,
Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Volker Simonis
Hi Ram,

while this sounds interesting, I wonder how this plays together with
NUMA and Large page support. I understand that these are different
concepts, but in the end it all bails down tot he fact that memory
access is not uniform and we have different "kinds" of memory. It
seems to me that this fact is currently not very well handled in
HotSpot and needs some general redesign. There are for example two
JEPs [1,2] about improving the NUMA support in general and in G1. One
of the problems is that NUMA support doesn't play well together with
Large/Huge page support.

I think your proposal must be evaluated in the broader context of
enhancing the VM and GC for non-uniform memory architectures.
Otherwise it would be yet another point fix which doesn't plays well
together with other features like NUMA and LargePages.

Thanks,
Volker

[1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable
NUMA Mode by Default When Appropriate)
[2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC:
NUMA-Aware Allocation)

On Wed, Apr 19, 2017 at 4:04 PM, Ram Krishnan <[hidden email]> wrote:

> Many thanks David.
>
> Thanks,
> Ramki
>
> On Tue, Apr 18, 2017 at 11:08 PM, David Holmes <[hidden email]>
> wrote:
>
>> On 19/04/2017 11:38 AM, Ram Krishnan wrote:
>>
>>> Hi David,
>>>
>>> Many thanks, please find attached text version of document for temporary
>>> hosting.
>>>
>>
>> Hosted at: http://cr.openjdk.java.net/~dholmes/JEP-cache-partitioning-
>> v1.txt
>>
>> David
>>
>>
>>> Thanks,
>>> Ramki
>>>
>>> On Tue, Apr 18, 2017 at 5:42 PM, David Holmes <[hidden email]
>>> <mailto:[hidden email]>> wrote:
>>>
>>>     Hi Ramki,
>>>
>>>     On 19/04/2017 8:27 AM, Ram Krishnan wrote:
>>>
>>>         Hi David,
>>>
>>>         Thanks for the clarification.
>>>
>>>         I have signed the OCA and mailed it to
>>>         oracle-ca_us(at)oracle.com <http://oracle.com>
>>>         <http://oracle.com>. Any help to expedite processing would be
>>> much
>>>         appreciated.
>>>
>>>
>>>     Can't help with that I'm afraid. :)
>>>
>>>         We are seeing promising POC results (details in the google doc)
>>>         for this
>>>         proposal -- would really appreciate your help in moving this
>>>         forward.
>>>
>>>
>>>     If you email me a text/html version of the document I can host it on
>>>     cr.openjdk.java.net <http://cr.openjdk.java.net> temporarily. For
>>>     this to become a JEP you will need a sponsor with the necessary
>>>     OpenJDK credentials.
>>>
>>>     http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html
>>>     <http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html>
>>>
>>>     Cheers,
>>>     David
>>>
>>>
>>>         Thanks,
>>>         Ramki
>>>
>>>         On Tue, Apr 18, 2017 at 1:55 PM, David Holmes
>>>         <[hidden email] <mailto:[hidden email]>
>>>         <mailto:[hidden email]
>>>         <mailto:[hidden email]>>> wrote:
>>>
>>>             Hi Ramki,
>>>
>>>             On 19/04/2017 12:34 AM, Ram Krishnan wrote:
>>>
>>>                 Please find detailed proposal below, looking forward to
>>> your
>>>                 comments.
>>>
>>>                 "Minimize application tail latency using
>>>                 cache-partitioning-aware G1GC" --
>>>
>>>         https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbB
>>> ZTclOWyg0arhuycXN94/edit
>>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
>>> BZTclOWyg0arhuycXN94/edit>
>>>
>>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
>>> BZTclOWyg0arhuycXN94/edit
>>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
>>> BZTclOWyg0arhuycXN94/edit>>
>>>
>>>
>>>             All contributions to OpenJDK need to be hosted on OpenJDK
>>>             infrastructure not on external systems like the above.
>>>
>>>             Also I can not see you listed as an OCA signatory. Are you an
>>>             OpenJDK contributor?
>>>
>>>             Thanks,
>>>             David
>>>             -----
>>>
>>>                 Thanks,
>>>                 Ramki
>>>
>>>                 On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels
>>>                 <[hidden email] <mailto:[hidden email]>
>>>         <mailto:[hidden email] <mailto:[hidden email]>>>
>>>                 wrote:
>>>
>>>                     Maybe it would be better to concentrate the processor
>>>                     optimizations on
>>>                     accessors and barrriers without introducing a
>>>         completely new GC
>>>                     architecture. I can imagine that especially in the
>>>         area of
>>>                     NUMA, TLAB, huge
>>>                     pages, cache consistency and possibly MMX extensions
>>>         there
>>>                     is some
>>>                     potential.
>>>
>>>                     Abandoning the global STW - while it seems like a
>>> pretty
>>>                     powerful change -
>>>                     is I guess not a good starter exercise. Especially
>>>         since it
>>>                     is not only a
>>>                     question of mutator threads.
>>>
>>>                     Gruss
>>>                     Bernd
>>>                     --
>>>                     http://bernd.eckenfels.net
>>>                     ------------------------------
>>>                     *From:* hotspot-gc-dev
>>>                     <[hidden email]
>>>         <mailto:[hidden email]>
>>>                     <mailto:[hidden email]
>>>         <mailto:[hidden email]>>> on
>>>                     behalf of Ram Krishnan <[hidden email]
>>>         <mailto:[hidden email]>
>>>                     <mailto:[hidden email]
>>>         <mailto:[hidden email]>>>
>>>                     *Sent:* Friday, April 14, 2017 6:36:27 AM
>>>                     *To:* Asif Qamar; Andrew Haley;
>>>                     [hidden email]
>>>         <mailto:[hidden email]>
>>>                     <mailto:[hidden email]
>>>         <mailto:[hidden email]>>
>>>                     *Subject:* Re: linux os processor optimizations for
>>>         OpenJDK GC
>>>                     performance enhancement
>>>
>>>                     Thanks Andrew.
>>>
>>>                     >>Surely there is: a thread could have its TLAB
>>>         allocated
>>>                     from a region
>>>
>>>                             local to that socket (or core), and the GC
>>>         thread
>>>                             for that region
>>>                             could run on the same socket.  It only works
>>> for
>>>                             young gen, but that's
>>>                             a lot of the problem.
>>>
>>>
>>>                     A clarification -- does the TLAB allocation apply to
>>>         tenured
>>>                     space also?
>>>                     If not, the above would work only for young gen
>>>         cases where
>>>                     there is no
>>>                     promotion to tenured right?
>>>
>>>                     Thanks,
>>>                     Ramki
>>>
>>>                     On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan
>>>                     <[hidden email] <mailto:[hidden email]>
>>>         <mailto:[hidden email] <mailto:[hidden email]>>>
>>>                     wrote:
>>>
>>>
>>>                         ---------- Forwarded message ----------
>>>                         From:
>>>
>>>                         Andrew Haley <[hidden email]
>>>         <mailto:[hidden email]> <mailto:[hidden email]
>>>         <mailto:[hidden email]>>>
>>>                         Date: Thu, Apr 13, 2017 at 9:52 AM
>>>                         Subject: Re: linux os processor optimizations for
>>>                         OpenJDK GC performance
>>>                         enhancement
>>>                         To:
>>>
>>>                         [hidden email]
>>>         <mailto:[hidden email]>
>>>                         <mailto:[hidden email]
>>>         <mailto:[hidden email]>>
>>>
>>>
>>>                         On 13/04/17 16:33, Kim Barrett wrote:
>>>
>>>                             An application thread may touch memory in any
>>>                             region; there is no
>>>                             notion of a thread being "scoped" to a
>>>         specific set
>>>                             of regions. While
>>>                             it might happen that a thread would only touch
>>>                             regions not being
>>>                             worked on by the collector, there is no a
>>>         priori way
>>>                             to know that.
>>>
>>>
>>>
>>>                         Surely there is: a thread could have its TLAB
>>>         allocated
>>>                         from a region
>>>                         local to that socket (or core), and the GC
>>>         thread for
>>>                         that region
>>>                         could run on the same socket.  It only works for
>>>         young
>>>                         gen, but that's
>>>                         a lot of the problem.
>>>
>>>                         Andrew.
>>>
>>>
>>>
>>>
>>>                         --
>>>                         Thanks,
>>>                         Ramki
>>>
>>>
>>>
>>>
>>>                     --
>>>                     Thanks,
>>>                     Ramki
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>         --
>>>         Thanks,
>>>         Ramki
>>>
>>>
>>>
>>>
>>> --
>>> Thanks,
>>> Ramki
>>>
>>
>
>
> --
> Thanks,
> Ramki
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: linux os processor optimizations for OpenJDK GC performance enhancement

Ram Krishnan
Hi Volker,

This proposal is complementary to Large page and NUMA support. Please find
below the typical processor memory access hierarchy and the role of each
feature and other discussion points.

*Typical Processor Memory Access Hierarchy*
*Translation Lookaside Buffer (TLB)*
-- Role of TLB is to translate from virtual to physical memory address in HW
-- Large page support makes sure that TLB entries are not exhausted and
thus lead to better performance
-- Note: There are two TLB accesses in the case JVM is running on a
hardware virtualized platform such as KVM, VMware etc.


*Cache Hierarchy -- Scope of this proposal*

*System Memory*
-- Modern multi-socket machines typically have Non-uniform memory access
(NUMA), with not all memory equidistant from each socket.
-- NUMA support makes sure that memory local to a socket is used to the
extent possible and thus lead to better performance.

*Your concern with NUMA <-> large page interaction*
I can see your concern with JEP [1] with the following remark about NUMA
<-> large page interaction
>>When using large pages, where multiple regions map to the same physical
page, things get a >>bit complicated. For now, we will finesse this by
disabling NUMA optimizations as soon as >>the page size exceeds some small
multiple of region size (say 4), and deal with the more >>general case in a
separate later phase.

*Other ways to handle NUMA*
Use Linux numactl -- https://linux.die.net/man/8/numactl -- "numactl runs
processes with a specific NUMA scheduling or memory placement policy. The
policy is set for command and inherited by all of its children. In addition
it can set persistent policy for shared memory segments or files."

NUMA topology awareness (leveraging linux numactl) is supported by
orchestration systems such as OpenStack, Kubernetes etc.
http://redhatstackblog.redhat.com/2015/05/05/cpu-pinning-and-numa-topology-awareness-in-openstack-compute/
)

The caveats are 1) JVM should not require resources more than what a single
socket can provide in terms of CPU, Memory and PCIe I/O 2) There may be
resource fragmentation depending on the JVM resource request pattern. These
are typically not a problem in modern server class CPUs.

Hope this clarifies.

Thanks,
Ramki

On Tue, Apr 25, 2017 at 1:57 AM, Volker Simonis <[hidden email]>
wrote:

> Hi Ram,
>
> while this sounds interesting, I wonder how this plays together with
> NUMA and Large page support. I understand that these are different
> concepts, but in the end it all bails down tot he fact that memory
> access is not uniform and we have different "kinds" of memory. It
> seems to me that this fact is currently not very well handled in
> HotSpot and needs some general redesign. There are for example two
> JEPs [1,2] about improving the NUMA support in general and in G1. One
> of the problems is that NUMA support doesn't play well together with
> Large/Huge page support.
>
> I think your proposal must be evaluated in the broader context of
> enhancing the VM and GC for non-uniform memory architectures.
> Otherwise it would be yet another point fix which doesn't plays well
> together with other features like NUMA and LargePages.
>
> Thanks,
> Volker
>
> [1] https://bugs.openjdk.java.net/browse/JDK-8046153 (JEP 163: Enable
> NUMA Mode by Default When Appropriate)
> [2] https://bugs.openjdk.java.net/browse/JDK-8046147 (JEP 157: G1 GC:
> NUMA-Aware Allocation)
>
> On Wed, Apr 19, 2017 at 4:04 PM, Ram Krishnan <[hidden email]> wrote:
> > Many thanks David.
> >
> > Thanks,
> > Ramki
> >
> > On Tue, Apr 18, 2017 at 11:08 PM, David Holmes <[hidden email]>
> > wrote:
> >
> >> On 19/04/2017 11:38 AM, Ram Krishnan wrote:
> >>
> >>> Hi David,
> >>>
> >>> Many thanks, please find attached text version of document for
> temporary
> >>> hosting.
> >>>
> >>
> >> Hosted at: http://cr.openjdk.java.net/~dholmes/JEP-cache-partitioning-
> >> v1.txt
> >>
> >> David
> >>
> >>
> >>> Thanks,
> >>> Ramki
> >>>
> >>> On Tue, Apr 18, 2017 at 5:42 PM, David Holmes <[hidden email]
> >>> <mailto:[hidden email]>> wrote:
> >>>
> >>>     Hi Ramki,
> >>>
> >>>     On 19/04/2017 8:27 AM, Ram Krishnan wrote:
> >>>
> >>>         Hi David,
> >>>
> >>>         Thanks for the clarification.
> >>>
> >>>         I have signed the OCA and mailed it to
> >>>         oracle-ca_us(at)oracle.com <http://oracle.com>
> >>>         <http://oracle.com>. Any help to expedite processing would be
> >>> much
> >>>         appreciated.
> >>>
> >>>
> >>>     Can't help with that I'm afraid. :)
> >>>
> >>>         We are seeing promising POC results (details in the google doc)
> >>>         for this
> >>>         proposal -- would really appreciate your help in moving this
> >>>         forward.
> >>>
> >>>
> >>>     If you email me a text/html version of the document I can host it
> on
> >>>     cr.openjdk.java.net <http://cr.openjdk.java.net> temporarily. For
> >>>     this to become a JEP you will need a sponsor with the necessary
> >>>     OpenJDK credentials.
> >>>
> >>>     http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html
> >>>     <http://cr.openjdk.java.net/~mr/jep/jep-2.0-02.html>
> >>>
> >>>     Cheers,
> >>>     David
> >>>
> >>>
> >>>         Thanks,
> >>>         Ramki
> >>>
> >>>         On Tue, Apr 18, 2017 at 1:55 PM, David Holmes
> >>>         <[hidden email] <mailto:[hidden email]>
> >>>         <mailto:[hidden email]
> >>>         <mailto:[hidden email]>>> wrote:
> >>>
> >>>             Hi Ramki,
> >>>
> >>>             On 19/04/2017 12:34 AM, Ram Krishnan wrote:
> >>>
> >>>                 Please find detailed proposal below, looking forward to
> >>> your
> >>>                 comments.
> >>>
> >>>                 "Minimize application tail latency using
> >>>                 cache-partitioning-aware G1GC" --
> >>>
> >>>         https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBbB
> >>> ZTclOWyg0arhuycXN94/edit
> >>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
> >>> BZTclOWyg0arhuycXN94/edit>
> >>>
> >>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
> >>> BZTclOWyg0arhuycXN94/edit
> >>>         <https://docs.google.com/document/d/1rPMG4XUiE7cUEOogW1z5tBb
> >>> BZTclOWyg0arhuycXN94/edit>>
> >>>
> >>>
> >>>             All contributions to OpenJDK need to be hosted on OpenJDK
> >>>             infrastructure not on external systems like the above.
> >>>
> >>>             Also I can not see you listed as an OCA signatory. Are you
> an
> >>>             OpenJDK contributor?
> >>>
> >>>             Thanks,
> >>>             David
> >>>             -----
> >>>
> >>>                 Thanks,
> >>>                 Ramki
> >>>
> >>>                 On Thu, Apr 13, 2017 at 11:04 PM, Bernd Eckenfels
> >>>                 <[hidden email] <mailto:[hidden email]
> >
> >>>         <mailto:[hidden email] <mailto:[hidden email]
> >>>
> >>>                 wrote:
> >>>
> >>>                     Maybe it would be better to concentrate the
> processor
> >>>                     optimizations on
> >>>                     accessors and barrriers without introducing a
> >>>         completely new GC
> >>>                     architecture. I can imagine that especially in the
> >>>         area of
> >>>                     NUMA, TLAB, huge
> >>>                     pages, cache consistency and possibly MMX
> extensions
> >>>         there
> >>>                     is some
> >>>                     potential.
> >>>
> >>>                     Abandoning the global STW - while it seems like a
> >>> pretty
> >>>                     powerful change -
> >>>                     is I guess not a good starter exercise. Especially
> >>>         since it
> >>>                     is not only a
> >>>                     question of mutator threads.
> >>>
> >>>                     Gruss
> >>>                     Bernd
> >>>                     --
> >>>                     http://bernd.eckenfels.net
> >>>                     ------------------------------
> >>>                     *From:* hotspot-gc-dev
> >>>                     <[hidden email]
> >>>         <mailto:[hidden email]>
> >>>                     <mailto:[hidden email]
> >>>         <mailto:[hidden email]>>> on
> >>>                     behalf of Ram Krishnan <[hidden email]
> >>>         <mailto:[hidden email]>
> >>>                     <mailto:[hidden email]
> >>>         <mailto:[hidden email]>>>
> >>>                     *Sent:* Friday, April 14, 2017 6:36:27 AM
> >>>                     *To:* Asif Qamar; Andrew Haley;
> >>>                     [hidden email]
> >>>         <mailto:[hidden email]>
> >>>                     <mailto:[hidden email]
> >>>         <mailto:[hidden email]>>
> >>>                     *Subject:* Re: linux os processor optimizations for
> >>>         OpenJDK GC
> >>>                     performance enhancement
> >>>
> >>>                     Thanks Andrew.
> >>>
> >>>                     >>Surely there is: a thread could have its TLAB
> >>>         allocated
> >>>                     from a region
> >>>
> >>>                             local to that socket (or core), and the GC
> >>>         thread
> >>>                             for that region
> >>>                             could run on the same socket.  It only
> works
> >>> for
> >>>                             young gen, but that's
> >>>                             a lot of the problem.
> >>>
> >>>
> >>>                     A clarification -- does the TLAB allocation apply
> to
> >>>         tenured
> >>>                     space also?
> >>>                     If not, the above would work only for young gen
> >>>         cases where
> >>>                     there is no
> >>>                     promotion to tenured right?
> >>>
> >>>                     Thanks,
> >>>                     Ramki
> >>>
> >>>                     On Thu, Apr 13, 2017 at 12:55 PM, Ram Krishnan
> >>>                     <[hidden email] <mailto:[hidden email]>
> >>>         <mailto:[hidden email] <mailto:[hidden email]>>>
> >>>                     wrote:
> >>>
> >>>
> >>>                         ---------- Forwarded message ----------
> >>>                         From:
> >>>
> >>>                         Andrew Haley <[hidden email]
> >>>         <mailto:[hidden email]> <mailto:[hidden email]
> >>>         <mailto:[hidden email]>>>
> >>>                         Date: Thu, Apr 13, 2017 at 9:52 AM
> >>>                         Subject: Re: linux os processor optimizations
> for
> >>>                         OpenJDK GC performance
> >>>                         enhancement
> >>>                         To:
> >>>
> >>>                         [hidden email]
> >>>         <mailto:[hidden email]>
> >>>                         <mailto:[hidden email]
> >>>         <mailto:[hidden email]>>
> >>>
> >>>
> >>>                         On 13/04/17 16:33, Kim Barrett wrote:
> >>>
> >>>                             An application thread may touch memory in
> any
> >>>                             region; there is no
> >>>                             notion of a thread being "scoped" to a
> >>>         specific set
> >>>                             of regions. While
> >>>                             it might happen that a thread would only
> touch
> >>>                             regions not being
> >>>                             worked on by the collector, there is no a
> >>>         priori way
> >>>                             to know that.
> >>>
> >>>
> >>>
> >>>                         Surely there is: a thread could have its TLAB
> >>>         allocated
> >>>                         from a region
> >>>                         local to that socket (or core), and the GC
> >>>         thread for
> >>>                         that region
> >>>                         could run on the same socket.  It only works
> for
> >>>         young
> >>>                         gen, but that's
> >>>                         a lot of the problem.
> >>>
> >>>                         Andrew.
> >>>
> >>>
> >>>
> >>>
> >>>                         --
> >>>                         Thanks,
> >>>                         Ramki
> >>>
> >>>
> >>>
> >>>
> >>>                     --
> >>>                     Thanks,
> >>>                     Ramki
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>>         --
> >>>         Thanks,
> >>>         Ramki
> >>>
> >>>
> >>>
> >>>
> >>> --
> >>> Thanks,
> >>> Ramki
> >>>
> >>
> >
> >
> > --
> > Thanks,
> > Ramki
>



--
Thanks,
Ramki
Loading...