how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Andy Nuss
I am writing a custom servlet replacement for memcached, suited to my own needs.  When the servlet boots, I create a huge array, about 1% of total memory in a static variable of my Cache class.  This is for quickselect median sorting, when memory is about 75% full by my calculations, so I can throw away half of my cached entries in a background thread.

My cache consists of tiered ConcurrentHashMaps, whose key is a base64 string of about 30 chars that is completely random.  The mapped value is always about 100 chars.  But the first char of the key takes you to the second tier of ConcurrentHashMaps and so, at a shallow depth, until you get to the ConcurrentHashMap that maps the key to the string.  The cache has get and put and delete methods for these String key values, and again, though the strings are roughly the same length they are not exactly the same length.  And because the tree of maps is statically held, all the strings are in old generation heap.

The machine is 8gig or 16gig or more with 2 or 4 or more cpus.  The map fills and fills over time on the tomcat instance with many millions of mappings, again, because in static references, old generation.  So the tomcat machine is mostly old generation heap usage.  The question is what happens at about the time the quickselect prunes away half of the LRU entries, by deleting them from the map tree.  It is removing the key and value strings, that are smallish, and all over the heap.  Thus potentially reclaiming half of memory when done.  For future additions to the map once the garbage has been cleared.  What gc settings should I use in the tomcat program and what gc (jdk 8)?  And the most important question is this, how do I ensure my tomcat threads (using nio) do not hang for long periods of time, when the gc is sweeping the old generation.

E.g. a servlet get method would do puts or gets to the map to fulfill the servlet request, and I want to ensure that it always completes in micros, and does not hang, even when the GC is doing extensive reclamation.

Reply | Threaded
Open this post in threaded view
|

Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Kirk Pepperdine
Hi Andy,

I wouldn’t do anything special. The array is effectively a cache and in G1 that would be a humongous allocation (in most configurations). After that, it’s business as usual.

Kind regards,
Kirk Pepperdine

On Dec 13, 2017, at 3:07 PM, Andy Nuss <[hidden email]> wrote:

I am writing a custom servlet replacement for memcached, suited to my own needs.  When the servlet boots, I create a huge array, about 1% of total memory in a static variable of my Cache class.  This is for quickselect median sorting, when memory is about 75% full by my calculations, so I can throw away half of my cached entries in a background thread.

My cache consists of tiered ConcurrentHashMaps, whose key is a base64 string of about 30 chars that is completely random.  The mapped value is always about 100 chars.  But the first char of the key takes you to the second tier of ConcurrentHashMaps and so, at a shallow depth, until you get to the ConcurrentHashMap that maps the key to the string.  The cache has get and put and delete methods for these String key values, and again, though the strings are roughly the same length they are not exactly the same length.  And because the tree of maps is statically held, all the strings are in old generation heap.

The machine is 8gig or 16gig or more with 2 or 4 or more cpus.  The map fills and fills over time on the tomcat instance with many millions of mappings, again, because in static references, old generation.  So the tomcat machine is mostly old generation heap usage.  The question is what happens at about the time the quickselect prunes away half of the LRU entries, by deleting them from the map tree.  It is removing the key and value strings, that are smallish, and all over the heap.  Thus potentially reclaiming half of memory when done.  For future additions to the map once the garbage has been cleared.  What gc settings should I use in the tomcat program and what gc (jdk 8)?  And the most important question is this, how do I ensure my tomcat threads (using nio) do not hang for long periods of time, when the gc is sweeping the old generation.

E.g. a servlet get method would do puts or gets to the map to fulfill the servlet request, and I want to ensure that it always completes in micros, and does not hang, even when the GC is doing extensive reclamation.


Reply | Threaded
Open this post in threaded view
|

Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Andy Nuss
Thanks Kirk,

The array is just a temporary buffer held onto that has its entries cleared to null after my LRU sweep.  The references that are freed to GC are in the ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of small strings, which G1 has to reallocate into bigger chunks, and mark freed, and so, so that I can in the future add new such strings to the LRU cache.  The concern was whether this sweep of old gen strings scattered all over the huge heap would cause tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less pre-emptively.  Are you basically saying that, "no tomcat servlet response time won't be significantly affected by G1 sweep"?

Also, I was wondering does anyone know how memcached works, and why it is used in preference to a custom design such as mine which seems a lot simpler.  I.e. it seems that with "memcached", you have to worry about "slabs" and memcached's own heap management, and waste a lot of memory.

Andy


On Wednesday, December 13, 2017, 7:54:36 AM PST, Kirk Pepperdine <[hidden email]> wrote:


Hi Andy,

I wouldn’t do anything special. The array is effectively a cache and in G1 that would be a humongous allocation (in most configurations). After that, it’s business as usual.

Kind regards,
Kirk Pepperdine

On Dec 13, 2017, at 3:07 PM, Andy Nuss <[hidden email]> wrote:

I am writing a custom servlet replacement for memcached, suited to my own needs.  When the servlet boots, I create a huge array, about 1% of total memory in a static variable of my Cache class.  This is for quickselect median sorting, when memory is about 75% full by my calculations, so I can throw away half of my cached entries in a background thread.

My cache consists of tiered ConcurrentHashMaps, whose key is a base64 string of about 30 chars that is completely random.  The mapped value is always about 100 chars.  But the first char of the key takes you to the second tier of ConcurrentHashMaps and so, at a shallow depth, until you get to the ConcurrentHashMap that maps the key to the string.  The cache has get and put and delete methods for these String key values, and again, though the strings are roughly the same length they are not exactly the same length.  And because the tree of maps is statically held, all the strings are in old generation heap.

The machine is 8gig or 16gig or more with 2 or 4 or more cpus.  The map fills and fills over time on the tomcat instance with many millions of mappings, again, because in static references, old generation.  So the tomcat machine is mostly old generation heap usage.  The question is what happens at about the time the quickselect prunes away half of the LRU entries, by deleting them from the map tree.  It is removing the key and value strings, that are smallish, and all over the heap.  Thus potentially reclaiming half of memory when done.  For future additions to the map once the garbage has been cleared.  What gc settings should I use in the tomcat program and what gc (jdk 8)?  And the most important question is this, how do I ensure my tomcat threads (using nio) do not hang for long periods of time, when the gc is sweeping the old generation.

E.g. a servlet get method would do puts or gets to the map to fulfill the servlet request, and I want to ensure that it always completes in micros, and does not hang, even when the GC is doing extensive reclamation.


Reply | Threaded
Open this post in threaded view
|

Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Kirk Pepperdine
Hi Andy,

On Dec 13, 2017, at 8:34 PM, Andy Nuss <[hidden email]> wrote:

Thanks Kirk,

The array is just a temporary buffer held onto that has its entries cleared to null after my LRU sweep.  The references that are freed to GC are in the ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of small strings,


which G1 has to reallocate into bigger chunks, and mark freed, and so,

Not sure I understand this bit. Can you explain what you mean by this?

so that I can in the future add new such strings to the LRU cache.  The concern was whether this sweep of old gen strings scattered all over the huge heap would cause tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less pre-emptively.  Are you basically saying that, "no tomcat servlet response time won't be significantly affected by G1 sweep”?

I’m not sure what you’re goal is here. I would say, design as needed and let the collector do it’s thing. That said, temporary humongous allocations are not well managed by the G1. Better to create up front and cache it for future downstream use.

As for a sweep… what I think you’re asking about is object copy costs. These costs should and typically do dominate pause time. Object copy cost is proportional to the number of live objects in the collection set (CSet). Strings are dedup’ed after age 5 so with most heap configurations, duplicate Strings will be dedup’ed before they hit tenured.

Also, I was wondering does anyone know how memcached works, and why it is used in preference to a custom design such as mine which seems a lot simpler.  I.e. it seems that with "memcached", you have to worry about "slabs" and memcached's own heap management, and waste a lot of memory.

I’m the wrong person to defend the use of memcached. It certainly does serve a purpose.. that said, to use it to offload temp object means you end up creating your own garbage collector… and as you can see by the efforts GC engineers put into each implementation, it’s a non-trivial under-taking.

Kind regards,
Kirk

Reply | Threaded
Open this post in threaded view
|

Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Andy Nuss
Let me try to explain.  On a 16 gig heap, I anticipate almost 97% of the heap in use at any given moment is ~30 and ~100 char strings.  The rest is small pointer objects in the ConcurrentHashMap, also longly held, and tomcat's nio stuff.  So at any moment in time, most of the in-use heap (and I will keep about 20% unused to aid gc), is a huge number of longly held strings.  Over time, as the single servlet receives requests to cache newly accessed key/val pairs, the number of strings grows to its maximum I allow.  At that point, a background thread sweeps away half of the LRU key/value pairs (30,100 char strings).  Now they are unreferenced and sweepable.  That's all I do.  Then the servlet keeps receiving requests to put more key/val pairs.  As well as handle get requests.  At the point in time where I clear all the LRU pairs, which might take minutes to iterate, G1 can start doing its thing, not that it will know to do so immediately.  I'm worried that whenever G1 does its thing, because the sweepable stuff is 100% small oldgen objects, servlet threads will timeout on the client side.  Not that this happens several times a day, but if G1 does take a long time to sweep a massive heap with all oldgen objects that are small, the *only* concern is that servlet requests will time out during this period.

Realize I know nothing about GC, except that periodically, eclipse hangs due to gc and then crashes on me.  I.e. after 4 hours of editing.  And that all the blogs I found talked about newgen and TLAB and other things assuming typical ephemeral usage going on which is not at all the case on this particular machine instance.  Again, all longly held small strings, growing and growing over time steadily, suddenly half are freed reference wise by me.

If there are no GC settings that make that sweepable stuff happen in a non-blocking thread, and tomcat's servlets could all hang once every other day for many many seconds on this 16 gig machine (the so-called long gc-pause that people blog about), that might motivate me to abandon this and use the memcached product.


On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk Pepperdine <[hidden email]> wrote:


Hi Andy,

On Dec 13, 2017, at 8:34 PM, Andy Nuss <[hidden email]> wrote:

Thanks Kirk,

The array is just a temporary buffer held onto that has its entries cleared to null after my LRU sweep.  The references that are freed to GC are in the ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of small strings,


which G1 has to reallocate into bigger chunks, and mark freed, and so,

Not sure I understand this bit. Can you explain what you mean by this?

so that I can in the future add new such strings to the LRU cache.  The concern was whether this sweep of old gen strings scattered all over the huge heap would cause tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less pre-emptively.  Are you basically saying that, "no tomcat servlet response time won't be significantly affected by G1 sweep”?

I’m not sure what you’re goal is here. I would say, design as needed and let the collector do it’s thing. That said, temporary humongous allocations are not well managed by the G1. Better to create up front and cache it for future downstream use.

As for a sweep… what I think you’re asking about is object copy costs. These costs should and typically do dominate pause time. Object copy cost is proportional to the number of live objects in the collection set (CSet). Strings are dedup’ed after age 5 so with most heap configurations, duplicate Strings will be dedup’ed before they hit tenured.

Also, I was wondering does anyone know how memcached works, and why it is used in preference to a custom design such as mine which seems a lot simpler.  I.e. it seems that with "memcached", you have to worry about "slabs" and memcached's own heap management, and waste a lot of memory.

I’m the wrong person to defend the use of memcached. It certainly does serve a purpose.. that said, to use it to offload temp object means you end up creating your own garbage collector… and as you can see by the efforts GC engineers put into each implementation, it’s a non-trivial under-taking.

Kind regards,
Kirk

Reply | Threaded
Open this post in threaded view
|

AW: how to tune gc for tomcat server on large machine that usesalmost all old generation smallish objects

Bernd Eckenfels-4

Dont worry about the sweep, G1 is like CMS mostly concurrent.

 

I would suggest to test it with GC log enabled and then you can worry. Most likely you want to allow it to kick of GC later so you can save some concurrent CPU. You also need to fear the FullGC when your regions become to fragmented (this hopefully does not happen if the LRU frees lots of object allocated at the same time in the same Region, but you never know. You might unfortunatelly Need to have 30% or more unused heap to defend against that.

 

There is BTW a Mailing list for GC Usage as opposed to development.

 

Gruss

Bernd

--
http://bernd.eckenfels.net

 

Von: [hidden email]
Gesendet: Mittwoch, 13. Dezember 2017 22:56
An: [hidden email]; [hidden email]
Betreff: Re: how to tune gc for tomcat server on large machine that usesalmost all old generation smallish objects

 

Let me try to explain.  On a 16 gig heap, I anticipate almost 97% of the heap in use at any given moment is ~30 and ~100 char strings.  The rest is small pointer objects in the ConcurrentHashMap, also longly held, and tomcat's nio stuff.  So at any moment in time, most of the in-use heap (and I will keep about 20% unused to aid gc), is a huge number of longly held strings.  Over time, as the single servlet receives requests to cache newly accessed key/val pairs, the number of strings grows to its maximum I allow.  At that point, a background thread sweeps away half of the LRU key/value pairs (30,100 char strings).  Now they are unreferenced and sweepable.  That's all I do.  Then the servlet keeps receiving requests to put more key/val pairs.  As well as handle get requests.  At the point in time where I clear all the LRU pairs, which might take minutes to iterate, G1 can start doing its thing, not that it will know to do so immediately.  I'm worried that whenever G1 does its thing, because the sweepable stuff is 100% small oldgen objects, servlet threads will timeout on the client side.  Not that this happens several times a day, but if G1 does take a long time to sweep a massive heap with all oldgen objects that are small, the *only* concern is that servlet requests will time out during this period.

 

Realize I know nothing about GC, except that periodically, eclipse hangs due to gc and then crashes on me.  I.e. after 4 hours of editing.  And that all the blogs I found talked about newgen and TLAB and other things assuming typical ephemeral usage going on which is not at all the case on this particular machine instance.  Again, all longly held small strings, growing and growing over time steadily, suddenly half are freed reference wise by me.

 

If there are no GC settings that make that sweepable stuff happen in a non-blocking thread, and tomcat's servlets could all hang once every other day for many many seconds on this 16 gig machine (the so-called long gc-pause that people blog about), that might motivate me to abandon this and use the memcached product.

 

 

On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk Pepperdine <[hidden email]> wrote:

 

 

Hi Andy,

 

On Dec 13, 2017, at 8:34 PM, Andy Nuss <[hidden email]> wrote:

 

Thanks Kirk,

 

The array is just a temporary buffer held onto that has its entries cleared to null after my LRU sweep.  The references that are freed to GC are in the ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of small strings,

 



which G1 has to reallocate into bigger chunks, and mark freed, and so,

 

Not sure I understand this bit. Can you explain what you mean by this?



so that I can in the future add new such strings to the LRU cache.  The concern was whether this sweep of old gen strings scattered all over the huge heap would cause tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less pre-emptively.  Are you basically saying that, "no tomcat servlet response time won't be significantly affected by G1 sweep”?

 

I’m not sure what you’re goal is here. I would say, design as needed and let the collector do it’s thing. That said, temporary humongous allocations are not well managed by the G1. Better to create up front and cache it for future downstream use.

 

As for a sweep… what I think you’re asking about is object copy costs. These costs should and typically do dominate pause time. Object copy cost is proportional to the number of live objects in the collection set (CSet). Strings are dedup’ed after age 5 so with most heap configurations, duplicate Strings will be dedup’ed before they hit tenured.

 

Also, I was wondering does anyone know how memcached works, and why it is used in preference to a custom design such as mine which seems a lot simpler.  I.e. it seems that with "memcached", you have to worry about "slabs" and memcached's own heap management, and waste a lot of memory.

 

I’m the wrong person to defend the use of memcached. It certainly does serve a purpose.. that said, to use it to offload temp object means you end up creating your own garbage collector… and as you can see by the efforts GC engineers put into each implementation, it’s a non-trivial under-taking.

 

Kind regards,

Kirk

 

 

Reply | Threaded
Open this post in threaded view
|

Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Kirk Pepperdine
In reply to this post by Andy Nuss
Hi Andy,

What you are describing is fairly routine caching behavior with a small twist in that the objects being held in this case are quite regular in size. Again, I wouldn’t design with the collector in mind where as I certainly design with memory efficiency as a reasonable goal.

As for GC, in the JVM there are two basic strategies which I then to label evacuating and in-place. G1 is completely evacuating and consequently the cost (aka pause duration) is (in most cases) a function of the number of live objects. The trigger for a young generational collection is when you have consumed all of the Eden regions. Thus the frequency is the size of Eden divided by your allocation rate. The trigger for a Concurrent Mark of tenured is when it consumes 45% of available heap. Thus your Concurrent Mark frequency is 45% to the size of heap / promotion rate. Additionally G1 keeps some memory on reserve to avoid painting the collector into a Full GC corner.

Issues specific to caching are; very large live sets that result in inflated copy costs as data flows from Eden through survivor and finally into tenured space. In these case I’ve found that it’s better slow down the frequency of collections  as this will result in you experiencing the same pause time but less frequently. There is also another tactic that I’ve found to be helpful on occasion is to lower the Initiating Heap Occupancy Percent (aka IHOP) from it’s default value of 45% into a value that sees is consistantly in the live set. Meaning, you’ll run back to back concurrent cycles. And I’ve got a bag of other tactics that I’ve used with varying degrees of success. Which one would be for you? I’ve no idea. Tuning a collector isn’t something you can do after reading a few tips from StackOverflow. GC behavior is an emergent reaction to the workload that you place on it meaning the only way to really understand how it’s all going to work is to run production like experiments (or better yet, run in production) and look at a GC log. (Shameless plug.. Censum, my GC log visualization tooling helps).

I understand your concerns in wanting to avoid the dreaded GC pause but I’d also look at your efforts in two ways. First, it’s an opportunity to get a better understanding of GC and secondly, recognize that this feels like a premature optimization as you’re trying to solve a problem that you, well none of us to be fair and honest, fully understand and may not actually have. Let me recommend some names that have written about how G1 works. Charlie Hunt in his performance tuning book, Poonan Parhhar in her blog entries, Monica Beckwith in a number of different places, Simone Bordet in a number of places. I should add that [hidden email] is a more appropriate list for these types of questions. We also have a number of GC related discussions on our mailing list, [hidden email]. I’ve also recorded a session with Dr. Heinz Kabutz on his https://javaspecialists.teachable.com/ site. I’ll get an exact link if you email me offline.

Kind regards,
Kirk Pepperdine
 
On Dec 13, 2017, at 9:55 PM, Andy Nuss <[hidden email]> wrote:

Let me try to explain.  On a 16 gig heap, I anticipate almost 97% of the heap in use at any given moment is ~30 and ~100 char strings.  The rest is small pointer objects in the ConcurrentHashMap, also longly held, and tomcat's nio stuff.  So at any moment in time, most of the in-use heap (and I will keep about 20% unused to aid gc), is a huge number of longly held strings.  Over time, as the single servlet receives requests to cache newly accessed key/val pairs, the number of strings grows to its maximum I allow.  At that point, a background thread sweeps away half of the LRU key/value pairs (30,100 char strings).  Now they are unreferenced and sweepable.  That's all I do.  Then the servlet keeps receiving requests to put more key/val pairs.  As well as handle get requests.  At the point in time where I clear all the LRU pairs, which might take minutes to iterate, G1 can start doing its thing, not that it will know to do so immediately.  I'm worried that whenever G1 does its thing, because the sweepable stuff is 100% small oldgen objects, servlet threads will timeout on the client side.  Not that this happens several times a day, but if G1 does take a long time to sweep a massive heap with all oldgen objects that are small, the *only* concern is that servlet requests will time out during this period.

Realize I know nothing about GC, except that periodically, eclipse hangs due to gc and then crashes on me.  I.e. after 4 hours of editing.  And that all the blogs I found talked about newgen and TLAB and other things assuming typical ephemeral usage going on which is not at all the case on this particular machine instance.  Again, all longly held small strings, growing and growing over time steadily, suddenly half are freed reference wise by me.

If there are no GC settings that make that sweepable stuff happen in a non-blocking thread, and tomcat's servlets could all hang once every other day for many many seconds on this 16 gig machine (the so-called long gc-pause that people blog about), that might motivate me to abandon this and use the memcached product.


On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk Pepperdine <[hidden email]> wrote:


Hi Andy,

On Dec 13, 2017, at 8:34 PM, Andy Nuss <[hidden email]> wrote:

Thanks Kirk,

The array is just a temporary buffer held onto that has its entries cleared to null after my LRU sweep.  The references that are freed to GC are in the ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of small strings,


which G1 has to reallocate into bigger chunks, and mark freed, and so,

Not sure I understand this bit. Can you explain what you mean by this?

so that I can in the future add new such strings to the LRU cache.  The concern was whether this sweep of old gen strings scattered all over the huge heap would cause tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less pre-emptively.  Are you basically saying that, "no tomcat servlet response time won't be significantly affected by G1 sweep”?

I’m not sure what you’re goal is here. I would say, design as needed and let the collector do it’s thing. That said, temporary humongous allocations are not well managed by the G1. Better to create up front and cache it for future downstream use.

As for a sweep… what I think you’re asking about is object copy costs. These costs should and typically do dominate pause time. Object copy cost is proportional to the number of live objects in the collection set (CSet). Strings are dedup’ed after age 5 so with most heap configurations, duplicate Strings will be dedup’ed before they hit tenured.

Also, I was wondering does anyone know how memcached works, and why it is used in preference to a custom design such as mine which seems a lot simpler.  I.e. it seems that with "memcached", you have to worry about "slabs" and memcached's own heap management, and waste a lot of memory.

I’m the wrong person to defend the use of memcached. It certainly does serve a purpose.. that said, to use it to offload temp object means you end up creating your own garbage collector… and as you can see by the efforts GC engineers put into each implementation, it’s a non-trivial under-taking.

Kind regards,
Kirk


Reply | Threaded
Open this post in threaded view
|

Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Michal Frajt

Hi Andy,

How many ConcurrentHashMap instances do you actually have in your 16 gig heap? Not sure if I understand your map structure correctly - "But the first char of the key takes you to the second tier of ConcurrentHashMaps and so". Could you provide historgram of your application when running full (before you start LRU sweeping)? Do you need the ConcurrentHashMaps if you have several tiers which already act as concurrent segments? Did you consider open addressing maps (Trove, Koloboke) eliminating the need of the map nodes (there would be some trade off when removing)? Did you consider to store char or even byte array instead of the String instance? Do your remove ConcurrentHashMap tier when it gets completely empty after the LRU sweep? All this might significantly reduce the heap requirement shortening the GC time. 

Regards, 
Michal 
 


Od: "hotspot-gc-dev" [hidden email]
Komu: "Andy Nuss" [hidden email]
Kopie: "[hidden email] openjdk.java.net" [hidden email]
Datum: Thu, 14 Dec 2017 08:19:21 +0100
Předmet: Re: how to tune gc for tomcat server on large machine that uses almost all old generation smallish objects

Hi Andy,

What you are describing is fairly routine caching behavior with a small twist in that the objects being held in this case are quite regular in size. Again, I wouldn’t design with the collector in mind where as I certainly design with memory efficiency as a reasonable goal.

As for GC, in the JVM there are two basic strategies which I then to label evacuating and in-place. G1 is completely evacuating and consequently the cost (aka pause duration) is (in most cases) a function of the number of live objects. The trigger for a young generational collection is when you have consumed all of the Eden regions. Thus the frequency is the size of Eden divided by your allocation rate. The trigger for a Concurrent Mark of tenured is when it consumes 45% of available heap. Thus your Concurrent Mark frequency is 45% to the size of heap / promotion rate. Additionally G1 keeps some memory on reserve to avoid painting the collector into a Full GC corner.

Issues specific to caching are; very large live sets that result in inflated copy costs as data flows from Eden through survivor and finally into tenured space. In these case I’ve found that it’s better slow down the frequency of collections  as this will result in you experiencing the same pause time but less frequently. There is also another tactic that I’ve found to be helpful on occasion is to lower the Initiating Heap Occupancy Percent (aka IHOP) from it’s default value of 45% into a value that sees is consistantly in the live set. Meaning, you’ll run back to back concurrent cycles. And I’ve got a bag of other tactics that I’ve used with varying degrees of success. Which one would be for you? I’ve no idea. Tuning a collector isn’t something you can do after reading a few tips from StackOverflow. GC behavior is an emergent reaction to the workload that you place on it meaning the only way to really understand how it’s all going to work is to run production like experiments (or better yet, run in production) and look at a GC log. (Shameless plug.. Censum, my GC log visualization tooling helps).

I understand your concerns in wanting to avoid the dreaded GC pause but I’d also look at your efforts in two ways. First, it’s an opportunity to get a better understanding of GC and secondly, recognize that this feels like a premature optimization as you’re trying to solve a problem that you, well none of us to be fair and honest, fully understand and may not actually have. Let me recommend some names that have written about how G1 works. Charlie Hunt in his performance tuning book, Poonan Parhhar in her blog entries, Monica Beckwith in a number of different places, Simone Bordet in a number of places. I should add that [hidden email] is a more appropriate list for these types of questions. We also have a number of GC related discussions on our mailing list, [hidden email]. I’ve also recorded a session with Dr. Heinz Kabutz on his https://javaspecialists.teachable.com/ site. I’ll get an exact link if you email me offline.

Kind regards,
Kirk Pepperdine
 
On Dec 13, 2017, at 9:55 PM, Andy Nuss <[hidden email]> wrote:

Let me try to explain.  On a 16 gig heap, I anticipate almost 97% of the heap in use at any given moment is ~30 and ~100 char strings.  The rest is small pointer objects in the ConcurrentHashMap, also longly held, and tomcat's nio stuff.  So at any moment in time, most of the in-use heap (and I will keep about 20% unused to aid gc), is a huge number of longly held strings.  Over time, as the single servlet receives requests to cache newly accessed key/val pairs, the number of strings grows to its maximum I allow.  At that point, a background thread sweeps away half of the LRU key/value pairs (30,100 char strings).  Now they are unreferenced and sweepable.  That's all I do.  Then the servlet keeps receiving requests to put more key/val pairs.  As well as handle get requests.  At the point in time where I clear all the LRU pairs, which might take minutes to iterate, G1 can start doing its thing, not that it will know to do so immediately.  I'm worried that whenever G1 does its thing, because the sweepable stuff is 100% small oldgen objects, servlet threads will timeout on the client side.  Not that this happens several times a day, but if G1 does take a long time to sweep a massive heap with all oldgen objects that are small, the *only* concern is that servlet requests will time out during this period.

Realize I know nothing about GC, except that periodically, eclipse hangs due to gc and then crashes on me.  I.e. after 4 hours of editing.  And that all the blogs I found talked about newgen and TLAB and other things assuming typical ephemeral usage going on which is not at all the case on this particular machine instance.  Again, all longly held small strings, growing and growing over time steadily, suddenly half are freed reference wise by me.

If there are no GC settings that make that sweepable stuff happen in a non-blocking thread, and tomcat's servlets could all hang once every other day for many many seconds on this 16 gig machine (the so-called long gc-pause that people blog about), that might motivate me to abandon this and use the memcached product.


On Wednesday, December 13, 2017, 12:15:38 PM PST, Kirk Pepperdine <[hidden email]> wrote:


Hi Andy,

On Dec 13, 2017, at 8:34 PM, Andy Nuss <[hidden email]> wrote:

Thanks Kirk,

The array is just a temporary buffer held onto that has its entries cleared to null after my LRU sweep.  The references that are freed to GC are in the ConcurrentHashMaps, and are all 30 char and 100 char strings, key/vals, but not precisely, so I assume that when I do my LRU sweep when needed, its freeing a ton of small strings,


which G1 has to reallocate into bigger chunks, and mark freed, and so,

Not sure I understand this bit. Can you explain what you mean by this?

so that I can in the future add new such strings to the LRU cache.  The concern was whether this sweep of old gen strings scattered all over the huge heap would cause tomcat nio-based threads to "hang", not respond quickly, or would G1 do things less pre-emptively.  Are you basically saying that, "no tomcat servlet response time won't be significantly affected by G1 sweep”?

I’m not sure what you’re goal is here. I would say, design as needed and let the collector do it’s thing. That said, temporary humongous allocations are not well managed by the G1. Better to create up front and cache it for future downstream use.

As for a sweep… what I think you’re asking about is object copy costs. These costs should and typically do dominate pause time. Object copy cost is proportional to the number of live objects in the collection set (CSet). Strings are dedup’ed after age 5 so with most heap configurations, duplicate Strings will be dedup’ed before they hit tenured.

Also, I was wondering does anyone know how memcached works, and why it is used in preference to a custom design such as mine which seems a lot simpler.  I.e. it seems that with "memcached", you have to worry about "slabs" and memcached's own heap management, and waste a lot of memory.

I’m the wrong person to defend the use of memcached. It certainly does serve a purpose.. that said, to use it to offload temp object means you end up creating your own garbage collector… and as you can see by the efforts GC engineers put into each implementation, it’s a non-trivial under-taking.

Kind regards,
Kirk