[10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

[10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Ivan Gerasimov
Hello everybody!

The class sun.security.ssl.SSLSessionContextImpl maintains caches for
the sessions reuse.
Access to the cache from threads is serialized.
It was reported that under heavy load the time of waiting for the turn
to access the synchronized methods outweighs the time of creating a new
session.

It is proposed to introduce a flag that will allow to avoid using the
cache altogether.
Would you please help review the proposed fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/

--
With kind regards,
Ivan Gerasimov

Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Bernd Eckenfels-4
Hello,

There is already a property to set the cache size, would it be enough to re-purpose a cache size of 0 to turn it off?

Are there numbers to show when this is actually a problem? Is this only for 100% Cache misses? Maybe the cache itself needs some optimizations? (It is hard to imagine that a saved handshake does not compensate for hundreds of gets - especially if the current version still would generate a cache key)

Gruss
Bernd


From: security-dev <[hidden email]> on behalf of Ivan Gerasimov <[hidden email]>
Sent: Wednesday, November 8, 2017 3:24:54 AM
To: [hidden email]
Subject: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck
 
Hello everybody!

The class sun.security.ssl.SSLSessionContextImpl maintains caches for
the sessions reuse.
Access to the cache from threads is serialized.
It was reported that under heavy load the time of waiting for the turn
to access the synchronized methods outweighs the time of creating a new
session.

It is proposed to introduce a flag that will allow to avoid using the
cache altogether.
Would you please help review the proposed fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/

--
With kind regards,
Ivan Gerasimov

Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Ivan Gerasimov

Thank you Bernd for looking into this!


On 11/7/17 11:42 PM, Bernd Eckenfels wrote:
Hello,

There is already a property to set the cache size, would it be enough to re-purpose a cache size of 0 to turn it off?

Currently, setting the cache size to 0 means that it is unbounded, so that the entries are removed from the cache only when they get expired.

Are there numbers to show when this is actually a problem? Is this only for 100% Cache misses?
We've seen dumps with lots of threads blocked waiting on the Cache.get()/put()/remove().
This is primarily due to the time spent in the cache cleaning routines (see sun.security.util.MemoryCache.emptyQueue()/expungeExpiredEntries()), which are executed inside the synchronization block.
This time is linear on the size of cache, but limiting the cache size doesn't always help either, as the amount of cleanup work also increases with a bounded cache.

Allowing to avoid to use the cache removed this bottleneck and under certain conditions the throughput increased from 35 to 120 sessions per second.

Please note that the proposed option javax.net.ssl.needCacheSessions will be true by default, so the default behavior will not change.
Only in specific situation, if it is proved that turning off the cache will improve performance, this option will be recommended to be set to false.

Maybe the cache itself needs some optimizations?
Certainly, it would be very good to optimize the cache implementation!
I've made a few attempts, but failed to achieve a significant improvement in different scenarios.
The complication is due to the two requirements: maintaining fixed cache capacity and maintaining FIFO order when removing the entries.  This makes it hard to use the concurrent data structures as is.

Still, I'm totally for the cache optimization in JDK 10, if it is possible.
However, if it is done, it would not be probably backported to the earlier releases.

And I'm going to propose to backport the proposed fix with the option to turn off the cache, as it will be useful for some currently running applications.

With kind regards,
Ivan

(It is hard to imagine that a saved handshake does not compensate for hundreds of gets - especially if the current version still would generate a cache key)

Gruss
Bernd


From: security-dev [hidden email] on behalf of Ivan Gerasimov [hidden email]
Sent: Wednesday, November 8, 2017 3:24:54 AM
To: [hidden email]
Subject: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck
 
Hello everybody!

The class sun.security.ssl.SSLSessionContextImpl maintains caches for
the sessions reuse.
Access to the cache from threads is serialized.
It was reported that under heavy load the time of waiting for the turn
to access the synchronized methods outweighs the time of creating a new
session.

It is proposed to introduce a flag that will allow to avoid using the
cache altogether.
Would you please help review the proposed fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/

--
With kind regards,
Ivan Gerasimov


-- 
With kind regards,
Ivan Gerasimov
Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

roger riggs
Hi Ivan,

One idea to consider is an indirection that spreads the work over
multiple Cache implementations.
Similar to what ConcurrentHashMap does, doing an early fan out to
multiple Caches based on the key.
If it was keyed to the same key as the cache, it would be able to take
advantage of re-using the contexts.
Though I'm not sure how to size or re-size the index based on load or ...

I would think that using a 'small' cache size would bound the expunge
time and still allow some-reuse.

$.02, Roger


On 11/8/2017 2:09 PM, Ivan Gerasimov wrote:

>
> Thank you Bernd for looking into this!
>
>
> On 11/7/17 11:42 PM, Bernd Eckenfels wrote:
>> Hello,
>>
>> There is already a property to set the cache size, would it be enough
>> to re-purpose a cache size of 0 to turn it off?
>>
> Currently, setting the cache size to 0 means that it is unbounded, so
> that the entries are removed from the cache only when they get expired.
>
>> Are there numbers to show when this is actually a problem? Is this
>> only for 100% Cache misses?
> We've seen dumps with lots of threads blocked waiting on the
> Cache.get()/put()/remove().
> This is primarily due to the time spent in the cache cleaning routines
> (see
> sun.security.util.MemoryCache.emptyQueue()/expungeExpiredEntries()),
> which are executed inside the synchronization block.
> This time is linear on the size of cache, but limiting the cache size
> doesn't always help either, as the amount of cleanup work also
> increases with a bounded cache.
>
> Allowing to avoid to use the cache removed this bottleneck and under
> certain conditions the throughput increased from 35 to 120 sessions
> per second.
>
> Please note that the proposed option javax.net.ssl.needCacheSessions
> will be true by default, so the default behavior will not change.
> Only in specific situation, if it is proved that turning off the cache
> will improve performance, this option will be recommended to be set to
> false.
>
>> Maybe the cache itself needs some optimizations?
> Certainly, it would be very good to optimize the cache implementation!
> I've made a few attempts, but failed to achieve a significant
> improvement in different scenarios.
> The complication is due to the two requirements: maintaining fixed
> cache capacity and maintaining FIFO order when removing the entries. 
> This makes it hard to use the concurrent data structures as is.
>
> Still, I'm totally for the cache optimization in JDK 10, if it is
> possible.
> However, if it is done, it would not be probably backported to the
> earlier releases.
>
> And I'm going to propose to backport the proposed fix with the option
> to turn off the cache, as it will be useful for some currently running
> applications.
>
> With kind regards,
> Ivan
>
>> (It is hard to imagine that a saved handshake does not compensate for
>> hundreds of gets - especially if the current version still would
>> generate a cache key)
>>
>> Gruss
>> Bernd
>>
>> Gruss
>> Bernd
>> --
>> http://bernd.eckenfels.net
>> ------------------------------------------------------------------------
>> *From:* security-dev <[hidden email]> on
>> behalf of Ivan Gerasimov <[hidden email]>
>> *Sent:* Wednesday, November 8, 2017 3:24:54 AM
>> *To:* [hidden email]
>> *Subject:* [10] RFR : 8186628 : SSL session cache can cause a
>> scalability bottleneck
>> Hello everybody!
>>
>> The class sun.security.ssl.SSLSessionContextImpl maintains caches for
>> the sessions reuse.
>> Access to the cache from threads is serialized.
>> It was reported that under heavy load the time of waiting for the turn
>> to access the synchronized methods outweighs the time of creating a new
>> session.
>>
>> It is proposed to introduce a flag that will allow to avoid using the
>> cache altogether.
>> Would you please help review the proposed fix?
>>
>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/ 
>> <http://cr.openjdk.java.net/%7Eigerasim/8186628/00/webrev/>
>>
>> --
>> With kind regards,
>> Ivan Gerasimov
>>
>
> --
> With kind regards,
> Ivan Gerasimov

Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Ivan Gerasimov
Thank you Roger for the suggestion!

Indeed an interesting idea to consider for optimization!  I guess, in
inventory of how Cache is used needs to be made to see if the maximum
size requirement may be weakened.

At this point, however, I'd like to have a simple solution with a choice
between MemoryCache and NullCache, which will allow to have a better
performance in certain situations.  Benchmarking with small cache size
with accessing the cache from many threads showed that the cache was
still a hot point and threads were spending lots of time blocked.

With kind regards,
Ivan


On 11/8/17 12:18 PM, Roger Riggs wrote:

> Hi Ivan,
>
> One idea to consider is an indirection that spreads the work over
> multiple Cache implementations.
> Similar to what ConcurrentHashMap does, doing an early fan out to
> multiple Caches based on the key.
> If it was keyed to the same key as the cache, it would be able to take
> advantage of re-using the contexts.
> Though I'm not sure how to size or re-size the index based on load or ...
>
> I would think that using a 'small' cache size would bound the expunge
> time and still allow some-reuse.
>
> $.02, Roger
>
>
> On 11/8/2017 2:09 PM, Ivan Gerasimov wrote:
>>
>> Thank you Bernd for looking into this!
>>
>>
>> On 11/7/17 11:42 PM, Bernd Eckenfels wrote:
>>> Hello,
>>>
>>> There is already a property to set the cache size, would it be
>>> enough to re-purpose a cache size of 0 to turn it off?
>>>
>> Currently, setting the cache size to 0 means that it is unbounded, so
>> that the entries are removed from the cache only when they get expired.
>>
>>> Are there numbers to show when this is actually a problem? Is this
>>> only for 100% Cache misses?
>> We've seen dumps with lots of threads blocked waiting on the
>> Cache.get()/put()/remove().
>> This is primarily due to the time spent in the cache cleaning
>> routines (see
>> sun.security.util.MemoryCache.emptyQueue()/expungeExpiredEntries()),
>> which are executed inside the synchronization block.
>> This time is linear on the size of cache, but limiting the cache size
>> doesn't always help either, as the amount of cleanup work also
>> increases with a bounded cache.
>>
>> Allowing to avoid to use the cache removed this bottleneck and under
>> certain conditions the throughput increased from 35 to 120 sessions
>> per second.
>>
>> Please note that the proposed option javax.net.ssl.needCacheSessions
>> will be true by default, so the default behavior will not change.
>> Only in specific situation, if it is proved that turning off the
>> cache will improve performance, this option will be recommended to be
>> set to false.
>>
>>> Maybe the cache itself needs some optimizations?
>> Certainly, it would be very good to optimize the cache implementation!
>> I've made a few attempts, but failed to achieve a significant
>> improvement in different scenarios.
>> The complication is due to the two requirements: maintaining fixed
>> cache capacity and maintaining FIFO order when removing the entries.  
>> This makes it hard to use the concurrent data structures as is.
>>
>> Still, I'm totally for the cache optimization in JDK 10, if it is
>> possible.
>> However, if it is done, it would not be probably backported to the
>> earlier releases.
>>
>> And I'm going to propose to backport the proposed fix with the option
>> to turn off the cache, as it will be useful for some currently
>> running applications.
>>
>> With kind regards,
>> Ivan
>>
>>> (It is hard to imagine that a saved handshake does not compensate
>>> for hundreds of gets - especially if the current version still would
>>> generate a cache key)
>>>
>>> Gruss
>>> Bernd
>>>
>>> Gruss
>>> Bernd
>>> --
>>> http://bernd.eckenfels.net
>>> ------------------------------------------------------------------------
>>>
>>> *From:* security-dev <[hidden email]> on
>>> behalf of Ivan Gerasimov <[hidden email]>
>>> *Sent:* Wednesday, November 8, 2017 3:24:54 AM
>>> *To:* [hidden email]
>>> *Subject:* [10] RFR : 8186628 : SSL session cache can cause a
>>> scalability bottleneck
>>> Hello everybody!
>>>
>>> The class sun.security.ssl.SSLSessionContextImpl maintains caches for
>>> the sessions reuse.
>>> Access to the cache from threads is serialized.
>>> It was reported that under heavy load the time of waiting for the turn
>>> to access the synchronized methods outweighs the time of creating a new
>>> session.
>>>
>>> It is proposed to introduce a flag that will allow to avoid using the
>>> cache altogether.
>>> Would you please help review the proposed fix?
>>>
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/ 
>>> <http://cr.openjdk.java.net/%7Eigerasim/8186628/00/webrev/>
>>>
>>> --
>>> With kind regards,
>>> Ivan Gerasimov
>>>
>>
>> --
>> With kind regards,
>> Ivan Gerasimov
>
>

--
With kind regards,
Ivan Gerasimov

Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Ivan Gerasimov
In reply to this post by Ivan Gerasimov
Gentle ping.

If people agree on the approach, I'll go ahead and file a CCC request
for the new recognized system property.

With kind regards,

Ivan


On 11/7/17 6:24 PM, Ivan Gerasimov wrote:

> Hello everybody!
>
> The class sun.security.ssl.SSLSessionContextImpl maintains caches for
> the sessions reuse.
> Access to the cache from threads is serialized.
> It was reported that under heavy load the time of waiting for the turn
> to access the synchronized methods outweighs the time of creating a
> new session.
>
> It is proposed to introduce a flag that will allow to avoid using the
> cache altogether.
> Would you please help review the proposed fix?
>
> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
> WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/
>

--
With kind regards,
Ivan Gerasimov

Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Xuelei Fan-2
Hi Ivan,

I understand the desire of performance improvement.  But I don't think
avoiding the use of cache is the price we want to pay for.  Besides,
avoiding using of session cache is not something improving the
performance in general, it is typically something impacting the
performance,  a lot sometimes.

Xuelei

On 11/20/2017 5:36 PM, Ivan Gerasimov wrote:

> Gentle ping.
>
> If people agree on the approach, I'll go ahead and file a CCC request
> for the new recognized system property.
>
> With kind regards,
>
> Ivan
>
>
> On 11/7/17 6:24 PM, Ivan Gerasimov wrote:
>> Hello everybody!
>>
>> The class sun.security.ssl.SSLSessionContextImpl maintains caches for
>> the sessions reuse.
>> Access to the cache from threads is serialized.
>> It was reported that under heavy load the time of waiting for the turn
>> to access the synchronized methods outweighs the time of creating a
>> new session.
>>
>> It is proposed to introduce a flag that will allow to avoid using the
>> cache altogether.
>> Would you please help review the proposed fix?
>>
>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
>> WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Ivan Gerasimov
Thanks Xuelei for the comment!


On 11/20/17 8:50 PM, Xuelei Fan wrote:
> Hi Ivan,
>
> I understand the desire of performance improvement.  But I don't think
> avoiding the use of cache is the price we want to pay for. Besides,
> avoiding using of session cache is not something improving the
> performance in general, it is typically something impacting the
> performance,  a lot sometimes.
>
The proposal is not meant to be a general solution.
Clearly, turning the session cache off will increase the average time of
session creation.
However, if the cache becomes the reason of a bottleneck due to high
contention, then turning it off helps by just shortening the waiting
time for each thread.

The option is set to true by default, i.e. the cache is used.
Only if the application is used in such a way that high contention for
the cache is exposed, then the setting the option to false may help to
improve performance.

With kind regards,
Ivan

> Xuelei
>
> On 11/20/2017 5:36 PM, Ivan Gerasimov wrote:
>> Gentle ping.
>>
>> If people agree on the approach, I'll go ahead and file a CCC request
>> for the new recognized system property.
>>
>> With kind regards,
>>
>> Ivan
>>
>>
>> On 11/7/17 6:24 PM, Ivan Gerasimov wrote:
>>> Hello everybody!
>>>
>>> The class sun.security.ssl.SSLSessionContextImpl maintains caches
>>> for the sessions reuse.
>>> Access to the cache from threads is serialized.
>>> It was reported that under heavy load the time of waiting for the
>>> turn to access the synchronized methods outweighs the time of
>>> creating a new session.
>>>
>>> It is proposed to introduce a flag that will allow to avoid using
>>> the cache altogether.
>>> Would you please help review the proposed fix?
>>>
>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/
>>>
>>
>

--
With kind regards,
Ivan Gerasimov

Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Peter Levart
Hi Ivan,

Here's my attempt to increase multithreaded scalability of Cache:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.01/

Haven't tested this yet, but I thought that since you already have relevant performance tests, you might want to try this, so I decided to release it as is.

Rough sketch of changes:
- replaced LinkedHashMap with ConcurrentHashMap
- CacheEntry(s) are additionaly linked into a double-linked list. DL-list management is the only synchronization point (similar to Cleaner API) and the rule is: 1st new entry is linked into DL-list, then it is put into map - published. The same with removing: 1st an entry is unlinked from DL-list, then if successfull, removed from map and invalidated.
- changed the way expiry is handled so that the whole list is never needed to be scanned.

The code speaks for itself.

Let me know if you find it usefull and/or it solves the scalability bottleneck.

Regards, Peter

On 11/21/17 14:16, Ivan Gerasimov wrote:
Thanks Xuelei for the comment!


On 11/20/17 8:50 PM, Xuelei Fan wrote:
Hi Ivan,

I understand the desire of performance improvement.  But I don't think avoiding the use of cache is the price we want to pay for. Besides, avoiding using of session cache is not something improving the performance in general, it is typically something impacting the performance,  a lot sometimes.

The proposal is not meant to be a general solution.
Clearly, turning the session cache off will increase the average time of session creation.
However, if the cache becomes the reason of a bottleneck due to high contention, then turning it off helps by just shortening the waiting time for each thread.

The option is set to true by default, i.e. the cache is used.
Only if the application is used in such a way that high contention for the cache is exposed, then the setting the option to false may help to improve performance.

With kind regards,
Ivan

Xuelei

On 11/20/2017 5:36 PM, Ivan Gerasimov wrote:
Gentle ping.

If people agree on the approach, I'll go ahead and file a CCC request for the new recognized system property.

With kind regards,

Ivan


On 11/7/17 6:24 PM, Ivan Gerasimov wrote:
Hello everybody!

The class sun.security.ssl.SSLSessionContextImpl maintains caches for the sessions reuse.
Access to the cache from threads is serialized.
It was reported that under heavy load the time of waiting for the turn to access the synchronized methods outweighs the time of creating a new session.

It is proposed to introduce a flag that will allow to avoid using the cache altogether.
Would you please help review the proposed fix?

BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/





Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Ivan Gerasimov
Hi Peter!

Thank you very much for looking into this!


On 11/22/17 1:45 AM, Peter Levart wrote:
> Hi Ivan,
>
> Here's my attempt to increase multithreaded scalability of Cache:
>
> http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.01/
>
> Haven't tested this yet, but I thought that since you already have
> relevant performance tests, you might want to try this, so I decided
> to release it as is.
I plugged your implementation into the benchmark I had [*] and got these
numbers for the throughput:

Original MemoryCache: 4,116,537 puts/second
Your variant:  2,942,561 puts/second

So, in the tested scenario performance downgraded by 28%.

Still, I think it's important to try to improve performance of the
MemoryCache and, if possible, remove the points of contention.

>
> Rough sketch of changes:
> - replaced LinkedHashMap with ConcurrentHashMap
> - CacheEntry(s) are additionaly linked into a double-linked list.
> DL-list management is the only synchronization point (similar to
> Cleaner API) and the rule is: 1st new entry is linked into DL-list,
> then it is put into map - published. The same with removing: 1st an
> entry is unlinked from DL-list, then if successfull, removed from map
> and invalidated.
> - changed the way expiry is handled so that the whole list is never
> needed to be scanned.
>
> The code speaks for itself.
>
Your implementation is somewhat similar to what I had tried before
coming up with proposal of the option to turn the cache off.
I used ConcurrentHashMap + ConcurrentLinkedQueue to maintain the FIFO
order, still the performance was a few percent less then the one of the
original implementation.

That's really why I decided to split the issue:
- first, provide an option to turn the cache off (that should be easily
backported and can provide immediate relief to the customers that
experience the scalability bottleneck;
- second, continue work on the cache implementation improvements.

> Let me know if you find it usefull and/or it solves the scalability
> bottleneck.
>
Yes, I think it's very useful!

However, as I wrote above, I think that the issue needs be split into
two parts: an option to turn the caching off (which can be easily
backported) and improving the cache implementation (which can even relax
the requirements, as the FIFO order or absolutely hard upper bound of
the cache size).

With kind regards,
Ivan

> Regards, Peter
>
> On 11/21/17 14:16, Ivan Gerasimov wrote:
>> Thanks Xuelei for the comment!
>>
>>
>> On 11/20/17 8:50 PM, Xuelei Fan wrote:
>>> Hi Ivan,
>>>
>>> I understand the desire of performance improvement.  But I don't
>>> think avoiding the use of cache is the price we want to pay for.
>>> Besides, avoiding using of session cache is not something improving
>>> the performance in general, it is typically something impacting the
>>> performance,  a lot sometimes.
>>>
>> The proposal is not meant to be a general solution.
>> Clearly, turning the session cache off will increase the average time
>> of session creation.
>> However, if the cache becomes the reason of a bottleneck due to high
>> contention, then turning it off helps by just shortening the waiting
>> time for each thread.
>>
>> The option is set to true by default, i.e. the cache is used.
>> Only if the application is used in such a way that high contention
>> for the cache is exposed, then the setting the option to false may
>> help to improve performance.
>>
>> With kind regards,
>> Ivan
>>
>>> Xuelei
>>>
>>> On 11/20/2017 5:36 PM, Ivan Gerasimov wrote:
>>>> Gentle ping.
>>>>
>>>> If people agree on the approach, I'll go ahead and file a CCC
>>>> request for the new recognized system property.
>>>>
>>>> With kind regards,
>>>>
>>>> Ivan
>>>>
>>>>
>>>> On 11/7/17 6:24 PM, Ivan Gerasimov wrote:
>>>>> Hello everybody!
>>>>>
>>>>> The class sun.security.ssl.SSLSessionContextImpl maintains caches
>>>>> for the sessions reuse.
>>>>> Access to the cache from threads is serialized.
>>>>> It was reported that under heavy load the time of waiting for the
>>>>> turn to access the synchronized methods outweighs the time of
>>>>> creating a new session.
>>>>>
>>>>> It is proposed to introduce a flag that will allow to avoid using
>>>>> the cache altogether.
>>>>> Would you please help review the proposed fix?
>>>>>
>>>>> BUGURL: https://bugs.openjdk.java.net/browse/JDK-8186628
>>>>> WEBREV: http://cr.openjdk.java.net/~igerasim/8186628/00/webrev/
>>>>>
>>>>
>>>
>>
>

--
With kind regards,
Ivan Gerasimov

Reply | Threaded
Open this post in threaded view
|

Re: [10] RFR : 8186628 : SSL session cache can cause a scalability bottleneck

Peter Levart
Hi Ivan,

On 11/23/2017 01:16 AM, Ivan Gerasimov wrote:

> Hi Peter!
>
> Thank you very much for looking into this!
>
>
> On 11/22/17 1:45 AM, Peter Levart wrote:
>> Hi Ivan,
>>
>> Here's my attempt to increase multithreaded scalability of Cache:
>>
>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.01/ 
>>
>>
>> Haven't tested this yet, but I thought that since you already have
>> relevant performance tests, you might want to try this, so I decided
>> to release it as is.
> I plugged your implementation into the benchmark I had [*] and got
> these numbers for the throughput:
>
> Original MemoryCache: 4,116,537 puts/second
> Your variant:  2,942,561 puts/second
>
> So, in the tested scenario performance downgraded by 28%.

Hm. What kind of benchmark did you have? Can you share the source? Was
this multi-threaded concurrent puts or single-threaded? I'm surprised
about the results since I did some benchmarking myself and got entirely
different results. Here's what I did:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/bench/

You can download the whole directory and just setup the JMH project over
it. The main class is CacheBench. It uses a typical caching scenario:

     @Threads(4)
     @Benchmark
     public Object original() {
         Integer key = ThreadLocalRandom.current().nextInt(keyspan);
         Object val = origCache.get(key);
         if (val == null) {
             origCache.put(key, val = Boolean.TRUE);
         }
         return val;
     }

The cache hit-ratio is varied by adjusting cache size and/or the range
of distinct key values produced by RNG. With this setup I get the
following results:

Benchmark            (hitratio)  (size)  (timeout)  Mode Cnt     
Score     Error  Units

CacheBench.original           0    1000      86400  avgt 10  18023,156
± 434,710  ns/op
CacheBench.original           5    1000      86400  avgt 10  17645,688
± 767,696  ns/op
CacheBench.original          50    1000      86400  avgt 10  10412,352
± 216,208  ns/op
CacheBench.original          95    1000      86400  avgt 10   2275,257
± 449,302  ns/op
CacheBench.original         100    1000      86400  avgt 10    542,321
±  15,133  ns/op

CacheBench.patched1           0    1000      86400  avgt 10   1685,499
±  24,257  ns/op
CacheBench.patched1           5    1000      86400  avgt 10   1633,203
±  15,045  ns/op
CacheBench.patched1          50    1000      86400  avgt 10   1061,596
±  12,312  ns/op
CacheBench.patched1          95    1000      86400  avgt 10    125,456
±   3,426  ns/op
CacheBench.patched1         100    1000      86400  avgt 10     47,166
±   3,140  ns/op

patched1 is mostly what I had in my proposed code but with a fix in
put() method which could install an entry into the cacheMap after some
other thread already removed it from the DL-list. The race is prevended
by 1st installing a unique reservation value into the cacheMap, then
adding the entry to the DL-list and afterwards trying to replace the
reservation value with the entry and cleaning up if not successful.

If this is not enough, I also tried an approach with lock-free DL-list.
Basically a modified ConcurrentLinkedDequeue where intermediary nodes
may be removed in constant time (important for fast removal of elements
that become soft-reachable). This variant is even more scalable than
lock-based DL-list:

CacheBench.patched2           0    1000      86400  avgt   10 774,356
±   4,113  ns/op
CacheBench.patched2           5    1000      86400  avgt 10    754,222
±   2,956  ns/op
CacheBench.patched2          50    1000      86400  avgt 10    434,969
±   2,359  ns/op
CacheBench.patched2          95    1000      86400  avgt 10     98,352
±   0,750  ns/op
CacheBench.patched2         100    1000      86400  avgt 10     47,692
±   0,995  ns/op

Here's what such MemoryCache looks like in the JDK:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8186628_ssl_session_cache_scalability/webrev.02/

I took the ConcurrentLinkedDequeue by Doug Lea and Martin Buchholz and
removed all public methods keeping just the delicate internal mechanics.
The Node is an interface and can be implemented by different classes.
BasicNode and SoftReferenceNode are provided (and are subclassed in
MemoryCache). Public methods are tailored to Cache use case and deal
with Node<E>(s) rather than E(lement)s.


>
> Still, I think it's important to try to improve performance of the
> MemoryCache and, if possible, remove the points of contention.

So what do you think of the benchmark results?

>
>>
>> Rough sketch of changes:
>> - replaced LinkedHashMap with ConcurrentHashMap
>> - CacheEntry(s) are additionaly linked into a double-linked list.
>> DL-list management is the only synchronization point (similar to
>> Cleaner API) and the rule is: 1st new entry is linked into DL-list,
>> then it is put into map - published. The same with removing: 1st an
>> entry is unlinked from DL-list, then if successfull, removed from map
>> and invalidated.
>> - changed the way expiry is handled so that the whole list is never
>> needed to be scanned.
>>
>> The code speaks for itself.
>>
> Your implementation is somewhat similar to what I had tried before
> coming up with proposal of the option to turn the cache off.
> I used ConcurrentHashMap + ConcurrentLinkedQueue to maintain the FIFO
> order, still the performance was a few percent less then the one of
> the original implementation.

Removing an intermediary element from ConcurrentLinkedQueue takes O(n)
time. This is needed in emptyQueue() for all sofly-reachable-and-cleared
entries and for remove() when application removes entry explicitly from
cache (for example on SSL session termination). We need double-linked
list (like ConcurrentLinkedDequeue) with constant-time removal of
intermediary node(s).

>
> That's really why I decided to split the issue:
> - first, provide an option to turn the cache off (that should be
> easily backported and can provide immediate relief to the customers
> that experience the scalability bottleneck;
> - second, continue work on the cache implementation improvements.

Maybe with a scalable Cache such switch will not be needed. I can
imagine that such switch provides relief for kinds of loads that may
happen in very special circumstances (like DOS attacks for example that
don't re-use SSL sessions), but in normal operation such switch would
degrade performance wouldn't it?

The problem I see is that if you provide such switch now, users may set
it and later when they upgrade to newer JDK with scalable cache, don't
realize that they don't need it any more and consequently unnecessarily
keep paying for it in the future...

Regards, Peter

>
>> Let me know if you find it usefull and/or it solves the scalability
>> bottleneck.
>>
> Yes, I think it's very useful!
>
> However, as I wrote above, I think that the issue needs be split into
> two parts: an option to turn the caching off (which can be easily
> backported) and improving the cache implementation (which can even
> relax the requirements, as the FIFO order or absolutely hard upper
> bound of the cache size).
>
> With kind regards,
> Ivan