Re: <i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: <i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

Peter Levart
Hi,

On 12/06/2017 02:30 PM, Alan Bateman wrote:
> I think this class is normally maintained on i18n-dev but I think
> introducing the Cache object looks good and making this much easier to
> understand.
>
> -Alan

Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that
part of JDK have any comments...

This is an official request for reviewing a patch for fixing a data race
between cloning a SimpleTimeZone object and lazily initializing its 3
cache fields which may produce a clone with inconsistent cache state.
Here's Jira issue with details:

https://bugs.openjdk.java.net/browse/JDK-8191216

Venkat has proposed to simply call invalidateCache() on the clone before
returning it from SimpleTimeZone.clone() method:

     public Object clone()
     {
         SimpleTimeZone clone = (SimpleTimeZone) super.clone();
         clone.invalidateCache();
         return clone;
     }

This fixes the issue and for the case of TimeZone.getDefault() which is
called from ZoneId.systemDefault() even elides synchronization in
clone.invalidateCache() and allocation of the clone, so JITed
ZoneId.systemDefault() is unaffected by the patch. Initially I was
satisfied with the fix, but then I tested something. Suppose someone
sets default zone to SimpleTimeZone:

         TimeZone.setDefault(
             new SimpleTimeZone(3600000,
                                "Europe/Paris",
                                Calendar.MARCH, -1, Calendar.SUNDAY,
                                3600000, SimpleTimeZone.UTC_TIME,
                                Calendar.OCTOBER, -1, Calendar.SUNDAY,
                                3600000, SimpleTimeZone.UTC_TIME,
                                3600000)
         );

And then calls some methods that initialize the cache of the internal
shared TimeZone.defaultTimeZone instance:

     new Date().toString();

The code which after that tries to obtain default time zone and
calculate the offset from UTC at some current point in time, for example:

     TimeZone.getDefault().getOffset(now)

can't use the cached state because it has been invalidated in the
returned clone. Such code has to re-compute the offset every time it
gets new clone. I measured this with a JMH benchmark and got the
following result:

Default:

Benchmark                                  Mode  Cnt Score   Error  Units
ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op

Venkat's patch - invalidateCache():

Benchmark                                  Mode  Cnt Score   Error  Units
ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 179,476 ± 1,942  ns/op
ZoneIdBench.ZoneId_systemDefault           avgt   10 3,538 ± 0,073  ns/op

We see that ZoneId.systemDefault() is unaffected, but
TimeZone.getDefault().getOffset(now) becomes 3x slower.

This is not good, so I tried an alternative fix for the issue - simply
making the SimpleTimeZone.clone() synchronized. Access to cache fields
is already synchronized, so this should fix the race. Here's the JMH result:

Default:

Benchmark                                  Mode  Cnt Score   Error  Units
ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op

Synchronized clone():

Benchmark                                  Mode  Cnt Score   Error  Units
ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 58,103 ± 0,936  ns/op
ZoneIdBench.ZoneId_systemDefault           avgt   10 4,154 ± 0,034  ns/op

We see that caching works again, but synchronization has some overhead
which is not big for single-threaded access, but might get bigger when
multiple threads try to get default zone concurrently.

So I created a 3rd variant of the fix which I'm presenting here and
requesting a review for:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/

The JMH benchmark shows this:

Default:

Benchmark                                  Mode  Cnt Score   Error  Units
ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op

Cache object:

Benchmark                                  Mode  Cnt Score   Error  Units
ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 42,860 ± 0,274  ns/op
ZoneIdBench.ZoneId_systemDefault           avgt   10 3,545 ± 0,057  ns/op

Not only does the fix not affect ZoneId.systemDefault() which is not
surprising, but it also speeds-up cache lookup in single-threaded
benchmark and certainly eliminates possible contention among threads
looking up the shared instance.

I have run the test/jdk/java/util/TimeZone and
test/jdk/java/util/Calendar jtreg tests and there were 7 failures caused
by compilation errors (package sun.util.locale.provider is not visible),
while 59 other tests pass.

So, what do you think?

Regards, Peter


Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:

> Thanks Peter for sponsoring this patch. Is there anything else that
> needs to be done from my end for this patch to be integrated? Please
> let me know.
>
> -Venkat
>
>
> On 14/11/17 8:46 PM, Peter Levart wrote:
>> Hi Venkat,
>>
>> I created the following issue:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>
>> I can also sponsor this patch and push it for JDK 10.
>>
>> The patch that you are proposing looks good to me. Does anybody have
>> anything else to say?
>>
>> Regards, Peter
>>
>>
>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>> Thanks David, Peter for your review and comments. As I am new to the
>>> community, can you please help me to open a bug and integrate the
>>> changes into code base?
>>>
>>> -Venkat
>>>
>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>> Hi David, Venkat,
>>>>
>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>> For example, take the following method:
>>>>>
>>>>> String defaultTZID() {
>>>>>     return TimeZone.getDefault().getID();
>>>>> }
>>>>>
>>>>> When JIT compiles it and inlines invocations to other methods
>>>>> within it, it can prove that cloned TimeZone instance never
>>>>> escapes the call to defaultTZID() and can therefore skip
>>>>> allocating the instance on heap.
>>>>>
>>>>> But this is fragile. If code in invoked methods changes, they may
>>>>> not get inlined or EA may not be able to prove that the cloned
>>>>> instance can't escape and allocation may be introduced.
>>>>> ZoneId.systemDefault() is a hot method and it would be nice if we
>>>>> manage to keep it allocation free.
>>>>
>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>
>>>>     public Object clone()
>>>>     {
>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>         // like tz.invalidateCache() but without holding a lock on
>>>> clone
>>>>         tz.cacheYear = tz.startYear - 1;
>>>>         tz.cacheStart = tz.cacheEnd = 0;
>>>>         return tz;
>>>>     }
>>>>
>>>>
>>>> ...and the JMH benchmark with gc profiling shows that
>>>> ZoneId.systemDefault() still manages to get JIT-compiled without
>>>> introducing allocation.
>>>>
>>>> Even the following (original Venkat's) patch:
>>>>
>>>>     public Object clone()
>>>>     {
>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>         tz.invalidateCache();
>>>>         return tz;
>>>>     }
>>>>
>>>> ...does the same and the locking in invalidateCache() is elided
>>>> too. Allocation and lock-elision go hand-in-hand. When object
>>>> doesn't escape, allocation on heap may be eliminated and locks on
>>>> that instance elided.
>>>>
>>>> So this is better than synchronizing on the original instance
>>>> during .clone() execution as it has potential to avoid locking
>>>> overhead.
>>>>
>>>> So Venkat, go ahead. My fear was unjustified.
>>>>
>>>> Regards, Peter
>>>>
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: <i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

Naoto Sato-2
Hi Peter, Venkat,

Thank you for the fix. It looks good to me. Improved performance is a
nice bonus! Would you be able to provide with a regression test?

Naoto

On 12/6/17 6:10 AM, Peter Levart wrote:

> Hi,
>
> On 12/06/2017 02:30 PM, Alan Bateman wrote:
>> I think this class is normally maintained on i18n-dev but I think
>> introducing the Cache object looks good and making this much easier to
>> understand.
>>
>> -Alan
>
> Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that
> part of JDK have any comments...
>
> This is an official request for reviewing a patch for fixing a data race
> between cloning a SimpleTimeZone object and lazily initializing its 3
> cache fields which may produce a clone with inconsistent cache state.
> Here's Jira issue with details:
>
> https://bugs.openjdk.java.net/browse/JDK-8191216
>
> Venkat has proposed to simply call invalidateCache() on the clone before
> returning it from SimpleTimeZone.clone() method:
>
>      public Object clone()
>      {
>          SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>          clone.invalidateCache();
>          return clone;
>      }
>
> This fixes the issue and for the case of TimeZone.getDefault() which is
> called from ZoneId.systemDefault() even elides synchronization in
> clone.invalidateCache() and allocation of the clone, so JITed
> ZoneId.systemDefault() is unaffected by the patch. Initially I was
> satisfied with the fix, but then I tested something. Suppose someone
> sets default zone to SimpleTimeZone:
>
>          TimeZone.setDefault(
>              new SimpleTimeZone(3600000,
>                                 "Europe/Paris",
>                                 Calendar.MARCH, -1, Calendar.SUNDAY,
>                                 3600000, SimpleTimeZone.UTC_TIME,
>                                 Calendar.OCTOBER, -1, Calendar.SUNDAY,
>                                 3600000, SimpleTimeZone.UTC_TIME,
>                                 3600000)
>          );
>
> And then calls some methods that initialize the cache of the internal
> shared TimeZone.defaultTimeZone instance:
>
>      new Date().toString();
>
> The code which after that tries to obtain default time zone and
> calculate the offset from UTC at some current point in time, for example:
>
>      TimeZone.getDefault().getOffset(now)
>
> can't use the cached state because it has been invalidated in the
> returned clone. Such code has to re-compute the offset every time it
> gets new clone. I measured this with a JMH benchmark and got the
> following result:
>
> Default:
>
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op
>
> Venkat's patch - invalidateCache():
>
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 179,476 ± 1,942  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,538 ± 0,073  ns/op
>
> We see that ZoneId.systemDefault() is unaffected, but
> TimeZone.getDefault().getOffset(now) becomes 3x slower.
>
> This is not good, so I tried an alternative fix for the issue - simply
> making the SimpleTimeZone.clone() synchronized. Access to cache fields
> is already synchronized, so this should fix the race. Here's the JMH
> result:
>
> Default:
>
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op
>
> Synchronized clone():
>
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 58,103 ± 0,936  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 4,154 ± 0,034  ns/op
>
> We see that caching works again, but synchronization has some overhead
> which is not big for single-threaded access, but might get bigger when
> multiple threads try to get default zone concurrently.
>
> So I created a 3rd variant of the fix which I'm presenting here and
> requesting a review for:
>
> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/ 
>
>
> The JMH benchmark shows this:
>
> Default:
>
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  ns/op
>
> Cache object:
>
> Benchmark                                  Mode  Cnt Score   Error  Units
> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 42,860 ± 0,274  ns/op
> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,545 ± 0,057  ns/op
>
> Not only does the fix not affect ZoneId.systemDefault() which is not
> surprising, but it also speeds-up cache lookup in single-threaded
> benchmark and certainly eliminates possible contention among threads
> looking up the shared instance.
>
> I have run the test/jdk/java/util/TimeZone and
> test/jdk/java/util/Calendar jtreg tests and there were 7 failures caused
> by compilation errors (package sun.util.locale.provider is not visible),
> while 59 other tests pass.
>
> So, what do you think?
>
> Regards, Peter
>
>
> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>> Thanks Peter for sponsoring this patch. Is there anything else that
>> needs to be done from my end for this patch to be integrated? Please
>> let me know.
>>
>> -Venkat
>>
>>
>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>> Hi Venkat,
>>>
>>> I created the following issue:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>
>>> I can also sponsor this patch and push it for JDK 10.
>>>
>>> The patch that you are proposing looks good to me. Does anybody have
>>> anything else to say?
>>>
>>> Regards, Peter
>>>
>>>
>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>> Thanks David, Peter for your review and comments. As I am new to the
>>>> community, can you please help me to open a bug and integrate the
>>>> changes into code base?
>>>>
>>>> -Venkat
>>>>
>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>> Hi David, Venkat,
>>>>>
>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>> For example, take the following method:
>>>>>>
>>>>>> String defaultTZID() {
>>>>>>     return TimeZone.getDefault().getID();
>>>>>> }
>>>>>>
>>>>>> When JIT compiles it and inlines invocations to other methods
>>>>>> within it, it can prove that cloned TimeZone instance never
>>>>>> escapes the call to defaultTZID() and can therefore skip
>>>>>> allocating the instance on heap.
>>>>>>
>>>>>> But this is fragile. If code in invoked methods changes, they may
>>>>>> not get inlined or EA may not be able to prove that the cloned
>>>>>> instance can't escape and allocation may be introduced.
>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if we
>>>>>> manage to keep it allocation free.
>>>>>
>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>
>>>>>     public Object clone()
>>>>>     {
>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>         // like tz.invalidateCache() but without holding a lock on
>>>>> clone
>>>>>         tz.cacheYear = tz.startYear - 1;
>>>>>         tz.cacheStart = tz.cacheEnd = 0;
>>>>>         return tz;
>>>>>     }
>>>>>
>>>>>
>>>>> ...and the JMH benchmark with gc profiling shows that
>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without
>>>>> introducing allocation.
>>>>>
>>>>> Even the following (original Venkat's) patch:
>>>>>
>>>>>     public Object clone()
>>>>>     {
>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>         tz.invalidateCache();
>>>>>         return tz;
>>>>>     }
>>>>>
>>>>> ...does the same and the locking in invalidateCache() is elided
>>>>> too. Allocation and lock-elision go hand-in-hand. When object
>>>>> doesn't escape, allocation on heap may be eliminated and locks on
>>>>> that instance elided.
>>>>>
>>>>> So this is better than synchronizing on the original instance
>>>>> during .clone() execution as it has potential to avoid locking
>>>>> overhead.
>>>>>
>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>
>>>
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: <i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

Peter Levart
Hi Naoto,

Thank you for reviewing.

Naoto Sato je 06. 12. 2017 ob 20:41 napisal:
> Hi Peter, Venkat,
>
> Thank you for the fix. It looks good to me. Improved performance is a
> nice bonus! Would you be able to provide with a regression test?

Sure, here it is:

http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.02/


The test reliably detects several races in 2 seconds of runtime which
cause incorrect offset calculations in JDK 9:

shared TZ: 7363986 correct, 0 incorrect calculations
cloned TZ: 3960264 correct, 1180 incorrect calculations
Exception in thread "main" java.lang.AssertionError: 1180 fatal data
races detected
     at
SimpleTimeZoneCloneRaceTest.main(SimpleTimeZoneCloneRaceTest.java:86)

With patch applied, there are no failures of course.

Can this go into JDK 10 ?

Regards, Peter

>
> Naoto
>
> On 12/6/17 6:10 AM, Peter Levart wrote:
>> Hi,
>>
>> On 12/06/2017 02:30 PM, Alan Bateman wrote:
>>> I think this class is normally maintained on i18n-dev but I think
>>> introducing the Cache object looks good and making this much easier
>>> to understand.
>>>
>>> -Alan
>>
>> Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that
>> part of JDK have any comments...
>>
>> This is an official request for reviewing a patch for fixing a data
>> race between cloning a SimpleTimeZone object and lazily initializing
>> its 3 cache fields which may produce a clone with inconsistent cache
>> state. Here's Jira issue with details:
>>
>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>
>> Venkat has proposed to simply call invalidateCache() on the clone
>> before returning it from SimpleTimeZone.clone() method:
>>
>>      public Object clone()
>>      {
>>          SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>>          clone.invalidateCache();
>>          return clone;
>>      }
>>
>> This fixes the issue and for the case of TimeZone.getDefault() which
>> is called from ZoneId.systemDefault() even elides synchronization in
>> clone.invalidateCache() and allocation of the clone, so JITed
>> ZoneId.systemDefault() is unaffected by the patch. Initially I was
>> satisfied with the fix, but then I tested something. Suppose someone
>> sets default zone to SimpleTimeZone:
>>
>>          TimeZone.setDefault(
>>              new SimpleTimeZone(3600000,
>>                                 "Europe/Paris",
>>                                 Calendar.MARCH, -1, Calendar.SUNDAY,
>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>                                 Calendar.OCTOBER, -1, Calendar.SUNDAY,
>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>                                 3600000)
>>          );
>>
>> And then calls some methods that initialize the cache of the internal
>> shared TimeZone.defaultTimeZone instance:
>>
>>      new Date().toString();
>>
>> The code which after that tries to obtain default time zone and
>> calculate the offset from UTC at some current point in time, for
>> example:
>>
>>      TimeZone.getDefault().getOffset(now)
>>
>> can't use the cached state because it has been invalidated in the
>> returned clone. Such code has to re-compute the offset every time it
>> gets new clone. I measured this with a JMH benchmark and got the
>> following result:
>>
>> Default:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501 
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040 
>> ns/op
>>
>> Venkat's patch - invalidateCache():
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 179,476 ± 1,942 
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,538 ± 0,073 
>> ns/op
>>
>> We see that ZoneId.systemDefault() is unaffected, but
>> TimeZone.getDefault().getOffset(now) becomes 3x slower.
>>
>> This is not good, so I tried an alternative fix for the issue -
>> simply making the SimpleTimeZone.clone() synchronized. Access to
>> cache fields is already synchronized, so this should fix the race.
>> Here's the JMH result:
>>
>> Default:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501 
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040 
>> ns/op
>>
>> Synchronized clone():
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 58,103 ± 0,936 
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 4,154 ± 0,034 
>> ns/op
>>
>> We see that caching works again, but synchronization has some
>> overhead which is not big for single-threaded access, but might get
>> bigger when multiple threads try to get default zone concurrently.
>>
>> So I created a 3rd variant of the fix which I'm presenting here and
>> requesting a review for:
>>
>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/ 
>>
>>
>> The JMH benchmark shows this:
>>
>> Default:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501 
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040 
>> ns/op
>>
>> Cache object:
>>
>> Benchmark                                  Mode  Cnt Score Error  Units
>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 42,860 ± 0,274 
>> ns/op
>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,545 ± 0,057 
>> ns/op
>>
>> Not only does the fix not affect ZoneId.systemDefault() which is not
>> surprising, but it also speeds-up cache lookup in single-threaded
>> benchmark and certainly eliminates possible contention among threads
>> looking up the shared instance.
>>
>> I have run the test/jdk/java/util/TimeZone and
>> test/jdk/java/util/Calendar jtreg tests and there were 7 failures
>> caused by compilation errors (package sun.util.locale.provider is not
>> visible), while 59 other tests pass.
>>
>> So, what do you think?
>>
>> Regards, Peter
>>
>>
>> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>>> Thanks Peter for sponsoring this patch. Is there anything else that
>>> needs to be done from my end for this patch to be integrated? Please
>>> let me know.
>>>
>>> -Venkat
>>>
>>>
>>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>>> Hi Venkat,
>>>>
>>>> I created the following issue:
>>>>
>>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>>
>>>> I can also sponsor this patch and push it for JDK 10.
>>>>
>>>> The patch that you are proposing looks good to me. Does anybody
>>>> have anything else to say?
>>>>
>>>> Regards, Peter
>>>>
>>>>
>>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>>> Thanks David, Peter for your review and comments. As I am new to
>>>>> the community, can you please help me to open a bug and integrate
>>>>> the changes into code base?
>>>>>
>>>>> -Venkat
>>>>>
>>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>>> Hi David, Venkat,
>>>>>>
>>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>>> For example, take the following method:
>>>>>>>
>>>>>>> String defaultTZID() {
>>>>>>>     return TimeZone.getDefault().getID();
>>>>>>> }
>>>>>>>
>>>>>>> When JIT compiles it and inlines invocations to other methods
>>>>>>> within it, it can prove that cloned TimeZone instance never
>>>>>>> escapes the call to defaultTZID() and can therefore skip
>>>>>>> allocating the instance on heap.
>>>>>>>
>>>>>>> But this is fragile. If code in invoked methods changes, they
>>>>>>> may not get inlined or EA may not be able to prove that the
>>>>>>> cloned instance can't escape and allocation may be introduced.
>>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if
>>>>>>> we manage to keep it allocation free.
>>>>>>
>>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>>
>>>>>>     public Object clone()
>>>>>>     {
>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>         // like tz.invalidateCache() but without holding a lock
>>>>>> on clone
>>>>>>         tz.cacheYear = tz.startYear - 1;
>>>>>>         tz.cacheStart = tz.cacheEnd = 0;
>>>>>>         return tz;
>>>>>>     }
>>>>>>
>>>>>>
>>>>>> ...and the JMH benchmark with gc profiling shows that
>>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without
>>>>>> introducing allocation.
>>>>>>
>>>>>> Even the following (original Venkat's) patch:
>>>>>>
>>>>>>     public Object clone()
>>>>>>     {
>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>         tz.invalidateCache();
>>>>>>         return tz;
>>>>>>     }
>>>>>>
>>>>>> ...does the same and the locking in invalidateCache() is elided
>>>>>> too. Allocation and lock-elision go hand-in-hand. When object
>>>>>> doesn't escape, allocation on heap may be eliminated and locks on
>>>>>> that instance elided.
>>>>>>
>>>>>> So this is better than synchronizing on the original instance
>>>>>> during .clone() execution as it has potential to avoid locking
>>>>>> overhead.
>>>>>>
>>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>>
>>>>>> Regards, Peter
>>>>>>
>>>>>
>>>>
>>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: <i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

Naoto Sato-2
Hi Peter,

Thanks for the tests. Looks good to me.

One nit: it should throw an Exception instead of AssertionError when the
test fails. No further review is needed.

 > Can this go into JDK 10 ?

You can push it before the JDK 10 fork.

Naoto

On 12/9/17 2:33 PM, Peter Levart wrote:

> Hi Naoto,
>
> Thank you for reviewing.
>
> Naoto Sato je 06. 12. 2017 ob 20:41 napisal:
>> Hi Peter, Venkat,
>>
>> Thank you for the fix. It looks good to me. Improved performance is a
>> nice bonus! Would you be able to provide with a regression test?
>
> Sure, here it is:
>
> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.02/
>
>
> The test reliably detects several races in 2 seconds of runtime which
> cause incorrect offset calculations in JDK 9:
>
> shared TZ: 7363986 correct, 0 incorrect calculations
> cloned TZ: 3960264 correct, 1180 incorrect calculations
> Exception in thread "main" java.lang.AssertionError: 1180 fatal data
> races detected
>      at
> SimpleTimeZoneCloneRaceTest.main(SimpleTimeZoneCloneRaceTest.java:86)
>
> With patch applied, there are no failures of course.
>
> Can this go into JDK 10 ?
>
> Regards, Peter
>
>>
>> Naoto
>>
>> On 12/6/17 6:10 AM, Peter Levart wrote:
>>> Hi,
>>>
>>> On 12/06/2017 02:30 PM, Alan Bateman wrote:
>>>> I think this class is normally maintained on i18n-dev but I think
>>>> introducing the Cache object looks good and making this much easier
>>>> to understand.
>>>>
>>>> -Alan
>>>
>>> Thanks Alan, I'm forwarding to i18n-dev to see if maintainers of that
>>> part of JDK have any comments...
>>>
>>> This is an official request for reviewing a patch for fixing a data
>>> race between cloning a SimpleTimeZone object and lazily initializing
>>> its 3 cache fields which may produce a clone with inconsistent cache
>>> state. Here's Jira issue with details:
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>
>>> Venkat has proposed to simply call invalidateCache() on the clone
>>> before returning it from SimpleTimeZone.clone() method:
>>>
>>>      public Object clone()
>>>      {
>>>          SimpleTimeZone clone = (SimpleTimeZone) super.clone();
>>>          clone.invalidateCache();
>>>          return clone;
>>>      }
>>>
>>> This fixes the issue and for the case of TimeZone.getDefault() which
>>> is called from ZoneId.systemDefault() even elides synchronization in
>>> clone.invalidateCache() and allocation of the clone, so JITed
>>> ZoneId.systemDefault() is unaffected by the patch. Initially I was
>>> satisfied with the fix, but then I tested something. Suppose someone
>>> sets default zone to SimpleTimeZone:
>>>
>>>          TimeZone.setDefault(
>>>              new SimpleTimeZone(3600000,
>>>                                 "Europe/Paris",
>>>                                 Calendar.MARCH, -1, Calendar.SUNDAY,
>>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>>                                 Calendar.OCTOBER, -1, Calendar.SUNDAY,
>>>                                 3600000, SimpleTimeZone.UTC_TIME,
>>>                                 3600000)
>>>          );
>>>
>>> And then calls some methods that initialize the cache of the internal
>>> shared TimeZone.defaultTimeZone instance:
>>>
>>>      new Date().toString();
>>>
>>> The code which after that tries to obtain default time zone and
>>> calculate the offset from UTC at some current point in time, for
>>> example:
>>>
>>>      TimeZone.getDefault().getOffset(now)
>>>
>>> can't use the cached state because it has been invalidated in the
>>> returned clone. Such code has to re-compute the offset every time it
>>> gets new clone. I measured this with a JMH benchmark and got the
>>> following result:
>>>
>>> Default:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  
>>> ns/op
>>>
>>> Venkat's patch - invalidateCache():
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 179,476 ± 1,942  
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,538 ± 0,073  
>>> ns/op
>>>
>>> We see that ZoneId.systemDefault() is unaffected, but
>>> TimeZone.getDefault().getOffset(now) becomes 3x slower.
>>>
>>> This is not good, so I tried an alternative fix for the issue -
>>> simply making the SimpleTimeZone.clone() synchronized. Access to
>>> cache fields is already synchronized, so this should fix the race.
>>> Here's the JMH result:
>>>
>>> Default:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  
>>> ns/op
>>>
>>> Synchronized clone():
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 58,103 ± 0,936  
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 4,154 ± 0,034  
>>> ns/op
>>>
>>> We see that caching works again, but synchronization has some
>>> overhead which is not big for single-threaded access, but might get
>>> bigger when multiple threads try to get default zone concurrently.
>>>
>>> So I created a 3rd variant of the fix which I'm presenting here and
>>> requesting a review for:
>>>
>>> http://cr.openjdk.java.net/~plevart/jdk10-dev/8191216_SimpleTimeZone_clone_race/webrev.01/ 
>>>
>>>
>>> The JMH benchmark shows this:
>>>
>>> Default:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 57,168 ± 0,501  
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,558 ± 0,040  
>>> ns/op
>>>
>>> Cache object:
>>>
>>> Benchmark                                  Mode  Cnt Score Error  Units
>>> ZoneIdBench.TimeZone_getDefault_getOffset  avgt   10 42,860 ± 0,274  
>>> ns/op
>>> ZoneIdBench.ZoneId_systemDefault           avgt   10 3,545 ± 0,057  
>>> ns/op
>>>
>>> Not only does the fix not affect ZoneId.systemDefault() which is not
>>> surprising, but it also speeds-up cache lookup in single-threaded
>>> benchmark and certainly eliminates possible contention among threads
>>> looking up the shared instance.
>>>
>>> I have run the test/jdk/java/util/TimeZone and
>>> test/jdk/java/util/Calendar jtreg tests and there were 7 failures
>>> caused by compilation errors (package sun.util.locale.provider is not
>>> visible), while 59 other tests pass.
>>>
>>> So, what do you think?
>>>
>>> Regards, Peter
>>>
>>>
>>> Venkateswara R Chintala je 21. 11. 2017 ob 10:14 napisal:
>>>> Thanks Peter for sponsoring this patch. Is there anything else that
>>>> needs to be done from my end for this patch to be integrated? Please
>>>> let me know.
>>>>
>>>> -Venkat
>>>>
>>>>
>>>> On 14/11/17 8:46 PM, Peter Levart wrote:
>>>>> Hi Venkat,
>>>>>
>>>>> I created the following issue:
>>>>>
>>>>> https://bugs.openjdk.java.net/browse/JDK-8191216
>>>>>
>>>>> I can also sponsor this patch and push it for JDK 10.
>>>>>
>>>>> The patch that you are proposing looks good to me. Does anybody
>>>>> have anything else to say?
>>>>>
>>>>> Regards, Peter
>>>>>
>>>>>
>>>>> On 11/13/2017 11:28 AM, Venkateswara R Chintala wrote:
>>>>>> Thanks David, Peter for your review and comments. As I am new to
>>>>>> the community, can you please help me to open a bug and integrate
>>>>>> the changes into code base?
>>>>>>
>>>>>> -Venkat
>>>>>>
>>>>>> On 12/11/17 2:19 AM, Peter Levart wrote:
>>>>>>> Hi David, Venkat,
>>>>>>>
>>>>>>> On 11/11/17 21:15, Peter Levart wrote:
>>>>>>>> For example, take the following method:
>>>>>>>>
>>>>>>>> String defaultTZID() {
>>>>>>>>     return TimeZone.getDefault().getID();
>>>>>>>> }
>>>>>>>>
>>>>>>>> When JIT compiles it and inlines invocations to other methods
>>>>>>>> within it, it can prove that cloned TimeZone instance never
>>>>>>>> escapes the call to defaultTZID() and can therefore skip
>>>>>>>> allocating the instance on heap.
>>>>>>>>
>>>>>>>> But this is fragile. If code in invoked methods changes, they
>>>>>>>> may not get inlined or EA may not be able to prove that the
>>>>>>>> cloned instance can't escape and allocation may be introduced.
>>>>>>>> ZoneId.systemDefault() is a hot method and it would be nice if
>>>>>>>> we manage to keep it allocation free.
>>>>>>>
>>>>>>> Well, I tried the following variant of SimpleTimeZone.clone() patch:
>>>>>>>
>>>>>>>     public Object clone()
>>>>>>>     {
>>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>>         // like tz.invalidateCache() but without holding a lock
>>>>>>> on clone
>>>>>>>         tz.cacheYear = tz.startYear - 1;
>>>>>>>         tz.cacheStart = tz.cacheEnd = 0;
>>>>>>>         return tz;
>>>>>>>     }
>>>>>>>
>>>>>>>
>>>>>>> ...and the JMH benchmark with gc profiling shows that
>>>>>>> ZoneId.systemDefault() still manages to get JIT-compiled without
>>>>>>> introducing allocation.
>>>>>>>
>>>>>>> Even the following (original Venkat's) patch:
>>>>>>>
>>>>>>>     public Object clone()
>>>>>>>     {
>>>>>>>         SimpleTimeZone tz = (SimpleTimeZone) super.clone();
>>>>>>>         tz.invalidateCache();
>>>>>>>         return tz;
>>>>>>>     }
>>>>>>>
>>>>>>> ...does the same and the locking in invalidateCache() is elided
>>>>>>> too. Allocation and lock-elision go hand-in-hand. When object
>>>>>>> doesn't escape, allocation on heap may be eliminated and locks on
>>>>>>> that instance elided.
>>>>>>>
>>>>>>> So this is better than synchronizing on the original instance
>>>>>>> during .clone() execution as it has potential to avoid locking
>>>>>>> overhead.
>>>>>>>
>>>>>>> So Venkat, go ahead. My fear was unjustified.
>>>>>>>
>>>>>>> Regards, Peter
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: <i18n dev> RFR 8191216: SimpleTimeZone.clone() has a data race on cache fields

Peter Levart
Thanks Naoto, Alan, David and Venkat. The change is in.

Regards, Peter

Naoto Sato je 11. 12. 2017 ob 19:41 napisal:

> Hi Peter,
>
> Thanks for the tests. Looks good to me.
>
> One nit: it should throw an Exception instead of AssertionError when
> the test fails. No further review is needed.
>
> > Can this go into JDK 10 ?
>
> You can push it before the JDK 10 fork.
>
> Naoto
>