RFR: 8263482: Make access to the ICC color profiles data multithread-friendly

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8263482: Make access to the ICC color profiles data multithread-friendly

Sergey Bylokhov-2
FYI: probably is better/simpler to review it via webrev.

After migration to the lcms from the kcms the performance of some operations was regressed. One possible workaround was to split the operation into multiple threads. But unfortunately, we have too many bottlenecks which prevent using multithreading. This is the request to remove/minimize such bottlenecks(at least some of them), but it does not affect the single-threaded performance it should be handled separately.

The main code pattern optimized here is this:
    activate();
    byte[] theHeader = getData(cmmProfile, icSigHead);
    ---->  CMSManager.getModule().getTagData(p, tagSignature);
Notes about the code above:

1. Before the change "activate()" method checked that the "cmmProfile" field was not null. After that we usually used the "cmmProfile" as the parameter to some other method. This included two volatile reads, and also required to check when we need to call the "activate()" method before usage of the "cmmProfile" field.
Solution: "activate()" renamed to the "cmmProfile()" which became an accessor for the field, so we will get one volatile read and can easily monitor the usage of the field itself(it is used directly only in this method).

2. The synchronized static method "CMSManager.getModule()" reimplemented to have only one volatile read.

3. The usage of locking in the "getTagData()" is changed. Instead of the synchronized instance methods, we now use the mix of "ConcurrentHashMap" and StampedLock.

See some comments inline.

Some numbers(small numbers are better):

1. Performance of ((ICC_ProfileRGB) ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();

jdk 15.0.2
Benchmark                              Mode  Cnt    Score      Error  Units
CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   19,624 ±    0,059  us/op
CMMPerf.testGetMatrix                  avgt    5    0,154 ±    0,001  us/op

jdk - before the fix
Benchmark                              Mode  Cnt    Score      Error  Units
CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   12,935 ±    0,042  us/op
CMMPerf.testGetMatrix                  avgt    5    0,127 ±    0,007  us/op

jdk - after the fix
Benchmark                              Mode  Cnt    Score      Error  Units
CMMPerf.ThreadsMAX.testGetMatrix       avgt    5    0,561 ±    0,005  us/op
CMMPerf.testGetMatrix                  avgt    5    0,092 ±    0,001  us/op

2. Part of performance gain in jdk17 is from some other fixes, for example
    Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and ColorSpace.getInstance(ColorSpace.CS_sRGB);

jdk 15.0.2
Benchmark                              Mode  Cnt    Score      Error  Units
CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    2,299 ±    0,032  us/op
CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    2,210 ±    0,051  us/op
CMMPerf.testGetSRGBProfile             avgt    5    0,019 ±    0,001  us/op
CMMPerf.testGetSRGBSpace               avgt    5    0,018 ±    0,001  us/op

jdk - same before/ after the fix
Benchmark                              Mode  Cnt    Score      Error  Units
CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    0,005 ±    0,001  us/op
CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    0,005 ±    0,001  us/op
CMMPerf.testGetSRGBProfile             avgt    5    0,005 ±    0,001  us/op
CMMPerf.testGetSRGBSpace               avgt    5    0,005 ±    0,001  us/op

note "ThreadsMAX" is 32 threads.

-------------

Commit messages:
 - Update ICC_Profile.java
 - Merge branch 'master' into threads
 - Update LCMS.c
 - Initial fix

Changes: https://git.openjdk.java.net/jdk/pull/2957/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2957&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8263482
  Stats: 212 lines in 5 files changed: 52 ins; 83 del; 77 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2957.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2957/head:pull/2957

PR: https://git.openjdk.java.net/jdk/pull/2957
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8263482: Make access to the ICC color profiles data multithread-friendly

Sergey Bylokhov-2
On Fri, 12 Mar 2021 05:10:25 GMT, Sergey Bylokhov <[hidden email]> wrote:

> FYI: probably is better/simpler to review it via webrev.
>
> After migration to the lcms from the kcms the performance of some operations was regressed. One possible workaround was to split the operation into multiple threads. But unfortunately, we have too many bottlenecks which prevent using multithreading. This is the request to remove/minimize such bottlenecks(at least some of them), but it does not affect the single-threaded performance it should be handled separately.
>
> The main code pattern optimized here is this:
>     activate();
>     byte[] theHeader = getData(cmmProfile, icSigHead);
>     ---->  CMSManager.getModule().getTagData(p, tagSignature);
> Notes about the code above:
>
> 1. Before the change "activate()" method checked that the "cmmProfile" field was not null. After that we usually used the "cmmProfile" as the parameter to some other method. This included two volatile reads, and also required to check when we need to call the "activate()" method before usage of the "cmmProfile" field.
> Solution: "activate()" renamed to the "cmmProfile()" which became an accessor for the field, so we will get one volatile read and can easily monitor the usage of the field itself(it is used directly only in this method).
>
> 2. The synchronized static method "CMSManager.getModule()" reimplemented to have only one volatile read.
>
> 3. The usage of locking in the "getTagData()" is changed. Instead of the synchronized instance methods, we now use the mix of "ConcurrentHashMap" and StampedLock.
>
> See some comments inline.
>
> Some numbers(small numbers are better):
>
> 1. Performance of ((ICC_ProfileRGB) ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   19,624 ±    0,059  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,154 ±    0,001  us/op
>
> jdk - before the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   12,935 ±    0,042  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,127 ±    0,007  us/op
>
> jdk - after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5    0,561 ±    0,005  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,092 ±    0,001  us/op
>
> 2. Part of performance gain in jdk17 is from some other fixes, for example
>     Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and ColorSpace.getInstance(ColorSpace.CS_sRGB);
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    2,299 ±    0,032  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    2,210 ±    0,051  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,019 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,018 ±    0,001  us/op
>
> jdk - same before/ after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    0,005 ±    0,001  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,005 ±    0,001  us/op
>
> note "ThreadsMAX" is 32 threads.

src/java.desktop/share/classes/sun/java2d/cmm/lcms/LCMSProfile.java line 70:

> 68:         } finally {
> 69:             lock.unlockRead(stamp);
> 70:         }

The comments about the change in this class and the changes in the "getTag()" method.
1. I have removed the TagData and TagCache classes and move the tag cache to the profile class directly.
2. I have moved synchronization logic from the LCMS.java to this class, so now we can use more "granular" synchronizations.

The logic behind this change:
1. The StampedLock guards the access to the native lcms code, so when we change the tags via "LCMS.setTagDataNative" nobody will call "LCMS.getTagNative" and "LCMS.getProfileDataNative". I tried the "ReentrantReadWriteLock", but it is a little bit slower.
2. The cache itself is maintained by the ConcurrentHashMap, and the usage of "byte[] t = tags.get(sig);" w/o synchronization is a key for the performance.

Note that it is possible to use "tags.computeIfAbsent" instead of "tags.get(sig)" and take care of "native access" synchronization in the "LCMS.getTagNative", but unfortuntly for some workload the "get()" is x10 times faster than computeIfAbsent() if the key is already exists in the map(common situation for the cache).

src/java.desktop/share/classes/java/awt/color/ICC_Profile.java line 1103:

> 1101:     public byte[] getData(int tagSignature) {
> 1102:         byte[] t = getData(cmmProfile(), tagSignature);
> 1103:         return t != null ? t.clone() : null;

I have moved the clone operation to the public "ICC_Profile.getData(int tagSignature)" method, so we do not clone it again again and again when we use this data internally.

src/java.desktop/share/classes/sun/java2d/cmm/lcms/LCMS.java line 84:

> 82:     @Override
> 83:     public synchronized void setTagData(Profile p, int tagSignature, byte[] data) {
> 84:         getLcmsProfile(p).setTag(tagSignature, data);

Note that this method is synchronized, I think it is not needed to be, because a long time ago(before the LCMSProfile class was implemented ) all method in this class was synchronized to guard information about tags. But later we changed all such synchronization methods to the per-profile locks. And only this method remains synchronized.

But I leave it as is for now, since it might affect transforms which I am not touching in this fix. WIll remove that later.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2957
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8263482: Make access to the ICC color profiles data multithread-friendly

Alexander Zvegintsev-2
In reply to this post by Sergey Bylokhov-2
On Fri, 12 Mar 2021 05:10:25 GMT, Sergey Bylokhov <[hidden email]> wrote:

> FYI: probably is better/simpler to review it via webrev.
>
> After migration to the lcms from the kcms the performance of some operations was regressed. One possible workaround was to split the operation into multiple threads. But unfortunately, we have too many bottlenecks which prevent using multithreading. This is the request to remove/minimize such bottlenecks(at least some of them), but it does not affect the single-threaded performance it should be handled separately.
>
> The main code pattern optimized here is this:
>     activate();
>     byte[] theHeader = getData(cmmProfile, icSigHead);
>     ---->  CMSManager.getModule().getTagData(p, tagSignature);
> Notes about the code above:
>
> 1. Before the change "activate()" method checked that the "cmmProfile" field was not null. After that we usually used the "cmmProfile" as the parameter to some other method. This included two volatile reads, and also required to check when we need to call the "activate()" method before usage of the "cmmProfile" field.
> Solution: "activate()" renamed to the "cmmProfile()" which became an accessor for the field, so we will get one volatile read and can easily monitor the usage of the field itself(it is used directly only in this method).
>
> 2. The synchronized static method "CMSManager.getModule()" reimplemented to have only one volatile read.
>
> 3. The usage of locking in the "getTagData()" is changed. Instead of the synchronized instance methods, we now use the mix of "ConcurrentHashMap" and StampedLock.
>
> See some comments inline.
>
> Some numbers(small numbers are better):
>
> 1. Performance of ((ICC_ProfileRGB) ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   19,624 ±    0,059  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,154 ±    0,001  us/op
>
> jdk - before the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   12,935 ±    0,042  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,127 ±    0,007  us/op
>
> jdk - after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5    0,561 ±    0,005  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,092 ±    0,001  us/op
>
> 2. Part of performance gain in jdk17 is from some other fixes, for example
>     Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and ColorSpace.getInstance(ColorSpace.CS_sRGB);
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    2,299 ±    0,032  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    2,210 ±    0,051  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,019 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,018 ±    0,001  us/op
>
> jdk - same before/ after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    0,005 ±    0,001  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,005 ±    0,001  us/op
>
> note "ThreadsMAX" is 32 threads.

src/java.desktop/share/native/liblcms/LCMS.c line 644:

> 642:         return cmmProfile;
> 643:     }
> 644:     return NULL;

Why do we need to do this from native code? (except easing of access to a private method of a class in another package.)
Will it give some noticeable performance boost if we implement it on java side?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2957
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8263482: Make access to the ICC color profiles data multithread-friendly

Sergey Bylokhov-2
On Wed, 17 Mar 2021 20:41:47 GMT, Alexander Zvegintsev <[hidden email]> wrote:

>> FYI: probably is better/simpler to review it via webrev.
>>
>> After migration to the lcms from the kcms the performance of some operations was regressed. One possible workaround was to split the operation into multiple threads. But unfortunately, we have too many bottlenecks which prevent using multithreading. This is the request to remove/minimize such bottlenecks(at least some of them), but it does not affect the single-threaded performance it should be handled separately.
>>
>> The main code pattern optimized here is this:
>>     activate();
>>     byte[] theHeader = getData(cmmProfile, icSigHead);
>>     ---->  CMSManager.getModule().getTagData(p, tagSignature);
>> Notes about the code above:
>>
>> 1. Before the change "activate()" method checked that the "cmmProfile" field was not null. After that we usually used the "cmmProfile" as the parameter to some other method. This included two volatile reads, and also required to check when we need to call the "activate()" method before usage of the "cmmProfile" field.
>> Solution: "activate()" renamed to the "cmmProfile()" which became an accessor for the field, so we will get one volatile read and can easily monitor the usage of the field itself(it is used directly only in this method).
>>
>> 2. The synchronized static method "CMSManager.getModule()" reimplemented to have only one volatile read.
>>
>> 3. The usage of locking in the "getTagData()" is changed. Instead of the synchronized instance methods, we now use the mix of "ConcurrentHashMap" and StampedLock.
>>
>> See some comments inline.
>>
>> Some numbers(small numbers are better):
>>
>> 1. Performance of ((ICC_ProfileRGB) ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();
>>
>> jdk 15.0.2
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   19,624 ±    0,059  us/op
>> CMMPerf.testGetMatrix                  avgt    5    0,154 ±    0,001  us/op
>>
>> jdk - before the fix
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   12,935 ±    0,042  us/op
>> CMMPerf.testGetMatrix                  avgt    5    0,127 ±    0,007  us/op
>>
>> jdk - after the fix
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5    0,561 ±    0,005  us/op
>> CMMPerf.testGetMatrix                  avgt    5    0,092 ±    0,001  us/op
>>
>> 2. Part of performance gain in jdk17 is from some other fixes, for example
>>     Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and ColorSpace.getInstance(ColorSpace.CS_sRGB);
>>
>> jdk 15.0.2
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    2,299 ±    0,032  us/op
>> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    2,210 ±    0,051  us/op
>> CMMPerf.testGetSRGBProfile             avgt    5    0,019 ±    0,001  us/op
>> CMMPerf.testGetSRGBSpace               avgt    5    0,018 ±    0,001  us/op
>>
>> jdk - same before/ after the fix
>> Benchmark                              Mode  Cnt    Score      Error  Units
>> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    0,005 ±    0,001  us/op
>> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    0,005 ±    0,001  us/op
>> CMMPerf.testGetSRGBProfile             avgt    5    0,005 ±    0,001  us/op
>> CMMPerf.testGetSRGBSpace               avgt    5    0,005 ±    0,001  us/op
>>
>> note "ThreadsMAX" is 32 threads.
>
> src/java.desktop/share/native/liblcms/LCMS.c line 644:
>
>> 642:         return cmmProfile;
>> 643:     }
>> 644:     return NULL;
>
> Why do we need to do this from native code? (except easing of access to a private method of a class in another package.)
> Will it give some noticeable performance boost if we implement it on java side?

Yes, this is the only reason.
I have a todo to check what access will be better, AWTAccessor/methodhandle/reflection vs jni.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2957
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8263482: Make access to the ICC color profiles data multithread-friendly

Alexander Zvegintsev-2
In reply to this post by Sergey Bylokhov-2
On Fri, 12 Mar 2021 05:10:25 GMT, Sergey Bylokhov <[hidden email]> wrote:

> FYI: probably is better/simpler to review it via webrev.
>
> After migration to the lcms from the kcms the performance of some operations was regressed. One possible workaround was to split the operation into multiple threads. But unfortunately, we have too many bottlenecks which prevent using multithreading. This is the request to remove/minimize such bottlenecks(at least some of them), but it does not affect the single-threaded performance it should be handled separately.
>
> The main code pattern optimized here is this:
>     activate();
>     byte[] theHeader = getData(cmmProfile, icSigHead);
>     ---->  CMSManager.getModule().getTagData(p, tagSignature);
> Notes about the code above:
>
> 1. Before the change "activate()" method checked that the "cmmProfile" field was not null. After that we usually used the "cmmProfile" as the parameter to some other method. This included two volatile reads, and also required to check when we need to call the "activate()" method before usage of the "cmmProfile" field.
> Solution: "activate()" renamed to the "cmmProfile()" which became an accessor for the field, so we will get one volatile read and can easily monitor the usage of the field itself(it is used directly only in this method).
>
> 2. The synchronized static method "CMSManager.getModule()" reimplemented to have only one volatile read.
>
> 3. The usage of locking in the "getTagData()" is changed. Instead of the synchronized instance methods, we now use the mix of "ConcurrentHashMap" and StampedLock.
>
> See some comments inline.
>
> Some numbers(small numbers are better):
>
> 1. Performance of ((ICC_ProfileRGB) ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   19,624 ±    0,059  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,154 ±    0,001  us/op
>
> jdk - before the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   12,935 ±    0,042  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,127 ±    0,007  us/op
>
> jdk - after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5    0,561 ±    0,005  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,092 ±    0,001  us/op
>
> 2. Part of performance gain in jdk17 is from some other fixes, for example
>     Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and ColorSpace.getInstance(ColorSpace.CS_sRGB);
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    2,299 ±    0,032  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    2,210 ±    0,051  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,019 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,018 ±    0,001  us/op
>
> jdk - same before/ after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    0,005 ±    0,001  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,005 ±    0,001  us/op
>
> note "ThreadsMAX" is 32 threads.

Marked as reviewed by azvegint (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/2957
Reply | Threaded
Open this post in threaded view
|

Integrated: 8263482: Make access to the ICC color profiles data multithread-friendly

Sergey Bylokhov-2
In reply to this post by Sergey Bylokhov-2
On Fri, 12 Mar 2021 05:10:25 GMT, Sergey Bylokhov <[hidden email]> wrote:

> FYI: probably is better/simpler to review it via webrev.
>
> After migration to the lcms from the kcms the performance of some operations was regressed. One possible workaround was to split the operation into multiple threads. But unfortunately, we have too many bottlenecks which prevent using multithreading. This is the request to remove/minimize such bottlenecks(at least some of them), but it does not affect the single-threaded performance it should be handled separately.
>
> The main code pattern optimized here is this:
>     activate();
>     byte[] theHeader = getData(cmmProfile, icSigHead);
>     ---->  CMSManager.getModule().getTagData(p, tagSignature);
> Notes about the code above:
>
> 1. Before the change "activate()" method checked that the "cmmProfile" field was not null. After that we usually used the "cmmProfile" as the parameter to some other method. This included two volatile reads, and also required to check when we need to call the "activate()" method before usage of the "cmmProfile" field.
> Solution: "activate()" renamed to the "cmmProfile()" which became an accessor for the field, so we will get one volatile read and can easily monitor the usage of the field itself(it is used directly only in this method).
>
> 2. The synchronized static method "CMSManager.getModule()" reimplemented to have only one volatile read.
>
> 3. The usage of locking in the "getTagData()" is changed. Instead of the synchronized instance methods, we now use the mix of "ConcurrentHashMap" and StampedLock.
>
> See some comments inline.
>
> Some numbers(small numbers are better):
>
> 1. Performance of ((ICC_ProfileRGB) ICC_Profile.getInstance(ColorSpace.CS_sRGB)).getMatrix();
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   19,624 ±    0,059  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,154 ±    0,001  us/op
>
> jdk - before the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5   12,935 ±    0,042  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,127 ±    0,007  us/op
>
> jdk - after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetMatrix       avgt    5    0,561 ±    0,005  us/op
> CMMPerf.testGetMatrix                  avgt    5    0,092 ±    0,001  us/op
>
> 2. Part of performance gain in jdk17 is from some other fixes, for example
>     Performance of ICC_Profile.getInstance(ColorSpace.CS_sRGB); and ColorSpace.getInstance(ColorSpace.CS_sRGB);
>
> jdk 15.0.2
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    2,299 ±    0,032  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    2,210 ±    0,051  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,019 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,018 ±    0,001  us/op
>
> jdk - same before/ after the fix
> Benchmark                              Mode  Cnt    Score      Error  Units
> CMMPerf.ThreadsMAX.testGetSRGBProfile  avgt    5    0,005 ±    0,001  us/op
> CMMPerf.ThreadsMAX.testGetSRGBSpace    avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBProfile             avgt    5    0,005 ±    0,001  us/op
> CMMPerf.testGetSRGBSpace               avgt    5    0,005 ±    0,001  us/op
>
> note "ThreadsMAX" is 32 threads.

This pull request has now been integrated.

Changeset: 1a21f779
Author:    Sergey Bylokhov <[hidden email]>
URL:       https://git.openjdk.java.net/jdk/commit/1a21f779
Stats:     212 lines in 5 files changed: 52 ins; 83 del; 77 mod

8263482: Make access to the ICC color profiles data multithread-friendly

Reviewed-by: azvegint

-------------

PR: https://git.openjdk.java.net/jdk/pull/2957