RFR: Newer AMD 17h (EPYC) Processor family defaults

classic Classic list List threaded Threaded
36 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
Hello All,

I am Rohit Arul Raj, working at AMD India Pvt Ltd.

This is my first contribution to OpenJDK, so please guide me in case I
have overlooked any process guidelines.
I would like an volunteer to review this patch (openJDK9) which sets
flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
the commit process.

Webrev: https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
I have also attached the patch (hg diff -g) for reference.

Note:
1) I have applied for Oracle Contributor Agreement on 30Aug2017.
2) I have applied the changes on top of openJDK9/hotspot:
    changeset:   12823:b756e7a2ec33
    tag:         tip
    user:        prr
    date:        Thu Aug 03 18:56:57 2017 +0000
    summary:     Added tag jdk-9+181 for changeset 4a443796f6f5

3) I have done regression testing using jtreg ($make default) and
didnt find any regressions.
     There was 1 additional failure in the base run
(java/util/ServiceLoader/ModulesTest.java), but was not able to
reproduce it when I ran the test individually.

      Base Run:
      ==========
      Test results: passed: 4,749; failed: 5; error: 2
       Report written to
/home/rohit/project/submit/9dev-base/jdk/testoutput/jdk_core/JTreport/html/report.html
       Results written to
/home/rohit/project/submit/9dev-base/jdk/testoutput/jdk_core/JTwork
       Error: Some tests failed or other problems occurred.
       Summary: jdk_core
       FAILED: java/io/BufferedInputStream/CloseStream.java
       FAILED: java/io/Serializable/unresolvableObjectStreamClass/UnresolvableObjectStreamClass.java
       FAILED: java/nio/channels/AsyncCloseAndInterrupt.java
       FAILED: java/util/ServiceLoader/ModulesTest.java
       FAILED: jdk/internal/reflect/constantPool/ConstantPoolTest.java
       FAILED: sun/security/pkcs11/Secmod/AddTrustedCert.java
       FAILED: sun/security/pkcs11/tls/TestKeyMaterial.java
       TEST STATS: name=jdk_core  run=4756  pass=4749  fail=7
       EXIT CODE: 3
       EXIT CODE: 3
        ../../test/TestCommon.gmk:398: recipe for target 'jtreg_tests' failed
        make[2]: *** [jtreg_tests] Error 3
        Makefile:43: recipe for target 'jdk_core' failed
        make[1]: *** [jdk_core] Error 2
        Makefile:77: recipe for target 'jdk_core' failed
        make: *** [jdk_core] Error 2

        Patch Run:
        =========
       Test results: passed: 4,750; failed: 4; error: 2
        Report written to
/home/rohit/project/submit/9dev/jdk/testoutput/jdk_core/JTreport/html/report.html
        Results written to
/home/rohit/project/submit/9dev/jdk/testoutput/jdk_core/JTwork
        Error: Some tests failed or other problems occurred.
        Summary: jdk_core
        FAILED: java/io/BufferedInputStream/CloseStream.java
        FAILED:
java/io/Serializable/unresolvableObjectStreamClass/UnresolvableObjectStreamClass.java
        FAILED: java/nio/channels/AsyncCloseAndInterrupt.java
        FAILED: jdk/internal/reflect/constantPool/ConstantPoolTest.java
        FAILED: sun/security/pkcs11/Secmod/AddTrustedCert.java
        FAILED: sun/security/pkcs11/tls/TestKeyMaterial.java
        TEST STATS: name=jdk_core  run=4756  pass=4750  fail=6
        EXIT CODE: 3
        EXIT CODE: 3
         ../../test/TestCommon.gmk:398: recipe for target 'jtreg_tests' failed
         make[2]: *** [jtreg_tests] Error 3
         Makefile:43: recipe for target 'jdk_core' failed
         make[1]: *** [jdk_core] Error 2
         Makefile:77: recipe for target 'jdk_core' failed
         make: *** [jdk_core] Error 2

  Is there any further testing required?

  Please let me know your comments.

Thanks,
Rohit
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

David Holmes
Hi Rohit,

On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:

> Hello All,
>
> I am Rohit Arul Raj, working at AMD India Pvt Ltd.
>
> This is my first contribution to OpenJDK, so please guide me in case I
> have overlooked any process guidelines.
> I would like an volunteer to review this patch (openJDK9) which sets
> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
> the commit process.
>
> Webrev: https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0

Unfortunately patches can not be accepted from systems outside the
OpenJDK infrastructure and ...

> I have also attached the patch (hg diff -g) for reference.

... unfortunately patches tend to get stripped by the mail servers. If
the patch is small please include it inline. Otherwise you will need to
find an OpenJDK Author who can host it for you on cr.openjdk.java.net.

> Note:
> 1) I have applied for Oracle Contributor Agreement on 30Aug2017.

Great! Welcome!

> 2) I have applied the changes on top of openJDK9/hotspot:

You will need to rebase to jdk10/hs/hotspot. JDK 9 is in the final
stages of release.

>      changeset:   12823:b756e7a2ec33
>      tag:         tip
>      user:        prr
>      date:        Thu Aug 03 18:56:57 2017 +0000
>      summary:     Added tag jdk-9+181 for changeset 4a443796f6f5
>
> 3) I have done regression testing using jtreg ($make default) and
> didnt find any regressions.

Sounds good, but until I see the patch it is hard to comment on testing
requirements.

Thanks,
David
-----

>       There was 1 additional failure in the base run
> (java/util/ServiceLoader/ModulesTest.java), but was not able to
> reproduce it when I ran the test individually.
>
>        Base Run:
>        ==========
>        Test results: passed: 4,749; failed: 5; error: 2
>         Report written to
> /home/rohit/project/submit/9dev-base/jdk/testoutput/jdk_core/JTreport/html/report.html
>         Results written to
> /home/rohit/project/submit/9dev-base/jdk/testoutput/jdk_core/JTwork
>         Error: Some tests failed or other problems occurred.
>         Summary: jdk_core
>         FAILED: java/io/BufferedInputStream/CloseStream.java
>         FAILED: java/io/Serializable/unresolvableObjectStreamClass/UnresolvableObjectStreamClass.java
>         FAILED: java/nio/channels/AsyncCloseAndInterrupt.java
>         FAILED: java/util/ServiceLoader/ModulesTest.java
>         FAILED: jdk/internal/reflect/constantPool/ConstantPoolTest.java
>         FAILED: sun/security/pkcs11/Secmod/AddTrustedCert.java
>         FAILED: sun/security/pkcs11/tls/TestKeyMaterial.java
>         TEST STATS: name=jdk_core  run=4756  pass=4749  fail=7
>         EXIT CODE: 3
>         EXIT CODE: 3
>          ../../test/TestCommon.gmk:398: recipe for target 'jtreg_tests' failed
>          make[2]: *** [jtreg_tests] Error 3
>          Makefile:43: recipe for target 'jdk_core' failed
>          make[1]: *** [jdk_core] Error 2
>          Makefile:77: recipe for target 'jdk_core' failed
>          make: *** [jdk_core] Error 2
>
>          Patch Run:
>          =========
>         Test results: passed: 4,750; failed: 4; error: 2
>          Report written to
> /home/rohit/project/submit/9dev/jdk/testoutput/jdk_core/JTreport/html/report.html
>          Results written to
> /home/rohit/project/submit/9dev/jdk/testoutput/jdk_core/JTwork
>          Error: Some tests failed or other problems occurred.
>          Summary: jdk_core
>          FAILED: java/io/BufferedInputStream/CloseStream.java
>          FAILED:
> java/io/Serializable/unresolvableObjectStreamClass/UnresolvableObjectStreamClass.java
>          FAILED: java/nio/channels/AsyncCloseAndInterrupt.java
>          FAILED: jdk/internal/reflect/constantPool/ConstantPoolTest.java
>          FAILED: sun/security/pkcs11/Secmod/AddTrustedCert.java
>          FAILED: sun/security/pkcs11/tls/TestKeyMaterial.java
>          TEST STATS: name=jdk_core  run=4756  pass=4750  fail=6
>          EXIT CODE: 3
>          EXIT CODE: 3
>           ../../test/TestCommon.gmk:398: recipe for target 'jtreg_tests' failed
>           make[2]: *** [jtreg_tests] Error 3
>           Makefile:43: recipe for target 'jdk_core' failed
>           make[1]: *** [jdk_core] Error 2
>           Makefile:77: recipe for target 'jdk_core' failed
>           make: *** [jdk_core] Error 2
>
>    Is there any further testing required?
>
>    Please let me know your comments.
>
> Thanks,
> Rohit
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]> wrote:

> Hi Rohit,
>
> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>
>> I would like an volunteer to review this patch (openJDK9) which sets
>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>> the commit process.
>>
>> Webrev:
>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>
>
> Unfortunately patches can not be accepted from systems outside the OpenJDK
> infrastructure and ...
>
>> I have also attached the patch (hg diff -g) for reference.
>
>
> ... unfortunately patches tend to get stripped by the mail servers. If the
> patch is small please include it inline. Otherwise you will need to find an
> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>

>> 3) I have done regression testing using jtreg ($make default) and
>> didnt find any regressions.
>
>
> Sounds good, but until I see the patch it is hard to comment on testing
> requirements.
>
> Thanks,
> David

Thanks David,
Yes, it's a small patch.

diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
b/src/cpu/x86/vm/vm_version_x86.cpp
--- a/src/cpu/x86/vm/vm_version_x86.cpp
+++ b/src/cpu/x86/vm/vm_version_x86.cpp
@@ -1051,6 +1051,22 @@
       }
       FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
     }
+    if (supports_sha()) {
+      if (FLAG_IS_DEFAULT(UseSHA)) {
+        FLAG_SET_DEFAULT(UseSHA, true);
+      }
+    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
UseSHA512Intrinsics) {
+      if (!FLAG_IS_DEFAULT(UseSHA) ||
+          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
+          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
+          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
+        warning("SHA instructions are not available on this CPU");
+      }
+      FLAG_SET_DEFAULT(UseSHA, false);
+      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
+      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
+      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+    }

     // some defaults for AMD family 15h
     if ( cpu_family() == 0x15 ) {
@@ -1072,11 +1088,43 @@
     }

 #ifdef COMPILER2
-    if (MaxVectorSize > 16) {
-      // Limit vectors size to 16 bytes on current AMD cpus.
+    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
+      // Limit vectors size to 16 bytes on AMD cpus < 17h.
       FLAG_SET_DEFAULT(MaxVectorSize, 16);
     }
 #endif // COMPILER2
+
+    // Some defaults for AMD family 17h
+    if ( cpu_family() == 0x17 ) {
+      // On family 17h processors use XMM and UnalignedLoadStores for
Array Copy
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
+        UseXMMForArrayCopy = true;
+      }
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
+        UseUnalignedLoadStores = true;
+      }
+      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
+        UseBMI2Instructions = true;
+      }
+      if (MaxVectorSize > 32) {
+        FLAG_SET_DEFAULT(MaxVectorSize, 32);
+      }
+      if (UseSHA) {
+        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
+          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+        } else if (UseSHA512Intrinsics) {
+          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
functions not available on this CPU.");
+          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+        }
+      }
+#ifdef COMPILER2
+      if (supports_sse4_2()) {
+        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
+          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
+        }
+      }
+#endif
+    }
   }

   if( is_intel() ) { // Intel cpus specific settings
diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
b/src/cpu/x86/vm/vm_version_x86.hpp
--- a/src/cpu/x86/vm/vm_version_x86.hpp
+++ b/src/cpu/x86/vm/vm_version_x86.hpp
@@ -513,6 +513,16 @@
         result |= CPU_LZCNT;
       if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
         result |= CPU_SSE4A;
+      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
+        result |= CPU_BMI2;
+      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
+        result |= CPU_HT;
+      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
+        result |= CPU_ADX;
+      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
+        result |= CPU_SHA;
+      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
+        result |= CPU_FMA;
     }
     // Intel features.
     if(is_intel()) {

Regards,
Rohit
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

David Holmes
Hi Rohit,

I think the patch needs updating for jdk10 as I already see a lot of
logic around UseSHA in vm_version_x86.cpp.

Thanks,
David

On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:

> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]> wrote:
>> Hi Rohit,
>>
>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>
>>> I would like an volunteer to review this patch (openJDK9) which sets
>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>> the commit process.
>>>
>>> Webrev:
>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>
>>
>> Unfortunately patches can not be accepted from systems outside the OpenJDK
>> infrastructure and ...
>>
>>> I have also attached the patch (hg diff -g) for reference.
>>
>>
>> ... unfortunately patches tend to get stripped by the mail servers. If the
>> patch is small please include it inline. Otherwise you will need to find an
>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>
>
>>> 3) I have done regression testing using jtreg ($make default) and
>>> didnt find any regressions.
>>
>>
>> Sounds good, but until I see the patch it is hard to comment on testing
>> requirements.
>>
>> Thanks,
>> David
>
> Thanks David,
> Yes, it's a small patch.
>
> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
> b/src/cpu/x86/vm/vm_version_x86.cpp
> --- a/src/cpu/x86/vm/vm_version_x86.cpp
> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
> @@ -1051,6 +1051,22 @@
>         }
>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>       }
> +    if (supports_sha()) {
> +      if (FLAG_IS_DEFAULT(UseSHA)) {
> +        FLAG_SET_DEFAULT(UseSHA, true);
> +      }
> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
> UseSHA512Intrinsics) {
> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
> +        warning("SHA instructions are not available on this CPU");
> +      }
> +      FLAG_SET_DEFAULT(UseSHA, false);
> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +    }
>
>       // some defaults for AMD family 15h
>       if ( cpu_family() == 0x15 ) {
> @@ -1072,11 +1088,43 @@
>       }
>
>   #ifdef COMPILER2
> -    if (MaxVectorSize > 16) {
> -      // Limit vectors size to 16 bytes on current AMD cpus.
> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>       }
>   #endif // COMPILER2
> +
> +    // Some defaults for AMD family 17h
> +    if ( cpu_family() == 0x17 ) {
> +      // On family 17h processors use XMM and UnalignedLoadStores for
> Array Copy
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
> +        UseXMMForArrayCopy = true;
> +      }
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> +        UseUnalignedLoadStores = true;
> +      }
> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
> +        UseBMI2Instructions = true;
> +      }
> +      if (MaxVectorSize > 32) {
> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
> +      }
> +      if (UseSHA) {
> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +        } else if (UseSHA512Intrinsics) {
> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
> functions not available on this CPU.");
> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +        }
> +      }
> +#ifdef COMPILER2
> +      if (supports_sse4_2()) {
> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
> +        }
> +      }
> +#endif
> +    }
>     }
>
>     if( is_intel() ) { // Intel cpus specific settings
> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
> b/src/cpu/x86/vm/vm_version_x86.hpp
> --- a/src/cpu/x86/vm/vm_version_x86.hpp
> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
> @@ -513,6 +513,16 @@
>           result |= CPU_LZCNT;
>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>           result |= CPU_SSE4A;
> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> +        result |= CPU_BMI2;
> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
> +        result |= CPU_HT;
> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> +        result |= CPU_ADX;
> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> +        result |= CPU_SHA;
> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> +        result |= CPU_FMA;
>       }
>       // Intel features.
>       if(is_intel()) {
>
> Regards,
> Rohit
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Vladimir Kozlov
Hi Rohit,

I am glad to see that AMD continue to work to improve their CPU.

But it also mean that AMD will have to do Java testing for this new platform and be responsible for it.
In a future we may forward this CPU related problems to you to analyze and fix.

Regards,
Vladimir

On 8/31/17 2:31 PM, David Holmes wrote:

> Hi Rohit,
>
> I think the patch needs updating for jdk10 as I already see a lot of logic around UseSHA in vm_version_x86.cpp.
>
> Thanks,
> David
>
> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]> wrote:
>>> Hi Rohit,
>>>
>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>
>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>> the commit process.
>>>>
>>>> Webrev:
>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>
>>>
>>> Unfortunately patches can not be accepted from systems outside the OpenJDK
>>> infrastructure and ...
>>>
>>>> I have also attached the patch (hg diff -g) for reference.
>>>
>>>
>>> ... unfortunately patches tend to get stripped by the mail servers. If the
>>> patch is small please include it inline. Otherwise you will need to find an
>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>
>>
>>>> 3) I have done regression testing using jtreg ($make default) and
>>>> didnt find any regressions.
>>>
>>>
>>> Sounds good, but until I see the patch it is hard to comment on testing
>>> requirements.
>>>
>>> Thanks,
>>> David
>>
>> Thanks David,
>> Yes, it's a small patch.
>>
>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>> b/src/cpu/x86/vm/vm_version_x86.cpp
>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>> @@ -1051,6 +1051,22 @@
>>         }
>>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>       }
>> +    if (supports_sha()) {
>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>> +        FLAG_SET_DEFAULT(UseSHA, true);
>> +      }
>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>> UseSHA512Intrinsics) {
>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +        warning("SHA instructions are not available on this CPU");
>> +      }
>> +      FLAG_SET_DEFAULT(UseSHA, false);
>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +    }
>>
>>       // some defaults for AMD family 15h
>>       if ( cpu_family() == 0x15 ) {
>> @@ -1072,11 +1088,43 @@
>>       }
>>
>>   #ifdef COMPILER2
>> -    if (MaxVectorSize > 16) {
>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>       }
>>   #endif // COMPILER2
>> +
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>> Array Copy
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>> +        UseXMMForArrayCopy = true;
>> +      }
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>> +        UseUnalignedLoadStores = true;
>> +      }
>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>> +        UseBMI2Instructions = true;
>> +      }
>> +      if (MaxVectorSize > 32) {
>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>> +      }
>> +      if (UseSHA) {
>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        } else if (UseSHA512Intrinsics) {
>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>> functions not available on this CPU.");
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        }
>> +      }
>> +#ifdef COMPILER2
>> +      if (supports_sse4_2()) {
>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>> +        }
>> +      }
>> +#endif
>> +    }
>>     }
>>
>>     if( is_intel() ) { // Intel cpus specific settings
>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>> b/src/cpu/x86/vm/vm_version_x86.hpp
>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>> @@ -513,6 +513,16 @@
>>           result |= CPU_LZCNT;
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>           result |= CPU_SSE4A;
>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> +        result |= CPU_BMI2;
>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>> +        result |= CPU_HT;
>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> +        result |= CPU_ADX;
>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> +        result |= CPU_SHA;
>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> +        result |= CPU_FMA;
>>       }
>>       // Intel features.
>>       if(is_intel()) {
>>
>> Regards,
>> Rohit
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
In reply to this post by David Holmes
On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]> wrote:
> Hi Rohit,
>
> I think the patch needs updating for jdk10 as I already see a lot of logic
> around UseSHA in vm_version_x86.cpp.
>
> Thanks,
> David
>

Thanks David, I will update the patch wrt JDK10 source base, test and
resubmit for review.

Regards,
Rohit

>
> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>
>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>> wrote:
>>>
>>> Hi Rohit,
>>>
>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>
>>>>
>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>> the commit process.
>>>>
>>>> Webrev:
>>>>
>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>
>>>
>>>
>>> Unfortunately patches can not be accepted from systems outside the
>>> OpenJDK
>>> infrastructure and ...
>>>
>>>> I have also attached the patch (hg diff -g) for reference.
>>>
>>>
>>>
>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>> the
>>> patch is small please include it inline. Otherwise you will need to find
>>> an
>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>
>>
>>>> 3) I have done regression testing using jtreg ($make default) and
>>>> didnt find any regressions.
>>>
>>>
>>>
>>> Sounds good, but until I see the patch it is hard to comment on testing
>>> requirements.
>>>
>>> Thanks,
>>> David
>>
>>
>> Thanks David,
>> Yes, it's a small patch.
>>
>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>> b/src/cpu/x86/vm/vm_version_x86.cpp
>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>> @@ -1051,6 +1051,22 @@
>>         }
>>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>       }
>> +    if (supports_sha()) {
>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>> +        FLAG_SET_DEFAULT(UseSHA, true);
>> +      }
>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>> UseSHA512Intrinsics) {
>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +        warning("SHA instructions are not available on this CPU");
>> +      }
>> +      FLAG_SET_DEFAULT(UseSHA, false);
>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +    }
>>
>>       // some defaults for AMD family 15h
>>       if ( cpu_family() == 0x15 ) {
>> @@ -1072,11 +1088,43 @@
>>       }
>>
>>   #ifdef COMPILER2
>> -    if (MaxVectorSize > 16) {
>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>       }
>>   #endif // COMPILER2
>> +
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>> Array Copy
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>> +        UseXMMForArrayCopy = true;
>> +      }
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>> +        UseUnalignedLoadStores = true;
>> +      }
>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>> +        UseBMI2Instructions = true;
>> +      }
>> +      if (MaxVectorSize > 32) {
>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>> +      }
>> +      if (UseSHA) {
>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        } else if (UseSHA512Intrinsics) {
>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>> functions not available on this CPU.");
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        }
>> +      }
>> +#ifdef COMPILER2
>> +      if (supports_sse4_2()) {
>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>> +        }
>> +      }
>> +#endif
>> +    }
>>     }
>>
>>     if( is_intel() ) { // Intel cpus specific settings
>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>> b/src/cpu/x86/vm/vm_version_x86.hpp
>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>> @@ -513,6 +513,16 @@
>>           result |= CPU_LZCNT;
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>           result |= CPU_SSE4A;
>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> +        result |= CPU_BMI2;
>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>> +        result |= CPU_HT;
>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> +        result |= CPU_ADX;
>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> +        result |= CPU_SHA;
>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> +        result |= CPU_FMA;
>>       }
>>       // Intel features.
>>       if(is_intel()) {
>>
>> Regards,
>> Rohit
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
In reply to this post by Vladimir Kozlov
Hello Vladimir,

> But it also mean that AMD will have to do Java testing for this new platform
> and be responsible for it.

Can you please elaborate on this a little more?
What all Java test suites would you like us to test from our end?

> In a future we may forward this CPU related problems to you to analyze and
> fix.

Sure, looking forward to it.

Regards,
Rohit

> Regards,
> Vladimir
>
>
> On 8/31/17 2:31 PM, David Holmes wrote:
>>
>> Hi Rohit,
>>
>> I think the patch needs updating for jdk10 as I already see a lot of logic
>> around UseSHA in vm_version_x86.cpp.
>>
>> Thanks,
>> David
>>
>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>
>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>>> wrote:
>>>>
>>>> Hi Rohit,
>>>>
>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>>> the commit process.
>>>>>
>>>>> Webrev:
>>>>>
>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>
>>>>
>>>>
>>>> Unfortunately patches can not be accepted from systems outside the
>>>> OpenJDK
>>>> infrastructure and ...
>>>>
>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>
>>>>
>>>>
>>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>>> the
>>>> patch is small please include it inline. Otherwise you will need to find
>>>> an
>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>
>>>
>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>> didnt find any regressions.
>>>>
>>>>
>>>>
>>>> Sounds good, but until I see the patch it is hard to comment on testing
>>>> requirements.
>>>>
>>>> Thanks,
>>>> David
>>>
>>>
>>> Thanks David,
>>> Yes, it's a small patch.
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1051,6 +1051,22 @@
>>>         }
>>>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>       }
>>> +    if (supports_sha()) {
>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>> +      }
>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>> UseSHA512Intrinsics) {
>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +        warning("SHA instructions are not available on this CPU");
>>> +      }
>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +    }
>>>
>>>       // some defaults for AMD family 15h
>>>       if ( cpu_family() == 0x15 ) {
>>> @@ -1072,11 +1088,43 @@
>>>       }
>>>
>>>   #ifdef COMPILER2
>>> -    if (MaxVectorSize > 16) {
>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>       }
>>>   #endif // COMPILER2
>>> +
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> +        UseXMMForArrayCopy = true;
>>> +      }
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +        UseUnalignedLoadStores = true;
>>> +      }
>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>> +        UseBMI2Instructions = true;
>>> +      }
>>> +      if (MaxVectorSize > 32) {
>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>> +      }
>>> +      if (UseSHA) {
>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        } else if (UseSHA512Intrinsics) {
>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>> functions not available on this CPU.");
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        }
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (supports_sse4_2()) {
>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> +        }
>>> +      }
>>> +#endif
>>> +    }
>>>     }
>>>
>>>     if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -513,6 +513,16 @@
>>>           result |= CPU_LZCNT;
>>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>           result |= CPU_SSE4A;
>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> +        result |= CPU_BMI2;
>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> +        result |= CPU_HT;
>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> +        result |= CPU_ADX;
>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> +        result |= CPU_SHA;
>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> +        result |= CPU_FMA;
>>>       }
>>>       // Intel features.
>>>       if(is_intel()) {
>>>
>>> Regards,
>>> Rohit
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Vladimir Kozlov
On 8/31/17 10:14 PM, Rohit Arul Raj wrote:
> Hello Vladimir,
>
>> But it also mean that AMD will have to do Java testing for this new platform
>> and be responsible for it.
>
> Can you please elaborate on this a little more?
> What all Java test suites would you like us to test from our end?

First, I am talking only about testing on your platform. In this case it is AMD 17h.

You need to build and use fastdebug JVM for testing: configure --with-debug-level=fastdebug

You need to make sure to run hotspot and jdk jtreg tests. At least next set of tests:

make test JOBS=1 TEST_JOBS=1 TEST="hotspot_compiler hotspot_gc hotspot_runtime hotspot_serviceability hotspot_misc
jdk_util jdk_lang"

It will take time. You can try to increase JOBS=1 and TEST_JOBS=1 numbers to run tests in parallel but depending on
memory and swap sizes it may not work.

In addition to that would be nice if you track performance changes with specjvm2008 and specjbb2015 on your cpu to avoid
regression when you apply new changes or pull changes from OpenJDK.

If you have questions, please ask.

>
>> In a future we may forward this CPU related problems to you to analyze and
>> fix.
>
> Sure, looking forward to it.

Best regards,
Vladimir

>
> Regards,
> Rohit
>
>> Regards,
>> Vladimir
>>
>>
>> On 8/31/17 2:31 PM, David Holmes wrote:
>>>
>>> Hi Rohit,
>>>
>>> I think the patch needs updating for jdk10 as I already see a lot of logic
>>> around UseSHA in vm_version_x86.cpp.
>>>
>>> Thanks,
>>> David
>>>
>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>
>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Hi Rohit,
>>>>>
>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>
>>>>>>
>>>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>>>> the commit process.
>>>>>>
>>>>>> Webrev:
>>>>>>
>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>
>>>>>
>>>>>
>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>> OpenJDK
>>>>> infrastructure and ...
>>>>>
>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>
>>>>>
>>>>>
>>>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>>>> the
>>>>> patch is small please include it inline. Otherwise you will need to find
>>>>> an
>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>
>>>>
>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>> didnt find any regressions.
>>>>>
>>>>>
>>>>>
>>>>> Sounds good, but until I see the patch it is hard to comment on testing
>>>>> requirements.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>
>>>>
>>>> Thanks David,
>>>> Yes, it's a small patch.
>>>>
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> @@ -1051,6 +1051,22 @@
>>>>          }
>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>        }
>>>> +    if (supports_sha()) {
>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>> +      }
>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>> UseSHA512Intrinsics) {
>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +        warning("SHA instructions are not available on this CPU");
>>>> +      }
>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +    }
>>>>
>>>>        // some defaults for AMD family 15h
>>>>        if ( cpu_family() == 0x15 ) {
>>>> @@ -1072,11 +1088,43 @@
>>>>        }
>>>>
>>>>    #ifdef COMPILER2
>>>> -    if (MaxVectorSize > 16) {
>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>        }
>>>>    #endif // COMPILER2
>>>> +
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>> Array Copy
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>> +        UseXMMForArrayCopy = true;
>>>> +      }
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>> +        UseUnalignedLoadStores = true;
>>>> +      }
>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>> +        UseBMI2Instructions = true;
>>>> +      }
>>>> +      if (MaxVectorSize > 32) {
>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>> +      }
>>>> +      if (UseSHA) {
>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        } else if (UseSHA512Intrinsics) {
>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>> functions not available on this CPU.");
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        }
>>>> +      }
>>>> +#ifdef COMPILER2
>>>> +      if (supports_sse4_2()) {
>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>> +        }
>>>> +      }
>>>> +#endif
>>>> +    }
>>>>      }
>>>>
>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> @@ -513,6 +513,16 @@
>>>>            result |= CPU_LZCNT;
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>            result |= CPU_SSE4A;
>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> +        result |= CPU_BMI2;
>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>> +        result |= CPU_HT;
>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> +        result |= CPU_ADX;
>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> +        result |= CPU_SHA;
>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> +        result |= CPU_FMA;
>>>>        }
>>>>        // Intel features.
>>>>        if(is_intel()) {
>>>>
>>>> Regards,
>>>> Rohit
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
On Fri, Sep 1, 2017 at 12:49 PM, Vladimir Kozlov
<[hidden email]> wrote:

> On 8/31/17 10:14 PM, Rohit Arul Raj wrote:
>>
>> Hello Vladimir,
>>
>>> But it also mean that AMD will have to do Java testing for this new
>>> platform
>>> and be responsible for it.
>>
>>
>> Can you please elaborate on this a little more?
>> What all Java test suites would you like us to test from our end?
>
>
> First, I am talking only about testing on your platform. In this case it is
> AMD 17h.
>
> You need to build and use fastdebug JVM for testing: configure
> --with-debug-level=fastdebug
>
> You need to make sure to run hotspot and jdk jtreg tests. At least next set
> of tests:
>
> make test JOBS=1 TEST_JOBS=1 TEST="hotspot_compiler hotspot_gc
> hotspot_runtime hotspot_serviceability hotspot_misc jdk_util jdk_lang"
>
> It will take time. You can try to increase JOBS=1 and TEST_JOBS=1 numbers to
> run tests in parallel but depending on memory and swap sizes it may not
> work.

Yes, We will do that.

> In addition to that would be nice if you track performance changes with
> specjvm2008 and specjbb2015 on your cpu to avoid regression when you apply
> new changes or pull changes from OpenJDK.

We do run SPECjbb2015.
Regarding SPECjvm2008, we tried the base run but the results are
pretty inconsistent. The base throughput varies from run to run
(~30%).

This is the command we use to generate the numbers
(startup.compiler.sunflow & compiler.sunflow have been disabled). Is
there any benchmark option we may be missing?

java -jar SPECjvm2008.jar  startup.helloworld
startup.compiler.compiler  startup.compress startup.crypto.aes
startup.crypto.rsa startup.crypto.signverify startup.mpegaudio
startup.scimark.fft startup.scimark.lu startup.scimark.monte_carlo
startup.scimark.sor  startup.scimark.sparse startup.serial
startup.sunflow startup.xml.transform  startup.xml.validation
compiler.compiler compress crypto.aes crypto.rsa  crypto.signverify
derby  mpegaudio scimark.fft.large scimark.lu.large scimark.sor.large
scimark.sparse.large  scimark.fft.small scimark.lu.small
scimark.sor.small scimark.sparse.small scimark.monte_carlo serial
sunflow xml.transform xml.validation

Regards,
Rohit

> If you have questions, please ask.
>
>>
>>> In a future we may forward this CPU related problems to you to analyze
>>> and
>>> fix.
>>
>>
>> Sure, looking forward to it.
>
>
> Best regards,
> Vladimir
>
>
>>
>> Regards,
>> Rohit
>>
>>> Regards,
>>> Vladimir
>>>
>>>
>>> On 8/31/17 2:31 PM, David Holmes wrote:
>>>>
>>>>
>>>> Hi Rohit,
>>>>
>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>> logic
>>>> around UseSHA in vm_version_x86.cpp.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Rohit,
>>>>>>
>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>>>>> the commit process.
>>>>>>>
>>>>>>> Webrev:
>>>>>>>
>>>>>>>
>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>> OpenJDK
>>>>>> infrastructure and ...
>>>>>>
>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>>>>> the
>>>>>> patch is small please include it inline. Otherwise you will need to
>>>>>> find
>>>>>> an
>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>
>>>>>
>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>> didnt find any regressions.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>> testing
>>>>>> requirements.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>
>>>>>
>>>>>
>>>>> Thanks David,
>>>>> Yes, it's a small patch.
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1051,6 +1051,22 @@
>>>>>          }
>>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>        }
>>>>> +    if (supports_sha()) {
>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>> +      }
>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>> UseSHA512Intrinsics) {
>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>> +      }
>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +    }
>>>>>
>>>>>        // some defaults for AMD family 15h
>>>>>        if ( cpu_family() == 0x15 ) {
>>>>> @@ -1072,11 +1088,43 @@
>>>>>        }
>>>>>
>>>>>    #ifdef COMPILER2
>>>>> -    if (MaxVectorSize > 16) {
>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>        }
>>>>>    #endif // COMPILER2
>>>>> +
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> +        UseXMMForArrayCopy = true;
>>>>> +      }
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>> {
>>>>> +        UseUnalignedLoadStores = true;
>>>>> +      }
>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>> +        UseBMI2Instructions = true;
>>>>> +      }
>>>>> +      if (MaxVectorSize > 32) {
>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>> +      }
>>>>> +      if (UseSHA) {
>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>> functions not available on this CPU.");
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        }
>>>>> +      }
>>>>> +#ifdef COMPILER2
>>>>> +      if (supports_sse4_2()) {
>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> +        }
>>>>> +      }
>>>>> +#endif
>>>>> +    }
>>>>>      }
>>>>>
>>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -513,6 +513,16 @@
>>>>>            result |= CPU_LZCNT;
>>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>            result |= CPU_SSE4A;
>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> +        result |= CPU_BMI2;
>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> +        result |= CPU_HT;
>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> +        result |= CPU_ADX;
>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> +        result |= CPU_SHA;
>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> +        result |= CPU_FMA;
>>>>>        }
>>>>>        // Intel features.
>>>>>        if(is_intel()) {
>>>>>
>>>>> Regards,
>>>>> Rohit
>>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
In reply to this post by Rohit Arul Raj
On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <[hidden email]> wrote:

> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]> wrote:
>> Hi Rohit,
>>
>> I think the patch needs updating for jdk10 as I already see a lot of logic
>> around UseSHA in vm_version_x86.cpp.
>>
>> Thanks,
>> David
>>
>
> Thanks David, I will update the patch wrt JDK10 source base, test and
> resubmit for review.
>
> Regards,
> Rohit
>

Hi All,

I have updated the patch wrt openjdk10/hotspot (parent:
13519:71337910df60), did regression testing using jtreg ($make
default) and didnt find any regressions.

Can anyone please volunteer to review this patch  which sets flag/ISA
defaults for newer AMD 17h (EPYC) processor?

************************* Patch ****************************

diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
b/src/cpu/x86/vm/vm_version_x86.cpp
--- a/src/cpu/x86/vm/vm_version_x86.cpp
+++ b/src/cpu/x86/vm/vm_version_x86.cpp
@@ -1088,6 +1088,22 @@
       }
       FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
     }
+    if (supports_sha()) {
+      if (FLAG_IS_DEFAULT(UseSHA)) {
+        FLAG_SET_DEFAULT(UseSHA, true);
+      }
+    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
UseSHA512Intrinsics) {
+      if (!FLAG_IS_DEFAULT(UseSHA) ||
+          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
+          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
+          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
+        warning("SHA instructions are not available on this CPU");
+      }
+      FLAG_SET_DEFAULT(UseSHA, false);
+      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
+      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
+      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+    }

     // some defaults for AMD family 15h
     if ( cpu_family() == 0x15 ) {
@@ -1109,11 +1125,43 @@
     }

 #ifdef COMPILER2
-    if (MaxVectorSize > 16) {
-      // Limit vectors size to 16 bytes on current AMD cpus.
+    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
+      // Limit vectors size to 16 bytes on AMD cpus < 17h.
       FLAG_SET_DEFAULT(MaxVectorSize, 16);
     }
 #endif // COMPILER2
+
+    // Some defaults for AMD family 17h
+    if ( cpu_family() == 0x17 ) {
+      // On family 17h processors use XMM and UnalignedLoadStores for
Array Copy
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
+        UseXMMForArrayCopy = true;
+      }
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
+        UseUnalignedLoadStores = true;
+      }
+      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
+        UseBMI2Instructions = true;
+      }
+      if (MaxVectorSize > 32) {
+        FLAG_SET_DEFAULT(MaxVectorSize, 32);
+      }
+      if (UseSHA) {
+        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
+          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+        } else if (UseSHA512Intrinsics) {
+          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
functions not available on this CPU.");
+          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+        }
+      }
+#ifdef COMPILER2
+      if (supports_sse4_2()) {
+        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
+          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
+        }
+      }
+#endif
+    }
   }

   if( is_intel() ) { // Intel cpus specific settings
diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
b/src/cpu/x86/vm/vm_version_x86.hpp
--- a/src/cpu/x86/vm/vm_version_x86.hpp
+++ b/src/cpu/x86/vm/vm_version_x86.hpp
@@ -505,6 +505,14 @@
       result |= CPU_CLMUL;
     if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
       result |= CPU_RTM;
+    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
+       result |= CPU_ADX;
+    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
+      result |= CPU_BMI2;
+    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
+      result |= CPU_SHA;
+    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
+      result |= CPU_FMA;

     // AMD features.
     if (is_amd()) {
@@ -515,19 +523,13 @@
         result |= CPU_LZCNT;
       if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
         result |= CPU_SSE4A;
+      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
+        result |= CPU_HT;
     }
     // Intel features.
     if(is_intel()) {
-      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
-         result |= CPU_ADX;
-      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
-        result |= CPU_BMI2;
-      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
-        result |= CPU_SHA;
       if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
         result |= CPU_LZCNT;
-      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
-        result |= CPU_FMA;
       // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
support for prefetchw
       if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
         result |= CPU_3DNOW_PREFETCH;

**************************************************************

Thanks,
Rohit

>>
>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>
>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>>> wrote:
>>>>
>>>> Hi Rohit,
>>>>
>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>>> the commit process.
>>>>>
>>>>> Webrev:
>>>>>
>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>
>>>>
>>>>
>>>> Unfortunately patches can not be accepted from systems outside the
>>>> OpenJDK
>>>> infrastructure and ...
>>>>
>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>
>>>>
>>>>
>>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>>> the
>>>> patch is small please include it inline. Otherwise you will need to find
>>>> an
>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>
>>>
>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>> didnt find any regressions.
>>>>
>>>>
>>>>
>>>> Sounds good, but until I see the patch it is hard to comment on testing
>>>> requirements.
>>>>
>>>> Thanks,
>>>> David
>>>
>>>
>>> Thanks David,
>>> Yes, it's a small patch.
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1051,6 +1051,22 @@
>>>         }
>>>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>       }
>>> +    if (supports_sha()) {
>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>> +      }
>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>> UseSHA512Intrinsics) {
>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +        warning("SHA instructions are not available on this CPU");
>>> +      }
>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +    }
>>>
>>>       // some defaults for AMD family 15h
>>>       if ( cpu_family() == 0x15 ) {
>>> @@ -1072,11 +1088,43 @@
>>>       }
>>>
>>>   #ifdef COMPILER2
>>> -    if (MaxVectorSize > 16) {
>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>       }
>>>   #endif // COMPILER2
>>> +
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> +        UseXMMForArrayCopy = true;
>>> +      }
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +        UseUnalignedLoadStores = true;
>>> +      }
>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>> +        UseBMI2Instructions = true;
>>> +      }
>>> +      if (MaxVectorSize > 32) {
>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>> +      }
>>> +      if (UseSHA) {
>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        } else if (UseSHA512Intrinsics) {
>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>> functions not available on this CPU.");
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        }
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (supports_sse4_2()) {
>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> +        }
>>> +      }
>>> +#endif
>>> +    }
>>>     }
>>>
>>>     if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -513,6 +513,16 @@
>>>           result |= CPU_LZCNT;
>>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>           result |= CPU_SSE4A;
>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> +        result |= CPU_BMI2;
>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> +        result |= CPU_HT;
>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> +        result |= CPU_ADX;
>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> +        result |= CPU_SHA;
>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> +        result |= CPU_FMA;
>>>       }
>>>       // Intel features.
>>>       if(is_intel()) {
>>>
>>> Regards,
>>> Rohit
>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Vladimir Kozlov
Hi Rohit,

Changes look good. Only question I have is about MaxVectorSize. It is set > 16 only in presence of AVX:

http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945

Does that code works for AMD 17h too?

Thanks,
Vladimir

On 9/1/17 8:04 AM, Rohit Arul Raj wrote:

> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <[hidden email]> wrote:
>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]> wrote:
>>> Hi Rohit,
>>>
>>> I think the patch needs updating for jdk10 as I already see a lot of logic
>>> around UseSHA in vm_version_x86.cpp.
>>>
>>> Thanks,
>>> David
>>>
>>
>> Thanks David, I will update the patch wrt JDK10 source base, test and
>> resubmit for review.
>>
>> Regards,
>> Rohit
>>
>
> Hi All,
>
> I have updated the patch wrt openjdk10/hotspot (parent:
> 13519:71337910df60), did regression testing using jtreg ($make
> default) and didnt find any regressions.
>
> Can anyone please volunteer to review this patch  which sets flag/ISA
> defaults for newer AMD 17h (EPYC) processor?
>
> ************************* Patch ****************************
>
> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
> b/src/cpu/x86/vm/vm_version_x86.cpp
> --- a/src/cpu/x86/vm/vm_version_x86.cpp
> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
> @@ -1088,6 +1088,22 @@
>         }
>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>       }
> +    if (supports_sha()) {
> +      if (FLAG_IS_DEFAULT(UseSHA)) {
> +        FLAG_SET_DEFAULT(UseSHA, true);
> +      }
> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
> UseSHA512Intrinsics) {
> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
> +        warning("SHA instructions are not available on this CPU");
> +      }
> +      FLAG_SET_DEFAULT(UseSHA, false);
> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +    }
>
>       // some defaults for AMD family 15h
>       if ( cpu_family() == 0x15 ) {
> @@ -1109,11 +1125,43 @@
>       }
>
>   #ifdef COMPILER2
> -    if (MaxVectorSize > 16) {
> -      // Limit vectors size to 16 bytes on current AMD cpus.
> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>       }
>   #endif // COMPILER2
> +
> +    // Some defaults for AMD family 17h
> +    if ( cpu_family() == 0x17 ) {
> +      // On family 17h processors use XMM and UnalignedLoadStores for
> Array Copy
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
> +        UseXMMForArrayCopy = true;
> +      }
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> +        UseUnalignedLoadStores = true;
> +      }
> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
> +        UseBMI2Instructions = true;
> +      }
> +      if (MaxVectorSize > 32) {
> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
> +      }
> +      if (UseSHA) {
> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +        } else if (UseSHA512Intrinsics) {
> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
> functions not available on this CPU.");
> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +        }
> +      }
> +#ifdef COMPILER2
> +      if (supports_sse4_2()) {
> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
> +        }
> +      }
> +#endif
> +    }
>     }
>
>     if( is_intel() ) { // Intel cpus specific settings
> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
> b/src/cpu/x86/vm/vm_version_x86.hpp
> --- a/src/cpu/x86/vm/vm_version_x86.hpp
> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
> @@ -505,6 +505,14 @@
>         result |= CPU_CLMUL;
>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>         result |= CPU_RTM;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> +       result |= CPU_ADX;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> +      result |= CPU_BMI2;
> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> +      result |= CPU_SHA;
> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> +      result |= CPU_FMA;
>
>       // AMD features.
>       if (is_amd()) {
> @@ -515,19 +523,13 @@
>           result |= CPU_LZCNT;
>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>           result |= CPU_SSE4A;
> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
> +        result |= CPU_HT;
>       }
>       // Intel features.
>       if(is_intel()) {
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> -         result |= CPU_ADX;
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> -        result |= CPU_BMI2;
> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> -        result |= CPU_SHA;
>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>           result |= CPU_LZCNT;
> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> -        result |= CPU_FMA;
>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
> support for prefetchw
>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>           result |= CPU_3DNOW_PREFETCH;
>
> **************************************************************
>
> Thanks,
> Rohit
>
>>>
>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>
>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Hi Rohit,
>>>>>
>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>
>>>>>>
>>>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>>>> the commit process.
>>>>>>
>>>>>> Webrev:
>>>>>>
>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>
>>>>>
>>>>>
>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>> OpenJDK
>>>>> infrastructure and ...
>>>>>
>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>
>>>>>
>>>>>
>>>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>>>> the
>>>>> patch is small please include it inline. Otherwise you will need to find
>>>>> an
>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>
>>>>
>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>> didnt find any regressions.
>>>>>
>>>>>
>>>>>
>>>>> Sounds good, but until I see the patch it is hard to comment on testing
>>>>> requirements.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>
>>>>
>>>> Thanks David,
>>>> Yes, it's a small patch.
>>>>
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> @@ -1051,6 +1051,22 @@
>>>>          }
>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>        }
>>>> +    if (supports_sha()) {
>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>> +      }
>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>> UseSHA512Intrinsics) {
>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +        warning("SHA instructions are not available on this CPU");
>>>> +      }
>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +    }
>>>>
>>>>        // some defaults for AMD family 15h
>>>>        if ( cpu_family() == 0x15 ) {
>>>> @@ -1072,11 +1088,43 @@
>>>>        }
>>>>
>>>>    #ifdef COMPILER2
>>>> -    if (MaxVectorSize > 16) {
>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>        }
>>>>    #endif // COMPILER2
>>>> +
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>> Array Copy
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>> +        UseXMMForArrayCopy = true;
>>>> +      }
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>> +        UseUnalignedLoadStores = true;
>>>> +      }
>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>> +        UseBMI2Instructions = true;
>>>> +      }
>>>> +      if (MaxVectorSize > 32) {
>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>> +      }
>>>> +      if (UseSHA) {
>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        } else if (UseSHA512Intrinsics) {
>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>> functions not available on this CPU.");
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        }
>>>> +      }
>>>> +#ifdef COMPILER2
>>>> +      if (supports_sse4_2()) {
>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>> +        }
>>>> +      }
>>>> +#endif
>>>> +    }
>>>>      }
>>>>
>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> @@ -513,6 +513,16 @@
>>>>            result |= CPU_LZCNT;
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>            result |= CPU_SSE4A;
>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> +        result |= CPU_BMI2;
>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>> +        result |= CPU_HT;
>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> +        result |= CPU_ADX;
>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> +        result |= CPU_SHA;
>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> +        result |= CPU_FMA;
>>>>        }
>>>>        // Intel features.
>>>>        if(is_intel()) {
>>>>
>>>> Regards,
>>>> Rohit
>>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
Hello Vladimir,

> Changes look good. Only question I have is about MaxVectorSize. It is set >
> 16 only in presence of AVX:
>
> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>
> Does that code works for AMD 17h too?

Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
I have removed the surplus check for MaxVectorSize from my patch. I
have updated, re-tested and attached the patch.

I have one query regarding the setting of UseSHA flag:
http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821

AMD 17h has support for SHA.
AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
enabled for it based on the availability of BMI2 and AVX2. Is there an
underlying reason for this? I have handled this in the patch but just
wanted to confirm.

Thanks for taking time to review the code.

diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
b/src/cpu/x86/vm/vm_version_x86.cpp
--- a/src/cpu/x86/vm/vm_version_x86.cpp
+++ b/src/cpu/x86/vm/vm_version_x86.cpp
@@ -1088,6 +1088,22 @@
       }
       FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
     }
+    if (supports_sha()) {
+      if (FLAG_IS_DEFAULT(UseSHA)) {
+        FLAG_SET_DEFAULT(UseSHA, true);
+      }
+    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
UseSHA512Intrinsics) {
+      if (!FLAG_IS_DEFAULT(UseSHA) ||
+          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
+          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
+          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
+        warning("SHA instructions are not available on this CPU");
+      }
+      FLAG_SET_DEFAULT(UseSHA, false);
+      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
+      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
+      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+    }

     // some defaults for AMD family 15h
     if ( cpu_family() == 0x15 ) {
@@ -1109,11 +1125,40 @@
     }

 #ifdef COMPILER2
-    if (MaxVectorSize > 16) {
-      // Limit vectors size to 16 bytes on current AMD cpus.
+    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
+      // Limit vectors size to 16 bytes on AMD cpus < 17h.
       FLAG_SET_DEFAULT(MaxVectorSize, 16);
     }
 #endif // COMPILER2
+
+    // Some defaults for AMD family 17h
+    if ( cpu_family() == 0x17 ) {
+      // On family 17h processors use XMM and UnalignedLoadStores for
Array Copy
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
+        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
+      }
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
+        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
+      }
+      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
+        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
+      }
+      if (UseSHA) {
+        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
+          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+        } else if (UseSHA512Intrinsics) {
+          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
functions not available on this CPU.");
+          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
+        }
+      }
+#ifdef COMPILER2
+      if (supports_sse4_2()) {
+        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
+          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
+        }
+      }
+#endif
+    }
   }

   if( is_intel() ) { // Intel cpus specific settings
diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
b/src/cpu/x86/vm/vm_version_x86.hpp
--- a/src/cpu/x86/vm/vm_version_x86.hpp
+++ b/src/cpu/x86/vm/vm_version_x86.hpp
@@ -505,6 +505,14 @@
       result |= CPU_CLMUL;
     if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
       result |= CPU_RTM;
+    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
+       result |= CPU_ADX;
+    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
+      result |= CPU_BMI2;
+    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
+      result |= CPU_SHA;
+    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
+      result |= CPU_FMA;

     // AMD features.
     if (is_amd()) {
@@ -515,19 +523,13 @@
         result |= CPU_LZCNT;
       if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
         result |= CPU_SSE4A;
+      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
+        result |= CPU_HT;
     }
     // Intel features.
     if(is_intel()) {
-      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
-         result |= CPU_ADX;
-      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
-        result |= CPU_BMI2;
-      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
-        result |= CPU_SHA;
       if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
         result |= CPU_LZCNT;
-      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
-        result |= CPU_FMA;
       // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
support for prefetchw
       if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
         result |= CPU_3DNOW_PREFETCH;


Regards,
Rohit



> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>
>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <[hidden email]>
>> wrote:
>>>
>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]>
>>> wrote:
>>>>
>>>> Hi Rohit,
>>>>
>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>> logic
>>>> around UseSHA in vm_version_x86.cpp.
>>>>
>>>> Thanks,
>>>> David
>>>>
>>>
>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>> resubmit for review.
>>>
>>> Regards,
>>> Rohit
>>>
>>
>> Hi All,
>>
>> I have updated the patch wrt openjdk10/hotspot (parent:
>> 13519:71337910df60), did regression testing using jtreg ($make
>> default) and didnt find any regressions.
>>
>> Can anyone please volunteer to review this patch  which sets flag/ISA
>> defaults for newer AMD 17h (EPYC) processor?
>>
>> ************************* Patch ****************************
>>
>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>> b/src/cpu/x86/vm/vm_version_x86.cpp
>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>> @@ -1088,6 +1088,22 @@
>>         }
>>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>       }
>> +    if (supports_sha()) {
>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>> +        FLAG_SET_DEFAULT(UseSHA, true);
>> +      }
>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>> UseSHA512Intrinsics) {
>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +        warning("SHA instructions are not available on this CPU");
>> +      }
>> +      FLAG_SET_DEFAULT(UseSHA, false);
>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +    }
>>
>>       // some defaults for AMD family 15h
>>       if ( cpu_family() == 0x15 ) {
>> @@ -1109,11 +1125,43 @@
>>       }
>>
>>   #ifdef COMPILER2
>> -    if (MaxVectorSize > 16) {
>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>       }
>>   #endif // COMPILER2
>> +
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>> Array Copy
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>> +        UseXMMForArrayCopy = true;
>> +      }
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>> +        UseUnalignedLoadStores = true;
>> +      }
>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>> +        UseBMI2Instructions = true;
>> +      }
>> +      if (MaxVectorSize > 32) {
>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>> +      }
>> +      if (UseSHA) {
>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        } else if (UseSHA512Intrinsics) {
>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>> functions not available on this CPU.");
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        }
>> +      }
>> +#ifdef COMPILER2
>> +      if (supports_sse4_2()) {
>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>> +        }
>> +      }
>> +#endif
>> +    }
>>     }
>>
>>     if( is_intel() ) { // Intel cpus specific settings
>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>> b/src/cpu/x86/vm/vm_version_x86.hpp
>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>> @@ -505,6 +505,14 @@
>>         result |= CPU_CLMUL;
>>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>         result |= CPU_RTM;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> +       result |= CPU_ADX;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> +      result |= CPU_BMI2;
>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> +      result |= CPU_SHA;
>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> +      result |= CPU_FMA;
>>
>>       // AMD features.
>>       if (is_amd()) {
>> @@ -515,19 +523,13 @@
>>           result |= CPU_LZCNT;
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>           result |= CPU_SSE4A;
>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>> +        result |= CPU_HT;
>>       }
>>       // Intel features.
>>       if(is_intel()) {
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> -         result |= CPU_ADX;
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> -        result |= CPU_BMI2;
>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> -        result |= CPU_SHA;
>>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>           result |= CPU_LZCNT;
>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> -        result |= CPU_FMA;
>>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>> support for prefetchw
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>           result |= CPU_3DNOW_PREFETCH;
>>
>> **************************************************************
>>
>> Thanks,
>> Rohit
>>
>>>>
>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Rohit,
>>>>>>
>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>>>>> the commit process.
>>>>>>>
>>>>>>> Webrev:
>>>>>>>
>>>>>>>
>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>> OpenJDK
>>>>>> infrastructure and ...
>>>>>>
>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>>>>> the
>>>>>> patch is small please include it inline. Otherwise you will need to
>>>>>> find
>>>>>> an
>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>
>>>>>
>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>> didnt find any regressions.
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>> testing
>>>>>> requirements.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>
>>>>>
>>>>>
>>>>> Thanks David,
>>>>> Yes, it's a small patch.
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1051,6 +1051,22 @@
>>>>>          }
>>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>        }
>>>>> +    if (supports_sha()) {
>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>> +      }
>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>> UseSHA512Intrinsics) {
>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>> +      }
>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +    }
>>>>>
>>>>>        // some defaults for AMD family 15h
>>>>>        if ( cpu_family() == 0x15 ) {
>>>>> @@ -1072,11 +1088,43 @@
>>>>>        }
>>>>>
>>>>>    #ifdef COMPILER2
>>>>> -    if (MaxVectorSize > 16) {
>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>        }
>>>>>    #endif // COMPILER2
>>>>> +
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> +        UseXMMForArrayCopy = true;
>>>>> +      }
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>> {
>>>>> +        UseUnalignedLoadStores = true;
>>>>> +      }
>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>> +        UseBMI2Instructions = true;
>>>>> +      }
>>>>> +      if (MaxVectorSize > 32) {
>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>> +      }
>>>>> +      if (UseSHA) {
>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>> functions not available on this CPU.");
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        }
>>>>> +      }
>>>>> +#ifdef COMPILER2
>>>>> +      if (supports_sse4_2()) {
>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> +        }
>>>>> +      }
>>>>> +#endif
>>>>> +    }
>>>>>      }
>>>>>
>>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -513,6 +513,16 @@
>>>>>            result |= CPU_LZCNT;
>>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>            result |= CPU_SSE4A;
>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> +        result |= CPU_BMI2;
>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> +        result |= CPU_HT;
>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> +        result |= CPU_ADX;
>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> +        result |= CPU_SHA;
>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> +        result |= CPU_FMA;
>>>>>        }
>>>>>        // Intel features.
>>>>>        if(is_intel()) {
>>>>>
>>>>> Regards,
>>>>> Rohit
>>>>>
>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Vladimir Kozlov
Hi Rohit,

On 9/2/17 1:16 AM, Rohit Arul Raj wrote:

> Hello Vladimir,
>
>> Changes look good. Only question I have is about MaxVectorSize. It is set >
>> 16 only in presence of AVX:
>>
>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>
>> Does that code works for AMD 17h too?
>
> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
> I have removed the surplus check for MaxVectorSize from my patch. I
> have updated, re-tested and attached the patch.

Which check you removed?

>
> I have one query regarding the setting of UseSHA flag:
> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>
> AMD 17h has support for SHA.
> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
> enabled for it based on the availability of BMI2 and AVX2. Is there an
> underlying reason for this? I have handled this in the patch but just
> wanted to confirm.

It was done with next changes which use only AVX2 and BMI2 instructions
to calculate SHA-256:

http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974

I don't know if AMD 15h supports these instructions and can execute that
code. You need to test it.

May be you should move your new UseSHA related code to the line 821 to
set UseSHA for AMD. Then you don't need to overwrite UseSHA*Intrinsics
flags which are set after that line.

Regards,
Vladimir

>
> Thanks for taking time to review the code.
>
> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
> b/src/cpu/x86/vm/vm_version_x86.cpp
> --- a/src/cpu/x86/vm/vm_version_x86.cpp
> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
> @@ -1088,6 +1088,22 @@
>         }
>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>       }
> +    if (supports_sha()) {
> +      if (FLAG_IS_DEFAULT(UseSHA)) {
> +        FLAG_SET_DEFAULT(UseSHA, true);
> +      }
> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
> UseSHA512Intrinsics) {
> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
> +        warning("SHA instructions are not available on this CPU");
> +      }
> +      FLAG_SET_DEFAULT(UseSHA, false);
> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +    }
>
>       // some defaults for AMD family 15h
>       if ( cpu_family() == 0x15 ) {
> @@ -1109,11 +1125,40 @@
>       }
>
>   #ifdef COMPILER2
> -    if (MaxVectorSize > 16) {
> -      // Limit vectors size to 16 bytes on current AMD cpus.
> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>       }
>   #endif // COMPILER2
> +
> +    // Some defaults for AMD family 17h
> +    if ( cpu_family() == 0x17 ) {
> +      // On family 17h processors use XMM and UnalignedLoadStores for
> Array Copy
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
> +      }
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
> +      }
> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
> +      }
> +      if (UseSHA) {
> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +        } else if (UseSHA512Intrinsics) {
> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
> functions not available on this CPU.");
> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
> +        }
> +      }
> +#ifdef COMPILER2
> +      if (supports_sse4_2()) {
> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
> +        }
> +      }
> +#endif
> +    }
>     }
>
>     if( is_intel() ) { // Intel cpus specific settings
> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
> b/src/cpu/x86/vm/vm_version_x86.hpp
> --- a/src/cpu/x86/vm/vm_version_x86.hpp
> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
> @@ -505,6 +505,14 @@
>         result |= CPU_CLMUL;
>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>         result |= CPU_RTM;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> +       result |= CPU_ADX;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> +      result |= CPU_BMI2;
> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> +      result |= CPU_SHA;
> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> +      result |= CPU_FMA;
>
>       // AMD features.
>       if (is_amd()) {
> @@ -515,19 +523,13 @@
>           result |= CPU_LZCNT;
>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>           result |= CPU_SSE4A;
> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
> +        result |= CPU_HT;
>       }
>       // Intel features.
>       if(is_intel()) {
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> -         result |= CPU_ADX;
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> -        result |= CPU_BMI2;
> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> -        result |= CPU_SHA;
>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>           result |= CPU_LZCNT;
> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> -        result |= CPU_FMA;
>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
> support for prefetchw
>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>           result |= CPU_3DNOW_PREFETCH;
>
>
> Regards,
> Rohit
>
>
>
>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>
>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <[hidden email]>
>>> wrote:
>>>>
>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]>
>>>> wrote:
>>>>>
>>>>> Hi Rohit,
>>>>>
>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>> logic
>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>
>>>>> Thanks,
>>>>> David
>>>>>
>>>>
>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>> resubmit for review.
>>>>
>>>> Regards,
>>>> Rohit
>>>>
>>>
>>> Hi All,
>>>
>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>> 13519:71337910df60), did regression testing using jtreg ($make
>>> default) and didnt find any regressions.
>>>
>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>> defaults for newer AMD 17h (EPYC) processor?
>>>
>>> ************************* Patch ****************************
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1088,6 +1088,22 @@
>>>          }
>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>        }
>>> +    if (supports_sha()) {
>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>> +      }
>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>> UseSHA512Intrinsics) {
>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +        warning("SHA instructions are not available on this CPU");
>>> +      }
>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +    }
>>>
>>>        // some defaults for AMD family 15h
>>>        if ( cpu_family() == 0x15 ) {
>>> @@ -1109,11 +1125,43 @@
>>>        }
>>>
>>>    #ifdef COMPILER2
>>> -    if (MaxVectorSize > 16) {
>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>        }
>>>    #endif // COMPILER2
>>> +
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> +        UseXMMForArrayCopy = true;
>>> +      }
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +        UseUnalignedLoadStores = true;
>>> +      }
>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>> +        UseBMI2Instructions = true;
>>> +      }
>>> +      if (MaxVectorSize > 32) {
>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>> +      }
>>> +      if (UseSHA) {
>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        } else if (UseSHA512Intrinsics) {
>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>> functions not available on this CPU.");
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        }
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (supports_sse4_2()) {
>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> +        }
>>> +      }
>>> +#endif
>>> +    }
>>>      }
>>>
>>>      if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -505,6 +505,14 @@
>>>          result |= CPU_CLMUL;
>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>          result |= CPU_RTM;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> +       result |= CPU_ADX;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> +      result |= CPU_BMI2;
>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> +      result |= CPU_SHA;
>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> +      result |= CPU_FMA;
>>>
>>>        // AMD features.
>>>        if (is_amd()) {
>>> @@ -515,19 +523,13 @@
>>>            result |= CPU_LZCNT;
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>            result |= CPU_SSE4A;
>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> +        result |= CPU_HT;
>>>        }
>>>        // Intel features.
>>>        if(is_intel()) {
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> -         result |= CPU_ADX;
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> -        result |= CPU_BMI2;
>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> -        result |= CPU_SHA;
>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>            result |= CPU_LZCNT;
>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> -        result |= CPU_FMA;
>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>> support for prefetchw
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>            result |= CPU_3DNOW_PREFETCH;
>>>
>>> **************************************************************
>>>
>>> Thanks,
>>> Rohit
>>>
>>>>>
>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>
>>>>>>
>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Rohit,
>>>>>>>
>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> I would like an volunteer to review this patch (openJDK9) which sets
>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us with
>>>>>>>> the commit process.
>>>>>>>>
>>>>>>>> Webrev:
>>>>>>>>
>>>>>>>>
>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>> OpenJDK
>>>>>>> infrastructure and ...
>>>>>>>
>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> ... unfortunately patches tend to get stripped by the mail servers. If
>>>>>>> the
>>>>>>> patch is small please include it inline. Otherwise you will need to
>>>>>>> find
>>>>>>> an
>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>
>>>>>>
>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>> didnt find any regressions.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>> testing
>>>>>>> requirements.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks David,
>>>>>> Yes, it's a small patch.
>>>>>>
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>           }
>>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>         }
>>>>>> +    if (supports_sha()) {
>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>> +      }
>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>> UseSHA512Intrinsics) {
>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>> +      }
>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +    }
>>>>>>
>>>>>>         // some defaults for AMD family 15h
>>>>>>         if ( cpu_family() == 0x15 ) {
>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>         }
>>>>>>
>>>>>>     #ifdef COMPILER2
>>>>>> -    if (MaxVectorSize > 16) {
>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>         }
>>>>>>     #endif // COMPILER2
>>>>>> +
>>>>>> +    // Some defaults for AMD family 17h
>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>>> Array Copy
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>> +        UseXMMForArrayCopy = true;
>>>>>> +      }
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>> {
>>>>>> +        UseUnalignedLoadStores = true;
>>>>>> +      }
>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>> +        UseBMI2Instructions = true;
>>>>>> +      }
>>>>>> +      if (MaxVectorSize > 32) {
>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>> +      }
>>>>>> +      if (UseSHA) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>> functions not available on this CPU.");
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#ifdef COMPILER2
>>>>>> +      if (supports_sse4_2()) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#endif
>>>>>> +    }
>>>>>>       }
>>>>>>
>>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> @@ -513,6 +513,16 @@
>>>>>>             result |= CPU_LZCNT;
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>             result |= CPU_SSE4A;
>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> +        result |= CPU_BMI2;
>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>> +        result |= CPU_HT;
>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> +        result |= CPU_ADX;
>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> +        result |= CPU_SHA;
>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> +        result |= CPU_FMA;
>>>>>>         }
>>>>>>         // Intel features.
>>>>>>         if(is_intel()) {
>>>>>>
>>>>>> Regards,
>>>>>> Rohit
>>>>>>
>>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
Hello Vladimir,

On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
<[hidden email]> wrote:

> Hi Rohit,
>
> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>
>> Hello Vladimir,
>>
>>> Changes look good. Only question I have is about MaxVectorSize. It is set
>>> >
>>> 16 only in presence of AVX:
>>>
>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>
>>> Does that code works for AMD 17h too?
>>
>>
>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>> I have removed the surplus check for MaxVectorSize from my patch. I
>> have updated, re-tested and attached the patch.
>
>
> Which check you removed?
>

My older patch had the below mentioned check which was required on
JDK9 where the default MaxVectorSize was 64. It has been handled
better in openJDK10. So this check is not required anymore.

+    // Some defaults for AMD family 17h
+    if ( cpu_family() == 0x17 ) {
...
...
+      if (MaxVectorSize > 32) {
+        FLAG_SET_DEFAULT(MaxVectorSize, 32);
+      }
..
..
+      }

>>
>> I have one query regarding the setting of UseSHA flag:
>>
>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>
>> AMD 17h has support for SHA.
>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>> underlying reason for this? I have handled this in the patch but just
>> wanted to confirm.
>
>
> It was done with next changes which use only AVX2 and BMI2 instructions to
> calculate SHA-256:
>
> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>
> I don't know if AMD 15h supports these instructions and can execute that
> code. You need to test it.
>

Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
it should work.
Confirmed by running following sanity tests:
./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java

So I have removed those SHA checks from my patch too.

Please find attached updated, re-tested patch.

diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
b/src/cpu/x86/vm/vm_version_x86.cpp
--- a/src/cpu/x86/vm/vm_version_x86.cpp
+++ b/src/cpu/x86/vm/vm_version_x86.cpp
@@ -1109,11 +1109,27 @@
     }

 #ifdef COMPILER2
-    if (MaxVectorSize > 16) {
-      // Limit vectors size to 16 bytes on current AMD cpus.
+    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
+      // Limit vectors size to 16 bytes on AMD cpus < 17h.
       FLAG_SET_DEFAULT(MaxVectorSize, 16);
     }
 #endif // COMPILER2
+
+    // Some defaults for AMD family 17h
+    if ( cpu_family() == 0x17 ) {
+      // On family 17h processors use XMM and UnalignedLoadStores for
Array Copy
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
+        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
+      }
+      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
+        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
+      }
+#ifdef COMPILER2
+      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
+        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
+      }
+#endif
+    }
   }

   if( is_intel() ) { // Intel cpus specific settings
diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
b/src/cpu/x86/vm/vm_version_x86.hpp
--- a/src/cpu/x86/vm/vm_version_x86.hpp
+++ b/src/cpu/x86/vm/vm_version_x86.hpp
@@ -505,6 +505,14 @@
       result |= CPU_CLMUL;
     if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
       result |= CPU_RTM;
+    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
+       result |= CPU_ADX;
+    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
+      result |= CPU_BMI2;
+    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
+      result |= CPU_SHA;
+    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
+      result |= CPU_FMA;

     // AMD features.
     if (is_amd()) {
@@ -515,19 +523,13 @@
         result |= CPU_LZCNT;
       if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
         result |= CPU_SSE4A;
+      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
+        result |= CPU_HT;
     }
     // Intel features.
     if(is_intel()) {
-      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
-         result |= CPU_ADX;
-      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
-        result |= CPU_BMI2;
-      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
-        result |= CPU_SHA;
       if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
         result |= CPU_LZCNT;
-      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
-        result |= CPU_FMA;
       // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
support for prefetchw
       if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
         result |= CPU_3DNOW_PREFETCH;

Please let me know your comments.

Thanks for your time.
Rohit

>>
>> Thanks for taking time to review the code.
>>
>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>> b/src/cpu/x86/vm/vm_version_x86.cpp
>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>> @@ -1088,6 +1088,22 @@
>>         }
>>         FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>       }
>> +    if (supports_sha()) {
>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>> +        FLAG_SET_DEFAULT(UseSHA, true);
>> +      }
>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>> UseSHA512Intrinsics) {
>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +        warning("SHA instructions are not available on this CPU");
>> +      }
>> +      FLAG_SET_DEFAULT(UseSHA, false);
>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +    }
>>
>>       // some defaults for AMD family 15h
>>       if ( cpu_family() == 0x15 ) {
>> @@ -1109,11 +1125,40 @@
>>       }
>>
>>   #ifdef COMPILER2
>> -    if (MaxVectorSize > 16) {
>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>       }
>>   #endif // COMPILER2
>> +
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>> Array Copy
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>> +      }
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>> +      }
>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>> +      }
>> +      if (UseSHA) {
>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        } else if (UseSHA512Intrinsics) {
>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>> functions not available on this CPU.");
>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>> +        }
>> +      }
>> +#ifdef COMPILER2
>> +      if (supports_sse4_2()) {
>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>> +        }
>> +      }
>> +#endif
>> +    }
>>     }
>>
>>     if( is_intel() ) { // Intel cpus specific settings
>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>> b/src/cpu/x86/vm/vm_version_x86.hpp
>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>> @@ -505,6 +505,14 @@
>>         result |= CPU_CLMUL;
>>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>         result |= CPU_RTM;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> +       result |= CPU_ADX;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> +      result |= CPU_BMI2;
>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> +      result |= CPU_SHA;
>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> +      result |= CPU_FMA;
>>
>>       // AMD features.
>>       if (is_amd()) {
>> @@ -515,19 +523,13 @@
>>           result |= CPU_LZCNT;
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>           result |= CPU_SSE4A;
>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>> +        result |= CPU_HT;
>>       }
>>       // Intel features.
>>       if(is_intel()) {
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> -         result |= CPU_ADX;
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> -        result |= CPU_BMI2;
>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> -        result |= CPU_SHA;
>>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>           result |= CPU_LZCNT;
>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> -        result |= CPU_FMA;
>>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>> support for prefetchw
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>           result |= CPU_3DNOW_PREFETCH;
>>
>>
>> Regards,
>> Rohit
>>
>>
>>
>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>
>>>>
>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <[hidden email]>
>>>> wrote:
>>>>>
>>>>>
>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> Hi Rohit,
>>>>>>
>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>> logic
>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>
>>>>>> Thanks,
>>>>>> David
>>>>>>
>>>>>
>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>> resubmit for review.
>>>>>
>>>>> Regards,
>>>>> Rohit
>>>>>
>>>>
>>>> Hi All,
>>>>
>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>> default) and didnt find any regressions.
>>>>
>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>
>>>> ************************* Patch ****************************
>>>>
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> @@ -1088,6 +1088,22 @@
>>>>          }
>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>        }
>>>> +    if (supports_sha()) {
>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>> +      }
>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>> UseSHA512Intrinsics) {
>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +        warning("SHA instructions are not available on this CPU");
>>>> +      }
>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +    }
>>>>
>>>>        // some defaults for AMD family 15h
>>>>        if ( cpu_family() == 0x15 ) {
>>>> @@ -1109,11 +1125,43 @@
>>>>        }
>>>>
>>>>    #ifdef COMPILER2
>>>> -    if (MaxVectorSize > 16) {
>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>        }
>>>>    #endif // COMPILER2
>>>> +
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>> Array Copy
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>> +        UseXMMForArrayCopy = true;
>>>> +      }
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>> +        UseUnalignedLoadStores = true;
>>>> +      }
>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>> +        UseBMI2Instructions = true;
>>>> +      }
>>>> +      if (MaxVectorSize > 32) {
>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>> +      }
>>>> +      if (UseSHA) {
>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        } else if (UseSHA512Intrinsics) {
>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>> functions not available on this CPU.");
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        }
>>>> +      }
>>>> +#ifdef COMPILER2
>>>> +      if (supports_sse4_2()) {
>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>> +        }
>>>> +      }
>>>> +#endif
>>>> +    }
>>>>      }
>>>>
>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> @@ -505,6 +505,14 @@
>>>>          result |= CPU_CLMUL;
>>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>          result |= CPU_RTM;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> +       result |= CPU_ADX;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> +      result |= CPU_BMI2;
>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> +      result |= CPU_SHA;
>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> +      result |= CPU_FMA;
>>>>
>>>>        // AMD features.
>>>>        if (is_amd()) {
>>>> @@ -515,19 +523,13 @@
>>>>            result |= CPU_LZCNT;
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>            result |= CPU_SSE4A;
>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>> +        result |= CPU_HT;
>>>>        }
>>>>        // Intel features.
>>>>        if(is_intel()) {
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> -         result |= CPU_ADX;
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> -        result |= CPU_BMI2;
>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> -        result |= CPU_SHA;
>>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>            result |= CPU_LZCNT;
>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> -        result |= CPU_FMA;
>>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>> support for prefetchw
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>            result |= CPU_3DNOW_PREFETCH;
>>>>
>>>> **************************************************************
>>>>
>>>> Thanks,
>>>> Rohit
>>>>
>>>>>>
>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Rohit,
>>>>>>>>
>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>> sets
>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>> with
>>>>>>>>> the commit process.
>>>>>>>>>
>>>>>>>>> Webrev:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>> OpenJDK
>>>>>>>> infrastructure and ...
>>>>>>>>
>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> ... unfortunately patches tend to get stripped by the mail servers.
>>>>>>>> If
>>>>>>>> the
>>>>>>>> patch is small please include it inline. Otherwise you will need to
>>>>>>>> find
>>>>>>>> an
>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>
>>>>>>>
>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>> didnt find any regressions.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>> testing
>>>>>>>> requirements.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks David,
>>>>>>> Yes, it's a small patch.
>>>>>>>
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>           }
>>>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>         }
>>>>>>> +    if (supports_sha()) {
>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>> +      }
>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>>> UseSHA512Intrinsics) {
>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>> +      }
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +    }
>>>>>>>
>>>>>>>         // some defaults for AMD family 15h
>>>>>>>         if ( cpu_family() == 0x15 ) {
>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>         }
>>>>>>>
>>>>>>>     #ifdef COMPILER2
>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>         }
>>>>>>>     #endif // COMPILER2
>>>>>>> +
>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>> for
>>>>>>> Array Copy
>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>> +      }
>>>>>>> +      if (supports_sse2() &&
>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>> {
>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>> +      }
>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>> +        UseBMI2Instructions = true;
>>>>>>> +      }
>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>> +      }
>>>>>>> +      if (UseSHA) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>> functions not available on this CPU.");
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#ifdef COMPILER2
>>>>>>> +      if (supports_sse4_2()) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#endif
>>>>>>> +    }
>>>>>>>       }
>>>>>>>
>>>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>             result |= CPU_LZCNT;
>>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>             result |= CPU_SSE4A;
>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>> +        result |= CPU_BMI2;
>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>> +        result |= CPU_HT;
>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>> +        result |= CPU_ADX;
>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>> +        result |= CPU_SHA;
>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>> +        result |= CPU_FMA;
>>>>>>>         }
>>>>>>>         // Intel features.
>>>>>>>         if(is_intel()) {
>>>>>>>
>>>>>>> Regards,
>>>>>>> Rohit
>>>>>>>
>>>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Vladimir Kozlov
Looks good.

Currently jdk10 repository is undergoing "consolidation" update. It may
take 2 weeks. You need to wait when we can push your changes.

Regards,
Vladimir

On 9/3/17 9:42 AM, Rohit Arul Raj wrote:

> Hello Vladimir,
>
> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
> <[hidden email]> wrote:
>> Hi Rohit,
>>
>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>
>>> Hello Vladimir,
>>>
>>>> Changes look good. Only question I have is about MaxVectorSize. It is set
>>>>>
>>>> 16 only in presence of AVX:
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>
>>>> Does that code works for AMD 17h too?
>>>
>>>
>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>> have updated, re-tested and attached the patch.
>>
>>
>> Which check you removed?
>>
>
> My older patch had the below mentioned check which was required on
> JDK9 where the default MaxVectorSize was 64. It has been handled
> better in openJDK10. So this check is not required anymore.
>
> +    // Some defaults for AMD family 17h
> +    if ( cpu_family() == 0x17 ) {
> ...
> ...
> +      if (MaxVectorSize > 32) {
> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
> +      }
> ..
> ..
> +      }
>
>>>
>>> I have one query regarding the setting of UseSHA flag:
>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>
>>> AMD 17h has support for SHA.
>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>> underlying reason for this? I have handled this in the patch but just
>>> wanted to confirm.
>>
>>
>> It was done with next changes which use only AVX2 and BMI2 instructions to
>> calculate SHA-256:
>>
>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>
>> I don't know if AMD 15h supports these instructions and can execute that
>> code. You need to test it.
>>
>
> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
> it should work.
> Confirmed by running following sanity tests:
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>
> So I have removed those SHA checks from my patch too.
>
> Please find attached updated, re-tested patch.
>
> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
> b/src/cpu/x86/vm/vm_version_x86.cpp
> --- a/src/cpu/x86/vm/vm_version_x86.cpp
> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
> @@ -1109,11 +1109,27 @@
>       }
>
>   #ifdef COMPILER2
> -    if (MaxVectorSize > 16) {
> -      // Limit vectors size to 16 bytes on current AMD cpus.
> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>       }
>   #endif // COMPILER2
> +
> +    // Some defaults for AMD family 17h
> +    if ( cpu_family() == 0x17 ) {
> +      // On family 17h processors use XMM and UnalignedLoadStores for
> Array Copy
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
> +      }
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
> +      }
> +#ifdef COMPILER2
> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
> +      }
> +#endif
> +    }
>     }
>
>     if( is_intel() ) { // Intel cpus specific settings
> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
> b/src/cpu/x86/vm/vm_version_x86.hpp
> --- a/src/cpu/x86/vm/vm_version_x86.hpp
> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
> @@ -505,6 +505,14 @@
>         result |= CPU_CLMUL;
>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>         result |= CPU_RTM;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> +       result |= CPU_ADX;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> +      result |= CPU_BMI2;
> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> +      result |= CPU_SHA;
> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> +      result |= CPU_FMA;
>
>       // AMD features.
>       if (is_amd()) {
> @@ -515,19 +523,13 @@
>           result |= CPU_LZCNT;
>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>           result |= CPU_SSE4A;
> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
> +        result |= CPU_HT;
>       }
>       // Intel features.
>       if(is_intel()) {
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> -         result |= CPU_ADX;
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> -        result |= CPU_BMI2;
> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> -        result |= CPU_SHA;
>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>           result |= CPU_LZCNT;
> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> -        result |= CPU_FMA;
>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
> support for prefetchw
>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>           result |= CPU_3DNOW_PREFETCH;
>
> Please let me know your comments.
>
> Thanks for your time.
> Rohit
>
>>>
>>> Thanks for taking time to review the code.
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1088,6 +1088,22 @@
>>>          }
>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>        }
>>> +    if (supports_sha()) {
>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>> +      }
>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>> UseSHA512Intrinsics) {
>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +        warning("SHA instructions are not available on this CPU");
>>> +      }
>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +    }
>>>
>>>        // some defaults for AMD family 15h
>>>        if ( cpu_family() == 0x15 ) {
>>> @@ -1109,11 +1125,40 @@
>>>        }
>>>
>>>    #ifdef COMPILER2
>>> -    if (MaxVectorSize > 16) {
>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>        }
>>>    #endif // COMPILER2
>>> +
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>> +      }
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>> +      }
>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>> +      }
>>> +      if (UseSHA) {
>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        } else if (UseSHA512Intrinsics) {
>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>> functions not available on this CPU.");
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        }
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (supports_sse4_2()) {
>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> +        }
>>> +      }
>>> +#endif
>>> +    }
>>>      }
>>>
>>>      if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -505,6 +505,14 @@
>>>          result |= CPU_CLMUL;
>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>          result |= CPU_RTM;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> +       result |= CPU_ADX;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> +      result |= CPU_BMI2;
>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> +      result |= CPU_SHA;
>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> +      result |= CPU_FMA;
>>>
>>>        // AMD features.
>>>        if (is_amd()) {
>>> @@ -515,19 +523,13 @@
>>>            result |= CPU_LZCNT;
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>            result |= CPU_SSE4A;
>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> +        result |= CPU_HT;
>>>        }
>>>        // Intel features.
>>>        if(is_intel()) {
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> -         result |= CPU_ADX;
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> -        result |= CPU_BMI2;
>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> -        result |= CPU_SHA;
>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>            result |= CPU_LZCNT;
>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> -        result |= CPU_FMA;
>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>> support for prefetchw
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>            result |= CPU_3DNOW_PREFETCH;
>>>
>>>
>>> Regards,
>>> Rohit
>>>
>>>
>>>
>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Rohit,
>>>>>>>
>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>> logic
>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>> resubmit for review.
>>>>>>
>>>>>> Regards,
>>>>>> Rohit
>>>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>> default) and didnt find any regressions.
>>>>>
>>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>
>>>>> ************************* Patch ****************************
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1088,6 +1088,22 @@
>>>>>           }
>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>         }
>>>>> +    if (supports_sha()) {
>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>> +      }
>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>> UseSHA512Intrinsics) {
>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>> +      }
>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +    }
>>>>>
>>>>>         // some defaults for AMD family 15h
>>>>>         if ( cpu_family() == 0x15 ) {
>>>>> @@ -1109,11 +1125,43 @@
>>>>>         }
>>>>>
>>>>>     #ifdef COMPILER2
>>>>> -    if (MaxVectorSize > 16) {
>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>         }
>>>>>     #endif // COMPILER2
>>>>> +
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> +        UseXMMForArrayCopy = true;
>>>>> +      }
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>> +        UseUnalignedLoadStores = true;
>>>>> +      }
>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>> +        UseBMI2Instructions = true;
>>>>> +      }
>>>>> +      if (MaxVectorSize > 32) {
>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>> +      }
>>>>> +      if (UseSHA) {
>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>> functions not available on this CPU.");
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        }
>>>>> +      }
>>>>> +#ifdef COMPILER2
>>>>> +      if (supports_sse4_2()) {
>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> +        }
>>>>> +      }
>>>>> +#endif
>>>>> +    }
>>>>>       }
>>>>>
>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -505,6 +505,14 @@
>>>>>           result |= CPU_CLMUL;
>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>           result |= CPU_RTM;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> +       result |= CPU_ADX;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> +      result |= CPU_BMI2;
>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> +      result |= CPU_SHA;
>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> +      result |= CPU_FMA;
>>>>>
>>>>>         // AMD features.
>>>>>         if (is_amd()) {
>>>>> @@ -515,19 +523,13 @@
>>>>>             result |= CPU_LZCNT;
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>             result |= CPU_SSE4A;
>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> +        result |= CPU_HT;
>>>>>         }
>>>>>         // Intel features.
>>>>>         if(is_intel()) {
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> -         result |= CPU_ADX;
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> -        result |= CPU_BMI2;
>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> -        result |= CPU_SHA;
>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>             result |= CPU_LZCNT;
>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> -        result |= CPU_FMA;
>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>> support for prefetchw
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>
>>>>> **************************************************************
>>>>>
>>>>> Thanks,
>>>>> Rohit
>>>>>
>>>>>>>
>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Rohit,
>>>>>>>>>
>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>> sets
>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>> with
>>>>>>>>>> the commit process.
>>>>>>>>>>
>>>>>>>>>> Webrev:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>> OpenJDK
>>>>>>>>> infrastructure and ...
>>>>>>>>>
>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ... unfortunately patches tend to get stripped by the mail servers.
>>>>>>>>> If
>>>>>>>>> the
>>>>>>>>> patch is small please include it inline. Otherwise you will need to
>>>>>>>>> find
>>>>>>>>> an
>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>
>>>>>>>>
>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>> didnt find any regressions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>> testing
>>>>>>>>> requirements.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks David,
>>>>>>>> Yes, it's a small patch.
>>>>>>>>
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>            }
>>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>          }
>>>>>>>> +    if (supports_sha()) {
>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>> +      }
>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>> +      }
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +    }
>>>>>>>>
>>>>>>>>          // some defaults for AMD family 15h
>>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>          }
>>>>>>>>
>>>>>>>>      #ifdef COMPILER2
>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>          }
>>>>>>>>      #endif // COMPILER2
>>>>>>>> +
>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>> for
>>>>>>>> Array Copy
>>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>> +      }
>>>>>>>> +      if (supports_sse2() &&
>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>> {
>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>> +      }
>>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>> +      }
>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>> +      }
>>>>>>>> +      if (UseSHA) {
>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>> functions not available on this CPU.");
>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +        }
>>>>>>>> +      }
>>>>>>>> +#ifdef COMPILER2
>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>> +        }
>>>>>>>> +      }
>>>>>>>> +#endif
>>>>>>>> +    }
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>              result |= CPU_SSE4A;
>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>> +        result |= CPU_HT;
>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>> +        result |= CPU_ADX;
>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>> +        result |= CPU_SHA;
>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>          }
>>>>>>>>          // Intel features.
>>>>>>>>          if(is_intel()) {
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Rohit
>>>>>>>>
>>>>>>>
>>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
On Mon, Sep 4, 2017 at 8:09 AM, Vladimir Kozlov
<[hidden email]> wrote:
> Looks good.
>
> Currently jdk10 repository is undergoing "consolidation" update. It may take
> 2 weeks. You need to wait when we can push your changes.
>

Sure Vladimir, Thanks for the support.

Regards,
Rohit


>
> On 9/3/17 9:42 AM, Rohit Arul Raj wrote:
>>
>> Hello Vladimir,
>>
>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
>> <[hidden email]> wrote:
>>>
>>> Hi Rohit,
>>>
>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>>
>>>>
>>>> Hello Vladimir,
>>>>
>>>>> Changes look good. Only question I have is about MaxVectorSize. It is
>>>>> set
>>>>>>
>>>>>>
>>>>> 16 only in presence of AVX:
>>>>>
>>>>>
>>>>>
>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>>
>>>>> Does that code works for AMD 17h too?
>>>>
>>>>
>>>>
>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>>> have updated, re-tested and attached the patch.
>>>
>>>
>>>
>>> Which check you removed?
>>>
>>
>> My older patch had the below mentioned check which was required on
>> JDK9 where the default MaxVectorSize was 64. It has been handled
>> better in openJDK10. So this check is not required anymore.
>>
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> ...
>> ...
>> +      if (MaxVectorSize > 32) {
>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>> +      }
>> ..
>> ..
>> +      }
>>
>>>>
>>>> I have one query regarding the setting of UseSHA flag:
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>>
>>>> AMD 17h has support for SHA.
>>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>>> underlying reason for this? I have handled this in the patch but just
>>>> wanted to confirm.
>>>
>>>
>>>
>>> It was done with next changes which use only AVX2 and BMI2 instructions
>>> to
>>> calculate SHA-256:
>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>>
>>> I don't know if AMD 15h supports these instructions and can execute that
>>> code. You need to test it.
>>>
>>
>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
>> it should work.
>> Confirmed by running following sanity tests:
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>>
>> So I have removed those SHA checks from my patch too.
>>
>> Please find attached updated, re-tested patch.
>>
>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>> b/src/cpu/x86/vm/vm_version_x86.cpp
>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>> @@ -1109,11 +1109,27 @@
>>       }
>>
>>   #ifdef COMPILER2
>> -    if (MaxVectorSize > 16) {
>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>       }
>>   #endif // COMPILER2
>> +
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>> Array Copy
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>> +      }
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>> +      }
>> +#ifdef COMPILER2
>> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>> +      }
>> +#endif
>> +    }
>>     }
>>
>>     if( is_intel() ) { // Intel cpus specific settings
>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>> b/src/cpu/x86/vm/vm_version_x86.hpp
>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>> @@ -505,6 +505,14 @@
>>         result |= CPU_CLMUL;
>>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>         result |= CPU_RTM;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> +       result |= CPU_ADX;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> +      result |= CPU_BMI2;
>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> +      result |= CPU_SHA;
>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> +      result |= CPU_FMA;
>>
>>       // AMD features.
>>       if (is_amd()) {
>> @@ -515,19 +523,13 @@
>>           result |= CPU_LZCNT;
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>           result |= CPU_SSE4A;
>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>> +        result |= CPU_HT;
>>       }
>>       // Intel features.
>>       if(is_intel()) {
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> -         result |= CPU_ADX;
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> -        result |= CPU_BMI2;
>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> -        result |= CPU_SHA;
>>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>           result |= CPU_LZCNT;
>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> -        result |= CPU_FMA;
>>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>> support for prefetchw
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>           result |= CPU_3DNOW_PREFETCH;
>>
>> Please let me know your comments.
>>
>> Thanks for your time.
>> Rohit
>>
>>>>
>>>> Thanks for taking time to review the code.
>>>>
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> @@ -1088,6 +1088,22 @@
>>>>          }
>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>        }
>>>> +    if (supports_sha()) {
>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>> +      }
>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>> UseSHA512Intrinsics) {
>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +        warning("SHA instructions are not available on this CPU");
>>>> +      }
>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +    }
>>>>
>>>>        // some defaults for AMD family 15h
>>>>        if ( cpu_family() == 0x15 ) {
>>>> @@ -1109,11 +1125,40 @@
>>>>        }
>>>>
>>>>    #ifdef COMPILER2
>>>> -    if (MaxVectorSize > 16) {
>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>        }
>>>>    #endif // COMPILER2
>>>> +
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>> Array Copy
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>> +      }
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>> +      }
>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>>> +      }
>>>> +      if (UseSHA) {
>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        } else if (UseSHA512Intrinsics) {
>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>> functions not available on this CPU.");
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        }
>>>> +      }
>>>> +#ifdef COMPILER2
>>>> +      if (supports_sse4_2()) {
>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>> +        }
>>>> +      }
>>>> +#endif
>>>> +    }
>>>>      }
>>>>
>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> @@ -505,6 +505,14 @@
>>>>          result |= CPU_CLMUL;
>>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>          result |= CPU_RTM;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> +       result |= CPU_ADX;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> +      result |= CPU_BMI2;
>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> +      result |= CPU_SHA;
>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> +      result |= CPU_FMA;
>>>>
>>>>        // AMD features.
>>>>        if (is_amd()) {
>>>> @@ -515,19 +523,13 @@
>>>>            result |= CPU_LZCNT;
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>            result |= CPU_SSE4A;
>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>> +        result |= CPU_HT;
>>>>        }
>>>>        // Intel features.
>>>>        if(is_intel()) {
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> -         result |= CPU_ADX;
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> -        result |= CPU_BMI2;
>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> -        result |= CPU_SHA;
>>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>            result |= CPU_LZCNT;
>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> -        result |= CPU_FMA;
>>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>> support for prefetchw
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>            result |= CPU_3DNOW_PREFETCH;
>>>>
>>>>
>>>> Regards,
>>>> Rohit
>>>>
>>>>
>>>>
>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj
>>>>>> <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Rohit,
>>>>>>>>
>>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>>> logic
>>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>>> resubmit for review.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Rohit
>>>>>>>
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>>> default) and didnt find any regressions.
>>>>>>
>>>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>>
>>>>>> ************************* Patch ****************************
>>>>>>
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>           }
>>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>         }
>>>>>> +    if (supports_sha()) {
>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>> +      }
>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>> UseSHA512Intrinsics) {
>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>> +      }
>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +    }
>>>>>>
>>>>>>         // some defaults for AMD family 15h
>>>>>>         if ( cpu_family() == 0x15 ) {
>>>>>> @@ -1109,11 +1125,43 @@
>>>>>>         }
>>>>>>
>>>>>>     #ifdef COMPILER2
>>>>>> -    if (MaxVectorSize > 16) {
>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>         }
>>>>>>     #endif // COMPILER2
>>>>>> +
>>>>>> +    // Some defaults for AMD family 17h
>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>>> Array Copy
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>> +        UseXMMForArrayCopy = true;
>>>>>> +      }
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>> {
>>>>>> +        UseUnalignedLoadStores = true;
>>>>>> +      }
>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>> +        UseBMI2Instructions = true;
>>>>>> +      }
>>>>>> +      if (MaxVectorSize > 32) {
>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>> +      }
>>>>>> +      if (UseSHA) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>> functions not available on this CPU.");
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#ifdef COMPILER2
>>>>>> +      if (supports_sse4_2()) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#endif
>>>>>> +    }
>>>>>>       }
>>>>>>
>>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> @@ -505,6 +505,14 @@
>>>>>>           result |= CPU_CLMUL;
>>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>           result |= CPU_RTM;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> +       result |= CPU_ADX;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> +      result |= CPU_BMI2;
>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> +      result |= CPU_SHA;
>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> +      result |= CPU_FMA;
>>>>>>
>>>>>>         // AMD features.
>>>>>>         if (is_amd()) {
>>>>>> @@ -515,19 +523,13 @@
>>>>>>             result |= CPU_LZCNT;
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>             result |= CPU_SSE4A;
>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>> +        result |= CPU_HT;
>>>>>>         }
>>>>>>         // Intel features.
>>>>>>         if(is_intel()) {
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> -         result |= CPU_ADX;
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> -        result |= CPU_BMI2;
>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> -        result |= CPU_SHA;
>>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>             result |= CPU_LZCNT;
>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> -        result |= CPU_FMA;
>>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>> support for prefetchw
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>>
>>>>>> **************************************************************
>>>>>>
>>>>>> Thanks,
>>>>>> Rohit
>>>>>>
>>>>>>>>
>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>>> <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Rohit,
>>>>>>>>>>
>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>>> sets
>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>>> with
>>>>>>>>>>> the commit process.
>>>>>>>>>>>
>>>>>>>>>>> Webrev:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>>> OpenJDK
>>>>>>>>>> infrastructure and ...
>>>>>>>>>>
>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail
>>>>>>>>>> servers.
>>>>>>>>>> If
>>>>>>>>>> the
>>>>>>>>>> patch is small please include it inline. Otherwise you will need
>>>>>>>>>> to
>>>>>>>>>> find
>>>>>>>>>> an
>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>>> didnt find any regressions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>>> testing
>>>>>>>>>> requirements.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks David,
>>>>>>>>> Yes, it's a small patch.
>>>>>>>>>
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>>            }
>>>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>          }
>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>> +      }
>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics
>>>>>>>>> ||
>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>>> +      }
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>>          // some defaults for AMD family 15h
>>>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>>          }
>>>>>>>>>
>>>>>>>>>      #ifdef COMPILER2
>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>          }
>>>>>>>>>      #endif // COMPILER2
>>>>>>>>> +
>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>>> for
>>>>>>>>> Array Copy
>>>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy))
>>>>>>>>> {
>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>> {
>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions))
>>>>>>>>> {
>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>> +      }
>>>>>>>>> +      if (UseSHA) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>>> functions not available on this CPU.");
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#endif
>>>>>>>>> +    }
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>              result |= CPU_SSE4A;
>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>> +        result |= CPU_ADX;
>>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>> +        result |= CPU_SHA;
>>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>>          }
>>>>>>>>>          // Intel features.
>>>>>>>>>          if(is_intel()) {
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Rohit
>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

David Holmes
In reply to this post by Rohit Arul Raj
Hi Rohit,

I was unable to apply your patch to latest jdk10/hs/hotspot repo.

Vladimir: are you able to host a webrev for this change please?

Thanks,
David
----

On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:

> Hello Vladimir,
>
> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
> <[hidden email]> wrote:
>> Hi Rohit,
>>
>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>
>>> Hello Vladimir,
>>>
>>>> Changes look good. Only question I have is about MaxVectorSize. It is set
>>>>>
>>>> 16 only in presence of AVX:
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>
>>>> Does that code works for AMD 17h too?
>>>
>>>
>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>> have updated, re-tested and attached the patch.
>>
>>
>> Which check you removed?
>>
>
> My older patch had the below mentioned check which was required on
> JDK9 where the default MaxVectorSize was 64. It has been handled
> better in openJDK10. So this check is not required anymore.
>
> +    // Some defaults for AMD family 17h
> +    if ( cpu_family() == 0x17 ) {
> ...
> ...
> +      if (MaxVectorSize > 32) {
> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
> +      }
> ..
> ..
> +      }
>
>>>
>>> I have one query regarding the setting of UseSHA flag:
>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>
>>> AMD 17h has support for SHA.
>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>> underlying reason for this? I have handled this in the patch but just
>>> wanted to confirm.
>>
>>
>> It was done with next changes which use only AVX2 and BMI2 instructions to
>> calculate SHA-256:
>>
>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>
>> I don't know if AMD 15h supports these instructions and can execute that
>> code. You need to test it.
>>
>
> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
> it should work.
> Confirmed by running following sanity tests:
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>
> So I have removed those SHA checks from my patch too.
>
> Please find attached updated, re-tested patch.
>
> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
> b/src/cpu/x86/vm/vm_version_x86.cpp
> --- a/src/cpu/x86/vm/vm_version_x86.cpp
> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
> @@ -1109,11 +1109,27 @@
>       }
>
>   #ifdef COMPILER2
> -    if (MaxVectorSize > 16) {
> -      // Limit vectors size to 16 bytes on current AMD cpus.
> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>       }
>   #endif // COMPILER2
> +
> +    // Some defaults for AMD family 17h
> +    if ( cpu_family() == 0x17 ) {
> +      // On family 17h processors use XMM and UnalignedLoadStores for
> Array Copy
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
> +      }
> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
> +      }
> +#ifdef COMPILER2
> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
> +      }
> +#endif
> +    }
>     }
>
>     if( is_intel() ) { // Intel cpus specific settings
> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
> b/src/cpu/x86/vm/vm_version_x86.hpp
> --- a/src/cpu/x86/vm/vm_version_x86.hpp
> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
> @@ -505,6 +505,14 @@
>         result |= CPU_CLMUL;
>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>         result |= CPU_RTM;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> +       result |= CPU_ADX;
> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> +      result |= CPU_BMI2;
> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> +      result |= CPU_SHA;
> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> +      result |= CPU_FMA;
>
>       // AMD features.
>       if (is_amd()) {
> @@ -515,19 +523,13 @@
>           result |= CPU_LZCNT;
>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>           result |= CPU_SSE4A;
> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
> +        result |= CPU_HT;
>       }
>       // Intel features.
>       if(is_intel()) {
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
> -         result |= CPU_ADX;
> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
> -        result |= CPU_BMI2;
> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
> -        result |= CPU_SHA;
>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>           result |= CPU_LZCNT;
> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
> -        result |= CPU_FMA;
>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
> support for prefetchw
>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>           result |= CPU_3DNOW_PREFETCH;
>
> Please let me know your comments.
>
> Thanks for your time.
> Rohit
>
>>>
>>> Thanks for taking time to review the code.
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1088,6 +1088,22 @@
>>>          }
>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>        }
>>> +    if (supports_sha()) {
>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>> +      }
>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>> UseSHA512Intrinsics) {
>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +        warning("SHA instructions are not available on this CPU");
>>> +      }
>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +    }
>>>
>>>        // some defaults for AMD family 15h
>>>        if ( cpu_family() == 0x15 ) {
>>> @@ -1109,11 +1125,40 @@
>>>        }
>>>
>>>    #ifdef COMPILER2
>>> -    if (MaxVectorSize > 16) {
>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>        }
>>>    #endif // COMPILER2
>>> +
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>> +      }
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>> +      }
>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>> +      }
>>> +      if (UseSHA) {
>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        } else if (UseSHA512Intrinsics) {
>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>> functions not available on this CPU.");
>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>> +        }
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (supports_sse4_2()) {
>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> +        }
>>> +      }
>>> +#endif
>>> +    }
>>>      }
>>>
>>>      if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -505,6 +505,14 @@
>>>          result |= CPU_CLMUL;
>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>          result |= CPU_RTM;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> +       result |= CPU_ADX;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> +      result |= CPU_BMI2;
>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> +      result |= CPU_SHA;
>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> +      result |= CPU_FMA;
>>>
>>>        // AMD features.
>>>        if (is_amd()) {
>>> @@ -515,19 +523,13 @@
>>>            result |= CPU_LZCNT;
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>            result |= CPU_SSE4A;
>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> +        result |= CPU_HT;
>>>        }
>>>        // Intel features.
>>>        if(is_intel()) {
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> -         result |= CPU_ADX;
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> -        result |= CPU_BMI2;
>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> -        result |= CPU_SHA;
>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>            result |= CPU_LZCNT;
>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> -        result |= CPU_FMA;
>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>> support for prefetchw
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>            result |= CPU_3DNOW_PREFETCH;
>>>
>>>
>>> Regards,
>>> Rohit
>>>
>>>
>>>
>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>> Hi Rohit,
>>>>>>>
>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>> logic
>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> David
>>>>>>>
>>>>>>
>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>> resubmit for review.
>>>>>>
>>>>>> Regards,
>>>>>> Rohit
>>>>>>
>>>>>
>>>>> Hi All,
>>>>>
>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>> default) and didnt find any regressions.
>>>>>
>>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>
>>>>> ************************* Patch ****************************
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1088,6 +1088,22 @@
>>>>>           }
>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>         }
>>>>> +    if (supports_sha()) {
>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>> +      }
>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>> UseSHA512Intrinsics) {
>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>> +      }
>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +    }
>>>>>
>>>>>         // some defaults for AMD family 15h
>>>>>         if ( cpu_family() == 0x15 ) {
>>>>> @@ -1109,11 +1125,43 @@
>>>>>         }
>>>>>
>>>>>     #ifdef COMPILER2
>>>>> -    if (MaxVectorSize > 16) {
>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>         }
>>>>>     #endif // COMPILER2
>>>>> +
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> +        UseXMMForArrayCopy = true;
>>>>> +      }
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>> +        UseUnalignedLoadStores = true;
>>>>> +      }
>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>> +        UseBMI2Instructions = true;
>>>>> +      }
>>>>> +      if (MaxVectorSize > 32) {
>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>> +      }
>>>>> +      if (UseSHA) {
>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>> functions not available on this CPU.");
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        }
>>>>> +      }
>>>>> +#ifdef COMPILER2
>>>>> +      if (supports_sse4_2()) {
>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> +        }
>>>>> +      }
>>>>> +#endif
>>>>> +    }
>>>>>       }
>>>>>
>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -505,6 +505,14 @@
>>>>>           result |= CPU_CLMUL;
>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>           result |= CPU_RTM;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> +       result |= CPU_ADX;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> +      result |= CPU_BMI2;
>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> +      result |= CPU_SHA;
>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> +      result |= CPU_FMA;
>>>>>
>>>>>         // AMD features.
>>>>>         if (is_amd()) {
>>>>> @@ -515,19 +523,13 @@
>>>>>             result |= CPU_LZCNT;
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>             result |= CPU_SSE4A;
>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> +        result |= CPU_HT;
>>>>>         }
>>>>>         // Intel features.
>>>>>         if(is_intel()) {
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> -         result |= CPU_ADX;
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> -        result |= CPU_BMI2;
>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> -        result |= CPU_SHA;
>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>             result |= CPU_LZCNT;
>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> -        result |= CPU_FMA;
>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>> support for prefetchw
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>
>>>>> **************************************************************
>>>>>
>>>>> Thanks,
>>>>> Rohit
>>>>>
>>>>>>>
>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Rohit,
>>>>>>>>>
>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>> sets
>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>> with
>>>>>>>>>> the commit process.
>>>>>>>>>>
>>>>>>>>>> Webrev:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>> OpenJDK
>>>>>>>>> infrastructure and ...
>>>>>>>>>
>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> ... unfortunately patches tend to get stripped by the mail servers.
>>>>>>>>> If
>>>>>>>>> the
>>>>>>>>> patch is small please include it inline. Otherwise you will need to
>>>>>>>>> find
>>>>>>>>> an
>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>
>>>>>>>>
>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>> didnt find any regressions.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>> testing
>>>>>>>>> requirements.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks David,
>>>>>>>> Yes, it's a small patch.
>>>>>>>>
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>            }
>>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>          }
>>>>>>>> +    if (supports_sha()) {
>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>> +      }
>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>> +      }
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +    }
>>>>>>>>
>>>>>>>>          // some defaults for AMD family 15h
>>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>          }
>>>>>>>>
>>>>>>>>      #ifdef COMPILER2
>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>          }
>>>>>>>>      #endif // COMPILER2
>>>>>>>> +
>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>> for
>>>>>>>> Array Copy
>>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>> +      }
>>>>>>>> +      if (supports_sse2() &&
>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>> {
>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>> +      }
>>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>> +      }
>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>> +      }
>>>>>>>> +      if (UseSHA) {
>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>> functions not available on this CPU.");
>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +        }
>>>>>>>> +      }
>>>>>>>> +#ifdef COMPILER2
>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>> +        }
>>>>>>>> +      }
>>>>>>>> +#endif
>>>>>>>> +    }
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>              result |= CPU_SSE4A;
>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>> +        result |= CPU_HT;
>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>> +        result |= CPU_ADX;
>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>> +        result |= CPU_SHA;
>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>          }
>>>>>>>>          // Intel features.
>>>>>>>>          if(is_intel()) {
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Rohit
>>>>>>>>
>>>>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

Rohit Arul Raj
Hello David,

On Tue, Sep 5, 2017 at 10:31 AM, David Holmes <[hidden email]> wrote:
> Hi Rohit,
>
> I was unable to apply your patch to latest jdk10/hs/hotspot repo.
>

I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826]
and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch]
without any issues.
Can you share the error message that you are getting?

Regards,
Rohit


>
>
> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:
>>
>> Hello Vladimir,
>>
>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
>> <[hidden email]> wrote:
>>>
>>> Hi Rohit,
>>>
>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>>
>>>>
>>>> Hello Vladimir,
>>>>
>>>>> Changes look good. Only question I have is about MaxVectorSize. It is
>>>>> set
>>>>>>
>>>>>>
>>>>> 16 only in presence of AVX:
>>>>>
>>>>>
>>>>>
>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>>
>>>>> Does that code works for AMD 17h too?
>>>>
>>>>
>>>>
>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>>> have updated, re-tested and attached the patch.
>>>
>>>
>>>
>>> Which check you removed?
>>>
>>
>> My older patch had the below mentioned check which was required on
>> JDK9 where the default MaxVectorSize was 64. It has been handled
>> better in openJDK10. So this check is not required anymore.
>>
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> ...
>> ...
>> +      if (MaxVectorSize > 32) {
>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>> +      }
>> ..
>> ..
>> +      }
>>
>>>>
>>>> I have one query regarding the setting of UseSHA flag:
>>>>
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>>
>>>> AMD 17h has support for SHA.
>>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>>> underlying reason for this? I have handled this in the patch but just
>>>> wanted to confirm.
>>>
>>>
>>>
>>> It was done with next changes which use only AVX2 and BMI2 instructions
>>> to
>>> calculate SHA-256:
>>>
>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>>
>>> I don't know if AMD 15h supports these instructions and can execute that
>>> code. You need to test it.
>>>
>>
>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
>> it should work.
>> Confirmed by running following sanity tests:
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>>
>> So I have removed those SHA checks from my patch too.
>>
>> Please find attached updated, re-tested patch.
>>
>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>> b/src/cpu/x86/vm/vm_version_x86.cpp
>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>> @@ -1109,11 +1109,27 @@
>>       }
>>
>>   #ifdef COMPILER2
>> -    if (MaxVectorSize > 16) {
>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>         FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>       }
>>   #endif // COMPILER2
>> +
>> +    // Some defaults for AMD family 17h
>> +    if ( cpu_family() == 0x17 ) {
>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>> Array Copy
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>> +      }
>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>> +      }
>> +#ifdef COMPILER2
>> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>> +      }
>> +#endif
>> +    }
>>     }
>>
>>     if( is_intel() ) { // Intel cpus specific settings
>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>> b/src/cpu/x86/vm/vm_version_x86.hpp
>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>> @@ -505,6 +505,14 @@
>>         result |= CPU_CLMUL;
>>       if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>         result |= CPU_RTM;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> +       result |= CPU_ADX;
>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> +      result |= CPU_BMI2;
>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> +      result |= CPU_SHA;
>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> +      result |= CPU_FMA;
>>
>>       // AMD features.
>>       if (is_amd()) {
>> @@ -515,19 +523,13 @@
>>           result |= CPU_LZCNT;
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>           result |= CPU_SSE4A;
>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>> +        result |= CPU_HT;
>>       }
>>       // Intel features.
>>       if(is_intel()) {
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>> -         result |= CPU_ADX;
>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>> -        result |= CPU_BMI2;
>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>> -        result |= CPU_SHA;
>>         if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>           result |= CPU_LZCNT;
>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>> -        result |= CPU_FMA;
>>         // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>> support for prefetchw
>>         if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>           result |= CPU_3DNOW_PREFETCH;
>>
>> Please let me know your comments.
>>
>> Thanks for your time.
>> Rohit
>>
>>>>
>>>> Thanks for taking time to review the code.
>>>>
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> @@ -1088,6 +1088,22 @@
>>>>          }
>>>>          FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>        }
>>>> +    if (supports_sha()) {
>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>> +      }
>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>> UseSHA512Intrinsics) {
>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +        warning("SHA instructions are not available on this CPU");
>>>> +      }
>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +    }
>>>>
>>>>        // some defaults for AMD family 15h
>>>>        if ( cpu_family() == 0x15 ) {
>>>> @@ -1109,11 +1125,40 @@
>>>>        }
>>>>
>>>>    #ifdef COMPILER2
>>>> -    if (MaxVectorSize > 16) {
>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>        }
>>>>    #endif // COMPILER2
>>>> +
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>> Array Copy
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>> +      }
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>> +      }
>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>>> +      }
>>>> +      if (UseSHA) {
>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        } else if (UseSHA512Intrinsics) {
>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>> functions not available on this CPU.");
>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>> +        }
>>>> +      }
>>>> +#ifdef COMPILER2
>>>> +      if (supports_sse4_2()) {
>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>> +        }
>>>> +      }
>>>> +#endif
>>>> +    }
>>>>      }
>>>>
>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> @@ -505,6 +505,14 @@
>>>>          result |= CPU_CLMUL;
>>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>          result |= CPU_RTM;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> +       result |= CPU_ADX;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> +      result |= CPU_BMI2;
>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> +      result |= CPU_SHA;
>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> +      result |= CPU_FMA;
>>>>
>>>>        // AMD features.
>>>>        if (is_amd()) {
>>>> @@ -515,19 +523,13 @@
>>>>            result |= CPU_LZCNT;
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>            result |= CPU_SSE4A;
>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>> +        result |= CPU_HT;
>>>>        }
>>>>        // Intel features.
>>>>        if(is_intel()) {
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> -         result |= CPU_ADX;
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> -        result |= CPU_BMI2;
>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> -        result |= CPU_SHA;
>>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>            result |= CPU_LZCNT;
>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> -        result |= CPU_FMA;
>>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>> support for prefetchw
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>            result |= CPU_3DNOW_PREFETCH;
>>>>
>>>>
>>>> Regards,
>>>> Rohit
>>>>
>>>>
>>>>
>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj
>>>>>> <[hidden email]>
>>>>>> wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Hi Rohit,
>>>>>>>>
>>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>>> logic
>>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> David
>>>>>>>>
>>>>>>>
>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>>> resubmit for review.
>>>>>>>
>>>>>>> Regards,
>>>>>>> Rohit
>>>>>>>
>>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>>> default) and didnt find any regressions.
>>>>>>
>>>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>>
>>>>>> ************************* Patch ****************************
>>>>>>
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>           }
>>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>         }
>>>>>> +    if (supports_sha()) {
>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>> +      }
>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>> UseSHA512Intrinsics) {
>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>> +      }
>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +    }
>>>>>>
>>>>>>         // some defaults for AMD family 15h
>>>>>>         if ( cpu_family() == 0x15 ) {
>>>>>> @@ -1109,11 +1125,43 @@
>>>>>>         }
>>>>>>
>>>>>>     #ifdef COMPILER2
>>>>>> -    if (MaxVectorSize > 16) {
>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>         }
>>>>>>     #endif // COMPILER2
>>>>>> +
>>>>>> +    // Some defaults for AMD family 17h
>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>>> Array Copy
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>> +        UseXMMForArrayCopy = true;
>>>>>> +      }
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>> {
>>>>>> +        UseUnalignedLoadStores = true;
>>>>>> +      }
>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>> +        UseBMI2Instructions = true;
>>>>>> +      }
>>>>>> +      if (MaxVectorSize > 32) {
>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>> +      }
>>>>>> +      if (UseSHA) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>> functions not available on this CPU.");
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#ifdef COMPILER2
>>>>>> +      if (supports_sse4_2()) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#endif
>>>>>> +    }
>>>>>>       }
>>>>>>
>>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> @@ -505,6 +505,14 @@
>>>>>>           result |= CPU_CLMUL;
>>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>           result |= CPU_RTM;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> +       result |= CPU_ADX;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> +      result |= CPU_BMI2;
>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> +      result |= CPU_SHA;
>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> +      result |= CPU_FMA;
>>>>>>
>>>>>>         // AMD features.
>>>>>>         if (is_amd()) {
>>>>>> @@ -515,19 +523,13 @@
>>>>>>             result |= CPU_LZCNT;
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>             result |= CPU_SSE4A;
>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>> +        result |= CPU_HT;
>>>>>>         }
>>>>>>         // Intel features.
>>>>>>         if(is_intel()) {
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> -         result |= CPU_ADX;
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> -        result |= CPU_BMI2;
>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> -        result |= CPU_SHA;
>>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>             result |= CPU_LZCNT;
>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> -        result |= CPU_FMA;
>>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>> support for prefetchw
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>>
>>>>>> **************************************************************
>>>>>>
>>>>>> Thanks,
>>>>>> Rohit
>>>>>>
>>>>>>>>
>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>>> <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Rohit,
>>>>>>>>>>
>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>>> sets
>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>>> with
>>>>>>>>>>> the commit process.
>>>>>>>>>>>
>>>>>>>>>>> Webrev:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>>> OpenJDK
>>>>>>>>>> infrastructure and ...
>>>>>>>>>>
>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail
>>>>>>>>>> servers.
>>>>>>>>>> If
>>>>>>>>>> the
>>>>>>>>>> patch is small please include it inline. Otherwise you will need
>>>>>>>>>> to
>>>>>>>>>> find
>>>>>>>>>> an
>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>>> didnt find any regressions.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>>> testing
>>>>>>>>>> requirements.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks David,
>>>>>>>>> Yes, it's a small patch.
>>>>>>>>>
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>>            }
>>>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>          }
>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>> +      }
>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics
>>>>>>>>> ||
>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>>> +      }
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +    }
>>>>>>>>>
>>>>>>>>>          // some defaults for AMD family 15h
>>>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>>          }
>>>>>>>>>
>>>>>>>>>      #ifdef COMPILER2
>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>          }
>>>>>>>>>      #endif // COMPILER2
>>>>>>>>> +
>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>>> for
>>>>>>>>> Array Copy
>>>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy))
>>>>>>>>> {
>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>> {
>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions))
>>>>>>>>> {
>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>> +      }
>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>> +      }
>>>>>>>>> +      if (UseSHA) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>>> functions not available on this CPU.");
>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>> +        }
>>>>>>>>> +      }
>>>>>>>>> +#endif
>>>>>>>>> +    }
>>>>>>>>>        }
>>>>>>>>>
>>>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>              result |= CPU_SSE4A;
>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>> +        result |= CPU_ADX;
>>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>> +        result |= CPU_SHA;
>>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>>          }
>>>>>>>>>          // Intel features.
>>>>>>>>>          if(is_intel()) {
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Rohit
>>>>>>>>>
>>>>>>>>
>>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

David Holmes
On 5/09/2017 3:29 PM, Rohit Arul Raj wrote:

> Hello David,
>
> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes <[hidden email]> wrote:
>> Hi Rohit,
>>
>> I was unable to apply your patch to latest jdk10/hs/hotspot repo.
>>
>
> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826]
> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch]
> without any issues.
> Can you share the error message that you are getting?

I was getting this:

applying hotspot.patch
patching file src/cpu/x86/vm/vm_version_x86.cpp
Hunk #1 FAILED at 1108
1 out of 1 hunks FAILED -- saving rejects to file
src/cpu/x86/vm/vm_version_x86.cpp.rej
patching file src/cpu/x86/vm/vm_version_x86.hpp
Hunk #2 FAILED at 522
1 out of 2 hunks FAILED -- saving rejects to file
src/cpu/x86/vm/vm_version_x86.hpp.rej
abort: patch failed to apply

but I started again and this time it applied fine, so not sure what was
going on there.

Cheers,
David

> Regards,
> Rohit
>
>
>>
>>
>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:
>>>
>>> Hello Vladimir,
>>>
>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
>>> <[hidden email]> wrote:
>>>>
>>>> Hi Rohit,
>>>>
>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>>>
>>>>>
>>>>> Hello Vladimir,
>>>>>
>>>>>> Changes look good. Only question I have is about MaxVectorSize. It is
>>>>>> set
>>>>>>>
>>>>>>>
>>>>>> 16 only in presence of AVX:
>>>>>>
>>>>>>
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945
>>>>>>
>>>>>> Does that code works for AMD 17h too?
>>>>>
>>>>>
>>>>>
>>>>> Thanks for pointing that out. Yes, the code works fine for AMD 17h. So
>>>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>>>> have updated, re-tested and attached the patch.
>>>>
>>>>
>>>>
>>>> Which check you removed?
>>>>
>>>
>>> My older patch had the below mentioned check which was required on
>>> JDK9 where the default MaxVectorSize was 64. It has been handled
>>> better in openJDK10. So this check is not required anymore.
>>>
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> ...
>>> ...
>>> +      if (MaxVectorSize > 32) {
>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>> +      }
>>> ..
>>> ..
>>> +      }
>>>
>>>>>
>>>>> I have one query regarding the setting of UseSHA flag:
>>>>>
>>>>>
>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821
>>>>>
>>>>> AMD 17h has support for SHA.
>>>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>>>> enabled for it based on the availability of BMI2 and AVX2. Is there an
>>>>> underlying reason for this? I have handled this in the patch but just
>>>>> wanted to confirm.
>>>>
>>>>
>>>>
>>>> It was done with next changes which use only AVX2 and BMI2 instructions
>>>> to
>>>> calculate SHA-256:
>>>>
>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>>>
>>>> I don't know if AMD 15h supports these instructions and can execute that
>>>> code. You need to test it.
>>>>
>>>
>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
>>> it should work.
>>> Confirmed by running following sanity tests:
>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>>>
>>> So I have removed those SHA checks from my patch too.
>>>
>>> Please find attached updated, re-tested patch.
>>>
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>> @@ -1109,11 +1109,27 @@
>>>        }
>>>
>>>    #ifdef COMPILER2
>>> -    if (MaxVectorSize > 16) {
>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>        }
>>>    #endif // COMPILER2
>>> +
>>> +    // Some defaults for AMD family 17h
>>> +    if ( cpu_family() == 0x17 ) {
>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>> Array Copy
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>> +      }
>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>> +      }
>>> +#ifdef COMPILER2
>>> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>> +      }
>>> +#endif
>>> +    }
>>>      }
>>>
>>>      if( is_intel() ) { // Intel cpus specific settings
>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>> @@ -505,6 +505,14 @@
>>>          result |= CPU_CLMUL;
>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>          result |= CPU_RTM;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> +       result |= CPU_ADX;
>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> +      result |= CPU_BMI2;
>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> +      result |= CPU_SHA;
>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> +      result |= CPU_FMA;
>>>
>>>        // AMD features.
>>>        if (is_amd()) {
>>> @@ -515,19 +523,13 @@
>>>            result |= CPU_LZCNT;
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>            result |= CPU_SSE4A;
>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>> +        result |= CPU_HT;
>>>        }
>>>        // Intel features.
>>>        if(is_intel()) {
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>> -         result |= CPU_ADX;
>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>> -        result |= CPU_BMI2;
>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>> -        result |= CPU_SHA;
>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>            result |= CPU_LZCNT;
>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>> -        result |= CPU_FMA;
>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>> support for prefetchw
>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>            result |= CPU_3DNOW_PREFETCH;
>>>
>>> Please let me know your comments.
>>>
>>> Thanks for your time.
>>> Rohit
>>>
>>>>>
>>>>> Thanks for taking time to review the code.
>>>>>
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>> @@ -1088,6 +1088,22 @@
>>>>>           }
>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>         }
>>>>> +    if (supports_sha()) {
>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>> +      }
>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>> UseSHA512Intrinsics) {
>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>> +      }
>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +    }
>>>>>
>>>>>         // some defaults for AMD family 15h
>>>>>         if ( cpu_family() == 0x15 ) {
>>>>> @@ -1109,11 +1125,40 @@
>>>>>         }
>>>>>
>>>>>     #ifdef COMPILER2
>>>>> -    if (MaxVectorSize > 16) {
>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>         }
>>>>>     #endif // COMPILER2
>>>>> +
>>>>> +    // Some defaults for AMD family 17h
>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>> Array Copy
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>>> +      }
>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>>> +      }
>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>>>> +      }
>>>>> +      if (UseSHA) {
>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>> functions not available on this CPU.");
>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>> +        }
>>>>> +      }
>>>>> +#ifdef COMPILER2
>>>>> +      if (supports_sse4_2()) {
>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>> +        }
>>>>> +      }
>>>>> +#endif
>>>>> +    }
>>>>>       }
>>>>>
>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>> @@ -505,6 +505,14 @@
>>>>>           result |= CPU_CLMUL;
>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>           result |= CPU_RTM;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> +       result |= CPU_ADX;
>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> +      result |= CPU_BMI2;
>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> +      result |= CPU_SHA;
>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> +      result |= CPU_FMA;
>>>>>
>>>>>         // AMD features.
>>>>>         if (is_amd()) {
>>>>> @@ -515,19 +523,13 @@
>>>>>             result |= CPU_LZCNT;
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>             result |= CPU_SSE4A;
>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>> +        result |= CPU_HT;
>>>>>         }
>>>>>         // Intel features.
>>>>>         if(is_intel()) {
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>> -         result |= CPU_ADX;
>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>> -        result |= CPU_BMI2;
>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>> -        result |= CPU_SHA;
>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>             result |= CPU_LZCNT;
>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>> -        result |= CPU_FMA;
>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>> support for prefetchw
>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>
>>>>>
>>>>> Regards,
>>>>> Rohit
>>>>>
>>>>>
>>>>>
>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj
>>>>>>> <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Hi Rohit,
>>>>>>>>>
>>>>>>>>> I think the patch needs updating for jdk10 as I already see a lot of
>>>>>>>>> logic
>>>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> David
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base, test and
>>>>>>>> resubmit for review.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Rohit
>>>>>>>>
>>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>>>> default) and didnt find any regressions.
>>>>>>>
>>>>>>> Can anyone please volunteer to review this patch  which sets flag/ISA
>>>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>>>
>>>>>>> ************************* Patch ****************************
>>>>>>>
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>>            }
>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>          }
>>>>>>> +    if (supports_sha()) {
>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>> +      }
>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>>> UseSHA512Intrinsics) {
>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>> +      }
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +    }
>>>>>>>
>>>>>>>          // some defaults for AMD family 15h
>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>> @@ -1109,11 +1125,43 @@
>>>>>>>          }
>>>>>>>
>>>>>>>      #ifdef COMPILER2
>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>          }
>>>>>>>      #endif // COMPILER2
>>>>>>> +
>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>>>>> Array Copy
>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>> +      }
>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>> {
>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>> +      }
>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>> +        UseBMI2Instructions = true;
>>>>>>> +      }
>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>> +      }
>>>>>>> +      if (UseSHA) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>> functions not available on this CPU.");
>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#ifdef COMPILER2
>>>>>>> +      if (supports_sse4_2()) {
>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>> +        }
>>>>>>> +      }
>>>>>>> +#endif
>>>>>>> +    }
>>>>>>>        }
>>>>>>>
>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>> @@ -505,6 +505,14 @@
>>>>>>>            result |= CPU_CLMUL;
>>>>>>>          if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>>            result |= CPU_RTM;
>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>> +       result |= CPU_ADX;
>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>> +      result |= CPU_BMI2;
>>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>> +      result |= CPU_SHA;
>>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>> +      result |= CPU_FMA;
>>>>>>>
>>>>>>>          // AMD features.
>>>>>>>          if (is_amd()) {
>>>>>>> @@ -515,19 +523,13 @@
>>>>>>>              result |= CPU_LZCNT;
>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>              result |= CPU_SSE4A;
>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>> +        result |= CPU_HT;
>>>>>>>          }
>>>>>>>          // Intel features.
>>>>>>>          if(is_intel()) {
>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>> -         result |= CPU_ADX;
>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>> -        result |= CPU_BMI2;
>>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>> -        result |= CPU_SHA;
>>>>>>>            if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>>              result |= CPU_LZCNT;
>>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>> -        result |= CPU_FMA;
>>>>>>>            // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>>> support for prefetchw
>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>>              result |= CPU_3DNOW_PREFETCH;
>>>>>>>
>>>>>>> **************************************************************
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Rohit
>>>>>>>
>>>>>>>>>
>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>>>> <[hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Hi Rohit,
>>>>>>>>>>>
>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9) which
>>>>>>>>>>>> sets
>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and help us
>>>>>>>>>>>> with
>>>>>>>>>>>> the commit process.
>>>>>>>>>>>>
>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Unfortunately patches can not be accepted from systems outside the
>>>>>>>>>>> OpenJDK
>>>>>>>>>>> infrastructure and ...
>>>>>>>>>>>
>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail
>>>>>>>>>>> servers.
>>>>>>>>>>> If
>>>>>>>>>>> the
>>>>>>>>>>> patch is small please include it inline. Otherwise you will need
>>>>>>>>>>> to
>>>>>>>>>>> find
>>>>>>>>>>> an
>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make default) and
>>>>>>>>>>>> didnt find any regressions.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>>>> testing
>>>>>>>>>>> requirements.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks David,
>>>>>>>>>> Yes, it's a small patch.
>>>>>>>>>>
>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>>>             }
>>>>>>>>>>             FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>>           }
>>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>>> +      }
>>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics
>>>>>>>>>> ||
>>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>>>> +      }
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>> +    }
>>>>>>>>>>
>>>>>>>>>>           // some defaults for AMD family 15h
>>>>>>>>>>           if ( cpu_family() == 0x15 ) {
>>>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>>>           }
>>>>>>>>>>
>>>>>>>>>>       #ifdef COMPILER2
>>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>>             FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>>           }
>>>>>>>>>>       #endif // COMPILER2
>>>>>>>>>> +
>>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>>>>>> for
>>>>>>>>>> Array Copy
>>>>>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy))
>>>>>>>>>> {
>>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>>> +      }
>>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>>> {
>>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>>> +      }
>>>>>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions))
>>>>>>>>>> {
>>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>>> +      }
>>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>>> +      }
>>>>>>>>>> +      if (UseSHA) {
>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>>>> functions not available on this CPU.");
>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>> +        }
>>>>>>>>>> +      }
>>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>>> +        }
>>>>>>>>>> +      }
>>>>>>>>>> +#endif
>>>>>>>>>> +    }
>>>>>>>>>>         }
>>>>>>>>>>
>>>>>>>>>>         if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>>>               result |= CPU_LZCNT;
>>>>>>>>>>             if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>>               result |= CPU_SSE4A;
>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>>> +        result |= CPU_ADX;
>>>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>>> +        result |= CPU_SHA;
>>>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>>>           }
>>>>>>>>>>           // Intel features.
>>>>>>>>>>           if(is_intel()) {
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Rohit
>>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR: Newer AMD 17h (EPYC) Processor family defaults

David Holmes
Hi Rohit,

I couldn't see a bug filed for this so I did it:

https://bugs.openjdk.java.net/browse/JDK-8187219

I also hosted the webrev as I wanted to see the change in context:

http://cr.openjdk.java.net/~dholmes/8187219/webrev/

I have a couple of comments/queries:

src/cpu/x86/vm/vm_version_x86.hpp

So this moved the adx/bmi2/sha/fam settings out from being Intel
specific to applying to AMD as well - ok. Have these features always
been available in AMD chips? Just wondering if they might not be valid
for some older processors.

You added:

  526       if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
  527         result |= CPU_HT;

and I'm wondering of there would be any case where this would not be
covered by the earlier:

  448     if (threads_per_core() > 1)
  449       result |= CPU_HT;

?
---

src/cpu/x86/vm/vm_version_x86.cpp

No comments on AMD specific changes.

Thanks,
David
-----

On 5/09/2017 3:43 PM, David Holmes wrote:

> On 5/09/2017 3:29 PM, Rohit Arul Raj wrote:
>> Hello David,
>>
>> On Tue, Sep 5, 2017 at 10:31 AM, David Holmes
>> <[hidden email]> wrote:
>>> Hi Rohit,
>>>
>>> I was unable to apply your patch to latest jdk10/hs/hotspot repo.
>>>
>>
>> I checked out the latest jdk10/hs/hotspot [parent: 13548:1a9c2e07a826]
>> and was able to apply the patch [epyc-amd17h-defaults-3Sept.patch]
>> without any issues.
>> Can you share the error message that you are getting?
>
> I was getting this:
>
> applying hotspot.patch
> patching file src/cpu/x86/vm/vm_version_x86.cpp
> Hunk #1 FAILED at 1108
> 1 out of 1 hunks FAILED -- saving rejects to file
> src/cpu/x86/vm/vm_version_x86.cpp.rej
> patching file src/cpu/x86/vm/vm_version_x86.hpp
> Hunk #2 FAILED at 522
> 1 out of 2 hunks FAILED -- saving rejects to file
> src/cpu/x86/vm/vm_version_x86.hpp.rej
> abort: patch failed to apply
>
> but I started again and this time it applied fine, so not sure what was
> going on there.
>
> Cheers,
> David
>
>> Regards,
>> Rohit
>>
>>
>>>
>>>
>>> On 4/09/2017 2:42 AM, Rohit Arul Raj wrote:
>>>>
>>>> Hello Vladimir,
>>>>
>>>> On Sat, Sep 2, 2017 at 11:25 PM, Vladimir Kozlov
>>>> <[hidden email]> wrote:
>>>>>
>>>>> Hi Rohit,
>>>>>
>>>>> On 9/2/17 1:16 AM, Rohit Arul Raj wrote:
>>>>>>
>>>>>>
>>>>>> Hello Vladimir,
>>>>>>
>>>>>>> Changes look good. Only question I have is about MaxVectorSize.
>>>>>>> It is
>>>>>>> set
>>>>>>>>
>>>>>>>>
>>>>>>> 16 only in presence of AVX:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l945 
>>>>>>>
>>>>>>>
>>>>>>> Does that code works for AMD 17h too?
>>>>>>
>>>>>>
>>>>>>
>>>>>> Thanks for pointing that out. Yes, the code works fine for AMD
>>>>>> 17h. So
>>>>>> I have removed the surplus check for MaxVectorSize from my patch. I
>>>>>> have updated, re-tested and attached the patch.
>>>>>
>>>>>
>>>>>
>>>>> Which check you removed?
>>>>>
>>>>
>>>> My older patch had the below mentioned check which was required on
>>>> JDK9 where the default MaxVectorSize was 64. It has been handled
>>>> better in openJDK10. So this check is not required anymore.
>>>>
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> ...
>>>> ...
>>>> +      if (MaxVectorSize > 32) {
>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>> +      }
>>>> ..
>>>> ..
>>>> +      }
>>>>
>>>>>>
>>>>>> I have one query regarding the setting of UseSHA flag:
>>>>>>
>>>>>>
>>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/file/046eab27258f/src/cpu/x86/vm/vm_version_x86.cpp#l821 
>>>>>>
>>>>>>
>>>>>> AMD 17h has support for SHA.
>>>>>> AMD 15h doesn't have  support for SHA. Still "UseSHA" flag gets
>>>>>> enabled for it based on the availability of BMI2 and AVX2. Is
>>>>>> there an
>>>>>> underlying reason for this? I have handled this in the patch but just
>>>>>> wanted to confirm.
>>>>>
>>>>>
>>>>>
>>>>> It was done with next changes which use only AVX2 and BMI2
>>>>> instructions
>>>>> to
>>>>> calculate SHA-256:
>>>>>
>>>>> http://hg.openjdk.java.net/jdk10/hs/hotspot/rev/6a17c49de974
>>>>>
>>>>> I don't know if AMD 15h supports these instructions and can execute
>>>>> that
>>>>> code. You need to test it.
>>>>>
>>>>
>>>> Ok, got it. Since AMD15h has support for AVX2 and BMI2 instructions,
>>>> it should work.
>>>> Confirmed by running following sanity tests:
>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA1Intrinsics.java
>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA512Intrinsics.java
>>>> ./hotspot/test/compiler/intrinsics/sha/sanity/TestSHA256Intrinsics.java
>>>>
>>>> So I have removed those SHA checks from my patch too.
>>>>
>>>> Please find attached updated, re-tested patch.
>>>>
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>> @@ -1109,11 +1109,27 @@
>>>>        }
>>>>
>>>>    #ifdef COMPILER2
>>>> -    if (MaxVectorSize > 16) {
>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>          FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>        }
>>>>    #endif // COMPILER2
>>>> +
>>>> +    // Some defaults for AMD family 17h
>>>> +    if ( cpu_family() == 0x17 ) {
>>>> +      // On family 17h processors use XMM and UnalignedLoadStores for
>>>> Array Copy
>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>> +      }
>>>> +      if (supports_sse2() &&
>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>> +      }
>>>> +#ifdef COMPILER2
>>>> +      if (supports_sse4_2() && FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>> +        FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>> +      }
>>>> +#endif
>>>> +    }
>>>>      }
>>>>
>>>>      if( is_intel() ) { // Intel cpus specific settings
>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>> @@ -505,6 +505,14 @@
>>>>          result |= CPU_CLMUL;
>>>>        if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>          result |= CPU_RTM;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> +       result |= CPU_ADX;
>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> +      result |= CPU_BMI2;
>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> +      result |= CPU_SHA;
>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> +      result |= CPU_FMA;
>>>>
>>>>        // AMD features.
>>>>        if (is_amd()) {
>>>> @@ -515,19 +523,13 @@
>>>>            result |= CPU_LZCNT;
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>            result |= CPU_SSE4A;
>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>> +        result |= CPU_HT;
>>>>        }
>>>>        // Intel features.
>>>>        if(is_intel()) {
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>> -         result |= CPU_ADX;
>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>> -        result |= CPU_BMI2;
>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>> -        result |= CPU_SHA;
>>>>          if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>            result |= CPU_LZCNT;
>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>> -        result |= CPU_FMA;
>>>>          // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>> support for prefetchw
>>>>          if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>            result |= CPU_3DNOW_PREFETCH;
>>>>
>>>> Please let me know your comments.
>>>>
>>>> Thanks for your time.
>>>> Rohit
>>>>
>>>>>>
>>>>>> Thanks for taking time to review the code.
>>>>>>
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>           }
>>>>>>           FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>         }
>>>>>> +    if (supports_sha()) {
>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>> +      }
>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics || UseSHA256Intrinsics ||
>>>>>> UseSHA512Intrinsics) {
>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>> +      }
>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +    }
>>>>>>
>>>>>>         // some defaults for AMD family 15h
>>>>>>         if ( cpu_family() == 0x15 ) {
>>>>>> @@ -1109,11 +1125,40 @@
>>>>>>         }
>>>>>>
>>>>>>     #ifdef COMPILER2
>>>>>> -    if (MaxVectorSize > 16) {
>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>           FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>         }
>>>>>>     #endif // COMPILER2
>>>>>> +
>>>>>> +    // Some defaults for AMD family 17h
>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>> +      // On family 17h processors use XMM and UnalignedLoadStores
>>>>>> for
>>>>>> Array Copy
>>>>>> +      if (supports_sse2() && FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>> +        FLAG_SET_DEFAULT(UseXMMForArrayCopy, true);
>>>>>> +      }
>>>>>> +      if (supports_sse2() &&
>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores)) {
>>>>>> +        FLAG_SET_DEFAULT(UseUnalignedLoadStores, true);
>>>>>> +      }
>>>>>> +      if (supports_bmi2() && FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>> +        FLAG_SET_DEFAULT(UseBMI2Instructions, true);
>>>>>> +      }
>>>>>> +      if (UseSHA) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>> functions not available on this CPU.");
>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#ifdef COMPILER2
>>>>>> +      if (supports_sse4_2()) {
>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>> +        }
>>>>>> +      }
>>>>>> +#endif
>>>>>> +    }
>>>>>>       }
>>>>>>
>>>>>>       if( is_intel() ) { // Intel cpus specific settings
>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>> @@ -505,6 +505,14 @@
>>>>>>           result |= CPU_CLMUL;
>>>>>>         if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>           result |= CPU_RTM;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> +       result |= CPU_ADX;
>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> +      result |= CPU_BMI2;
>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> +      result |= CPU_SHA;
>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> +      result |= CPU_FMA;
>>>>>>
>>>>>>         // AMD features.
>>>>>>         if (is_amd()) {
>>>>>> @@ -515,19 +523,13 @@
>>>>>>             result |= CPU_LZCNT;
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>             result |= CPU_SSE4A;
>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>> +        result |= CPU_HT;
>>>>>>         }
>>>>>>         // Intel features.
>>>>>>         if(is_intel()) {
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>> -         result |= CPU_ADX;
>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>> -        result |= CPU_BMI2;
>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>> -        result |= CPU_SHA;
>>>>>>           if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>             result |= CPU_LZCNT;
>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>> -        result |= CPU_FMA;
>>>>>>           // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>> support for prefetchw
>>>>>>           if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>             result |= CPU_3DNOW_PREFETCH;
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Rohit
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 9/1/17 8:04 AM, Rohit Arul Raj wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Sep 1, 2017 at 10:27 AM, Rohit Arul Raj
>>>>>>>> <[hidden email]>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Fri, Sep 1, 2017 at 3:01 AM, David Holmes
>>>>>>>>> <[hidden email]>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi Rohit,
>>>>>>>>>>
>>>>>>>>>> I think the patch needs updating for jdk10 as I already see a
>>>>>>>>>> lot of
>>>>>>>>>> logic
>>>>>>>>>> around UseSHA in vm_version_x86.cpp.
>>>>>>>>>>
>>>>>>>>>> Thanks,
>>>>>>>>>> David
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks David, I will update the patch wrt JDK10 source base,
>>>>>>>>> test and
>>>>>>>>> resubmit for review.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Rohit
>>>>>>>>>
>>>>>>>>
>>>>>>>> Hi All,
>>>>>>>>
>>>>>>>> I have updated the patch wrt openjdk10/hotspot (parent:
>>>>>>>> 13519:71337910df60), did regression testing using jtreg ($make
>>>>>>>> default) and didnt find any regressions.
>>>>>>>>
>>>>>>>> Can anyone please volunteer to review this patch  which sets
>>>>>>>> flag/ISA
>>>>>>>> defaults for newer AMD 17h (EPYC) processor?
>>>>>>>>
>>>>>>>> ************************* Patch ****************************
>>>>>>>>
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>> @@ -1088,6 +1088,22 @@
>>>>>>>>            }
>>>>>>>>            FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>          }
>>>>>>>> +    if (supports_sha()) {
>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>> +      }
>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics ||
>>>>>>>> UseSHA256Intrinsics ||
>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> +        warning("SHA instructions are not available on this CPU");
>>>>>>>> +      }
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +    }
>>>>>>>>
>>>>>>>>          // some defaults for AMD family 15h
>>>>>>>>          if ( cpu_family() == 0x15 ) {
>>>>>>>> @@ -1109,11 +1125,43 @@
>>>>>>>>          }
>>>>>>>>
>>>>>>>>      #ifdef COMPILER2
>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>            FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>          }
>>>>>>>>      #endif // COMPILER2
>>>>>>>> +
>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>> +      // On family 17h processors use XMM and
>>>>>>>> UnalignedLoadStores for
>>>>>>>> Array Copy
>>>>>>>> +      if (supports_sse2() &&
>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy)) {
>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>> +      }
>>>>>>>> +      if (supports_sse2() &&
>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>> {
>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>> +      }
>>>>>>>> +      if (supports_bmi2() &&
>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions)) {
>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>> +      }
>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>> +      }
>>>>>>>> +      if (UseSHA) {
>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto hash
>>>>>>>> functions not available on this CPU.");
>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>> +        }
>>>>>>>> +      }
>>>>>>>> +#ifdef COMPILER2
>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>> +        }
>>>>>>>> +      }
>>>>>>>> +#endif
>>>>>>>> +    }
>>>>>>>>        }
>>>>>>>>
>>>>>>>>        if( is_intel() ) { // Intel cpus specific settings
>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>> @@ -505,6 +505,14 @@
>>>>>>>>            result |= CPU_CLMUL;
>>>>>>>>          if (_cpuid_info.sef_cpuid7_ebx.bits.rtm != 0)
>>>>>>>>            result |= CPU_RTM;
>>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>> +       result |= CPU_ADX;
>>>>>>>> +    if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>> +      result |= CPU_BMI2;
>>>>>>>> +    if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>> +      result |= CPU_SHA;
>>>>>>>> +    if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>> +      result |= CPU_FMA;
>>>>>>>>
>>>>>>>>          // AMD features.
>>>>>>>>          if (is_amd()) {
>>>>>>>> @@ -515,19 +523,13 @@
>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>              result |= CPU_SSE4A;
>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>> +        result |= CPU_HT;
>>>>>>>>          }
>>>>>>>>          // Intel features.
>>>>>>>>          if(is_intel()) {
>>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>> -         result |= CPU_ADX;
>>>>>>>> -      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>> -        result |= CPU_BMI2;
>>>>>>>> -      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>> -        result |= CPU_SHA;
>>>>>>>>            if(_cpuid_info.ext_cpuid1_ecx.bits.lzcnt_intel != 0)
>>>>>>>>              result |= CPU_LZCNT;
>>>>>>>> -      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>> -        result |= CPU_FMA;
>>>>>>>>            // for Intel, ecx.bits.misalignsse bit (bit 8) indicates
>>>>>>>> support for prefetchw
>>>>>>>>            if (_cpuid_info.ext_cpuid1_ecx.bits.misalignsse != 0) {
>>>>>>>>              result |= CPU_3DNOW_PREFETCH;
>>>>>>>>
>>>>>>>> **************************************************************
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Rohit
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 1/09/2017 1:11 AM, Rohit Arul Raj wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Aug 31, 2017 at 5:59 PM, David Holmes
>>>>>>>>>>> <[hidden email]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Hi Rohit,
>>>>>>>>>>>>
>>>>>>>>>>>> On 31/08/2017 7:03 PM, Rohit Arul Raj wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would like an volunteer to review this patch (openJDK9)
>>>>>>>>>>>>> which
>>>>>>>>>>>>> sets
>>>>>>>>>>>>> flag/ISA defaults for newer AMD 17h (EPYC) processor and
>>>>>>>>>>>>> help us
>>>>>>>>>>>>> with
>>>>>>>>>>>>> the commit process.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Webrev:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> https://www.dropbox.com/sh/08bsxaxupg8kbam/AADurTXLGIZ6C-tiIAi_Glyka?dl=0 
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Unfortunately patches can not be accepted from systems
>>>>>>>>>>>> outside the
>>>>>>>>>>>> OpenJDK
>>>>>>>>>>>> infrastructure and ...
>>>>>>>>>>>>
>>>>>>>>>>>>> I have also attached the patch (hg diff -g) for reference.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> ... unfortunately patches tend to get stripped by the mail
>>>>>>>>>>>> servers.
>>>>>>>>>>>> If
>>>>>>>>>>>> the
>>>>>>>>>>>> patch is small please include it inline. Otherwise you will
>>>>>>>>>>>> need
>>>>>>>>>>>> to
>>>>>>>>>>>> find
>>>>>>>>>>>> an
>>>>>>>>>>>> OpenJDK Author who can host it for you on cr.openjdk.java.net.
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>>> 3) I have done regression testing using jtreg ($make
>>>>>>>>>>>>> default) and
>>>>>>>>>>>>> didnt find any regressions.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Sounds good, but until I see the patch it is hard to comment on
>>>>>>>>>>>> testing
>>>>>>>>>>>> requirements.
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> David
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Thanks David,
>>>>>>>>>>> Yes, it's a small patch.
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.cpp
>>>>>>>>>>> @@ -1051,6 +1051,22 @@
>>>>>>>>>>>             }
>>>>>>>>>>>             FLAG_SET_DEFAULT(UseSSE42Intrinsics, false);
>>>>>>>>>>>           }
>>>>>>>>>>> +    if (supports_sha()) {
>>>>>>>>>>> +      if (FLAG_IS_DEFAULT(UseSHA)) {
>>>>>>>>>>> +        FLAG_SET_DEFAULT(UseSHA, true);
>>>>>>>>>>> +      }
>>>>>>>>>>> +    } else if (UseSHA || UseSHA1Intrinsics ||
>>>>>>>>>>> UseSHA256Intrinsics
>>>>>>>>>>> ||
>>>>>>>>>>> UseSHA512Intrinsics) {
>>>>>>>>>>> +      if (!FLAG_IS_DEFAULT(UseSHA) ||
>>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA1Intrinsics) ||
>>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA256Intrinsics) ||
>>>>>>>>>>> +          !FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>>> +        warning("SHA instructions are not available on this
>>>>>>>>>>> CPU");
>>>>>>>>>>> +      }
>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA, false);
>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA1Intrinsics, false);
>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA256Intrinsics, false);
>>>>>>>>>>> +      FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>>> +    }
>>>>>>>>>>>
>>>>>>>>>>>           // some defaults for AMD family 15h
>>>>>>>>>>>           if ( cpu_family() == 0x15 ) {
>>>>>>>>>>> @@ -1072,11 +1088,43 @@
>>>>>>>>>>>           }
>>>>>>>>>>>
>>>>>>>>>>>       #ifdef COMPILER2
>>>>>>>>>>> -    if (MaxVectorSize > 16) {
>>>>>>>>>>> -      // Limit vectors size to 16 bytes on current AMD cpus.
>>>>>>>>>>> +    if (cpu_family() < 0x17 && MaxVectorSize > 16) {
>>>>>>>>>>> +      // Limit vectors size to 16 bytes on AMD cpus < 17h.
>>>>>>>>>>>             FLAG_SET_DEFAULT(MaxVectorSize, 16);
>>>>>>>>>>>           }
>>>>>>>>>>>       #endif // COMPILER2
>>>>>>>>>>> +
>>>>>>>>>>> +    // Some defaults for AMD family 17h
>>>>>>>>>>> +    if ( cpu_family() == 0x17 ) {
>>>>>>>>>>> +      // On family 17h processors use XMM and
>>>>>>>>>>> UnalignedLoadStores
>>>>>>>>>>> for
>>>>>>>>>>> Array Copy
>>>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>>>> FLAG_IS_DEFAULT(UseXMMForArrayCopy))
>>>>>>>>>>> {
>>>>>>>>>>> +        UseXMMForArrayCopy = true;
>>>>>>>>>>> +      }
>>>>>>>>>>> +      if (supports_sse2() &&
>>>>>>>>>>> FLAG_IS_DEFAULT(UseUnalignedLoadStores))
>>>>>>>>>>> {
>>>>>>>>>>> +        UseUnalignedLoadStores = true;
>>>>>>>>>>> +      }
>>>>>>>>>>> +      if (supports_bmi2() &&
>>>>>>>>>>> FLAG_IS_DEFAULT(UseBMI2Instructions))
>>>>>>>>>>> {
>>>>>>>>>>> +        UseBMI2Instructions = true;
>>>>>>>>>>> +      }
>>>>>>>>>>> +      if (MaxVectorSize > 32) {
>>>>>>>>>>> +        FLAG_SET_DEFAULT(MaxVectorSize, 32);
>>>>>>>>>>> +      }
>>>>>>>>>>> +      if (UseSHA) {
>>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseSHA512Intrinsics)) {
>>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>>> +        } else if (UseSHA512Intrinsics) {
>>>>>>>>>>> +          warning("Intrinsics for SHA-384 and SHA-512 crypto
>>>>>>>>>>> hash
>>>>>>>>>>> functions not available on this CPU.");
>>>>>>>>>>> +          FLAG_SET_DEFAULT(UseSHA512Intrinsics, false);
>>>>>>>>>>> +        }
>>>>>>>>>>> +      }
>>>>>>>>>>> +#ifdef COMPILER2
>>>>>>>>>>> +      if (supports_sse4_2()) {
>>>>>>>>>>> +        if (FLAG_IS_DEFAULT(UseFPUForSpilling)) {
>>>>>>>>>>> +          FLAG_SET_DEFAULT(UseFPUForSpilling, true);
>>>>>>>>>>> +        }
>>>>>>>>>>> +      }
>>>>>>>>>>> +#endif
>>>>>>>>>>> +    }
>>>>>>>>>>>         }
>>>>>>>>>>>
>>>>>>>>>>>         if( is_intel() ) { // Intel cpus specific settings
>>>>>>>>>>> diff --git a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>> b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>> --- a/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>> +++ b/src/cpu/x86/vm/vm_version_x86.hpp
>>>>>>>>>>> @@ -513,6 +513,16 @@
>>>>>>>>>>>               result |= CPU_LZCNT;
>>>>>>>>>>>             if (_cpuid_info.ext_cpuid1_ecx.bits.sse4a != 0)
>>>>>>>>>>>               result |= CPU_SSE4A;
>>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.bmi2 != 0)
>>>>>>>>>>> +        result |= CPU_BMI2;
>>>>>>>>>>> +      if(_cpuid_info.std_cpuid1_edx.bits.ht != 0)
>>>>>>>>>>> +        result |= CPU_HT;
>>>>>>>>>>> +      if(_cpuid_info.sef_cpuid7_ebx.bits.adx != 0)
>>>>>>>>>>> +        result |= CPU_ADX;
>>>>>>>>>>> +      if (_cpuid_info.sef_cpuid7_ebx.bits.sha != 0)
>>>>>>>>>>> +        result |= CPU_SHA;
>>>>>>>>>>> +      if (_cpuid_info.std_cpuid1_ecx.bits.fma != 0)
>>>>>>>>>>> +        result |= CPU_FMA;
>>>>>>>>>>>           }
>>>>>>>>>>>           // Intel features.
>>>>>>>>>>>           if(is_intel()) {
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Rohit
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>
>>>>>
>>>
12