RFR(S): 8175096: Use Subword Analysis for set vector size

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR(S): 8175096: Use Subword Analysis for set vector size

Deshpande, Vivek R

Hi

 

Currently subword types cannot use entire vector width using SLP. 

This fix analyzes the subword in the loop for possibility of narrowing and sets the vector size accordingly.

 

Webrev:

http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/

I have also updated the JBS entry.

https://bugs.openjdk.java.net/browse/JDK-8175096

Would you please review and sponsor it.

 

Regards,

Vivek

 

Reply | Threaded
Open this post in threaded view
|

Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Vladimir Kozlov
Hi Vivek,

This should go into jdk 10 since it is enhancement. We will consider to
backport it into jdk 9 update release later.

First, please explain why "Currently subword types cannot use entire
vector width using SLP".

The only explanation I have is that if loop has mixing types operations
then we narrow unroll factor for biggest type:

         if (cur_max_vector < max_vector) {
           max_vector = cur_max_vector;

And we start with smallest type:

int max_vector = Matcher::max_vector_size(T_BYTE);

Should we do opposite and start from long T_LONG (small unroll factor)
and widen it to smallest type (big unroll factor)?:

         if (cur_max_vector > max_vector) {
           max_vector = cur_max_vector;

Note, max_vector_size() returns number of elements and not size of
vector in bytes.

Michael, you are author of this code. What do you think?

Thanks,
Vladimir

On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:

> Hi
>
>
>
> Currently subword types cannot use entire vector width using SLP.
>
> This fix analyzes the subword in the loop for possibility of narrowing
> and sets the vector size accordingly.
>
>
>
> Webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>
> I have also updated the JBS entry.
>
> https://bugs.openjdk.java.net/browse/JDK-8175096
>
> Would you please review and sponsor it.
>
>
>
> Regards,
>
> Vivek
>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Berg, Michael C
I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.

Regards,
Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Wednesday, February 22, 2017 12:09 PM
To: Deshpande, Vivek R <[hidden email]>; [hidden email]
Cc: Viswanathan, Sandhya <[hidden email]>; Berg, Michael C <[hidden email]>
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Hi Vivek,

This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.

First, please explain why "Currently subword types cannot use entire vector width using SLP".

The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:

         if (cur_max_vector < max_vector) {
           max_vector = cur_max_vector;

And we start with smallest type:

int max_vector = Matcher::max_vector_size(T_BYTE);

Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:

         if (cur_max_vector > max_vector) {
           max_vector = cur_max_vector;

Note, max_vector_size() returns number of elements and not size of vector in bytes.

Michael, you are author of this code. What do you think?

Thanks,
Vladimir

On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:

> Hi
>
>
>
> Currently subword types cannot use entire vector width using SLP.
>
> This fix analyzes the subword in the loop for possibility of narrowing
> and sets the vector size accordingly.
>
>
>
> Webrev:
>
> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>
> I have also updated the JBS entry.
>
> https://bugs.openjdk.java.net/browse/JDK-8175096
>
> Would you please review and sponsor it.
>
>
>
> Regards,
>
> Vivek
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Vladimir Kozlov
On 2/23/17 9:03 AM, Berg, Michael C wrote:
> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.

"optimal" has wide meaning :)

Currently it means small number of unrolls in presence of wide (long,
double) values.

And if it is optimal why we need this additional fix (8175096)?

I still did not get answer about why "Currently subword types cannot use
entire vector width using SLP". In what cases this happen?

Thanks,
Vladimir

>
> Regards,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Wednesday, February 22, 2017 12:09 PM
> To: Deshpande, Vivek R <[hidden email]>; [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>; Berg, Michael C <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Hi Vivek,
>
> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>
> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>
> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>
>          if (cur_max_vector < max_vector) {
>            max_vector = cur_max_vector;
>
> And we start with smallest type:
>
> int max_vector = Matcher::max_vector_size(T_BYTE);
>
> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>
>          if (cur_max_vector > max_vector) {
>            max_vector = cur_max_vector;
>
> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>
> Michael, you are author of this code. What do you think?
>
> Thanks,
> Vladimir
>
> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>> Hi
>>
>>
>>
>> Currently subword types cannot use entire vector width using SLP.
>>
>> This fix analyzes the subword in the loop for possibility of narrowing
>> and sets the vector size accordingly.
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>
>> I have also updated the JBS entry.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8175096
>>
>> Would you please review and sponsor it.
>>
>>
>>
>> Regards,
>>
>> Vivek
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Berg, Michael C
Vladimir,

The cases for this are for short and byte types stmt expressions, where the memory operations are sized to the key types, but the arithmetic components are not, but instead int as in this case:

if (!in->is_Mem() && in_bb(in) && in->bottom_type()->basic_type() == T_INT)

for the inputs on short and byte typed operations, as the node n is already bound to is_subword_type(bt).

Here is a snippet of code that would show the issue:

    static final int NUM = 1024;
    static byte[] data =  new byte[NUM],
                  data2 = new byte[NUM],
                  data3 = new byte[NUM];
    static short[] data4 =  new short[NUM],
                  data5 = new short[NUM],
                  data6 = new short[NUM];
    static int[] data7 =  new int[NUM],
                  data8 = new int[NUM],
                  data9 = new int[NUM];

    public static double count(long X) {
        long time1, time0 = System.nanoTime();
        doit2(X);
        time1 = System.nanoTime();
        return 1f*X/(time1-time0)*1e9;
    }

    public static void doit2(long X) {
        while (X > 0) {
            for (int i = 0; i < NUM; i++) {
                data[i] = (byte)(data2[i] + data3[i]);
                data4[i] = (short)(data5[i] + data6[i]);
                data7[i] = data8[i] + data9[i];
            }
            X--;
        }
    }

Where main inits the arrays to some set values for use above.  You can even break this it into two cases for each of the byte and short stmt level expressions. We can assert that this is always safe as superword by design does not allow mixed type stmt level expressions where byte and short are intermingled at that level, just the artifacts of boxing the interior components, like arithmetic as int.

Regards,
Michael

-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Friday, February 24, 2017 10:07 AM
To: Berg, Michael C <[hidden email]>; Deshpande, Vivek R <[hidden email]>; [hidden email]
Cc: Viswanathan, Sandhya <[hidden email]>
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

On 2/23/17 9:03 AM, Berg, Michael C wrote:
> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.

"optimal" has wide meaning :)

Currently it means small number of unrolls in presence of wide (long,
double) values.

And if it is optimal why we need this additional fix (8175096)?

I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?

Thanks,
Vladimir

>
> Regards,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Wednesday, February 22, 2017 12:09 PM
> To: Deshpande, Vivek R <[hidden email]>;
> [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
> Michael C <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Hi Vivek,
>
> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>
> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>
> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>
>          if (cur_max_vector < max_vector) {
>            max_vector = cur_max_vector;
>
> And we start with smallest type:
>
> int max_vector = Matcher::max_vector_size(T_BYTE);
>
> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>
>          if (cur_max_vector > max_vector) {
>            max_vector = cur_max_vector;
>
> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>
> Michael, you are author of this code. What do you think?
>
> Thanks,
> Vladimir
>
> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>> Hi
>>
>>
>>
>> Currently subword types cannot use entire vector width using SLP.
>>
>> This fix analyzes the subword in the loop for possibility of
>> narrowing and sets the vector size accordingly.
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>
>> I have also updated the JBS entry.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8175096
>>
>> Would you please review and sponsor it.
>>
>>
>>
>> Regards,
>>
>> Vivek
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Deshpande, Vivek R
In reply to this post by Vladimir Kozlov
Hi Vladimir

Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.

Regards,
Vivek


-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Friday, February 24, 2017 10:07 AM
To: Berg, Michael C; Deshpande, Vivek R; [hidden email]
Cc: Viswanathan, Sandhya
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

On 2/23/17 9:03 AM, Berg, Michael C wrote:
> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.

"optimal" has wide meaning :)

Currently it means small number of unrolls in presence of wide (long,
double) values.

And if it is optimal why we need this additional fix (8175096)?

I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?

Thanks,
Vladimir

>
> Regards,
> Michael
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Wednesday, February 22, 2017 12:09 PM
> To: Deshpande, Vivek R <[hidden email]>;
> [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
> Michael C <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Hi Vivek,
>
> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>
> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>
> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>
>          if (cur_max_vector < max_vector) {
>            max_vector = cur_max_vector;
>
> And we start with smallest type:
>
> int max_vector = Matcher::max_vector_size(T_BYTE);
>
> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>
>          if (cur_max_vector > max_vector) {
>            max_vector = cur_max_vector;
>
> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>
> Michael, you are author of this code. What do you think?
>
> Thanks,
> Vladimir
>
> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>> Hi
>>
>>
>>
>> Currently subword types cannot use entire vector width using SLP.
>>
>> This fix analyzes the subword in the loop for possibility of
>> narrowing and sets the vector size accordingly.
>>
>>
>>
>> Webrev:
>>
>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>
>> I have also updated the JBS entry.
>>
>> https://bugs.openjdk.java.net/browse/JDK-8175096
>>
>> Would you please review and sponsor it.
>>
>>
>>
>> Regards,
>>
>> Vivek
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Vladimir Kozlov
I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.

Regards,
Vladimir

On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:

> Hi Vladimir
>
> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Friday, February 24, 2017 10:07 AM
> To: Berg, Michael C; Deshpande, Vivek R; [hidden email]
> Cc: Viswanathan, Sandhya
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>
> "optimal" has wide meaning :)
>
> Currently it means small number of unrolls in presence of wide (long,
> double) values.
>
> And if it is optimal why we need this additional fix (8175096)?
>
> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>
> Thanks,
> Vladimir
>
>>
>> Regards,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Wednesday, February 22, 2017 12:09 PM
>> To: Deshpande, Vivek R <[hidden email]>;
>> [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>> Michael C <[hidden email]>
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>>
>> Hi Vivek,
>>
>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>
>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>
>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>
>>          if (cur_max_vector < max_vector) {
>>            max_vector = cur_max_vector;
>>
>> And we start with smallest type:
>>
>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>
>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>
>>          if (cur_max_vector > max_vector) {
>>            max_vector = cur_max_vector;
>>
>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>
>> Michael, you are author of this code. What do you think?
>>
>> Thanks,
>> Vladimir
>>
>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>> Hi
>>>
>>>
>>>
>>> Currently subword types cannot use entire vector width using SLP.
>>>
>>> This fix analyzes the subword in the loop for possibility of
>>> narrowing and sets the vector size accordingly.
>>>
>>>
>>>
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>
>>> I have also updated the JBS entry.
>>>
>>> https://bugs.openjdk.java.net/browse/JDK-8175096
>>>
>>> Would you please review and sponsor it.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Vivek
>>>
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Deshpande, Vivek R
Hi Vladimir

Thanks for taking a look at the patch.
I will take a look at your example and work on improving the patch.
Also work on the suggestion of different vector elements counts for different types.

Regards,
Vivek


-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Friday, May 19, 2017 12:17 PM
To: Deshpande, Vivek R; Berg, Michael C; [hidden email]
Cc: Viswanathan, Sandhya
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.

Regards,
Vladimir

On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:

> Hi Vladimir
>
> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Friday, February 24, 2017 10:07 AM
> To: Berg, Michael C; Deshpande, Vivek R;
> [hidden email]
> Cc: Viswanathan, Sandhya
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>
> "optimal" has wide meaning :)
>
> Currently it means small number of unrolls in presence of wide (long,
> double) values.
>
> And if it is optimal why we need this additional fix (8175096)?
>
> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>
> Thanks,
> Vladimir
>
>>
>> Regards,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Wednesday, February 22, 2017 12:09 PM
>> To: Deshpande, Vivek R <[hidden email]>;
>> [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>> Michael C <[hidden email]>
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>> size
>>
>> Hi Vivek,
>>
>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>
>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>
>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>
>>          if (cur_max_vector < max_vector) {
>>            max_vector = cur_max_vector;
>>
>> And we start with smallest type:
>>
>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>
>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>
>>          if (cur_max_vector > max_vector) {
>>            max_vector = cur_max_vector;
>>
>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>
>> Michael, you are author of this code. What do you think?
>>
>> Thanks,
>> Vladimir
>>
>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>> Hi
>>>
>>>
>>>
>>> Currently subword types cannot use entire vector width using SLP.
>>>
>>> This fix analyzes the subword in the loop for possibility of
>>> narrowing and sets the vector size accordingly.
>>>
>>>
>>>
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>
>>> I have also updated the JBS entry.
>>>
>>>c
>>>
>>> Would you please review and sponsor it.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Vivek
>>>
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Deshpande, Vivek R
Hi Vladimir

I have updated the patch.
With this update the generated code for the main loop is unrolled and uses full vector width.
Updated Webrev:
http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
Would you please take a look and give your feedback/ suggestions.

Regards,
Vivek

-----Original Message-----
From: hotspot-compiler-dev [mailto:[hidden email]] On Behalf Of Deshpande, Vivek R
Sent: Friday, May 19, 2017 1:28 PM
To: Vladimir Kozlov <[hidden email]>; Berg, Michael C <[hidden email]>; [hidden email]
Cc: Viswanathan, Sandhya <[hidden email]>
Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Hi Vladimir

Thanks for taking a look at the patch.
I will take a look at your example and work on improving the patch.
Also work on the suggestion of different vector elements counts for different types.

Regards,
Vivek


-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Friday, May 19, 2017 12:17 PM
To: Deshpande, Vivek R; Berg, Michael C; [hidden email]
Cc: Viswanathan, Sandhya
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.

Regards,
Vladimir

On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:

> Hi Vladimir
>
> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Friday, February 24, 2017 10:07 AM
> To: Berg, Michael C; Deshpande, Vivek R;
> [hidden email]
> Cc: Viswanathan, Sandhya
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>
> "optimal" has wide meaning :)
>
> Currently it means small number of unrolls in presence of wide (long,
> double) values.
>
> And if it is optimal why we need this additional fix (8175096)?
>
> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>
> Thanks,
> Vladimir
>
>>
>> Regards,
>> Michael
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Wednesday, February 22, 2017 12:09 PM
>> To: Deshpande, Vivek R <[hidden email]>;
>> [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>> Michael C <[hidden email]>
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>> size
>>
>> Hi Vivek,
>>
>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>
>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>
>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>
>>          if (cur_max_vector < max_vector) {
>>            max_vector = cur_max_vector;
>>
>> And we start with smallest type:
>>
>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>
>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>
>>          if (cur_max_vector > max_vector) {
>>            max_vector = cur_max_vector;
>>
>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>
>> Michael, you are author of this code. What do you think?
>>
>> Thanks,
>> Vladimir
>>
>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>> Hi
>>>
>>>
>>>
>>> Currently subword types cannot use entire vector width using SLP.
>>>
>>> This fix analyzes the subword in the loop for possibility of
>>> narrowing and sets the vector size accordingly.
>>>
>>>
>>>
>>> Webrev:
>>>
>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>
>>> I have also updated the JBS entry.
>>>
>>>c
>>>
>>> Would you please review and sponsor it.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Vivek
>>>
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Vladimir Kozlov
Now I am confused. Did you changed machine?
Originally vector size in code was 512 for integer arrays:

20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)

Latest code show that it is limited to 256:

143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes)
14a vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I

Why vectors are smaller?

And suggestion. Please, modify the test in the bug report to have
separate test methods which only operates on byte[] and short[] arrays
and their combination. It is to see the the change fixed limiting vector
size for sub-integer types.

Thanks,
Vladimir

On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:

> Hi Vladimir
>
> I have updated the patch.
> With this update the generated code for the main loop is unrolled and uses full vector width.
> Updated Webrev:
> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
> Would you please take a look and give your feedback/ suggestions.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: hotspot-compiler-dev [mailto:[hidden email]] On Behalf Of Deshpande, Vivek R
> Sent: Friday, May 19, 2017 1:28 PM
> To: Vladimir Kozlov <[hidden email]>; Berg, Michael C <[hidden email]>; [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>
> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Hi Vladimir
>
> Thanks for taking a look at the patch.
> I will take a look at your example and work on improving the patch.
> Also work on the suggestion of different vector elements counts for different types.
>
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Friday, May 19, 2017 12:17 PM
> To: Deshpande, Vivek R; Berg, Michael C; [hidden email]
> Cc: Viswanathan, Sandhya
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>
> Regards,
> Vladimir
>
> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>
>> Regards,
>> Vivek
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Friday, February 24, 2017 10:07 AM
>> To: Berg, Michael C; Deshpande, Vivek R;
>> [hidden email]
>> Cc: Viswanathan, Sandhya
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>>
>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>>
>> "optimal" has wide meaning :)
>>
>> Currently it means small number of unrolls in presence of wide (long,
>> double) values.
>>
>> And if it is optimal why we need this additional fix (8175096)?
>>
>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Regards,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>> To: Deshpande, Vivek R <[hidden email]>;
>>> [hidden email]
>>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>>> Michael C <[hidden email]>
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> Hi Vivek,
>>>
>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>
>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>
>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>
>>>           if (cur_max_vector < max_vector) {
>>>             max_vector = cur_max_vector;
>>>
>>> And we start with smallest type:
>>>
>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>
>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>
>>>           if (cur_max_vector > max_vector) {
>>>             max_vector = cur_max_vector;
>>>
>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>
>>> Michael, you are author of this code. What do you think?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>> Hi
>>>>
>>>>
>>>>
>>>> Currently subword types cannot use entire vector width using SLP.
>>>>
>>>> This fix analyzes the subword in the loop for possibility of
>>>> narrowing and sets the vector size accordingly.
>>>>
>>>>
>>>>
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>
>>>> I have also updated the JBS entry.
>>>>
>>>> c
>>>>
>>>> Would you please review and sponsor it.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Vivek
>>>>
>>>>
>>>>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Deshpande, Vivek R
Hi Vladimir

I generated that code on my desktop.
The generated code on the skylake server looks like this,
 and I have updated the bug with full main loop generated on skylake server and tests:

1d0   B31: #    B31 B32 <- B30 B31      Loop: B31-B31 inner main of N161 Freq: 1024.97
1d0     movslq  R11, RBP        # i2l
1d3     vmovdqul XMM1 k0,[RBX + #16 + R11 << #2]        ! load vector (64 bytes)
1de     vpaddd  XMM1,XMM1,[R9 + #16 + R11 << #2]        ! add packed16I
1e9     movdq   R10, XMM0       # spill
1ee     vmovdqul XMM2 k0,[R10 + #16 + R11]      ! load vector (64 bytes)
1f9     vpaddb  XMM2,XMM2,[R13 + #16 + R11]     ! add packed64B
204     vmovdqul [R14 + #16 + R11] k0,XMM2      ! store vector (64 bytes)
20f     vmovdqul XMM2 k0,[RDX + #80 + R11 << #1]        ! load vector (64 bytes)
21a     vmovdqul XMM3 k0,[R10 + #80 + R11]      ! load vector (64 bytes)
225     vpaddb  XMM3,XMM3,[R13 + #80 + R11]     ! add packed64B
230     vmovdqul [R14 + #80 + R11] k0,XMM3      ! store vector (64 bytes)
23b     vmovdqul XMM3 k0,[RAX + #80 + R11 << #1]        ! load vector (64 bytes)
246     vmovdqul XMM4 k0,[R10 + #144 + R11]     ! load vector (64 bytes)
251     vpaddb  XMM4,XMM4,[R13 + #144 + R11]    ! add packed64B
25c     vmovdqul [R14 + #144 + R11] k0,XMM4     ! store vector (64 bytes)
267     vmovdqul XMM4 k0,[RBX + #80 + R11 << #2]        ! load vector (64 bytes)
272     vmovdqul XMM5 k0,[R10 + #208 + R11]     ! load vector (64 bytes)
27d     vpaddb  XMM5,XMM5,[R13 + #208 + R11]    ! add packed64B
288     vmovdqul [R14 + #208 + R11] k0,XMM5     ! store vector (64 bytes)
293     vmovdqul XMM5 k0,[RDX + #16 + R11 << #1]        ! load vector (64 bytes)
29e     vpaddw  XMM5,XMM5,[RAX + #16 + R11 << #1]       ! add packed32S


Please let me know your thoughts.

Regards,
Vivek
-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Monday, July 17, 2017 12:44 PM
To: Deshpande, Vivek R <[hidden email]>; [hidden email]
Cc: Viswanathan, Sandhya <[hidden email]>
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Now I am confused. Did you changed machine?
Originally vector size in code was 512 for integer arrays:

20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)

Latest code show that it is limited to 256:

143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes) 14a vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I

Why vectors are smaller?

And suggestion. Please, modify the test in the bug report to have separate test methods which only operates on byte[] and short[] arrays and their combination. It is to see the the change fixed limiting vector size for sub-integer types.

Thanks,
Vladimir

On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:

> Hi Vladimir
>
> I have updated the patch.
> With this update the generated code for the main loop is unrolled and uses full vector width.
> Updated Webrev:
> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
> Would you please take a look and give your feedback/ suggestions.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: hotspot-compiler-dev
> [mailto:[hidden email]] On Behalf Of
> Deshpande, Vivek R
> Sent: Friday, May 19, 2017 1:28 PM
> To: Vladimir Kozlov <[hidden email]>; Berg, Michael C
> <[hidden email]>; [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>
> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Hi Vladimir
>
> Thanks for taking a look at the patch.
> I will take a look at your example and work on improving the patch.
> Also work on the suggestion of different vector elements counts for different types.
>
> Regards,
> Vivek
>
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Friday, May 19, 2017 12:17 PM
> To: Deshpande, Vivek R; Berg, Michael C;
> [hidden email]
> Cc: Viswanathan, Sandhya
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>
> Regards,
> Vladimir
>
> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>
>> Regards,
>> Vivek
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Friday, February 24, 2017 10:07 AM
>> To: Berg, Michael C; Deshpande, Vivek R;
>> [hidden email]
>> Cc: Viswanathan, Sandhya
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>> size
>>
>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>>
>> "optimal" has wide meaning :)
>>
>> Currently it means small number of unrolls in presence of wide (long,
>> double) values.
>>
>> And if it is optimal why we need this additional fix (8175096)?
>>
>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>
>> Thanks,
>> Vladimir
>>
>>>
>>> Regards,
>>> Michael
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>> To: Deshpande, Vivek R <[hidden email]>;
>>> [hidden email]
>>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>>> Michael C <[hidden email]>
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> Hi Vivek,
>>>
>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>
>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>
>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>
>>>           if (cur_max_vector < max_vector) {
>>>             max_vector = cur_max_vector;
>>>
>>> And we start with smallest type:
>>>
>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>
>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>
>>>           if (cur_max_vector > max_vector) {
>>>             max_vector = cur_max_vector;
>>>
>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>
>>> Michael, you are author of this code. What do you think?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>> Hi
>>>>
>>>>
>>>>
>>>> Currently subword types cannot use entire vector width using SLP.
>>>>
>>>> This fix analyzes the subword in the loop for possibility of
>>>> narrowing and sets the vector size accordingly.
>>>>
>>>>
>>>>
>>>> Webrev:
>>>>
>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>
>>>> I have also updated the JBS entry.
>>>>
>>>> c
>>>>
>>>> Would you please review and sponsor it.
>>>>
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Vivek
>>>>
>>>>
>>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Vladimir Kozlov
Good. Can you also add to the bug report code generated for byte[] and
short[] arrays tests?

Thanks,
Vladimir

On 7/18/17 11:50 AM, Deshpande, Vivek R wrote:

> Hi Vladimir
>
> I generated that code on my desktop.
> The generated code on the skylake server looks like this,
>   and I have updated the bug with full main loop generated on skylake server and tests:
>
> 1d0   B31: #    B31 B32 <- B30 B31      Loop: B31-B31 inner main of N161 Freq: 1024.97
> 1d0     movslq  R11, RBP        # i2l
> 1d3     vmovdqul XMM1 k0,[RBX + #16 + R11 << #2]        ! load vector (64 bytes)
> 1de     vpaddd  XMM1,XMM1,[R9 + #16 + R11 << #2]        ! add packed16I
> 1e9     movdq   R10, XMM0       # spill
> 1ee     vmovdqul XMM2 k0,[R10 + #16 + R11]      ! load vector (64 bytes)
> 1f9     vpaddb  XMM2,XMM2,[R13 + #16 + R11]     ! add packed64B
> 204     vmovdqul [R14 + #16 + R11] k0,XMM2      ! store vector (64 bytes)
> 20f     vmovdqul XMM2 k0,[RDX + #80 + R11 << #1]        ! load vector (64 bytes)
> 21a     vmovdqul XMM3 k0,[R10 + #80 + R11]      ! load vector (64 bytes)
> 225     vpaddb  XMM3,XMM3,[R13 + #80 + R11]     ! add packed64B
> 230     vmovdqul [R14 + #80 + R11] k0,XMM3      ! store vector (64 bytes)
> 23b     vmovdqul XMM3 k0,[RAX + #80 + R11 << #1]        ! load vector (64 bytes)
> 246     vmovdqul XMM4 k0,[R10 + #144 + R11]     ! load vector (64 bytes)
> 251     vpaddb  XMM4,XMM4,[R13 + #144 + R11]    ! add packed64B
> 25c     vmovdqul [R14 + #144 + R11] k0,XMM4     ! store vector (64 bytes)
> 267     vmovdqul XMM4 k0,[RBX + #80 + R11 << #2]        ! load vector (64 bytes)
> 272     vmovdqul XMM5 k0,[R10 + #208 + R11]     ! load vector (64 bytes)
> 27d     vpaddb  XMM5,XMM5,[R13 + #208 + R11]    ! add packed64B
> 288     vmovdqul [R14 + #208 + R11] k0,XMM5     ! store vector (64 bytes)
> 293     vmovdqul XMM5 k0,[RDX + #16 + R11 << #1]        ! load vector (64 bytes)
> 29e     vpaddw  XMM5,XMM5,[RAX + #16 + R11 << #1]       ! add packed32S
>
>
> Please let me know your thoughts.
>
> Regards,
> Vivek
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Monday, July 17, 2017 12:44 PM
> To: Deshpande, Vivek R <[hidden email]>; [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Now I am confused. Did you changed machine?
> Originally vector size in code was 512 for integer arrays:
>
> 20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
> 217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
> 222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)
>
> Latest code show that it is limited to 256:
>
> 143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes) 14a vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I
>
> Why vectors are smaller?
>
> And suggestion. Please, modify the test in the bug report to have separate test methods which only operates on byte[] and short[] arrays and their combination. It is to see the the change fixed limiting vector size for sub-integer types.
>
> Thanks,
> Vladimir
>
> On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> I have updated the patch.
>> With this update the generated code for the main loop is unrolled and uses full vector width.
>> Updated Webrev:
>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
>> Would you please take a look and give your feedback/ suggestions.
>>
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev
>> [mailto:[hidden email]] On Behalf Of
>> Deshpande, Vivek R
>> Sent: Friday, May 19, 2017 1:28 PM
>> To: Vladimir Kozlov <[hidden email]>; Berg, Michael C
>> <[hidden email]>; [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>
>> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector size
>>
>> Hi Vladimir
>>
>> Thanks for taking a look at the patch.
>> I will take a look at your example and work on improving the patch.
>> Also work on the suggestion of different vector elements counts for different types.
>>
>> Regards,
>> Vivek
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Friday, May 19, 2017 12:17 PM
>> To: Deshpande, Vivek R; Berg, Michael C;
>> [hidden email]
>> Cc: Viswanathan, Sandhya
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>>
>> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>>
>> Regards,
>> Vladimir
>>
>> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>>> Hi Vladimir
>>>
>>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>>
>>> Regards,
>>> Vivek
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>> Sent: Friday, February 24, 2017 10:07 AM
>>> To: Berg, Michael C; Deshpande, Vivek R;
>>> [hidden email]
>>> Cc: Viswanathan, Sandhya
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>>>
>>> "optimal" has wide meaning :)
>>>
>>> Currently it means small number of unrolls in presence of wide (long,
>>> double) values.
>>>
>>> And if it is optimal why we need this additional fix (8175096)?
>>>
>>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>>> To: Deshpande, Vivek R <[hidden email]>;
>>>> [hidden email]
>>>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>>>> Michael C <[hidden email]>
>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>> size
>>>>
>>>> Hi Vivek,
>>>>
>>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>>
>>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>>
>>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>>
>>>>            if (cur_max_vector < max_vector) {
>>>>              max_vector = cur_max_vector;
>>>>
>>>> And we start with smallest type:
>>>>
>>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>>
>>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>>
>>>>            if (cur_max_vector > max_vector) {
>>>>              max_vector = cur_max_vector;
>>>>
>>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>>
>>>> Michael, you are author of this code. What do you think?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>>> Hi
>>>>>
>>>>>
>>>>>
>>>>> Currently subword types cannot use entire vector width using SLP.
>>>>>
>>>>> This fix analyzes the subword in the loop for possibility of
>>>>> narrowing and sets the vector size accordingly.
>>>>>
>>>>>
>>>>>
>>>>> Webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>>
>>>>> I have also updated the JBS entry.
>>>>>
>>>>> c
>>>>>
>>>>> Would you please review and sponsor it.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Vivek
>>>>>
>>>>>
>>>>>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Deshpande, Vivek R
HI Vladimir

I have updated the bug with code generated for byte[] and short[].
Thanks for taking a look at it.

Regards,
Vivek

-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Tuesday, July 18, 2017 12:35 PM
To: Deshpande, Vivek R <[hidden email]>; [hidden email]
Cc: Viswanathan, Sandhya <[hidden email]>
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Good. Can you also add to the bug report code generated for byte[] and short[] arrays tests?

Thanks,
Vladimir

On 7/18/17 11:50 AM, Deshpande, Vivek R wrote:

> Hi Vladimir
>
> I generated that code on my desktop.
> The generated code on the skylake server looks like this,
>   and I have updated the bug with full main loop generated on skylake server and tests:
>
> 1d0   B31: #    B31 B32 <- B30 B31      Loop: B31-B31 inner main of N161 Freq: 1024.97
> 1d0     movslq  R11, RBP        # i2l
> 1d3     vmovdqul XMM1 k0,[RBX + #16 + R11 << #2]        ! load vector (64 bytes)
> 1de     vpaddd  XMM1,XMM1,[R9 + #16 + R11 << #2]        ! add packed16I
> 1e9     movdq   R10, XMM0       # spill
> 1ee     vmovdqul XMM2 k0,[R10 + #16 + R11]      ! load vector (64 bytes)
> 1f9     vpaddb  XMM2,XMM2,[R13 + #16 + R11]     ! add packed64B
> 204     vmovdqul [R14 + #16 + R11] k0,XMM2      ! store vector (64 bytes)
> 20f     vmovdqul XMM2 k0,[RDX + #80 + R11 << #1]        ! load vector (64 bytes)
> 21a     vmovdqul XMM3 k0,[R10 + #80 + R11]      ! load vector (64 bytes)
> 225     vpaddb  XMM3,XMM3,[R13 + #80 + R11]     ! add packed64B
> 230     vmovdqul [R14 + #80 + R11] k0,XMM3      ! store vector (64 bytes)
> 23b     vmovdqul XMM3 k0,[RAX + #80 + R11 << #1]        ! load vector (64 bytes)
> 246     vmovdqul XMM4 k0,[R10 + #144 + R11]     ! load vector (64 bytes)
> 251     vpaddb  XMM4,XMM4,[R13 + #144 + R11]    ! add packed64B
> 25c     vmovdqul [R14 + #144 + R11] k0,XMM4     ! store vector (64 bytes)
> 267     vmovdqul XMM4 k0,[RBX + #80 + R11 << #2]        ! load vector (64 bytes)
> 272     vmovdqul XMM5 k0,[R10 + #208 + R11]     ! load vector (64 bytes)
> 27d     vpaddb  XMM5,XMM5,[R13 + #208 + R11]    ! add packed64B
> 288     vmovdqul [R14 + #208 + R11] k0,XMM5     ! store vector (64 bytes)
> 293     vmovdqul XMM5 k0,[RDX + #16 + R11 << #1]        ! load vector (64 bytes)
> 29e     vpaddw  XMM5,XMM5,[RAX + #16 + R11 << #1]       ! add packed32S
>
>
> Please let me know your thoughts.
>
> Regards,
> Vivek
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Monday, July 17, 2017 12:44 PM
> To: Deshpande, Vivek R <[hidden email]>;
> [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Now I am confused. Did you changed machine?
> Originally vector size in code was 512 for integer arrays:
>
> 20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
> 217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
> 222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)
>
> Latest code show that it is limited to 256:
>
> 143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes) 14a
> vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I
>
> Why vectors are smaller?
>
> And suggestion. Please, modify the test in the bug report to have separate test methods which only operates on byte[] and short[] arrays and their combination. It is to see the the change fixed limiting vector size for sub-integer types.
>
> Thanks,
> Vladimir
>
> On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> I have updated the patch.
>> With this update the generated code for the main loop is unrolled and uses full vector width.
>> Updated Webrev:
>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
>> Would you please take a look and give your feedback/ suggestions.
>>
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: hotspot-compiler-dev
>> [mailto:[hidden email]] On Behalf Of
>> Deshpande, Vivek R
>> Sent: Friday, May 19, 2017 1:28 PM
>> To: Vladimir Kozlov <[hidden email]>; Berg, Michael C
>> <[hidden email]>; [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>
>> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector
>> size
>>
>> Hi Vladimir
>>
>> Thanks for taking a look at the patch.
>> I will take a look at your example and work on improving the patch.
>> Also work on the suggestion of different vector elements counts for different types.
>>
>> Regards,
>> Vivek
>>
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Friday, May 19, 2017 12:17 PM
>> To: Deshpande, Vivek R; Berg, Michael C;
>> [hidden email]
>> Cc: Viswanathan, Sandhya
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>> size
>>
>> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>>
>> Regards,
>> Vladimir
>>
>> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>>> Hi Vladimir
>>>
>>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>>
>>> Regards,
>>> Vivek
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>> Sent: Friday, February 24, 2017 10:07 AM
>>> To: Berg, Michael C; Deshpande, Vivek R;
>>> [hidden email]
>>> Cc: Viswanathan, Sandhya
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>>>
>>> "optimal" has wide meaning :)
>>>
>>> Currently it means small number of unrolls in presence of wide
>>> (long,
>>> double) values.
>>>
>>> And if it is optimal why we need this additional fix (8175096)?
>>>
>>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>>
>>> Thanks,
>>> Vladimir
>>>
>>>>
>>>> Regards,
>>>> Michael
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>>> To: Deshpande, Vivek R <[hidden email]>;
>>>> [hidden email]
>>>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>>>> Michael C <[hidden email]>
>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>> size
>>>>
>>>> Hi Vivek,
>>>>
>>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>>
>>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>>
>>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>>
>>>>            if (cur_max_vector < max_vector) {
>>>>              max_vector = cur_max_vector;
>>>>
>>>> And we start with smallest type:
>>>>
>>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>>
>>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>>
>>>>            if (cur_max_vector > max_vector) {
>>>>              max_vector = cur_max_vector;
>>>>
>>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>>
>>>> Michael, you are author of this code. What do you think?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>>> Hi
>>>>>
>>>>>
>>>>>
>>>>> Currently subword types cannot use entire vector width using SLP.
>>>>>
>>>>> This fix analyzes the subword in the loop for possibility of
>>>>> narrowing and sets the vector size accordingly.
>>>>>
>>>>>
>>>>>
>>>>> Webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>>
>>>>> I have also updated the JBS entry.
>>>>>
>>>>> c
>>>>>
>>>>> Would you please review and sponsor it.
>>>>>
>>>>>
>>>>>
>>>>> Regards,
>>>>>
>>>>> Vivek
>>>>>
>>>>>
>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Vladimir Kozlov
Perfect.

I will push it now.

Thanks,
Vladimir

On 7/18/17 1:44 PM, Deshpande, Vivek R wrote:

> HI Vladimir
>
> I have updated the bug with code generated for byte[] and short[].
> Thanks for taking a look at it.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Tuesday, July 18, 2017 12:35 PM
> To: Deshpande, Vivek R <[hidden email]>; [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Good. Can you also add to the bug report code generated for byte[] and short[] arrays tests?
>
> Thanks,
> Vladimir
>
> On 7/18/17 11:50 AM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> I generated that code on my desktop.
>> The generated code on the skylake server looks like this,
>>    and I have updated the bug with full main loop generated on skylake server and tests:
>>
>> 1d0   B31: #    B31 B32 <- B30 B31      Loop: B31-B31 inner main of N161 Freq: 1024.97
>> 1d0     movslq  R11, RBP        # i2l
>> 1d3     vmovdqul XMM1 k0,[RBX + #16 + R11 << #2]        ! load vector (64 bytes)
>> 1de     vpaddd  XMM1,XMM1,[R9 + #16 + R11 << #2]        ! add packed16I
>> 1e9     movdq   R10, XMM0       # spill
>> 1ee     vmovdqul XMM2 k0,[R10 + #16 + R11]      ! load vector (64 bytes)
>> 1f9     vpaddb  XMM2,XMM2,[R13 + #16 + R11]     ! add packed64B
>> 204     vmovdqul [R14 + #16 + R11] k0,XMM2      ! store vector (64 bytes)
>> 20f     vmovdqul XMM2 k0,[RDX + #80 + R11 << #1]        ! load vector (64 bytes)
>> 21a     vmovdqul XMM3 k0,[R10 + #80 + R11]      ! load vector (64 bytes)
>> 225     vpaddb  XMM3,XMM3,[R13 + #80 + R11]     ! add packed64B
>> 230     vmovdqul [R14 + #80 + R11] k0,XMM3      ! store vector (64 bytes)
>> 23b     vmovdqul XMM3 k0,[RAX + #80 + R11 << #1]        ! load vector (64 bytes)
>> 246     vmovdqul XMM4 k0,[R10 + #144 + R11]     ! load vector (64 bytes)
>> 251     vpaddb  XMM4,XMM4,[R13 + #144 + R11]    ! add packed64B
>> 25c     vmovdqul [R14 + #144 + R11] k0,XMM4     ! store vector (64 bytes)
>> 267     vmovdqul XMM4 k0,[RBX + #80 + R11 << #2]        ! load vector (64 bytes)
>> 272     vmovdqul XMM5 k0,[R10 + #208 + R11]     ! load vector (64 bytes)
>> 27d     vpaddb  XMM5,XMM5,[R13 + #208 + R11]    ! add packed64B
>> 288     vmovdqul [R14 + #208 + R11] k0,XMM5     ! store vector (64 bytes)
>> 293     vmovdqul XMM5 k0,[RDX + #16 + R11 << #1]        ! load vector (64 bytes)
>> 29e     vpaddw  XMM5,XMM5,[RAX + #16 + R11 << #1]       ! add packed32S
>>
>>
>> Please let me know your thoughts.
>>
>> Regards,
>> Vivek
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Monday, July 17, 2017 12:44 PM
>> To: Deshpande, Vivek R <[hidden email]>;
>> [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>>
>> Now I am confused. Did you changed machine?
>> Originally vector size in code was 512 for integer arrays:
>>
>> 20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
>> 217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
>> 222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)
>>
>> Latest code show that it is limited to 256:
>>
>> 143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes) 14a
>> vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I
>>
>> Why vectors are smaller?
>>
>> And suggestion. Please, modify the test in the bug report to have separate test methods which only operates on byte[] and short[] arrays and their combination. It is to see the the change fixed limiting vector size for sub-integer types.
>>
>> Thanks,
>> Vladimir
>>
>> On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:
>>> Hi Vladimir
>>>
>>> I have updated the patch.
>>> With this update the generated code for the main loop is unrolled and uses full vector width.
>>> Updated Webrev:
>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
>>> Would you please take a look and give your feedback/ suggestions.
>>>
>>> Regards,
>>> Vivek
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:[hidden email]] On Behalf Of
>>> Deshpande, Vivek R
>>> Sent: Friday, May 19, 2017 1:28 PM
>>> To: Vladimir Kozlov <[hidden email]>; Berg, Michael C
>>> <[hidden email]>; [hidden email]
>>> Cc: Viswanathan, Sandhya <[hidden email]>
>>> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> Hi Vladimir
>>>
>>> Thanks for taking a look at the patch.
>>> I will take a look at your example and work on improving the patch.
>>> Also work on the suggestion of different vector elements counts for different types.
>>>
>>> Regards,
>>> Vivek
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>> Sent: Friday, May 19, 2017 12:17 PM
>>> To: Deshpande, Vivek R; Berg, Michael C;
>>> [hidden email]
>>> Cc: Viswanathan, Sandhya
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>>>> Hi Vladimir
>>>>
>>>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>>>
>>>> Regards,
>>>> Vivek
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>> Sent: Friday, February 24, 2017 10:07 AM
>>>> To: Berg, Michael C; Deshpande, Vivek R;
>>>> [hidden email]
>>>> Cc: Viswanathan, Sandhya
>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>> size
>>>>
>>>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>>>>
>>>> "optimal" has wide meaning :)
>>>>
>>>> Currently it means small number of unrolls in presence of wide
>>>> (long,
>>>> double) values.
>>>>
>>>> And if it is optimal why we need this additional fix (8175096)?
>>>>
>>>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>>>> To: Deshpande, Vivek R <[hidden email]>;
>>>>> [hidden email]
>>>>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>>>>> Michael C <[hidden email]>
>>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>>> size
>>>>>
>>>>> Hi Vivek,
>>>>>
>>>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>>>
>>>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>>>
>>>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>>>
>>>>>             if (cur_max_vector < max_vector) {
>>>>>               max_vector = cur_max_vector;
>>>>>
>>>>> And we start with smallest type:
>>>>>
>>>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>>>
>>>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>>>
>>>>>             if (cur_max_vector > max_vector) {
>>>>>               max_vector = cur_max_vector;
>>>>>
>>>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>>>
>>>>> Michael, you are author of this code. What do you think?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>>>> Hi
>>>>>>
>>>>>>
>>>>>>
>>>>>> Currently subword types cannot use entire vector width using SLP.
>>>>>>
>>>>>> This fix analyzes the subword in the loop for possibility of
>>>>>> narrowing and sets the vector size accordingly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>>>
>>>>>> I have also updated the JBS entry.
>>>>>>
>>>>>> c
>>>>>>
>>>>>> Would you please review and sponsor it.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Vivek
>>>>>>
>>>>>>
>>>>>>
Reply | Threaded
Open this post in threaded view
|

RE: RFR(S): 8175096: Use Subword Analysis for set vector size

Deshpande, Vivek R
Thanks Vladimir.

Regards,
Vivek

-----Original Message-----
From: Vladimir Kozlov [mailto:[hidden email]]
Sent: Tuesday, July 18, 2017 1:47 PM
To: Deshpande, Vivek R <[hidden email]>; [hidden email]
Cc: Viswanathan, Sandhya <[hidden email]>
Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Perfect.

I will push it now.

Thanks,
Vladimir

On 7/18/17 1:44 PM, Deshpande, Vivek R wrote:

> HI Vladimir
>
> I have updated the bug with code generated for byte[] and short[].
> Thanks for taking a look at it.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Tuesday, July 18, 2017 12:35 PM
> To: Deshpande, Vivek R <[hidden email]>;
> [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Good. Can you also add to the bug report code generated for byte[] and short[] arrays tests?
>
> Thanks,
> Vladimir
>
> On 7/18/17 11:50 AM, Deshpande, Vivek R wrote:
>> Hi Vladimir
>>
>> I generated that code on my desktop.
>> The generated code on the skylake server looks like this,
>>    and I have updated the bug with full main loop generated on skylake server and tests:
>>
>> 1d0   B31: #    B31 B32 <- B30 B31      Loop: B31-B31 inner main of N161 Freq: 1024.97
>> 1d0     movslq  R11, RBP        # i2l
>> 1d3     vmovdqul XMM1 k0,[RBX + #16 + R11 << #2]        ! load vector (64 bytes)
>> 1de     vpaddd  XMM1,XMM1,[R9 + #16 + R11 << #2]        ! add packed16I
>> 1e9     movdq   R10, XMM0       # spill
>> 1ee     vmovdqul XMM2 k0,[R10 + #16 + R11]      ! load vector (64 bytes)
>> 1f9     vpaddb  XMM2,XMM2,[R13 + #16 + R11]     ! add packed64B
>> 204     vmovdqul [R14 + #16 + R11] k0,XMM2      ! store vector (64 bytes)
>> 20f     vmovdqul XMM2 k0,[RDX + #80 + R11 << #1]        ! load vector (64 bytes)
>> 21a     vmovdqul XMM3 k0,[R10 + #80 + R11]      ! load vector (64 bytes)
>> 225     vpaddb  XMM3,XMM3,[R13 + #80 + R11]     ! add packed64B
>> 230     vmovdqul [R14 + #80 + R11] k0,XMM3      ! store vector (64 bytes)
>> 23b     vmovdqul XMM3 k0,[RAX + #80 + R11 << #1]        ! load vector (64 bytes)
>> 246     vmovdqul XMM4 k0,[R10 + #144 + R11]     ! load vector (64 bytes)
>> 251     vpaddb  XMM4,XMM4,[R13 + #144 + R11]    ! add packed64B
>> 25c     vmovdqul [R14 + #144 + R11] k0,XMM4     ! store vector (64 bytes)
>> 267     vmovdqul XMM4 k0,[RBX + #80 + R11 << #2]        ! load vector (64 bytes)
>> 272     vmovdqul XMM5 k0,[R10 + #208 + R11]     ! load vector (64 bytes)
>> 27d     vpaddb  XMM5,XMM5,[R13 + #208 + R11]    ! add packed64B
>> 288     vmovdqul [R14 + #208 + R11] k0,XMM5     ! store vector (64 bytes)
>> 293     vmovdqul XMM5 k0,[RDX + #16 + R11 << #1]        ! load vector (64 bytes)
>> 29e     vpaddw  XMM5,XMM5,[RAX + #16 + R11 << #1]       ! add packed32S
>>
>>
>> Please let me know your thoughts.
>>
>> Regards,
>> Vivek
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Monday, July 17, 2017 12:44 PM
>> To: Deshpande, Vivek R <[hidden email]>;
>> [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>> size
>>
>> Now I am confused. Did you changed machine?
>> Originally vector size in code was 512 for integer arrays:
>>
>> 20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
>> 217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
>> 222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)
>>
>> Latest code show that it is limited to 256:
>>
>> 143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes) 14a
>> vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I
>>
>> Why vectors are smaller?
>>
>> And suggestion. Please, modify the test in the bug report to have separate test methods which only operates on byte[] and short[] arrays and their combination. It is to see the the change fixed limiting vector size for sub-integer types.
>>
>> Thanks,
>> Vladimir
>>
>> On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:
>>> Hi Vladimir
>>>
>>> I have updated the patch.
>>> With this update the generated code for the main loop is unrolled and uses full vector width.
>>> Updated Webrev:
>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
>>> Would you please take a look and give your feedback/ suggestions.
>>>
>>> Regards,
>>> Vivek
>>>
>>> -----Original Message-----
>>> From: hotspot-compiler-dev
>>> [mailto:[hidden email]] On Behalf Of
>>> Deshpande, Vivek R
>>> Sent: Friday, May 19, 2017 1:28 PM
>>> To: Vladimir Kozlov <[hidden email]>; Berg, Michael C
>>> <[hidden email]>; [hidden email]
>>> Cc: Viswanathan, Sandhya <[hidden email]>
>>> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> Hi Vladimir
>>>
>>> Thanks for taking a look at the patch.
>>> I will take a look at your example and work on improving the patch.
>>> Also work on the suggestion of different vector elements counts for different types.
>>>
>>> Regards,
>>> Vivek
>>>
>>>
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>> Sent: Friday, May 19, 2017 12:17 PM
>>> To: Deshpande, Vivek R; Berg, Michael C;
>>> [hidden email]
>>> Cc: Viswanathan, Sandhya
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>>>
>>> Regards,
>>> Vladimir
>>>
>>> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>>>> Hi Vladimir
>>>>
>>>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>>>
>>>> Regards,
>>>> Vivek
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>> Sent: Friday, February 24, 2017 10:07 AM
>>>> To: Berg, Michael C; Deshpande, Vivek R;
>>>> [hidden email]
>>>> Cc: Viswanathan, Sandhya
>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>> size
>>>>
>>>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>>>>
>>>> "optimal" has wide meaning :)
>>>>
>>>> Currently it means small number of unrolls in presence of wide
>>>> (long,
>>>> double) values.
>>>>
>>>> And if it is optimal why we need this additional fix (8175096)?
>>>>
>>>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>>>
>>>> Thanks,
>>>> Vladimir
>>>>
>>>>>
>>>>> Regards,
>>>>> Michael
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>>>> To: Deshpande, Vivek R <[hidden email]>;
>>>>> [hidden email]
>>>>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>>>>> Michael C <[hidden email]>
>>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>>> size
>>>>>
>>>>> Hi Vivek,
>>>>>
>>>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>>>
>>>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>>>
>>>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>>>
>>>>>             if (cur_max_vector < max_vector) {
>>>>>               max_vector = cur_max_vector;
>>>>>
>>>>> And we start with smallest type:
>>>>>
>>>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>>>
>>>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>>>
>>>>>             if (cur_max_vector > max_vector) {
>>>>>               max_vector = cur_max_vector;
>>>>>
>>>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>>>
>>>>> Michael, you are author of this code. What do you think?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>>>> Hi
>>>>>>
>>>>>>
>>>>>>
>>>>>> Currently subword types cannot use entire vector width using SLP.
>>>>>>
>>>>>> This fix analyzes the subword in the loop for possibility of
>>>>>> narrowing and sets the vector size accordingly.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>>>
>>>>>> I have also updated the JBS entry.
>>>>>>
>>>>>> c
>>>>>>
>>>>>> Would you please review and sponsor it.
>>>>>>
>>>>>>
>>>>>>
>>>>>> Regards,
>>>>>>
>>>>>> Vivek
>>>>>>
>>>>>>
>>>>>>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(S): 8175096: Use Subword Analysis for set vector size

Vladimir Kozlov
We need to wait pre-integration testing results before push. I will let
you know the results.

Thanks,
Vladimir

On 7/18/17 2:08 PM, Deshpande, Vivek R wrote:

> Thanks Vladimir.
>
> Regards,
> Vivek
>
> -----Original Message-----
> From: Vladimir Kozlov [mailto:[hidden email]]
> Sent: Tuesday, July 18, 2017 1:47 PM
> To: Deshpande, Vivek R <[hidden email]>; [hidden email]
> Cc: Viswanathan, Sandhya <[hidden email]>
> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>
> Perfect.
>
> I will push it now.
>
> Thanks,
> Vladimir
>
> On 7/18/17 1:44 PM, Deshpande, Vivek R wrote:
>> HI Vladimir
>>
>> I have updated the bug with code generated for byte[] and short[].
>> Thanks for taking a look at it.
>>
>> Regards,
>> Vivek
>>
>> -----Original Message-----
>> From: Vladimir Kozlov [mailto:[hidden email]]
>> Sent: Tuesday, July 18, 2017 12:35 PM
>> To: Deshpande, Vivek R <[hidden email]>;
>> [hidden email]
>> Cc: Viswanathan, Sandhya <[hidden email]>
>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector size
>>
>> Good. Can you also add to the bug report code generated for byte[] and short[] arrays tests?
>>
>> Thanks,
>> Vladimir
>>
>> On 7/18/17 11:50 AM, Deshpande, Vivek R wrote:
>>> Hi Vladimir
>>>
>>> I generated that code on my desktop.
>>> The generated code on the skylake server looks like this,
>>>     and I have updated the bug with full main loop generated on skylake server and tests:
>>>
>>> 1d0   B31: #    B31 B32 <- B30 B31      Loop: B31-B31 inner main of N161 Freq: 1024.97
>>> 1d0     movslq  R11, RBP        # i2l
>>> 1d3     vmovdqul XMM1 k0,[RBX + #16 + R11 << #2]        ! load vector (64 bytes)
>>> 1de     vpaddd  XMM1,XMM1,[R9 + #16 + R11 << #2]        ! add packed16I
>>> 1e9     movdq   R10, XMM0       # spill
>>> 1ee     vmovdqul XMM2 k0,[R10 + #16 + R11]      ! load vector (64 bytes)
>>> 1f9     vpaddb  XMM2,XMM2,[R13 + #16 + R11]     ! add packed64B
>>> 204     vmovdqul [R14 + #16 + R11] k0,XMM2      ! store vector (64 bytes)
>>> 20f     vmovdqul XMM2 k0,[RDX + #80 + R11 << #1]        ! load vector (64 bytes)
>>> 21a     vmovdqul XMM3 k0,[R10 + #80 + R11]      ! load vector (64 bytes)
>>> 225     vpaddb  XMM3,XMM3,[R13 + #80 + R11]     ! add packed64B
>>> 230     vmovdqul [R14 + #80 + R11] k0,XMM3      ! store vector (64 bytes)
>>> 23b     vmovdqul XMM3 k0,[RAX + #80 + R11 << #1]        ! load vector (64 bytes)
>>> 246     vmovdqul XMM4 k0,[R10 + #144 + R11]     ! load vector (64 bytes)
>>> 251     vpaddb  XMM4,XMM4,[R13 + #144 + R11]    ! add packed64B
>>> 25c     vmovdqul [R14 + #144 + R11] k0,XMM4     ! store vector (64 bytes)
>>> 267     vmovdqul XMM4 k0,[RBX + #80 + R11 << #2]        ! load vector (64 bytes)
>>> 272     vmovdqul XMM5 k0,[R10 + #208 + R11]     ! load vector (64 bytes)
>>> 27d     vpaddb  XMM5,XMM5,[R13 + #208 + R11]    ! add packed64B
>>> 288     vmovdqul [R14 + #208 + R11] k0,XMM5     ! store vector (64 bytes)
>>> 293     vmovdqul XMM5 k0,[RDX + #16 + R11 << #1]        ! load vector (64 bytes)
>>> 29e     vpaddw  XMM5,XMM5,[RAX + #16 + R11 << #1]       ! add packed32S
>>>
>>>
>>> Please let me know your thoughts.
>>>
>>> Regards,
>>> Vivek
>>> -----Original Message-----
>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>> Sent: Monday, July 17, 2017 12:44 PM
>>> To: Deshpande, Vivek R <[hidden email]>;
>>> [hidden email]
>>> Cc: Viswanathan, Sandhya <[hidden email]>
>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>> size
>>>
>>> Now I am confused. Did you changed machine?
>>> Originally vector size in code was 512 for integer arrays:
>>>
>>> 20c vmovdqul XMM0 k0,[RCX + #16 + R10 << #2] ! load vector (64 bytes)
>>> 217 vpaddd XMM0,XMM0,[RBX + #16 + R10 << #2] ! add packed16I
>>> 222 vmovdqul [RDX + #16 + R10 << #2] k0,XMM0 ! store vector (64 bytes)
>>>
>>> Latest code show that it is limited to 256:
>>>
>>> 143 vmovdqu XMM0,[R9 + #16 + R10 << #2] ! load vector (32 bytes) 14a
>>> vpaddd XMM0,XMM0,[R11 + #16 + R10 << #2] ! add packed8I
>>>
>>> Why vectors are smaller?
>>>
>>> And suggestion. Please, modify the test in the bug report to have separate test methods which only operates on byte[] and short[] arrays and their combination. It is to see the the change fixed limiting vector size for sub-integer types.
>>>
>>> Thanks,
>>> Vladimir
>>>
>>> On 7/13/17 11:49 AM, Deshpande, Vivek R wrote:
>>>> Hi Vladimir
>>>>
>>>> I have updated the patch.
>>>> With this update the generated code for the main loop is unrolled and uses full vector width.
>>>> Updated Webrev:
>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.01/
>>>> Would you please take a look and give your feedback/ suggestions.
>>>>
>>>> Regards,
>>>> Vivek
>>>>
>>>> -----Original Message-----
>>>> From: hotspot-compiler-dev
>>>> [mailto:[hidden email]] On Behalf Of
>>>> Deshpande, Vivek R
>>>> Sent: Friday, May 19, 2017 1:28 PM
>>>> To: Vladimir Kozlov <[hidden email]>; Berg, Michael C
>>>> <[hidden email]>; [hidden email]
>>>> Cc: Viswanathan, Sandhya <[hidden email]>
>>>> Subject: RE: RFR(S): 8175096: Use Subword Analysis for set vector
>>>> size
>>>>
>>>> Hi Vladimir
>>>>
>>>> Thanks for taking a look at the patch.
>>>> I will take a look at your example and work on improving the patch.
>>>> Also work on the suggestion of different vector elements counts for different types.
>>>>
>>>> Regards,
>>>> Vivek
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>> Sent: Friday, May 19, 2017 12:17 PM
>>>> To: Deshpande, Vivek R; Berg, Michael C;
>>>> [hidden email]
>>>> Cc: Viswanathan, Sandhya
>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>> size
>>>>
>>>> I have time to look on this again. And I don't see that patch helps. See my comments in the bug report.
>>>>
>>>> Regards,
>>>> Vladimir
>>>>
>>>> On 2/24/17 10:56 AM, Deshpande, Vivek R wrote:
>>>>> Hi Vladimir
>>>>>
>>>>> Subwords get converted to int by broadening for the operations on them and max_vector gets set according to int, even if there are no mixed type operations, leading to short vector width instructions.
>>>>>
>>>>> Regards,
>>>>> Vivek
>>>>>
>>>>>
>>>>> -----Original Message-----
>>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>>> Sent: Friday, February 24, 2017 10:07 AM
>>>>> To: Berg, Michael C; Deshpande, Vivek R;
>>>>> [hidden email]
>>>>> Cc: Viswanathan, Sandhya
>>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>>> size
>>>>>
>>>>> On 2/23/17 9:03 AM, Berg, Michael C wrote:
>>>>>> I think using the current approach is the best way, it find the consistent shared vectorsize that is optimal for the loop for unrolling to act to.  This additional approach augments that by some testing of constraints as the subword types are often occluded by sign and zero extension artifacts that would otherwise have caused the next common size to surface.  These subword types are only in the integer subtype domain.
>>>>>
>>>>> "optimal" has wide meaning :)
>>>>>
>>>>> Currently it means small number of unrolls in presence of wide
>>>>> (long,
>>>>> double) values.
>>>>>
>>>>> And if it is optimal why we need this additional fix (8175096)?
>>>>>
>>>>> I still did not get answer about why "Currently subword types cannot use entire vector width using SLP". In what cases this happen?
>>>>>
>>>>> Thanks,
>>>>> Vladimir
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Michael
>>>>>>
>>>>>> -----Original Message-----
>>>>>> From: Vladimir Kozlov [mailto:[hidden email]]
>>>>>> Sent: Wednesday, February 22, 2017 12:09 PM
>>>>>> To: Deshpande, Vivek R <[hidden email]>;
>>>>>> [hidden email]
>>>>>> Cc: Viswanathan, Sandhya <[hidden email]>; Berg,
>>>>>> Michael C <[hidden email]>
>>>>>> Subject: Re: RFR(S): 8175096: Use Subword Analysis for set vector
>>>>>> size
>>>>>>
>>>>>> Hi Vivek,
>>>>>>
>>>>>> This should go into jdk 10 since it is enhancement. We will consider to backport it into jdk 9 update release later.
>>>>>>
>>>>>> First, please explain why "Currently subword types cannot use entire vector width using SLP".
>>>>>>
>>>>>> The only explanation I have is that if loop has mixing types operations then we narrow unroll factor for biggest type:
>>>>>>
>>>>>>              if (cur_max_vector < max_vector) {
>>>>>>                max_vector = cur_max_vector;
>>>>>>
>>>>>> And we start with smallest type:
>>>>>>
>>>>>> int max_vector = Matcher::max_vector_size(T_BYTE);
>>>>>>
>>>>>> Should we do opposite and start from long T_LONG (small unroll factor) and widen it to smallest type (big unroll factor)?:
>>>>>>
>>>>>>              if (cur_max_vector > max_vector) {
>>>>>>                max_vector = cur_max_vector;
>>>>>>
>>>>>> Note, max_vector_size() returns number of elements and not size of vector in bytes.
>>>>>>
>>>>>> Michael, you are author of this code. What do you think?
>>>>>>
>>>>>> Thanks,
>>>>>> Vladimir
>>>>>>
>>>>>> On 2/16/17 11:45 AM, Deshpande, Vivek R wrote:
>>>>>>> Hi
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Currently subword types cannot use entire vector width using SLP.
>>>>>>>
>>>>>>> This fix analyzes the subword in the loop for possibility of
>>>>>>> narrowing and sets the vector size accordingly.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~vdeshpande/8175096/webrev.00/
>>>>>>>
>>>>>>> I have also updated the JBS entry.
>>>>>>>
>>>>>>> c
>>>>>>>
>>>>>>> Would you please review and sponsor it.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Vivek
>>>>>>>
>>>>>>>
>>>>>>>