RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

Xiaohong Gong
The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.

For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
  VectorLoadMask (VectorStoreMask vmask)
Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
vectors with the same element size and vector length, it's safe to do the optimization:
  VectorLoadMask (VectorStoreMask vmask) ==> vmask
The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.

-------------

Commit messages:
 - 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

Changes: https://git.openjdk.java.net/jdk/pull/3238/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3238&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264104
  Stats: 28 lines in 3 files changed: 27 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3238.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3238/head:pull/3238

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Xiaohong Gong
> The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.
>
> For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
> input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
>   VectorLoadMask (VectorStoreMask vmask)
> Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
> vectors with the same element size and vector length, it's safe to do the optimization:
>   VectorLoadMask (VectorStoreMask vmask) ==> vmask
> The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.

Xiaohong Gong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:

  8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3238/files
  - new: https://git.openjdk.java.net/jdk/pull/3238/files/af74280a..bf5b2028

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3238&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3238&range=00-01

  Stats: 0 lines in 0 files changed: 0 ins; 0 del; 0 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3238.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3238/head:pull/3238

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

Xiaohong Gong
In reply to this post by Xiaohong Gong
On Mon, 29 Mar 2021 06:00:46 GMT, Xiaohong Gong <[hidden email]> wrote:

> The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.
>
> For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
> input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
>   VectorLoadMask (VectorStoreMask vmask)
> Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
> vectors with the same element size and vector length, it's safe to do the optimization:
>   VectorLoadMask (VectorStoreMask vmask) ==> vmask
> The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.

Hi, could anyone please help to look at this PR? Thanks so much!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Vladimir Kozlov-2
In reply to this post by Xiaohong Gong
On Mon, 29 Mar 2021 06:28:42 GMT, Xiaohong Gong <[hidden email]> wrote:

>> The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.
>>
>> For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
>> input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
>>   VectorLoadMask (VectorStoreMask vmask)
>> Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
>> vectors with the same element size and vector length, it's safe to do the optimization:
>>   VectorLoadMask (VectorStoreMask vmask) ==> vmask
>> The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.
>
> Xiaohong Gong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
>   8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

@iwanowww should confirm correctness of such optimization.
Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Vladimir Ivanov-2
On Tue, 6 Apr 2021 15:54:33 GMT, Vladimir Kozlov <[hidden email]> wrote:

>> Xiaohong Gong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>>
>>   8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask
>
> @iwanowww should confirm correctness of such optimization.
> Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.

I'm not fond of the proposed approach.

It hard-codes some implicit assumptions about vector mask representation.  
Also, vector type mismatch can trigger asserts since some vector-related code expects that types of vector inputs match precisely).

I suggest to introduce artificial cast nodes (`VectorLoadMask (VectorStoreMask vmask) ==> VectorMaskCast (vmask)`) which are then lowered into no-ops on backend side.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Xiaohong Gong
On Wed, 7 Apr 2021 16:06:44 GMT, Vladimir Ivanov <[hidden email]> wrote:

>> @iwanowww should confirm correctness of such optimization.
>> Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.
>
> I'm not fond of the proposed approach.
>
> It hard-codes some implicit assumptions about vector mask representation.  
> Also, vector type mismatch can trigger asserts since some vector-related code expects that types of vector inputs match precisely).
>
> I suggest to introduce artificial cast nodes (`VectorLoadMask (VectorStoreMask vmask) ==> VectorMaskCast (vmask)`) which are then lowered into no-ops on backend side.

Hi @iwanowww , thanks for looking at this PR.

> It hard-codes some implicit assumptions about vector mask representation.
> Also, vector type mismatch can trigger asserts since some vector-related code expects that types of vector inputs match precisely).

Agree. I'm also anxious about the potential assertion although I didn't met the issue currently.

> I suggest to introduce artificial cast nodes (`VectorLoadMask (VectorStoreMask vmask) ==> VectorMaskCast (vmask)`) which are then lowered into no-ops on backend side.

It's a good idea to add a cast node for mask. I tried with it, and it can work well for the casting with same element size and vector   length. However, for the different element size (i.g. short->int) casting, I think the original `VectorLoadMask (VectorStoreMask vmask)` is better since there are some optimizations existed. Also using `VectorMaskCast` for all cases needs more match rules in the backend which I think is duplicated with `VectorLoadMask+VectorStoreMask`. So does it look good for you if the `VectorMaskCast` is limited to the masking cast with the same element size? The changes might look like:
diff --git a/src/hotspot/share/opto/vectornode.cpp b/src/hotspot/share/opto/vectornode.cpp
index 326b9c4a5a0..d7b53c247fb 100644
--- a/src/hotspot/share/opto/vectornode.cpp
+++ b/src/hotspot/share/opto/vectornode.cpp
@@ -1232,6 +1232,10 @@ Node* VectorUnboxNode::Ideal(PhaseGVN* phase, bool can_reshape) {
         bool is_vector_mask    = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorMask_klass());
         bool is_vector_shuffle = vbox_klass->is_subclass_of(ciEnv::current()->vector_VectorShuffle_klass());
         if (is_vector_mask) {
+          if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
+              Matcher::match_rule_supported(Op_VectorMaskCast)) {
+            return new VectorMaskCastNode(value, out_vt);
+          }
           // VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)
           value = phase->transform(VectorStoreMaskNode::make(*phase, value, in_vt->element_basic_type(), in_vt->length()));
           return new VectorLoadMaskNode(value, out_vt);

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Xiaohong Gong
In reply to this post by Vladimir Ivanov-2
On Wed, 7 Apr 2021 16:06:44 GMT, Vladimir Ivanov <[hidden email]> wrote:

>> @iwanowww should confirm correctness of such optimization.
>> Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.
>
> I'm not fond of the proposed approach.
>
> It hard-codes some implicit assumptions about vector mask representation.  
> Also, vector type mismatch can trigger asserts since some vector-related code expects that types of vector inputs match precisely).
>
> I suggest to introduce artificial cast nodes (`VectorLoadMask (VectorStoreMask vmask) ==> VectorMaskCast (vmask)`) which are then lowered into no-ops on backend side.

> @iwanowww should confirm correctness of such optimization.
> Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.

Thanks for looking at this PR @vnkozlov . To be honest, I'v no idea about the TOP checking issue to the inputs of the VectorNode. Hope @iwanowww  could explain more. Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Vladimir Ivanov-2
In reply to this post by Xiaohong Gong
On Thu, 8 Apr 2021 03:41:40 GMT, Xiaohong Gong <[hidden email]> wrote:

> So does it look good for you if the VectorMaskCast is limited to the masking cast with the same element size?

Yes, I'm fine with focusing on no-op case for now.
Moreover, both vector length and element size should match to reuse the very same bit representation of the mask.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Vladimir Ivanov-2
In reply to this post by Xiaohong Gong
On Thu, 8 Apr 2021 03:44:41 GMT, Xiaohong Gong <[hidden email]> wrote:

>  I notice that VectorNode and its subclasses do not check for TOP inputs.
> Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead.
> How such cases handled? And why we did not hit them yet? is_vect() should hit assert.

`VectorLoadMaskNode::Identity()` can't observe TOP types because it uses the type cached at construction (`type()` and `VectorNode` extends `TypeNode`). Still, a TOP input is possible and should be filtered out by opcode check (`in(1)->Opcode() == Op_VectorStoreMask`).

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Vladimir Kozlov-2
In reply to this post by Xiaohong Gong
On Mon, 29 Mar 2021 06:28:42 GMT, Xiaohong Gong <[hidden email]> wrote:

>> The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.
>>
>> For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
>> input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
>>   VectorLoadMask (VectorStoreMask vmask)
>> Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
>> vectors with the same element size and vector length, it's safe to do the optimization:
>>   VectorLoadMask (VectorStoreMask vmask) ==> vmask
>> The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.
>
> Xiaohong Gong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>
>   8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask

Okay.

-------------

Marked as reviewed by kvn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Xiaohong Gong
In reply to this post by Xiaohong Gong
> The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.
>
> For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
> input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
>   VectorLoadMask (VectorStoreMask vmask)
> Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
> vectors with the same element size and vector length, it's safe to do the optimization:
>   VectorLoadMask (VectorStoreMask vmask) ==> vmask
> The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.

Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:

  Revert changes to VectorLoadMask and add a VectorMaskCast for the same element size mask casting

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3238/files
  - new: https://git.openjdk.java.net/jdk/pull/3238/files/bf5b2028..ce3577ae

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3238&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3238&range=01-02

  Stats: 121 lines in 10 files changed: 92 ins; 27 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3238.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3238/head:pull/3238

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Xiaohong Gong
In reply to this post by Vladimir Kozlov-2
On Thu, 8 Apr 2021 15:00:10 GMT, Vladimir Kozlov <[hidden email]> wrote:

>> Xiaohong Gong has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>>
>>   8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask
>
> Okay.

Hi @iwanowww , I'v updated the codes to use `VectorMaskCast` for vector mask casting with the same element size and vector length. Would you mind having a look again? Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Vladimir Ivanov-2
In reply to this post by Xiaohong Gong
On Fri, 9 Apr 2021 07:43:50 GMT, Xiaohong Gong <[hidden email]> wrote:

>> The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.
>>
>> For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
>> input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
>>   VectorLoadMask (VectorStoreMask vmask)
>> Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
>> vectors with the same element size and vector length, it's safe to do the optimization:
>>   VectorLoadMask (VectorStoreMask vmask) ==> vmask
>> The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.
>
> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>
>   Revert changes to VectorLoadMask and add a VectorMaskCast for the same element size mask casting

Overall, looks good.

src/hotspot/share/opto/vectornode.cpp line 1236:

> 1234:         if (is_vector_mask) {
> 1235:           if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
> 1236:               Matcher::match_rule_supported(Op_VectorMaskCast)) {

There's `Matcher::match_rule_supported_vector()` specifically for vector nodes.

src/hotspot/share/opto/vectornode.cpp line 1237:

> 1235:           if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
> 1236:               Matcher::match_rule_supported(Op_VectorMaskCast)) {
> 1237:             // VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)

It's better to implement it as a 2-step transformation and place it in `VectorLoadMaskNode::Ideal()`:
VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask) => VectorMaskCast (vmask)

src/hotspot/share/opto/vectornode.hpp line 1247:

> 1245:     assert(type2aelembytes(in_vt->element_basic_type()) == type2aelembytes(vt->element_basic_type()), "element size must match");
> 1246:   }
> 1247:

FTR there's a way to avoid platform-specific changes (in AD files) if we don't envision any non-trivial conversions to be supported: just disable matching of the node by overriding `Node::ideal_reg()`:
virtual uint ideal_reg() const { return 0; } // not matched in the AD file

-------------

Marked as reviewed by vlivanov (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Xiaohong Gong
On Fri, 9 Apr 2021 21:52:45 GMT, Vladimir Ivanov <[hidden email]> wrote:

>> Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Revert changes to VectorLoadMask and add a VectorMaskCast for the same element size mask casting
>
> src/hotspot/share/opto/vectornode.cpp line 1236:
>
>> 1234:         if (is_vector_mask) {
>> 1235:           if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
>> 1236:               Matcher::match_rule_supported(Op_VectorMaskCast)) {
>
> There's `Matcher::match_rule_supported_vector()` specifically for vector nodes.

OK, I will use this instead. Thanks!

> src/hotspot/share/opto/vectornode.cpp line 1237:
>
>> 1235:           if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
>> 1236:               Matcher::match_rule_supported(Op_VectorMaskCast)) {
>> 1237:             // VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)
>
> It's better to implement it as a 2-step transformation and place it in `VectorLoadMaskNode::Ideal()`:
> VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask) => VectorMaskCast (vmask)

Thanks for your comments. Yes, theoretically it's better to place it in `VectorLoadMaskNode::Ideal()`. Unfortunately, we met an issue that is related to optimization for `VectorStoreMask`. Considering the following case:
   LoadVector                                              LoadVector
           |                                                    |
   VectorLoadMask  (double)                          VectorLoadMask (double)
               |                                               |
           VectorUnbox   (long)          ==>           VectorStoreMask  (double)
                                                               |
                                                      VectorLoadMask  (long)
This case loads the masking values for a double type, and does a bitwise `and` operation. Since the type is mismatched, the compiler will try to do `VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)`. However, since there is the transformation `VectorStoreMask (VectorLoadMask value)  ==> value`, the above `VectorStoreMaskNode` will be optimized out. The final graph looks like:

   LoadVector                                                  LoadVector
          |                                                      /      \
        VectorLoadMask  (double)                               /         \
              |                      ==>     VectorLoadMask (double)      \
         VectorStoreMask (double)                                   VectorLoadMask (long)
                   |
            VectorLoadMask (long)
Since the two `VectorLoadMaskNode` have different element type, the GVN cannot optimize out one. So finally there will be two similar  `VectorLoadMaskNode`s. That's also why I override the `cmp/hash` for `VectorLoadMaskNode` in the first version.

So I prefer to add `VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)` directly.

> src/hotspot/share/opto/vectornode.hpp line 1247:
>
>> 1245:     assert(type2aelembytes(in_vt->element_basic_type()) == type2aelembytes(vt->element_basic_type()), "element size must match");
>> 1246:   }
>> 1247:
>
> FTR there's a way to avoid platform-specific changes (in AD files) if we don't envision any non-trivial conversions to be supported: just disable matching of the node by overriding `Node::ideal_reg()`:
> virtual uint ideal_reg() const { return 0; } // not matched in the AD file

Good idea, I will try this way. Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Xiaohong Gong
On Mon, 12 Apr 2021 07:02:38 GMT, Xiaohong Gong <[hidden email]> wrote:

>> src/hotspot/share/opto/vectornode.hpp line 1247:
>>
>>> 1245:     assert(type2aelembytes(in_vt->element_basic_type()) == type2aelembytes(vt->element_basic_type()), "element size must match");
>>> 1246:   }
>>> 1247:
>>
>> FTR there's a way to avoid platform-specific changes (in AD files) if we don't envision any non-trivial conversions to be supported: just disable matching of the node by overriding `Node::ideal_reg()`:
>> virtual uint ideal_reg() const { return 0; } // not matched in the AD file
>
> Good idea, I will try this way. Thanks!

I met the bad AD file issue when I remove the changes in AD files and overriding `Node::ideal_reg()`. I guess if a node is not used as an input of other node, this can work well? For the `VectorMaskCastNode`, it must be an input for some other nodes. If just disable the matching of this node, how does its usage find the right input code?

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Vladimir Ivanov-2
On Mon, 12 Apr 2021 07:53:36 GMT, Xiaohong Gong <[hidden email]> wrote:

>> Good idea, I will try this way. Thanks!
>
> I met the bad AD file issue when I remove the changes in AD files and overriding `Node::ideal_reg()`. I guess if a node is not used as an input of other node, this can work well? For the `VectorMaskCastNode`, it must be an input for some other nodes. If just disable the matching of this node, how does its usage find the right input code?

Yeah, `ideal_reg() { return 0; }` won't work here. Sorry for the misleading suggestion.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Vladimir Ivanov-2
In reply to this post by Xiaohong Gong
On Mon, 12 Apr 2021 07:00:58 GMT, Xiaohong Gong <[hidden email]> wrote:

>> src/hotspot/share/opto/vectornode.cpp line 1237:
>>
>>> 1235:           if (in_vt->length_in_bytes() == out_vt->length_in_bytes() &&
>>> 1236:               Matcher::match_rule_supported(Op_VectorMaskCast)) {
>>> 1237:             // VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)
>>
>> It's better to implement it as a 2-step transformation and place it in `VectorLoadMaskNode::Ideal()`:
>>
>> VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask) => VectorMaskCast (vmask)
>
> Thanks for your comments. Yes, theoretically it's better to place it in `VectorLoadMaskNode::Ideal()`. Unfortunately, we met an issue that is related to optimization for `VectorStoreMask`. Considering the following case:
>
>    LoadVector                                              LoadVector
>            |                                                    |
>    VectorLoadMask  (double)                          VectorLoadMask (double)
>                |                                               |
>            VectorUnbox   (long)          ==>           VectorStoreMask  (double)
>                                                                |
>                                                       VectorLoadMask  (long)
>
> This case loads the masking values for a double type, and does a bitwise `and` operation. Since the type is mismatched, the compiler will try to do `VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)`. However, since there is the transformation `VectorStoreMask (VectorLoadMask value)  ==> value`, the above `VectorStoreMaskNode` will be optimized out. The final graph looks like:
>
>
>    LoadVector                                                  LoadVector
>           |                                                      /      \
>         VectorLoadMask  (double)                               /         \
>               |                      ==>     VectorLoadMask (double)      \
>          VectorStoreMask (double)                                   VectorLoadMask (long)
>                    |
>             VectorLoadMask (long)
>
> Since the two `VectorLoadMaskNode` have different element type, the GVN cannot optimize out one. So finally there will be two similar  `VectorLoadMaskNode`s. That's also why I override the `cmp/hash` for `VectorLoadMaskNode` in the first version.
>
> So I prefer to add `VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)` directly.

Ok, so you face a transformation ordering problem here.

By working on `VectorUnbox (VectorBox vmask)` you effectively delay `VectorStoreMask (VectorLoadMask vmask) => vmask` transformation.

As an alternative you could:

(1) check for `VectorLoadMask` users before applying `VectorStoreMask (VectorLoadMask vmask) => vmask`;

(2) nest adjacent casts:

VectorLoadMask #double (1 LoadVector)
VectorLoadMask #long   (1 LoadVector)
==>
VectorMaskCast #long (VectorLoadMask #double (1 LoadVector)


The latter looks more powerful (and hence preferrable), but I'm fine with what you have right now. (It can be enhanced later.)
Please, leave a comment describing the motivation for doing the transformation directly on `VectorUnbox (VectorBox ...)`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v4]

Xiaohong Gong
In reply to this post by Xiaohong Gong
> The Vector API defines different element types for floating point VectorMask. For example, the bitwise related APIs use "`long/int`", while data related APIs use "`double/float`". This makes some optimizations that based on the type checking cannot work well.
>
> For example, the VectorBox/Unbox elimination like `"VectorUnbox (VectorBox v) ==> v"` requires the types of output and
> input are equal. Normally this is necessary. However, due to the different element type for floating point VectorMask with the same species, the VectorBox/Unbox pattern is optimized to:
>
>   VectorLoadMask (VectorStoreMask vmask)
>
> Actually the types can be treated as the same one for such cases. And considering the vector mask representation is the same for
> vectors with the same element size and vector length, it's safe to do the optimization:
>
>   VectorLoadMask (VectorStoreMask vmask) ==> vmask
>
> The same issue exists for GVN which is based on the type of a node. Making the mask node's` hash()/cmp()` methods depends on the element size instead of the element type can fix it.

Xiaohong Gong has updated the pull request incrementally with one additional commit since the last revision:

  Use "Matcher::match_rule_supported_vector" for vector nodes checking

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/3238/files
  - new: https://git.openjdk.java.net/jdk/pull/3238/files/ce3577ae..977787e4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=3238&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=3238&range=02-03

  Stats: 4 lines in 1 file changed: 2 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3238.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3238/head:pull/3238

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v2]

Xiaohong Gong
In reply to this post by Vladimir Ivanov-2
On Thu, 8 Apr 2021 10:14:21 GMT, Vladimir Ivanov <[hidden email]> wrote:

>>> @iwanowww should confirm correctness of such optimization.
>>> Regarding changes - they seem fine to me. I notice that VectorNode and its subclasses do not check for TOP inputs. Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead. How such cases handled? And why we did not hit them yet? is_vect() should hit assert.
>>
>> Thanks for looking at this PR @vnkozlov . To be honest, I'v no idea about the TOP checking issue to the inputs of the VectorNode. Hope @iwanowww  could explain more. Thanks!
>
>>  I notice that VectorNode and its subclasses do not check for TOP inputs.
>> Since Vector API introduce vectors in graph before SuperWord transformation their input could become dead.
>> How such cases handled? And why we did not hit them yet? is_vect() should hit assert.
>
> `VectorLoadMaskNode::Identity()` can't observe TOP types because it uses the type cached at construction (`type()` and `VectorNode` extends `TypeNode`). Still, a TOP input is possible and should be filtered out by opcode check (`in(1)->Opcode() == Op_VectorStoreMask`).

Hi @iwanowww , all your review comments have been addressed. Would you mind having a look at it again? Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264104: Eliminate unnecessary vector mask conversion during VectorUnbox for floating point VectorMask [v3]

Xiaohong Gong
In reply to this post by Vladimir Ivanov-2
On Mon, 12 Apr 2021 16:50:04 GMT, Vladimir Ivanov <[hidden email]> wrote:

>> Thanks for your comments. Yes, theoretically it's better to place it in `VectorLoadMaskNode::Ideal()`. Unfortunately, we met an issue that is related to optimization for `VectorStoreMask`. Considering the following case:
>>
>>    LoadVector                                              LoadVector
>>            |                                                    |
>>    VectorLoadMask  (double)                          VectorLoadMask (double)
>>                |                                               |
>>            VectorUnbox   (long)          ==>           VectorStoreMask  (double)
>>                                                                |
>>                                                       VectorLoadMask  (long)
>>
>> This case loads the masking values for a double type, and does a bitwise `and` operation. Since the type is mismatched, the compiler will try to do `VectorUnbox (VectorBox vmask) ==> VectorLoadMask (VectorStoreMask vmask)`. However, since there is the transformation `VectorStoreMask (VectorLoadMask value)  ==> value`, the above `VectorStoreMaskNode` will be optimized out. The final graph looks like:
>>
>>
>>    LoadVector                                                  LoadVector
>>           |                                                      /      \
>>         VectorLoadMask  (double)                               /         \
>>               |                      ==>     VectorLoadMask (double)      \
>>          VectorStoreMask (double)                                   VectorLoadMask (long)
>>                    |
>>             VectorLoadMask (long)
>>
>> Since the two `VectorLoadMaskNode` have different element type, the GVN cannot optimize out one. So finally there will be two similar  `VectorLoadMaskNode`s. That's also why I override the `cmp/hash` for `VectorLoadMaskNode` in the first version.
>>
>> So I prefer to add `VectorUnbox (VectorBox vmask) ==> VectorMaskCast (vmask)` directly.
>
> Ok, so you face a transformation ordering problem here.
>
> By working on `VectorUnbox (VectorBox vmask)` you effectively delay `VectorStoreMask (VectorLoadMask vmask) => vmask` transformation.
>
> As an alternative you could:
>
> (1) check for `VectorLoadMask` users before applying `VectorStoreMask (VectorLoadMask vmask) => vmask`;
>
> (2) nest adjacent casts:
>
> VectorLoadMask #double (1 LoadVector)
> VectorLoadMask #long   (1 LoadVector)
> ==>
> VectorMaskCast #long (VectorLoadMask #double (1 LoadVector)
>
>
> The latter looks more powerful (and hence preferrable), but I'm fine with what you have right now. (It can be enhanced later.)
> Please, leave a comment describing the motivation for doing the transformation directly on `VectorUnbox (VectorBox ...)`.

Thanks for your alternative advice! I prefer to keep the code as it is right now. Also the comments have been added. Thanks!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3238
12