RFR: 8260355: AArch64: deoptimization stub should save vector registers

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

RFR: 8260355: AArch64: deoptimization stub should save vector registers

Nick Gasson
This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
doesn't save vector registers on x86". The problem is that a vector
produced by the Vector API may be stored in a register when the deopt
blob is called. Because the deopt blob only stores the lower half of
vector registers, the full vector object cannot be rematerialized during
deoptimization. So the following will crash on AArch64 with current JDK:

  make test TEST="jdk/incubator/vector" \
    JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"

The fix is to store the full vector registers by passing
save_vectors=true to save_live_registers() in the deopt blob. Because
save_live_registers() places the integer registers above the floating
registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
to calculate the SP offset based on whether full vectors were saved, and
whether those vectors were NEON or SVE, rather than using a static
offset as it does currently.

The change to VectorSupport::allocate_vector_payload_helper() is
required because we only store the lowest VMReg slot in the oop map.
However unlike x86 the vector registers are always saved in a contiguous
region of memory, so we can calculate the address of each vector element
as an offset from the address of the first slot. X86 handles this in
RegisterMap::pd_location() but that won't work on AArch64 because with
SVE there isn't a unique VMReg corresponding to each four-byte physical
slot in the vector (there are always exactly eight logical VMRegs
regardless of the actual vector length).

Tested hotspot_all_no_apps and jdk_core.

-------------

Commit messages:
 - 8260355: AArch64: deoptimization stub should save vector registers

Changes: https://git.openjdk.java.net/jdk/pull/2279/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8260355
  Stats: 88 lines in 2 files changed: 42 ins; 23 del; 23 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2279.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Andrew Haley
On 1/28/21 8:31 AM, Nick Gasson wrote:
> The fix is to store the full vector registers by passing
> save_vectors=true to save_live_registers() in the deopt blob. Because
> save_live_registers() places the integer registers above the floating
> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
> to calculate the SP offset based on whether full vectors were saved, and
> whether those vectors were NEON or SVE, rather than using a static
> offset as it does currently.

It seems to me that save_vectors is only set here:

   bool save_vectors = COMPILER2_OR_JVMCI != 0;

which means that save_vectors is a static property of a build, not something
that can vary. Therefore, there is nothing to be gained by passing save_vectors
around. Could we simply have a save_vectors_on_deopt() function?

Also, I'm wondering how much all of this complexity gains us for the sake
of configurations without C2. I'd be very tempted to just leave a hole in
the stack for these systems. How much stack would that cost?

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Vladimir Ivanov-2
In reply to this post by Nick Gasson
On Thu, 28 Jan 2021 08:27:22 GMT, Nick Gasson <[hidden email]> wrote:

> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
> doesn't save vector registers on x86". The problem is that a vector
> produced by the Vector API may be stored in a register when the deopt
> blob is called. Because the deopt blob only stores the lower half of
> vector registers, the full vector object cannot be rematerialized during
> deoptimization. So the following will crash on AArch64 with current JDK:
>
>   make test TEST="jdk/incubator/vector" \
>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>
> The fix is to store the full vector registers by passing
> save_vectors=true to save_live_registers() in the deopt blob. Because
> save_live_registers() places the integer registers above the floating
> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
> to calculate the SP offset based on whether full vectors were saved, and
> whether those vectors were NEON or SVE, rather than using a static
> offset as it does currently.
>
> The change to VectorSupport::allocate_vector_payload_helper() is
> required because we only store the lowest VMReg slot in the oop map.
> However unlike x86 the vector registers are always saved in a contiguous
> region of memory, so we can calculate the address of each vector element
> as an offset from the address of the first slot. X86 handles this in
> RegisterMap::pd_location() but that won't work on AArch64 because with
> SVE there isn't a unique VMReg corresponding to each four-byte physical
> slot in the vector (there are always exactly eight logical VMRegs
> regardless of the actual vector length).
>
> Tested hotspot_all_no_apps and jdk_core.

src/hotspot/share/prims/vectorSupport.cpp line 138:

> 136:
> 137:     for (int i = 0; i < num_elem; i++) {
> 138:       bool contiguous = X86_ONLY(false) NOT_X86(true);

I don't like this change. It's not x86-specific, but SVE-specific code. What is broken here is `VMReg::next()` doesn't work properly for `VecA` registers. And, as a result, it makes `RegisterMap::location(VMReg)` unusable as well.

So, a proper fix should address that instead. If there's no way to save `VMReg::next()` and `RegisterMap::location(VMReg)`, then new cross-platform API should be introduced and `VectorSupport::allocate_vector_payload_helper()` migrated to it.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Ningsheng Jian-2
On Thu, 28 Jan 2021 10:27:04 GMT, Vladimir Ivanov <[hidden email]> wrote:

>> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
>> doesn't save vector registers on x86". The problem is that a vector
>> produced by the Vector API may be stored in a register when the deopt
>> blob is called. Because the deopt blob only stores the lower half of
>> vector registers, the full vector object cannot be rematerialized during
>> deoptimization. So the following will crash on AArch64 with current JDK:
>>
>>   make test TEST="jdk/incubator/vector" \
>>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>>
>> The fix is to store the full vector registers by passing
>> save_vectors=true to save_live_registers() in the deopt blob. Because
>> save_live_registers() places the integer registers above the floating
>> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
>> to calculate the SP offset based on whether full vectors were saved, and
>> whether those vectors were NEON or SVE, rather than using a static
>> offset as it does currently.
>>
>> The change to VectorSupport::allocate_vector_payload_helper() is
>> required because we only store the lowest VMReg slot in the oop map.
>> However unlike x86 the vector registers are always saved in a contiguous
>> region of memory, so we can calculate the address of each vector element
>> as an offset from the address of the first slot. X86 handles this in
>> RegisterMap::pd_location() but that won't work on AArch64 because with
>> SVE there isn't a unique VMReg corresponding to each four-byte physical
>> slot in the vector (there are always exactly eight logical VMRegs
>> regardless of the actual vector length).
>>
>> Tested hotspot_all_no_apps and jdk_core.
>
> src/hotspot/share/prims/vectorSupport.cpp line 138:
>
>> 136:
>> 137:     for (int i = 0; i < num_elem; i++) {
>> 138:       bool contiguous = X86_ONLY(false) NOT_X86(true);
>
> I don't like this change. It's not x86-specific, but SVE-specific code. What is broken here is `VMReg::next()` doesn't work properly for `VecA` registers. And, as a result, it makes `RegisterMap::location(VMReg)` unusable as well.
>
> So, a proper fix should address that instead. If there's no way to save `VMReg::next()` and `RegisterMap::location(VMReg)`, then new cross-platform API should be introduced and `VectorSupport::allocate_vector_payload_helper()` migrated to it.

For Arm NEON (and PPC) we don't set VMReg::next() in oopmap either, and their vector slots are contiguous, so that's x86-specific? But yes, NEON can also generate correct full oopmap as for fixed vector size. For SVE, I have no idea to have proper VMReg::next() support, so Nick's solution looks good to me. Regarding to introducing new cross-platform API, which API do you mean? If we could have some better api, that would be perfect. Currently, allocate_vector_payload_helper() is the only one I can see that is vector related for RegisterMap::location() call.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Vladimir Ivanov-2
On Thu, 28 Jan 2021 11:17:16 GMT, Ningsheng Jian <[hidden email]> wrote:

>> src/hotspot/share/prims/vectorSupport.cpp line 138:
>>
>>> 136:
>>> 137:     for (int i = 0; i < num_elem; i++) {
>>> 138:       bool contiguous = X86_ONLY(false) NOT_X86(true);
>>
>> I don't like this change. It's not x86-specific, but SVE-specific code. What is broken here is `VMReg::next()` doesn't work properly for `VecA` registers. And, as a result, it makes `RegisterMap::location(VMReg)` unusable as well.
>>
>> So, a proper fix should address that instead. If there's no way to save `VMReg::next()` and `RegisterMap::location(VMReg)`, then new cross-platform API should be introduced and `VectorSupport::allocate_vector_payload_helper()` migrated to it.
>
> For Arm NEON (and PPC) we don't set VMReg::next() in oopmap either, and their vector slots are contiguous, so that's x86-specific? But yes, NEON can also generate correct full oopmap as for fixed vector size. For SVE, I have no idea to have proper VMReg::next() support, so Nick's solution looks good to me. Regarding to introducing new cross-platform API, which API do you mean? If we could have some better api, that would be perfect. Currently, allocate_vector_payload_helper() is the only one I can see that is vector related for RegisterMap::location() call.

Probably, x86 unique in using non-contiguous representation for vector values, but it doesn't make the code in question x86-specific. AArch64 is the only user of`VecA` and `VecA` is the only register type which has a mismatch in size between in-memory and RegMask representation. So, I conclude it is AArch64/SVE-specific.

On x86 RegisterMap isn't fully populated for vector registers as well, but there's`RegisterMap::pd_location()` to cover that.

Regarding new API, I mean the alternative to `VMReg::next()`/`RegisterMap::location(VMReg)` which is able to handle `VecA` case well. As Nick pointed out earlier, the problem with `VecA` is that there's no `VMReg` representation for all the slots which comprise the register value.

Either enhancing `VMReg::next(int)` to produce special values for `VecA` case or introducing `RegisterMap::location(VMReg base_reg, int slot)` is a better way to handle the problem.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Nick Gasson (Arm Technology China)
In reply to this post by Andrew Haley
On 01/28/21 17:48 pm, Andrew Haley wrote:
>
> It seems to me that save_vectors is only set here:
>
>    bool save_vectors = COMPILER2_OR_JVMCI != 0;
>
> which means that save_vectors is a static property of a build, not something
> that can vary. Therefore, there is nothing to be gained by passing save_vectors
> around. Could we simply have a save_vectors_on_deopt() function?
>

RegisterSaver is also used by generate_resolve_blob (which never saves
vectors) and generate_handler_blob (which saves vectors when poll_type
== POLL_AT_VECTOR_LOOP). How about we make RegisterSaver a class you can
instantiate, and pass whether to save vectors or not to the constructor?
That would look like:

  RegisterSaver reg_save(COMPILER2_OR_JVMCI != 0 /* save_vectors */);
  map = reg_save.save_live_registers(masm, 0, &frame_size_in_words);
  ...
  reg_save.restore_live_registers(masm);

Which avoids passing save_vectors around everywhere.

> Also, I'm wondering how much all of this complexity gains us for the sake
> of configurations without C2. I'd be very tempted to just leave a hole in
> the stack for these systems. How much stack would that cost?

For NEON the difference is 768 bytes vs 512, but SVE could be a lot
more.

  83 // FIXME -- this is used by C1
  84 class RegisterSaver {

Do you remember what this is referring to? That it's duplicating
save_live_registers() in c1_Runtime_aarch64.cpp?

--
Thanks,
Nick
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Andrew Haley
On 1/29/21 7:53 AM, Nick Gasson wrote:

> On 01/28/21 17:48 pm, Andrew Haley wrote:
>>
>> It seems to me that save_vectors is only set here:
>>
>>    bool save_vectors = COMPILER2_OR_JVMCI != 0;
>>
>> which means that save_vectors is a static property of a build, not something
>> that can vary. Therefore, there is nothing to be gained by passing save_vectors
>> around. Could we simply have a save_vectors_on_deopt() function?
>>
>
> RegisterSaver is also used by generate_resolve_blob (which never saves
> vectors) and generate_handler_blob (which saves vectors when poll_type
> == POLL_AT_VECTOR_LOOP). How about we make RegisterSaver a class you can
> instantiate, and pass whether to save vectors or not to the constructor?
> That would look like:
>
>   RegisterSaver reg_save(COMPILER2_OR_JVMCI != 0 /* save_vectors */);
>   map = reg_save.save_live_registers(masm, 0, &frame_size_in_words);
>   ...
>   reg_save.restore_live_registers(masm);
>
> Which avoids passing save_vectors around everywhere.

That sounds like a great improvement.

>> Also, I'm wondering how much all of this complexity gains us for the sake
>> of configurations without C2. I'd be very tempted to just leave a hole in
>> the stack for these systems. How much stack would that cost?
>
> For NEON the difference is 768 bytes vs 512, but SVE could be a lot
> more.

OK, so it probably wouldn't be worth doing on NEON. But A64FX vectors are 64
bytes each, right? So 2kbytes to save, in which case we need this variable-size
register-save area.

>   83 // FIXME -- this is used by C1
>   84 class RegisterSaver {
>
> Do you remember what this is referring to? That it's duplicating
> save_live_registers() in c1_Runtime_aarch64.cpp?

Probably, yes.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v2]

Nick Gasson
In reply to this post by Nick Gasson
> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
> doesn't save vector registers on x86". The problem is that a vector
> produced by the Vector API may be stored in a register when the deopt
> blob is called. Because the deopt blob only stores the lower half of
> vector registers, the full vector object cannot be rematerialized during
> deoptimization. So the following will crash on AArch64 with current JDK:
>
>   make test TEST="jdk/incubator/vector" \
>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>
> The fix is to store the full vector registers by passing
> save_vectors=true to save_live_registers() in the deopt blob. Because
> save_live_registers() places the integer registers above the floating
> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
> to calculate the SP offset based on whether full vectors were saved, and
> whether those vectors were NEON or SVE, rather than using a static
> offset as it does currently.
>
> The change to VectorSupport::allocate_vector_payload_helper() is
> required because we only store the lowest VMReg slot in the oop map.
> However unlike x86 the vector registers are always saved in a contiguous
> region of memory, so we can calculate the address of each vector element
> as an offset from the address of the first slot. X86 handles this in
> RegisterMap::pd_location() but that won't work on AArch64 because with
> SVE there isn't a unique VMReg corresponding to each four-byte physical
> slot in the vector (there are always exactly eight logical VMRegs
> regardless of the actual vector length).
>
> Tested hotspot_all_no_apps and jdk_core.

Nick Gasson has updated the pull request incrementally with one additional commit since the last revision:

  Move SVE slot handling to RegisterMap::pd_location

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2279/files
  - new: https://git.openjdk.java.net/jdk/pull/2279/files/83d07c58..2364f3dd

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=00-01

  Stats: 222 lines in 20 files changed: 136 ins; 13 del; 73 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2279.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v2]

Nick Gasson
In reply to this post by Vladimir Ivanov-2
On Thu, 28 Jan 2021 12:12:32 GMT, Vladimir Ivanov <[hidden email]> wrote:

>> For Arm NEON (and PPC) we don't set VMReg::next() in oopmap either, and their vector slots are contiguous, so that's x86-specific? But yes, NEON can also generate correct full oopmap as for fixed vector size. For SVE, I have no idea to have proper VMReg::next() support, so Nick's solution looks good to me. Regarding to introducing new cross-platform API, which API do you mean? If we could have some better api, that would be perfect. Currently, allocate_vector_payload_helper() is the only one I can see that is vector related for RegisterMap::location() call.
>
> Probably, x86 is unique in using non-contiguous representation for vector values, but it doesn't make the code in question x86-specific. AArch64 is the only user of`VecA` and `VecA` is the only register type that has a mismatch in size between in-memory and RegMask representation. So, I conclude it is AArch64/SVE-specific.
>
> On x86 RegisterMap isn't fully populated for vector registers as well, but there's`RegisterMap::pd_location()` to cover that.
>
> Regarding new API, I mean the alternative to `VMReg::next()`/`RegisterMap::location(VMReg)` which is able to handle `VecA` case well. As Nick pointed out earlier, the problem with `VecA` is that there's no `VMReg` representation for all the slots which comprise the register value.
>
> Either enhancing `VMReg::next(int)` to produce special values for `VecA` case or introducing `RegisterMap::location(VMReg base_reg, int slot)` is a better way to handle the problem.

@iwanowww please take a look at the latest set of changes and let me know what you think. There's now a `RegisterMap::location(VMReg base_reg, int slot)` method as you suggest. That in turn uses a new method `VMReg::is_expressible(int slot_delta)` which is true if offsetting a VMReg by slot_delta slots gives another valid VMReg which is also a slot of the same physical register (i.e. `reg->next(slot_delta)` is valid). We can use this to fall back to `pd_location` if a slot of a vector is not expressible as a VMReg (i.e. for SVE). Unfortunately it touches a lot of files but that seems unavoidable.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v3]

Nick Gasson
In reply to this post by Nick Gasson
> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
> doesn't save vector registers on x86". The problem is that a vector
> produced by the Vector API may be stored in a register when the deopt
> blob is called. Because the deopt blob only stores the lower half of
> vector registers, the full vector object cannot be rematerialized during
> deoptimization. So the following will crash on AArch64 with current JDK:
>
>   make test TEST="jdk/incubator/vector" \
>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>
> The fix is to store the full vector registers by passing
> save_vectors=true to save_live_registers() in the deopt blob. Because
> save_live_registers() places the integer registers above the floating
> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
> to calculate the SP offset based on whether full vectors were saved, and
> whether those vectors were NEON or SVE, rather than using a static
> offset as it does currently.
>
> The change to VectorSupport::allocate_vector_payload_helper() is
> required because we only store the lowest VMReg slot in the oop map.
> However unlike x86 the vector registers are always saved in a contiguous
> region of memory, so we can calculate the address of each vector element
> as an offset from the address of the first slot. X86 handles this in
> RegisterMap::pd_location() but that won't work on AArch64 because with
> SVE there isn't a unique VMReg corresponding to each four-byte physical
> slot in the vector (there are always exactly eight logical VMRegs
> regardless of the actual vector length).
>
> Tested hotspot_all_no_apps and jdk_core.

Nick Gasson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:

 - Merge branch 'master' into 8260355
 - Move SVE slot handling to RegisterMap::pd_location
 - 8260355: AArch64: deoptimization stub should save vector registers
   
   This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
   doesn't save vector registers on x86". The problem is that a vector
   produced by the Vector API may be stored in a register when the deopt
   blob is called. Because the deopt blob only stores the lower half of
   vector registers, the full vector object cannot be rematerialized during
   deoptimization. So the following will crash on AArch64 with current JDK:
   
     make test TEST="jdk/incubator/vector" \
       JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
   
   The fix is to store the full vector registers by passing
   save_vectors=true to save_live_registers() in the deopt blob. Because
   save_live_registers() places the integer registers above the floating
   registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
   to calculate the SP offset based on whether full vectors were saved, and
   whether those vectors were NEON or SVE, rather than using a static
   offset as it does currently.
   
   The change to VectorSupport::allocate_vector_payload_helper() is
   required because we only store the lowest VMReg slot in the oop map.
   However unlike x86 the vector registers are always saved in a contiguous
   region of memory, so we can calculate the address of each vector element
   as an offset from the address of the first slot. X86 handles this in
   RegisterMap::pd_location() but that won't work on AArch64 because with
   SVE there isn't a unique VMReg corresponding to each four-byte physical
   slot in the vector (there are always exactly eight logical VMRegs
   regardless of the actual vector length).
   
   Tested hotspot_all_no_apps and jdk_core.

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2279/files
  - new: https://git.openjdk.java.net/jdk/pull/2279/files/2364f3dd..498310d4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=01-02

  Stats: 16985 lines in 325 files changed: 3685 ins; 4804 del; 8496 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2279.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Vladimir Ivanov-2
In reply to this post by Nick Gasson
On Thu, 28 Jan 2021 08:27:22 GMT, Nick Gasson <[hidden email]> wrote:

> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
> doesn't save vector registers on x86". The problem is that a vector
> produced by the Vector API may be stored in a register when the deopt
> blob is called. Because the deopt blob only stores the lower half of
> vector registers, the full vector object cannot be rematerialized during
> deoptimization. So the following will crash on AArch64 with current JDK:
>
>   make test TEST="jdk/incubator/vector" \
>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>
> The fix is to store the full vector registers by passing
> save_vectors=true to save_live_registers() in the deopt blob. Because
> save_live_registers() places the integer registers above the floating
> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
> to calculate the SP offset based on whether full vectors were saved, and
> whether those vectors were NEON or SVE, rather than using a static
> offset as it does currently.
>
> The change to VectorSupport::allocate_vector_payload_helper() is
> required because we only store the lowest VMReg slot in the oop map.
> However unlike x86 the vector registers are always saved in a contiguous
> region of memory, so we can calculate the address of each vector element
> as an offset from the address of the first slot. X86 handles this in
> RegisterMap::pd_location() but that won't work on AArch64 because with
> SVE there isn't a unique VMReg corresponding to each four-byte physical
> slot in the vector (there are always exactly eight logical VMRegs
> regardless of the actual vector length).
>
> Tested hotspot_all_no_apps and jdk_core.

Much better, thanks.

I suggest the following changes:

* please, leave original `location(VMReg reg)`/`pd_location(VMReg reg)` intact and introduce `(VMReg reg, uint slot_idx)` overloads;

* it's fair for `location(VMReg reg, uint slot_idx)` to require `reg` to  always be a base register:
address RegisterMap::location(VMReg base_reg, uint slot_idx) const {
  if (slot_idx > 0) {
    return pd_location(base_reg, slot_idx);
  } else {
    return location(base_reg);
}

* on all platforms except AArch64 define `pd_location(VMReg base_reg, int slot_idx)` as:
address RegisterMap::pd_location(VMReg base_reg, int slot_idx) const {
  return location(base_reg->next(slot_idx));
}

* on AArch64 check for vector register case and special-case it to:
address RegisterMap::pd_location(VMReg base_reg, int slot_idx) const {
  if (base_reg->is_FloatRegister()) {
    assert((base_reg->value() - ConcreteRegisterImpl::max_gpr) % FloatRegisterImpl::max_slots_per_register == 0, "not a base register");
    intptr_t offset_in_bytes = slot_idx * VMRegImpl::stack_slot_size;
    address base_location = location(base_reg);
    if (base_location != NULL) {
      return base_location + offset_in_bytes;
    }
  }
  return location(base_reg->next(slot_idx));
}
 

Or, as an alternative (since all the registers are stored contiguously on AArch64 anyway):
address RegisterMap::pd_location(VMReg base_reg, int slot_idx) const {
  intptr_t offset_in_bytes = slot_idx * VMRegImpl::stack_slot_size;
  address base_location = location(base_reg);
  if (base_location != NULL) {
    return base_location + offset_in_bytes;
  }
}

* keep the assert in `VMReg::next(int)` if you wish, but refactor it into an out-of-bounds check instead. Personally, I'd prefer to see it as a separate enhancement.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v3]

Vladimir Ivanov-2
In reply to this post by Nick Gasson
On Mon, 1 Feb 2021 09:36:05 GMT, Nick Gasson <[hidden email]> wrote:

>> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
>> doesn't save vector registers on x86". The problem is that a vector
>> produced by the Vector API may be stored in a register when the deopt
>> blob is called. Because the deopt blob only stores the lower half of
>> vector registers, the full vector object cannot be rematerialized during
>> deoptimization. So the following will crash on AArch64 with current JDK:
>>
>>   make test TEST="jdk/incubator/vector" \
>>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>>
>> The fix is to store the full vector registers by passing
>> save_vectors=true to save_live_registers() in the deopt blob. Because
>> save_live_registers() places the integer registers above the floating
>> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
>> to calculate the SP offset based on whether full vectors were saved, and
>> whether those vectors were NEON or SVE, rather than using a static
>> offset as it does currently.
>>
>> The change to VectorSupport::allocate_vector_payload_helper() is
>> required because we only store the lowest VMReg slot in the oop map.
>> However unlike x86 the vector registers are always saved in a contiguous
>> region of memory, so we can calculate the address of each vector element
>> as an offset from the address of the first slot. X86 handles this in
>> RegisterMap::pd_location() but that won't work on AArch64 because with
>> SVE there isn't a unique VMReg corresponding to each four-byte physical
>> slot in the vector (there are always exactly eight logical VMRegs
>> regardless of the actual vector length).
>>
>> Tested hotspot_all_no_apps and jdk_core.
>
> Nick Gasson has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains three additional commits since the last revision:
>
>  - Merge branch 'master' into 8260355
>  - Move SVE slot handling to RegisterMap::pd_location
>  - 8260355: AArch64: deoptimization stub should save vector registers
>    
>    This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
>    doesn't save vector registers on x86". The problem is that a vector
>    produced by the Vector API may be stored in a register when the deopt
>    blob is called. Because the deopt blob only stores the lower half of
>    vector registers, the full vector object cannot be rematerialized during
>    deoptimization. So the following will crash on AArch64 with current JDK:
>    
>      make test TEST="jdk/incubator/vector" \
>        JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>    
>    The fix is to store the full vector registers by passing
>    save_vectors=true to save_live_registers() in the deopt blob. Because
>    save_live_registers() places the integer registers above the floating
>    registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
>    to calculate the SP offset based on whether full vectors were saved, and
>    whether those vectors were NEON or SVE, rather than using a static
>    offset as it does currently.
>    
>    The change to VectorSupport::allocate_vector_payload_helper() is
>    required because we only store the lowest VMReg slot in the oop map.
>    However unlike x86 the vector registers are always saved in a contiguous
>    region of memory, so we can calculate the address of each vector element
>    as an offset from the address of the first slot. X86 handles this in
>    RegisterMap::pd_location() but that won't work on AArch64 because with
>    SVE there isn't a unique VMReg corresponding to each four-byte physical
>    slot in the vector (there are always exactly eight logical VMRegs
>    regardless of the actual vector length).
>    
>    Tested hotspot_all_no_apps and jdk_core.

src/hotspot/share/prims/vectorSupport.cpp line 141:

> 139:       int off   = (i * elem_size) % VMRegImpl::stack_slot_size;
> 140:
> 141:       address elem_addr = reg_map->location(vreg, vslot) + off;

Unrelated to your change: please, put the following comment:
address elem_addr = reg_map->location(vreg, vslot) + off; // assumes little endian element order

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4]

Nick Gasson
In reply to this post by Nick Gasson
> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
> doesn't save vector registers on x86". The problem is that a vector
> produced by the Vector API may be stored in a register when the deopt
> blob is called. Because the deopt blob only stores the lower half of
> vector registers, the full vector object cannot be rematerialized during
> deoptimization. So the following will crash on AArch64 with current JDK:
>
>   make test TEST="jdk/incubator/vector" \
>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>
> The fix is to store the full vector registers by passing
> save_vectors=true to save_live_registers() in the deopt blob. Because
> save_live_registers() places the integer registers above the floating
> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
> to calculate the SP offset based on whether full vectors were saved, and
> whether those vectors were NEON or SVE, rather than using a static
> offset as it does currently.
>
> The change to VectorSupport::allocate_vector_payload_helper() is
> required because we only store the lowest VMReg slot in the oop map.
> However unlike x86 the vector registers are always saved in a contiguous
> region of memory, so we can calculate the address of each vector element
> as an offset from the address of the first slot. X86 handles this in
> RegisterMap::pd_location() but that won't work on AArch64 because with
> SVE there isn't a unique VMReg corresponding to each four-byte physical
> slot in the vector (there are always exactly eight logical VMRegs
> regardless of the actual vector length).
>
> Tested hotspot_all_no_apps and jdk_core.

Nick Gasson has updated the pull request incrementally with one additional commit since the last revision:

  Review comments

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2279/files
  - new: https://git.openjdk.java.net/jdk/pull/2279/files/498310d4..0809ccf1

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=03
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2279&range=02-03

  Stats: 132 lines in 19 files changed: 30 ins; 65 del; 37 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2279.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2279/head:pull/2279

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers

Nick Gasson
In reply to this post by Vladimir Ivanov-2
On Mon, 1 Feb 2021 12:40:54 GMT, Vladimir Ivanov <[hidden email]> wrote:

> Much better, thanks.
>
> I suggest the following changes:
>

I've changed it as suggested. This way seems much simpler, thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4]

Vladimir Ivanov-2
In reply to this post by Nick Gasson
On Tue, 2 Feb 2021 10:21:02 GMT, Nick Gasson <[hidden email]> wrote:

>> This is an AArch64 port of the fix for JDK-8256056 "Deoptimization stub
>> doesn't save vector registers on x86". The problem is that a vector
>> produced by the Vector API may be stored in a register when the deopt
>> blob is called. Because the deopt blob only stores the lower half of
>> vector registers, the full vector object cannot be rematerialized during
>> deoptimization. So the following will crash on AArch64 with current JDK:
>>
>>   make test TEST="jdk/incubator/vector" \
>>     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"
>>
>> The fix is to store the full vector registers by passing
>> save_vectors=true to save_live_registers() in the deopt blob. Because
>> save_live_registers() places the integer registers above the floating
>> registers in the stack frame, RegisterSaver::r0_offset_in_bytes() needs
>> to calculate the SP offset based on whether full vectors were saved, and
>> whether those vectors were NEON or SVE, rather than using a static
>> offset as it does currently.
>>
>> The change to VectorSupport::allocate_vector_payload_helper() is
>> required because we only store the lowest VMReg slot in the oop map.
>> However unlike x86 the vector registers are always saved in a contiguous
>> region of memory, so we can calculate the address of each vector element
>> as an offset from the address of the first slot. X86 handles this in
>> RegisterMap::pd_location() but that won't work on AArch64 because with
>> SVE there isn't a unique VMReg corresponding to each four-byte physical
>> slot in the vector (there are always exactly eight logical VMRegs
>> regardless of the actual vector length).
>>
>> Tested hotspot_all_no_apps and jdk_core.
>
> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision:
>
>   Review comments

`RegisterMap`-related changes look good.

-------------

Marked as reviewed by vlivanov (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4]

Nick Gasson
On Tue, 2 Feb 2021 11:08:46 GMT, Vladimir Ivanov <[hidden email]> wrote:

>> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Review comments
>
> `RegisterMap`-related changes look good.

@theRealAph are the sharedRuntime_aarch64.cpp changes ok?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2279
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4]

Andrew Haley
On 2/3/21 7:02 AM, Nick Gasson wrote:
> On Tue, 2 Feb 2021 11:08:46 GMT, Vladimir Ivanov <[hidden email]> wrote:
>
>>> Nick Gasson has updated the pull request incrementally with one additional commit since the last revision:
>>>
>>>   Review comments
>>
>> `RegisterMap`-related changes look good.
>
> @theRealAph are the sharedRuntime_aarch64.cpp changes ok?

I guess so, but the code changes are so complex and delicate it's extremely
hard to tell.

What have you done about stress testing? I guess we need some code that's
repeatedly deoptimized and recompiled millions of times, with continuous
testing. I guess that in order to make sure nothing has regressed, a
bootstrap with DeoptimizeALot would help gain some confidence.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4]

Nick Gasson (Arm Technology China)
On 02/03/21 17:36 pm, Andrew Haley wrote:

>>
>> @theRealAph are the sharedRuntime_aarch64.cpp changes ok?
>
> I guess so, but the code changes are so complex and delicate it's extremely
> hard to tell.
>
> What have you done about stress testing? I guess we need some code that's
> repeatedly deoptimized and recompiled millions of times, with continuous
> testing. I guess that in order to make sure nothing has regressed, a
> bootstrap with DeoptimizeALot would help gain some confidence.

I tried make bootcycle-images as you suggest with -XX:+DeoptimizeALot
added to JAVA_FLAGS and JAVA_FLAGS_BIG in bootcycle-spec.gmk.in (not
sure if there's a better way to do that...).

I've also previously run the tier1 and java/incubator/vector/* tests
with -XX:+DeoptimizeALot. My experience of modifying that code is that
DeoptimizeALot fails pretty quickly if you get something wrong.

--
Thanks,
Nick
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4]

Andrew Haley
On 2/4/21 7:21 AM, Nick Gasson wrote:

> On 02/03/21 17:36 pm, Andrew Haley wrote:
>>>
>>> @theRealAph are the sharedRuntime_aarch64.cpp changes ok?
>>
>> I guess so, but the code changes are so complex and delicate it's extremely
>> hard to tell.
>>
>> What have you done about stress testing? I guess we need some code that's
>> repeatedly deoptimized and recompiled millions of times, with continuous
>> testing. I guess that in order to make sure nothing has regressed, a
>> bootstrap with DeoptimizeALot would help gain some confidence.
>
> I tried make bootcycle-images as you suggest with -XX:+DeoptimizeALot
> added to JAVA_FLAGS and JAVA_FLAGS_BIG in bootcycle-spec.gmk.in (not
> sure if there's a better way to do that...).
>
> I've also previously run the tier1 and java/incubator/vector/* tests
> with -XX:+DeoptimizeALot. My experience of modifying that code is that
> DeoptimizeALot fails pretty quickly if you get something wrong.

Yeah. The problem here is that safepoints with live vectors aren't so
common, so it's hard to test, I get it. Maybe this will have to do.

--
Andrew Haley  (he/him)
Java Platform Lead Engineer
Red Hat UK Ltd. <https://www.redhat.com>
https://keybase.io/andrewhaley
EAC8 43EB D3EF DB98 CC77 2FAD A5CD 6035 332F A671

Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8260355: AArch64: deoptimization stub should save vector registers [v4]

Nick Gasson (Arm Technology China)
On 02/04/21 16:18 pm, Andrew Haley wrote:
>
> Yeah. The problem here is that safepoints with live vectors aren't so
> common, so it's hard to test, I get it. Maybe this will have to do.

You can test that situation quite readily with:

  make test TEST="jdk/incubator/vector" \
     JTREG="VM_OPTIONS=-XX:+DeoptimizeALot -XX:DeoptimizeALotInterval=0"

Which will segfault with current JDK. I guess the difficulty is showing
it hasn't regressed something else.

--
Thanks,
Nick
12