RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Thomas Stuefe
If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.

This can be reproduced by starting the VM with
-Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
- heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
- class space follows at 0x8800_0000
- the narrow klass pointer base points to the start of the class space at 0x8800_0000.

In MacroAssembler::encode_klass_not_null(), there is the following section:

  if (base != NULL) {
    unsigned int base_h = ((unsigned long)base)>>32;
    unsigned int base_l = (unsigned int)((unsigned long)base);
    if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
      lgr_if_needed(dst, current);
      z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
    } else if ((base_h == 0) && (base_l != 0)) {   (A)
      lgr_if_needed(dst, current);                
      z_agfi(dst, -(int)base_l);                   (B)
    } else {
      load_const(Z_R0, base);
      lgr_if_needed(dst, current);
      z_sgr(dst, Z_R0);
    }
    current = dst;
  }

We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:

In the case of the crash, we have:
base:  8800_0000
klass pointer:  8804_1040
32bit two's complement of base:   7800_0000
added to the klass pointer: 1_0004_1040

So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.

This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).

================

Fix:

I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.

I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.

----

Tests:

I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.

I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).

I used this method to test various combinations:
- narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
- narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
- narrow klass pointer base = 0                      (we dont do anything)

(would this override-feature be useful? We could do better testing).

Thanks, Thomas

-------------

Commit messages:
 - Remove whitespace
 - start

Changes: https://git.openjdk.java.net/jdk/pull/2595/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261552
  Stats: 42 lines in 3 files changed: 37 ins; 0 del; 5 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2595.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2595/head:pull/2595

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Lutz Schmidt
On Tue, 16 Feb 2021 20:49:49 GMT, Thomas Stuefe <[hidden email]> wrote:

> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>
> This can be reproduced by starting the VM with
> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
> - class space follows at 0x8800_0000
> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>
> In MacroAssembler::encode_klass_not_null(), there is the following section:
>
>   if (base != NULL) {
>     unsigned int base_h = ((unsigned long)base)>>32;
>     unsigned int base_l = (unsigned int)((unsigned long)base);
>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>       lgr_if_needed(dst, current);
>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>       lgr_if_needed(dst, current);                
>       z_agfi(dst, -(int)base_l);                   (B)
>     } else {
>       load_const(Z_R0, base);
>       lgr_if_needed(dst, current);
>       z_sgr(dst, Z_R0);
>     }
>     current = dst;
>   }
>
> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>
> In the case of the crash, we have:
> base:  8800_0000
> klass pointer:  8804_1040
> 32bit two's complement of base:   7800_0000
> added to the klass pointer: 1_0004_1040
>
> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>
> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>
> ================
>
> Fix:
>
> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>
> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>
> ----
>
> Tests:
>
> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>
> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>
> I used this method to test various combinations:
> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
> - narrow klass pointer base = 0                      (we dont do anything)
>
> (would this override-feature be useful? We could do better testing).
>
> Thanks, Thomas

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3645:

> 3643:   }
> 3644: #endif
> 3645:

I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test:
  z_oihf(current, 0);
  z_brc(Assembler::bcondZero, ok);

z_oihf() does modify the contents of register current, but it writes back the same value.

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657:

> 3655:     } else {
> 3656:       load_const(Z_R0, base);
> 3657:       lgr_if_needed(dst, current);

What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.

// Klass oop manipulations if compressed.
void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
  Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
  address  base    = CompressedKlassPointers::base();
  int      shift   = CompressedKlassPointers::shift();
  bool     need_zero_extend = false;
  assert(UseCompressedClassPointers, "only for compressed klass ptrs");

  BLOCK_COMMENT("cKlass encoder {");

#ifdef ASSERT
  Label ok;
  z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
  z_brc(Assembler::bcondAllZero, ok);
  // The plain disassembler does not recognize illtrap. It instead displays
  // a 32-bit value. Issueing two illtraps assures the disassembler finds
  // the proper beginning of the next instruction.
  z_illtrap(0xee);
  z_illtrap(0xee);
  bind(ok);
#endif

  // Scale down the incoming klass pointer first.
  // We then can be sure we calculate an offset that fits into 32 bit.
  // More generally speaking: all subsequent calculations are purely 32-bit.
  if (shift != 0) {
    assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
    z_srlg(dst, current, shift);
    need_zero_extend = true;
    current = dst;
  }

  if (base != NULL) {
    // Use scaled-down base address parts to match scaled-down klass pointer.
    unsigned int base_h = ((unsigned long)base)>>(32+shift);
    unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);

    // General considerations:
    //  - when calculating (current_h - base_h), all digits must cancel (become 0).
    //    Otherwise, we would end up with a compressed klass pointer which doesn't
    //    fit into 32-bit.
    //  - Only bit#33 of the difference could potentially be non-zero. For that
    //    to happen, (current_l < base_l) must hold. In this case, the subtraction
    //    will create a borrow out of bit#32, nicely killing bit#33.
    //  - With the above, we only need to consider current_l and base_l to
    //    calculate the result.
    //  - Both values are treated as unsigned. The unsigned subtraction is
    //    replaced by adding (unsigned) the 2's complement of the subtrahend.

    if (base_l == 0) {
      //  - By theory, the calculation to be performed here (current_h - base_h) MUST
      //    cancel all high-word bits. Otherwise, we would end up with an offset
      //    (i.e. compressed klass pointer) that does not fit into 32 bit.
      //  - current_l remains unchanged.
      //  - Therefore, we can replace all calculation with just a
      //    zero-extending load 32 to 64 bit.
      //  - Even that can be replaced with a conditional load if dst != current.
      //    (this is a local view. The shift step may have requested zero-extension).
    } else {
      // To begin with, we may need to copy and/or zero-extend the register operand.
      // We have to calculate (current_l - base_l). Because there is no unsigend
      // subtract instruction with immediate operand, we add the 2's complement of base_l.
      if (need_zero_extend) {
        z_llgfr(dst, current);
        need_zero_extend = false;
      } else {
        llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
      }
      current = dst;
      z_alfi(dst, -(int)base_l);
    }

  if (need_zero_extend) {
    // We must zero-extend the calculated result. It may have some leftover bits in
    // the hi-word because we only did optimized calculations.
    z_llgfr(dst, current);
  } else {
    llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
  }

  BLOCK_COMMENT("} cKlass encoder");
}

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Thomas Stuefe
On Wed, 17 Feb 2021 15:58:49 GMT, Lutz Schmidt <[hidden email]> wrote:

> I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers.

It is not just for the assertion, it is for limiting the 32bit add to situations where we know Klass pointers cannot exceed 32bit. That was the main reason. As I wrote, I was not sure about the assertion myself and am happy to drop it.

> And by the way, you could do the check with one test:
>
> ```
>   z_oihf(current, 0);
>   z_brc(Assembler::bcondZero, ok);
> ```
>
> z_oihf() does modify the contents of register current, but it writes back the same value.

Thank you. Unfortunately, information about z assembly was hard to come by. The only public information I found had hardly more than the instruction names, the rest was trial and error.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Thomas Stuefe
In reply to this post by Lutz Schmidt
On Wed, 17 Feb 2021 16:53:03 GMT, Lutz Schmidt <[hidden email]> wrote:

>> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>>
>> This can be reproduced by starting the VM with
>> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
>> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
>> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
>> - class space follows at 0x8800_0000
>> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>>
>> In MacroAssembler::encode_klass_not_null(), there is the following section:
>>
>>   if (base != NULL) {
>>     unsigned int base_h = ((unsigned long)base)>>32;
>>     unsigned int base_l = (unsigned int)((unsigned long)base);
>>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>>       lgr_if_needed(dst, current);
>>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>>       lgr_if_needed(dst, current);                
>>       z_agfi(dst, -(int)base_l);                   (B)
>>     } else {
>>       load_const(Z_R0, base);
>>       lgr_if_needed(dst, current);
>>       z_sgr(dst, Z_R0);
>>     }
>>     current = dst;
>>   }
>>
>> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>>
>> In the case of the crash, we have:
>> base:  8800_0000
>> klass pointer:  8804_1040
>> 32bit two's complement of base:   7800_0000
>> added to the klass pointer: 1_0004_1040
>>
>> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>>
>> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>>
>> ================
>>
>> Fix:
>>
>> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>>
>> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>>
>> ----
>>
>> Tests:
>>
>> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>>
>> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>>
>> I used this method to test various combinations:
>> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
>> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
>> - narrow klass pointer base = 0                      (we dont do anything)
>>
>> (would this override-feature be useful? We could do better testing).
>>
>> Thanks, Thomas
>
> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657:
>
>> 3655:     } else {
>> 3656:       load_const(Z_R0, base);
>> 3657:       lgr_if_needed(dst, current);
>
> What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.
>
> // Klass oop manipulations if compressed.
> void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
>   Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
>   address  base    = CompressedKlassPointers::base();
>   int      shift   = CompressedKlassPointers::shift();
>   bool     need_zero_extend = false;
>   assert(UseCompressedClassPointers, "only for compressed klass ptrs");
>
>   BLOCK_COMMENT("cKlass encoder {");
>
> #ifdef ASSERT
>   Label ok;
>   z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
>   z_brc(Assembler::bcondAllZero, ok);
>   // The plain disassembler does not recognize illtrap. It instead displays
>   // a 32-bit value. Issueing two illtraps assures the disassembler finds
>   // the proper beginning of the next instruction.
>   z_illtrap(0xee);
>   z_illtrap(0xee);
>   bind(ok);
> #endif
>
>   // Scale down the incoming klass pointer first.
>   // We then can be sure we calculate an offset that fits into 32 bit.
>   // More generally speaking: all subsequent calculations are purely 32-bit.
>   if (shift != 0) {
>     assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
>     z_srlg(dst, current, shift);
>     need_zero_extend = true;
>     current = dst;
>   }
>
>   if (base != NULL) {
>     // Use scaled-down base address parts to match scaled-down klass pointer.
>     unsigned int base_h = ((unsigned long)base)>>(32+shift);
>     unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);
>
>     // General considerations:
>     //  - when calculating (current_h - base_h), all digits must cancel (become 0).
>     //    Otherwise, we would end up with a compressed klass pointer which doesn't
>     //    fit into 32-bit.
>     //  - Only bit#33 of the difference could potentially be non-zero. For that
>     //    to happen, (current_l < base_l) must hold. In this case, the subtraction
>     //    will create a borrow out of bit#32, nicely killing bit#33.
>     //  - With the above, we only need to consider current_l and base_l to
>     //    calculate the result.
>     //  - Both values are treated as unsigned. The unsigned subtraction is
>     //    replaced by adding (unsigned) the 2's complement of the subtrahend.
>
>     if (base_l == 0) {
>       //  - By theory, the calculation to be performed here (current_h - base_h) MUST
>       //    cancel all high-word bits. Otherwise, we would end up with an offset
>       //    (i.e. compressed klass pointer) that does not fit into 32 bit.
>       //  - current_l remains unchanged.
>       //  - Therefore, we can replace all calculation with just a
>       //    zero-extending load 32 to 64 bit.
>       //  - Even that can be replaced with a conditional load if dst != current.
>       //    (this is a local view. The shift step may have requested zero-extension).
>     } else {
>       // To begin with, we may need to copy and/or zero-extend the register operand.
>       // We have to calculate (current_l - base_l). Because there is no unsigend
>       // subtract instruction with immediate operand, we add the 2's complement of base_l.
>       if (need_zero_extend) {
>         z_llgfr(dst, current);
>         need_zero_extend = false;
>       } else {
>         llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
>       }
>       current = dst;
>       z_alfi(dst, -(int)base_l);
>     }
>
>   if (need_zero_extend) {
>     // We must zero-extend the calculated result. It may have some leftover bits in
>     // the hi-word because we only did optimized calculations.
>     z_llgfr(dst, current);
>   } else {
>     llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
>   }
>
>   BLOCK_COMMENT("} cKlass encoder");
> }

Looks nice and elegant.

But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range.

That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Lutz Schmidt
On Thu, 18 Feb 2021 05:37:18 GMT, Thomas Stuefe <[hidden email]> wrote:

>> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3657:
>>
>>> 3655:     } else {
>>> 3656:       load_const(Z_R0, base);
>>> 3657:       lgr_if_needed(dst, current);
>>
>> What would you think of a more general rework like this? The comments in the code should explain the intentions/assumptions/conclusions.
>>
>> // Klass oop manipulations if compressed.
>> void MacroAssembler::encode_klass_not_null(Register dst, Register src) {
>>   Register current = (src != noreg) ? src : dst; // Klass is in dst if no src provided. (dst == src) also possible.
>>   address  base    = CompressedKlassPointers::base();
>>   int      shift   = CompressedKlassPointers::shift();
>>   bool     need_zero_extend = false;
>>   assert(UseCompressedClassPointers, "only for compressed klass ptrs");
>>
>>   BLOCK_COMMENT("cKlass encoder {");
>>
>> #ifdef ASSERT
>>   Label ok;
>>   z_tmll(current, KlassAlignmentInBytes-1); // Check alignment.
>>   z_brc(Assembler::bcondAllZero, ok);
>>   // The plain disassembler does not recognize illtrap. It instead displays
>>   // a 32-bit value. Issueing two illtraps assures the disassembler finds
>>   // the proper beginning of the next instruction.
>>   z_illtrap(0xee);
>>   z_illtrap(0xee);
>>   bind(ok);
>> #endif
>>
>>   // Scale down the incoming klass pointer first.
>>   // We then can be sure we calculate an offset that fits into 32 bit.
>>   // More generally speaking: all subsequent calculations are purely 32-bit.
>>   if (shift != 0) {
>>     assert (LogKlassAlignmentInBytes == shift, "decode alg wrong");
>>     z_srlg(dst, current, shift);
>>     need_zero_extend = true;
>>     current = dst;
>>   }
>>
>>   if (base != NULL) {
>>     // Use scaled-down base address parts to match scaled-down klass pointer.
>>     unsigned int base_h = ((unsigned long)base)>>(32+shift);
>>     unsigned int base_l = (unsigned int)(((unsigned long)base)>>shift);
>>
>>     // General considerations:
>>     //  - when calculating (current_h - base_h), all digits must cancel (become 0).
>>     //    Otherwise, we would end up with a compressed klass pointer which doesn't
>>     //    fit into 32-bit.
>>     //  - Only bit#33 of the difference could potentially be non-zero. For that
>>     //    to happen, (current_l < base_l) must hold. In this case, the subtraction
>>     //    will create a borrow out of bit#32, nicely killing bit#33.
>>     //  - With the above, we only need to consider current_l and base_l to
>>     //    calculate the result.
>>     //  - Both values are treated as unsigned. The unsigned subtraction is
>>     //    replaced by adding (unsigned) the 2's complement of the subtrahend.
>>
>>     if (base_l == 0) {
>>       //  - By theory, the calculation to be performed here (current_h - base_h) MUST
>>       //    cancel all high-word bits. Otherwise, we would end up with an offset
>>       //    (i.e. compressed klass pointer) that does not fit into 32 bit.
>>       //  - current_l remains unchanged.
>>       //  - Therefore, we can replace all calculation with just a
>>       //    zero-extending load 32 to 64 bit.
>>       //  - Even that can be replaced with a conditional load if dst != current.
>>       //    (this is a local view. The shift step may have requested zero-extension).
>>     } else {
>>       // To begin with, we may need to copy and/or zero-extend the register operand.
>>       // We have to calculate (current_l - base_l). Because there is no unsigend
>>       // subtract instruction with immediate operand, we add the 2's complement of base_l.
>>       if (need_zero_extend) {
>>         z_llgfr(dst, current);
>>         need_zero_extend = false;
>>       } else {
>>         llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
>>       }
>>       current = dst;
>>       z_alfi(dst, -(int)base_l);
>>     }
>>
>>   if (need_zero_extend) {
>>     // We must zero-extend the calculated result. It may have some leftover bits in
>>     // the hi-word because we only did optimized calculations.
>>     z_llgfr(dst, current);
>>   } else {
>>     llgfr_if_needed(dst, current); // zero-extension while copying comes at no extra cost.
>>   }
>>
>>   BLOCK_COMMENT("} cKlass encoder");
>> }
>
> Looks nice and elegant.
>
> But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range.
>
> That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes.

I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit.

You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value.

I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers.

Back to code:
In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Lutz Schmidt
In reply to this post by Thomas Stuefe
On Thu, 18 Feb 2021 05:31:33 GMT, Thomas Stuefe <[hidden email]> wrote:

>> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3645:
>>
>>> 3643:   }
>>> 3644: #endif
>>> 3645:
>>
>> I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers. And by the way, you could do the check with one test:
>>   z_oihf(current, 0);
>>   z_brc(Assembler::bcondZero, ok);
>>
>> z_oihf() does modify the contents of register current, but it writes back the same value.
>
>> I do not like the cross-dependency to metaspace.hpp just for the sake of checking an artificial restriction on Klass pointers.
>
> It is not just for the assertion, it is for limiting the 32bit add to situations where we know Klass pointers cannot exceed 32bit. That was the main reason. As I wrote, I was not sure about the assertion myself and am happy to drop it.
>
>> And by the way, you could do the check with one test:
>>
>> ```
>>   z_oihf(current, 0);
>>   z_brc(Assembler::bcondZero, ok);
>> ```
>>
>> z_oihf() does modify the contents of register current, but it writes back the same value.
>
> Thank you. Unfortunately, information about z assembly was hard to come by. The only public information I found had hardly more than the instruction names, the rest was trial and error.

I admit. To find System z information, you need to know the "magic keywords" to search for. In this case, it would be "Principles of Operation". The third or so Google hit would lead you to the System z architecture document. With 2000+ pages to read, you would be lost anyway. :-)

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Thomas Stuefe
In reply to this post by Lutz Schmidt
On Thu, 18 Feb 2021 09:12:17 GMT, Lutz Schmidt <[hidden email]> wrote:

>> Looks nice and elegant.
>>
>> But as said offlist, I dislike the fact that this hard codes the limitation to 32bit for the narrow klass pointer range.
>>
>> That restriction is artificial and we may just want to drop it. E.g. one recurring idea I have is to drop the duality in metaspace between non-class- and class-metaspace, and just store everything in class space. That would save quite a bit of memory (less overhead) and make the metaspace coding quite a bit simpler. However, in that case it could be that we exceed the current 3g limit and may even exceed 32bit. Since add+shift for decoding is universally done on all platforms at least if CDS is on, this should work out of the box. Unless of course the platforms hard-code the 32bit limitation into their encoding schemes.
>
> I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit.
>
> You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value.
>
> I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers.
>
> Back to code:
> In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice.

You misunderstand me. My point was not to make narrow pointers larger than 32bit, but use the full encodable range. The encodable range is 32g atm. But we artificially limit the range to 3G (CompressedClassSpaceSize is capped at that value).

I thought your proposal was based upon the assumption that the highest *uncompressed* offset into class space can be not larger than 4G. But looking at your proposal again, I see you moved the shift up before the add, so it should probably work.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Lutz Schmidt
On Mon, 22 Feb 2021 06:28:14 GMT, Thomas Stuefe <[hidden email]> wrote:

>> I don't see how you want to overcome the 32-bit limit for compressed pointers. This whole "compression" thing is based on the "trick" to store an offset instead of the full address. Depending on the object alignment requirement, this affords you 32 GB (8-byte alignment) or 64 GB (16-byte alignment) of addressable (or should I say offset-able) space. That's quite a bit.
>>
>> You use pointer compression to save space, and for nothing else. Space savings have to be so significant that they outweigh the added effort for encoding and decoding. With just some shift and add, the effort is limited, though noticeable. If you would make compressed pointers 40 bits wide (5 bytes), encoding and decoding would impose more effort. What's even worse, you then would have entities with a size not native to any processor. Just imagine you have to atomically store such a value.
>>
>> I my opinion, wider compressed pointers will have to wait until we have 128-bit pointers.
>>
>> Back to code:
>> In the code suggested above, you could make use of the Metaspace::class_space_end() function. If the class space end address, shifted right, fits into 32 bit, need_zero_extend may remain false. Your choice.
>
> You misunderstand me. My point was not to make narrow pointers larger than 32bit, but use the full encodable range. The encodable range is 32g atm. But we artificially limit the range to 3G (CompressedClassSpaceSize is capped at that value).
>
> I thought your proposal was based upon the assumption that the highest *uncompressed* offset into class space can be not larger than 4G. But looking at your proposal again, I see you moved the shift up before the add, so it should probably work.

So it was mutual misunderstanding. Good to have that resolved.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2]

Thomas Stuefe
In reply to this post by Thomas Stuefe
> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>
> This can be reproduced by starting the VM with
> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
> - class space follows at 0x8800_0000
> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>
> In MacroAssembler::encode_klass_not_null(), there is the following section:
>
>   if (base != NULL) {
>     unsigned int base_h = ((unsigned long)base)>>32;
>     unsigned int base_l = (unsigned int)((unsigned long)base);
>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>       lgr_if_needed(dst, current);
>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>       lgr_if_needed(dst, current);                
>       z_agfi(dst, -(int)base_l);                   (B)
>     } else {
>       load_const(Z_R0, base);
>       lgr_if_needed(dst, current);
>       z_sgr(dst, Z_R0);
>     }
>     current = dst;
>   }
>
> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>
> In the case of the crash, we have:
> base:  8800_0000
> klass pointer:  8804_1040
> 32bit two's complement of base:   7800_0000
> added to the klass pointer: 1_0004_1040
>
> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>
> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>
> ================
>
> Fix:
>
> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>
> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>
> ----
>
> Tests:
>
> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>
> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>
> I used this method to test various combinations:
> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
> - narrow klass pointer base = 0                      (we dont do anything)
>
> (would this override-feature be useful? We could do better testing).
>
> Thanks, Thomas

Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:

 - Lucys proposal
 - Revert first attempt

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2595/files
  - new: https://git.openjdk.java.net/jdk/pull/2595/files/07e83dfb..e096f09c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=00-01

  Stats: 84 lines in 3 files changed: 28 ins; 28 del; 28 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2595.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2595/head:pull/2595

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2]

Martin Doerr
On Mon, 22 Feb 2021 09:32:01 GMT, Thomas Stuefe <[hidden email]> wrote:

>> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>>
>> This can be reproduced by starting the VM with
>> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
>> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
>> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
>> - class space follows at 0x8800_0000
>> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>>
>> In MacroAssembler::encode_klass_not_null(), there is the following section:
>>
>>   if (base != NULL) {
>>     unsigned int base_h = ((unsigned long)base)>>32;
>>     unsigned int base_l = (unsigned int)((unsigned long)base);
>>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>>       lgr_if_needed(dst, current);
>>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>>       lgr_if_needed(dst, current);                
>>       z_agfi(dst, -(int)base_l);                   (B)
>>     } else {
>>       load_const(Z_R0, base);
>>       lgr_if_needed(dst, current);
>>       z_sgr(dst, Z_R0);
>>     }
>>     current = dst;
>>   }
>>
>> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>>
>> In the case of the crash, we have:
>> base:  8800_0000
>> klass pointer:  8804_1040
>> 32bit two's complement of base:   7800_0000
>> added to the klass pointer: 1_0004_1040
>>
>> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>>
>> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>>
>> ================
>>
>> Fix:
>>
>> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>>
>> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>>
>> ----
>>
>> Tests:
>>
>> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>>
>> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>>
>> I used this method to test various combinations:
>> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
>> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
>> - narrow klass pointer base = 0                      (we dont do anything)
>>
>> (would this override-feature be useful? We could do better testing).
>>
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
>
>  - Lucys proposal
>  - Revert first attempt

Thanks for fixing. Looks correct, but I have one minor finding.

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3635:

> 3633:   if (base != NULL) {
> 3634:     // Use scaled-down base address parts to match scaled-down klass pointer.
> 3635:     unsigned int base_h = ((unsigned long)base)>>(32+shift);

base_h is unused, but referred to in the comments

-------------

Changes requested by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2]

Lutz Schmidt
In reply to this post by Thomas Stuefe
On Mon, 22 Feb 2021 09:32:01 GMT, Thomas Stuefe <[hidden email]> wrote:

>> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>>
>> This can be reproduced by starting the VM with
>> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
>> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
>> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
>> - class space follows at 0x8800_0000
>> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>>
>> In MacroAssembler::encode_klass_not_null(), there is the following section:
>>
>>   if (base != NULL) {
>>     unsigned int base_h = ((unsigned long)base)>>32;
>>     unsigned int base_l = (unsigned int)((unsigned long)base);
>>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>>       lgr_if_needed(dst, current);
>>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>>       lgr_if_needed(dst, current);                
>>       z_agfi(dst, -(int)base_l);                   (B)
>>     } else {
>>       load_const(Z_R0, base);
>>       lgr_if_needed(dst, current);
>>       z_sgr(dst, Z_R0);
>>     }
>>     current = dst;
>>   }
>>
>> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>>
>> In the case of the crash, we have:
>> base:  8800_0000
>> klass pointer:  8804_1040
>> 32bit two's complement of base:   7800_0000
>> added to the klass pointer: 1_0004_1040
>>
>> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>>
>> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>>
>> ================
>>
>> Fix:
>>
>> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>>
>> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>>
>> ----
>>
>> Tests:
>>
>> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>>
>> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>>
>> I used this method to test various combinations:
>> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
>> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
>> - narrow klass pointer base = 0                      (we dont do anything)
>>
>> (would this override-feature be useful? We could do better testing).
>>
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
>
>  - Lucys proposal
>  - Revert first attempt

The changes look good to me now.
Including the additional optimisation is optional.
Thanks for debugging, finding and fixing!

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3649:

> 3647:     //  - Both values are treated as unsigned. The unsigned subtraction is
> 3648:     //    replaced by adding (unsigned) the 2's complement of the subtrahend.
> 3649:

There is a further tiny optimisation you may want to include in the final version:

  // If we happen to see (base_h == 0), we are sure there
  // is no borrow from bit#33. No zero-extension is needed.
  if (base_h == 0) {
    need_zero_extend = false;
  }

-------------

Marked as reviewed by lucy (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2]

Lutz Schmidt
In reply to this post by Martin Doerr
On Mon, 22 Feb 2021 10:31:45 GMT, Martin Doerr <[hidden email]> wrote:

>> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
>>
>>  - Lucys proposal
>>  - Revert first attempt
>
> src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3635:
>
>> 3633:   if (base != NULL) {
>> 3634:     // Use scaled-down base address parts to match scaled-down klass pointer.
>> 3635:     unsigned int base_h = ((unsigned long)base)>>(32+shift);
>
> base_h is unused, but referred to in the comments

There was a comment crossing...
With my latest suggestion, base_h is now used.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v2]

Martin Doerr
In reply to this post by Thomas Stuefe
On Mon, 22 Feb 2021 09:32:01 GMT, Thomas Stuefe <[hidden email]> wrote:

>> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>>
>> This can be reproduced by starting the VM with
>> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
>> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
>> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
>> - class space follows at 0x8800_0000
>> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>>
>> In MacroAssembler::encode_klass_not_null(), there is the following section:
>>
>>   if (base != NULL) {
>>     unsigned int base_h = ((unsigned long)base)>>32;
>>     unsigned int base_l = (unsigned int)((unsigned long)base);
>>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>>       lgr_if_needed(dst, current);
>>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>>       lgr_if_needed(dst, current);                
>>       z_agfi(dst, -(int)base_l);                   (B)
>>     } else {
>>       load_const(Z_R0, base);
>>       lgr_if_needed(dst, current);
>>       z_sgr(dst, Z_R0);
>>     }
>>     current = dst;
>>   }
>>
>> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>>
>> In the case of the crash, we have:
>> base:  8800_0000
>> klass pointer:  8804_1040
>> 32bit two's complement of base:   7800_0000
>> added to the klass pointer: 1_0004_1040
>>
>> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>>
>> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>>
>> ================
>>
>> Fix:
>>
>> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>>
>> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>>
>> ----
>>
>> Tests:
>>
>> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>>
>> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>>
>> I used this method to test various combinations:
>> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
>> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
>> - narrow klass pointer base = 0                      (we dont do anything)
>>
>> (would this override-feature be useful? We could do better testing).
>>
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with two additional commits since the last revision:
>
>  - Lucys proposal
>  - Revert first attempt

As discussed offline, I think we need to support encoding of Class Pointer 0x100000000 (and above) with e.g. base = 0x0C0000000 and shift = 0. need_zero_extend is false in this example which possibly leaves a 1 in the higher 32 bit. Lower 32 bit are correct in your current version, but some code may rely on zero extension to 64 bit.

-------------

Changes requested by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v3]

Thomas Stuefe
In reply to this post by Thomas Stuefe
> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>
> This can be reproduced by starting the VM with
> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
> - class space follows at 0x8800_0000
> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>
> In MacroAssembler::encode_klass_not_null(), there is the following section:
>
>   if (base != NULL) {
>     unsigned int base_h = ((unsigned long)base)>>32;
>     unsigned int base_l = (unsigned int)((unsigned long)base);
>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>       lgr_if_needed(dst, current);
>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>       lgr_if_needed(dst, current);                
>       z_agfi(dst, -(int)base_l);                   (B)
>     } else {
>       load_const(Z_R0, base);
>       lgr_if_needed(dst, current);
>       z_sgr(dst, Z_R0);
>     }
>     current = dst;
>   }
>
> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>
> In the case of the crash, we have:
> base:  8800_0000
> klass pointer:  8804_1040
> 32bit two's complement of base:   7800_0000
> added to the klass pointer: 1_0004_1040
>
> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>
> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>
> ================
>
> Fix:
>
> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>
> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>
> ----
>
> Tests:
>
> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>
> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>
> I used this method to test various combinations:
> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
> - narrow klass pointer base = 0                      (we dont do anything)
>
> (would this override-feature be useful? We could do better testing).
>
> Thanks, Thomas

Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:

  update

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2595/files
  - new: https://git.openjdk.java.net/jdk/pull/2595/files/e096f09c..b3cbd715

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2595&range=01-02

  Stats: 26 lines in 1 file changed: 15 ins; 3 del; 8 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2595.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2595/head:pull/2595

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v3]

Lutz Schmidt
On Mon, 1 Mar 2021 11:09:03 GMT, Thomas Stuefe <[hidden email]> wrote:

>> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>>
>> This can be reproduced by starting the VM with
>> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
>> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
>> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
>> - class space follows at 0x8800_0000
>> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>>
>> In MacroAssembler::encode_klass_not_null(), there is the following section:
>>
>>   if (base != NULL) {
>>     unsigned int base_h = ((unsigned long)base)>>32;
>>     unsigned int base_l = (unsigned int)((unsigned long)base);
>>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>>       lgr_if_needed(dst, current);
>>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>>       lgr_if_needed(dst, current);                
>>       z_agfi(dst, -(int)base_l);                   (B)
>>     } else {
>>       load_const(Z_R0, base);
>>       lgr_if_needed(dst, current);
>>       z_sgr(dst, Z_R0);
>>     }
>>     current = dst;
>>   }
>>
>> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>>
>> In the case of the crash, we have:
>> base:  8800_0000
>> klass pointer:  8804_1040
>> 32bit two's complement of base:   7800_0000
>> added to the klass pointer: 1_0004_1040
>>
>> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>>
>> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>>
>> ================
>>
>> Fix:
>>
>> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>>
>> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>>
>> ----
>>
>> Tests:
>>
>> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>>
>> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>>
>> I used this method to test various combinations:
>> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
>> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
>> - narrow klass pointer base = 0                      (we dont do anything)
>>
>> (would this override-feature be useful? We could do better testing).
>>
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
>   update

LGTM, still.

-------------

Marked as reviewed by lucy (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v3]

Martin Doerr
In reply to this post by Thomas Stuefe
On Mon, 1 Mar 2021 11:09:03 GMT, Thomas Stuefe <[hidden email]> wrote:

>> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>>
>> This can be reproduced by starting the VM with
>> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
>> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
>> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
>> - class space follows at 0x8800_0000
>> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>>
>> In MacroAssembler::encode_klass_not_null(), there is the following section:
>>
>>   if (base != NULL) {
>>     unsigned int base_h = ((unsigned long)base)>>32;
>>     unsigned int base_l = (unsigned int)((unsigned long)base);
>>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>>       lgr_if_needed(dst, current);
>>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>>       lgr_if_needed(dst, current);                
>>       z_agfi(dst, -(int)base_l);                   (B)
>>     } else {
>>       load_const(Z_R0, base);
>>       lgr_if_needed(dst, current);
>>       z_sgr(dst, Z_R0);
>>     }
>>     current = dst;
>>   }
>>
>> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>>
>> In the case of the crash, we have:
>> base:  8800_0000
>> klass pointer:  8804_1040
>> 32bit two's complement of base:   7800_0000
>> added to the klass pointer: 1_0004_1040
>>
>> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>>
>> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>>
>> ================
>>
>> Fix:
>>
>> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>>
>> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>>
>> ----
>>
>> Tests:
>>
>> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>>
>> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>>
>> I used this method to test various combinations:
>> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
>> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
>> - narrow klass pointer base = 0                      (we dont do anything)
>>
>> (would this override-feature be useful? We could do better testing).
>>
>> Thanks, Thomas
>
> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>
>   update

Marked as reviewed by mdoerr (Reviewer).

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Re: RFR: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base [v3]

Thomas Stuefe
On Mon, 1 Mar 2021 12:42:33 GMT, Martin Doerr <[hidden email]> wrote:

>> Thomas Stuefe has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   update
>
> Marked as reviewed by mdoerr (Reviewer).

Thanks @TheRealMDoerr and @RealLucy for advice and reviews!

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595
Reply | Threaded
Open this post in threaded view
|

Integrated: JDK-8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Thomas Stuefe
In reply to this post by Thomas Stuefe
On Tue, 16 Feb 2021 20:49:49 GMT, Thomas Stuefe <[hidden email]> wrote:

> If Compressed class pointer base has a non-zero value it may cause MacroAssembler::encode_klass_not_null() to encode a Klass pointer to a wrong narrow pointer.
>
> This can be reproduced by starting the VM with
> -Xshare:dump -XX:HeapBaseMinAddress=2g -Xmx128m
> but CDS is not involved. It is only relevant insofar as this is the only way to get the following combination:
> - heap is allocated at 0x800_0000. It is small and ends at 0x8800_0000.
> - class space follows at 0x8800_0000
> - the narrow klass pointer base points to the start of the class space at 0x8800_0000.
>
> In MacroAssembler::encode_klass_not_null(), there is the following section:
>
>   if (base != NULL) {
>     unsigned int base_h = ((unsigned long)base)>>32;
>     unsigned int base_l = (unsigned int)((unsigned long)base);
>     if ((base_h != 0) && (base_l == 0) && VM_Version::has_HighWordInstr()) {
>       lgr_if_needed(dst, current);
>       z_aih(dst, -((int)base_h));     // Base has no set bits in lower half.
>     } else if ((base_h == 0) && (base_l != 0)) {   (A)
>       lgr_if_needed(dst, current);                
>       z_agfi(dst, -(int)base_l);                   (B)
>     } else {
>       load_const(Z_R0, base);
>       lgr_if_needed(dst, current);
>       z_sgr(dst, Z_R0);
>     }
>     current = dst;
>   }
>
> We enter the condition at (A) if the narrow klass pointer base is non-zero but fits into 32bit. At (B), we want to substract the base from the Klass pointer; we do this by calculating the 32bit twos-complement of the base and add it with AGFI. AGFI adds a 32bit immediate to a 64bit register. In this case, it produces the wrong result if the base is >0x800_0000:
>
> In the case of the crash, we have:
> base:  8800_0000
> klass pointer:  8804_1040
> 32bit two's complement of base:   7800_0000
> added to the klass pointer: 1_0004_1040
>
> So the result of the "substraction" is 1_0004_1040, it should be 4_1040, which would be the correct offset of the Klass* pointer within the ccs.
>
> This bug has been dormant; was activated by JDK-8250989 which changed the way class space reservation happens at CDS dump time. It surfaced first as crash in a CDS-specific jtreg test (JDK-8261552).
>
> ================
>
> Fix:
>
> I changed the AGFI instruction to a pure 32bit add (AFI). That works as long as the Klass pointer also fits into 32bit. So I narrowed the condition at (A) to only fire if it can be ensured that both narrow base and Klass* pointers fit into 32bit.
>
> I also added a runtime verification in that case that any Klass pointer passed down is indeed a 32bit pointer. However, I am not really sure this is useful, or that this is the best way to do this (using TMHH and TMHL). I was looking for something like TMH or TML to check whole 32bit words but could not find any.
>
> ----
>
> Tests:
>
> I manually tested that the crash disappears, which it does. I stepped through the encoding code and the values now look right.
>
> I also did build a VM with the ability to override both class space start address and the narrow klass pointer base to exact values (see https://github.com/openjdk/jdk/compare/master...tstuefe:override-ccs-start-and-base).
>
> I used this method to test various combinations:
> - narrow klass pointer base > 0 < 4g + ccs end < 4g  (we hit our branch doing AFI)
> - narrow klass pointer base > 0 < 4g + ccs end > 4g  (we hit the fallback doing SGR with r0)
> - narrow klass pointer base = 0                      (we dont do anything)
>
> (would this override-feature be useful? We could do better testing).
>
> Thanks, Thomas

This pull request has now been integrated.

Changeset: fdd10932
Author:    Thomas Stuefe <[hidden email]>
URL:       https://git.openjdk.java.net/jdk/commit/fdd10932
Stats:     79 lines in 1 file changed: 62 ins; 13 del; 4 mod

8261552: s390: MacroAssembler::encode_klass_not_null() may produce wrong results for non-zero values of narrow klass base

Co-authored-by: Lutz Schmidt <[hidden email]>
Reviewed-by: mdoerr, lucy

-------------

PR: https://git.openjdk.java.net/jdk/pull/2595