C1 code installation and JNIHandle::deleted_handle() oop

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

C1 code installation and JNIHandle::deleted_handle() oop

Aleksey Shipilev-4
Hi,

In some of our aggressive test configurations for Shenandoah, we sometimes see the following failure:
  http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log

It seems to happen when C1 code installation is happening during Full GC.

The actual failure is caused by touching the JNIHandles::deleted_handle() oop in
JNIHandles::guard_value() during JNIHandles::resolve() against the constant oop handle when we are
recording the debugging information for C1-generated Java call:
  http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220

The C1 thread is in _thread_in_native state, and so the runtime thinks the thread is at safepoint,
but the thread touches the deleted_handle oop(). When Shenandoah dives into Full GC and moves that
object at the same time, everything crashes and burns.

Is C1 (and any other compiler thread) supposed to transit to _vm_state when touching the naked oops,
and thus coordinate with safepoints? I see VM_ENTRY_MARK all over ci* that seems to transit there
before accessing the heap. Does that mean we need the same everywhere around JNIHandles::resolve too?

Or is there some other mechanism that is supposed to get compiler threads to coordinate with GC?

Thanks,
-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Roman Kennke-6
When the compiler thread is _in_native state, it doesn't stop for
safepoints, e.g. GC. This means it must not touch naked oops, otherwise
the GC may move away the oop under our feet. Seems like this is what
happens here.
I am not sure what granularity VM_ENTRY_MARK are usually done. My
feeling is that it should be somewhere around
LIR_Assembler::emit_code(BlockList*) ?

Roman

> Hi,
>
> In some of our aggressive test configurations for Shenandoah, we sometimes see the following failure:
>    http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>
> It seems to happen when C1 code installation is happening during Full GC.
>
> The actual failure is caused by touching the JNIHandles::deleted_handle() oop in
> JNIHandles::guard_value() during JNIHandles::resolve() against the constant oop handle when we are
> recording the debugging information for C1-generated Java call:
>    http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>
> The C1 thread is in _thread_in_native state, and so the runtime thinks the thread is at safepoint,
> but the thread touches the deleted_handle oop(). When Shenandoah dives into Full GC and moves that
> object at the same time, everything crashes and burns.
>
> Is C1 (and any other compiler thread) supposed to transit to _vm_state when touching the naked oops,
> and thus coordinate with safepoints? I see VM_ENTRY_MARK all over ci* that seems to transit there
> before accessing the heap. Does that mean we need the same everywhere around JNIHandles::resolve too?
>
> Or is there some other mechanism that is supposed to get compiler threads to coordinate with GC?
>
> Thanks,
> -Aleksey
>

Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
In reply to this post by Aleksey Shipilev-4
Aleksey,

I agree with your & Roman analysis: compilers shouldn't touch naked oops
unless the thread is in _thread_in_vm mode.

Looking at the crash log, the problematic code is under assert:

void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
   assert(JNIHandles::resolve(value()) == NULL ||
          Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
          "Should be in heap");
   stream->write_int(CONSTANT_OOP_CODE);
   stream->write_handle(value());
}

So, the proper fix would be to make the verification code more robust.

Best regards,
Vladimir Ivanov

On 11/14/17 5:16 PM, Aleksey Shipilev wrote:

> Hi,
>
> In some of our aggressive test configurations for Shenandoah, we sometimes see the following failure:
>    http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>
> It seems to happen when C1 code installation is happening during Full GC.
>
> The actual failure is caused by touching the JNIHandles::deleted_handle() oop in
> JNIHandles::guard_value() during JNIHandles::resolve() against the constant oop handle when we are
> recording the debugging information for C1-generated Java call:
>    http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>
> The C1 thread is in _thread_in_native state, and so the runtime thinks the thread is at safepoint,
> but the thread touches the deleted_handle oop(). When Shenandoah dives into Full GC and moves that
> object at the same time, everything crashes and burns.
>
> Is C1 (and any other compiler thread) supposed to transit to _vm_state when touching the naked oops,
> and thus coordinate with safepoints? I see VM_ENTRY_MARK all over ci* that seems to transit there
> before accessing the heap. Does that mean we need the same everywhere around JNIHandles::resolve too?
>
> Or is there some other mechanism that is supposed to get compiler threads to coordinate with GC?
>
> Thanks,
> -Aleksey
>
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Roman Kennke-6
The code below the assert also unwraps the oop and does lookups with it. I'm not on my computer but I can dig out the relevant parts when I'm back at work...

Roman


Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov <[hidden email]>:
Aleksey,

I agree with your & Roman analysis: compilers shouldn't touch naked oops
unless the thread is in _thread_in_vm mode.

Looking at the crash log, the problematic code is under assert:

void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
assert(JNIHandles::resolve(value()) == NULL ||
Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
"Should be in heap");
stream->write_int(CONSTANT_OOP_CODE);
stream->write_handle(value());
}

So, the proper fix would be to make the verification code more robust.

Best regards,
Vladimir Ivanov

On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
Hi,

In some of our aggressive test configurations for Shenandoah, we sometimes see the following failure:
http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log

It seems to happen when C1 code installation is happening during Full GC.

The actual failure is caused by touching the JNIHandles::deleted_handle() oop in
JNIHandles::guard_value() during JNIHandles::resolve() against the constant oop handle when we are
recording the debugging information for C1-generated Java call:
http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220

The C1 thread is in _thread_in_native state, and so the runtime thinks the thread is at safepoint,
but the thread touches the deleted_handle oop(). When Shenandoah dives into Full GC and moves that
object at the same time, everything crashes and burns.

Is C1 (and any other compiler thread) supposed to transit to _vm_state when touching the naked oops,
and thus coordinate with safepoints? I see VM_ENTRY_MARK all over ci* that seems to transit there
before accessing the heap. Does that mean we need the same everywhere around JNIHandles::resolve too?

Or is there some other mechanism that is supposed to get compiler threads to coordinate with GC?

Thanks,
-Aleksey


--
Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
Thanks, now I see that as well: OopRecorder::find_index() can delegate
to ObjectLookup::find_index() which does resolve the handle w/o
transitioning to VM.

But I don't believe you hit that path: ObjectLookup was added as part of
JVMCI and is guarded by a flag (deduplicate) which is turned on only for
JVMCI.

Anyway, I'll file a bug to investigate ObjectLookup::find_index().

Best regards,
Vladimir Ivanov

On 11/14/17 6:45 PM, Roman Kennke wrote:

> The code below the assert also unwraps the oop and does lookups with it.
> I'm not on my computer but I can dig out the relevant parts when I'm
> back at work...
>
> Roman
>
>
> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov
> <[hidden email]>:
>
>     Aleksey,
>
>     I agree with your & Roman analysis: compilers shouldn't touch naked oops
>     unless the thread is in _thread_in_vm mode.
>
>     Looking at the crash log, the problematic code is under assert:
>
>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>         assert(JNIHandles::resolve(value()) == NULL ||
>                Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>                "Should be in heap");
>         stream->write_int(CONSTANT_OOP_CODE);
>         stream->write_handle(value());
>     }
>
>     So, the proper fix would be to make the verification code more robust.
>
>     Best regards,
>     Vladimir Ivanov
>
>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>
>         Hi,
>
>         In some of our aggressive test configurations for Shenandoah, we
>         sometimes see the following failure:
>         http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>
>         It seems to happen when C1 code installation is happening during
>         Full GC.
>
>         The actual failure is caused by touching the
>         JNIHandles::deleted_handle() oop in
>         JNIHandles::guard_value() during JNIHandles::resolve() against
>         the constant oop handle when we are
>         recording the debugging information for C1-generated Java call:
>         http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>
>         The C1 thread is in _thread_in_native state, and so the runtime
>         thinks the thread is at safepoint,
>         but the thread touches the deleted_handle oop(). When Shenandoah
>         dives into Full GC and moves that
>         object at the same time, everything crashes and burns.
>
>         Is C1 (and any other compiler thread) supposed to transit to
>         _vm_state when touching the naked oops,
>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>         over ci* that seems to transit there
>         before accessing the heap. Does that mean we need the same
>         everywhere around JNIHandles::resolve too?
>
>         Or is there some other mechanism that is supposed to get
>         compiler threads to coordinate with GC?
>
>         Thanks,
>         -Aleksey
>
>
> --
> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Aleksey Shipilev-4
In reply to this post by Vladimir Ivanov
On 11/14/2017 04:36 PM, Vladimir Ivanov wrote:

> Aleksey,
>
> I agree with your & Roman analysis: compilers shouldn't touch naked oops unless the thread is in
> _thread_in_vm mode.
>
> Looking at the crash log, the problematic code is under assert:
>
> void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>   assert(JNIHandles::resolve(value()) == NULL ||
>          Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>          "Should be in heap");
>   stream->write_int(CONSTANT_OOP_CODE);
>   stream->write_handle(value());
> }
>
> So, the proper fix would be to make the verification code more robust.
Actually, in Shenandoah case, the failure is here (it is obscured by inlining, but visible in
slowdebug):
  http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220

... "== deleted_handle()" or "!= deleted_handle()". So fortifying assert code does not help, the bug
is touching the oop in improper thread state. Nightmarish scenario would be to fail "!="/"==" check
because the deleted_handle oop had been already updated, but the actual handle value did not.

The similar thing happens after we write the oop into the debug stream: it is not handelize-d
anymore, and so opaque for GC, and thus has all the chance to be garbled by the time GC finishes.

Having these two thoughts in mind, I think we have to consider _thread_in_vm on that C1 path. But
where exactly?

Thanks,
-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
In reply to this post by Vladimir Ivanov
> Anyway, I'll file a bug to investigate ObjectLookup::find_index().

FTR it shouldn't be a problem for JVMCI since transition to VM happens
as part of the call into VM (see jvmciCompilerToVM.cpp & C2V_VMENTRY macro).

Best regards,
Vladimir Ivanov

> On 11/14/17 6:45 PM, Roman Kennke wrote:
>> The code below the assert also unwraps the oop and does lookups with
>> it. I'm not on my computer but I can dig out the relevant parts when
>> I'm back at work...
>>
>> Roman
>>
>>
>> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov
>> <[hidden email]>:
>>
>>     Aleksey,
>>
>>     I agree with your & Roman analysis: compilers shouldn't touch
>> naked oops
>>     unless the thread is in _thread_in_vm mode.
>>
>>     Looking at the crash log, the problematic code is under assert:
>>
>>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>>         assert(JNIHandles::resolve(value()) == NULL ||
>>                
>> Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>                "Should be in heap");
>>         stream->write_int(CONSTANT_OOP_CODE);
>>         stream->write_handle(value());
>>     }
>>
>>     So, the proper fix would be to make the verification code more
>> robust.
>>
>>     Best regards,
>>     Vladimir Ivanov
>>
>>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>>
>>         Hi,
>>
>>         In some of our aggressive test configurations for Shenandoah, we
>>         sometimes see the following failure:
>>        
>> http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>>
>>         It seems to happen when C1 code installation is happening during
>>         Full GC.
>>
>>         The actual failure is caused by touching the
>>         JNIHandles::deleted_handle() oop in
>>         JNIHandles::guard_value() during JNIHandles::resolve() against
>>         the constant oop handle when we are
>>         recording the debugging information for C1-generated Java call:
>>        
>> http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220 
>>
>>
>>         The C1 thread is in _thread_in_native state, and so the runtime
>>         thinks the thread is at safepoint,
>>         but the thread touches the deleted_handle oop(). When Shenandoah
>>         dives into Full GC and moves that
>>         object at the same time, everything crashes and burns.
>>
>>         Is C1 (and any other compiler thread) supposed to transit to
>>         _vm_state when touching the naked oops,
>>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>>         over ci* that seems to transit there
>>         before accessing the heap. Does that mean we need the same
>>         everywhere around JNIHandles::resolve too?
>>
>>         Or is there some other mechanism that is supposed to get
>>         compiler threads to coordinate with GC?
>>
>>         Thanks,
>>         -Aleksey
>>
>>
>> --
>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
In reply to this post by Aleksey Shipilev-4

>> Looking at the crash log, the problematic code is under assert:
>>
>> void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>>    assert(JNIHandles::resolve(value()) == NULL ||
>>           Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>           "Should be in heap");
>>    stream->write_int(CONSTANT_OOP_CODE);
>>    stream->write_handle(value());
>> }
>>
>> So, the proper fix would be to make the verification code more robust.
>
> Actually, in Shenandoah case, the failure is here (it is obscured by inlining, but visible in
> slowdebug):
>    http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>
> ... "== deleted_handle()" or "!= deleted_handle()". So fortifying assert code does not help, the bug
> is touching the oop in improper thread state. Nightmarish scenario would be to fail "!="/"==" check
> because the deleted_handle oop had been already updated, but the actual handle value did not.

> The similar thing happens after we write the oop into the debug stream: it is not handelize-d
> anymore, and so opaque for GC, and thus has all the chance to be garbled by the time GC finishes.

I don't follow you. Can you point to the code you are talking about?

ConstantOopWriteValue::write_on() writes the _handle_ and not the naked
oop [1].

I still believe the assert is the only problem here. If it's valuable to
keep it, then it should perform all necessary precautions itself before
accessing the oop.

The assert is a relatively recent addition. It was added as part of
PermGen removal and most likely the fact it isn't executed in VM was
overlooked.

Best regards,
Vladimir Ivanov

[1]

void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
...
stream->write_handle(value());

...

class ConstantOopWriteValue: public ScopeValue {
  private:
   jobject _value;
  public:
   jobject value() const                { return _value;  }
...

void DebugInfoWriteStream::write_handle(jobject h) {
   write_int(recorder()->oop_recorder()->find_index(h));
}


Best regards,
Vladimir Ivanov

> Having these two thoughts in mind, I think we have to consider _thread_in_vm on that C1 path. But
> where exactly?
>
> Thanks,
> -Aleksey
>
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Aleksey Shipilev-4
On 11/14/2017 05:23 PM, Vladimir Ivanov wrote:
>> Actually, in Shenandoah case, the failure is here (it is obscured by inlining, but visible in
>> slowdebug):
>>    http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>>
>> ... "== deleted_handle()" or "!= deleted_handle()". So fortifying assert code does not help, the bug
>> is touching the oop in improper thread state. Nightmarish scenario would be to fail "!="/"==" check
>> because the deleted_handle oop had been already updated, but the actual handle value did not.
>

See this line:
  http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l224

  } else if ((value == badJNIHandle) || (value == deleted_handle())) {
    value = NULL;
  }
  return value;

If we ever get in that code during the GC, we might get the deleted_handle() updated by GC code
(because it is a root via JNIHandles), but the $value may be not updated, and still point to old
location. Then "value == deleted_handle()" fails, and we return broken oop from guard_value. Notice
this code is not under the assert.

It seems to me the symptom of larger problem: touching naked oops is seldom reliable, if the thread
is inconsistent state. It does not seem right to fix asserts, because the real trouble is touching
that oop on product paths. It appears to work for other GCs, but for Shenandoah this failure is
observable, because its barriers touch heap through the oops.

>> The similar thing happens after we write the oop into the debug stream: it is not handelize-d
>> anymore, and so opaque for GC, and thus has all the chance to be garbled by the time GC finishes.
>
> I don't follow you. Can you point to the code you are talking about?
> ConstantOopWriteValue::write_on() writes the _handle_ and not the naked oop [1].

Right, this part I overlooked.

Thanks,
-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Roman Kennke-6
In reply to this post by Vladimir Ivanov
Am 14.11.2017 um 17:04 schrieb Vladimir Ivanov:
> Thanks, now I see that as well: OopRecorder::find_index() can delegate
> to ObjectLookup::find_index() which does resolve the handle w/o
> transitioning to VM.
>
> But I don't believe you hit that path: ObjectLookup was added as part
> of JVMCI and is guarded by a flag (deduplicate) which is turned on
> only for JVMCI.
Ah ok. Didn't know that.

However, as Aleksey pointed out, we hit JNIHandles::resolve() in product
path, JVMCI or not, and this touches the naked oop by comparing it with
another oop. This doesn't sound like a reliable thing to do.

This simple change seems to fix it:
https://paste.fedoraproject.org/paste/poQ5caTCuN6jHSGbK1n0iQ

Doing more testing...

Roman

> Anyway, I'll file a bug to investigate ObjectLookup::find_index().
>
> Best regards,
> Vladimir Ivanov
>
> On 11/14/17 6:45 PM, Roman Kennke wrote:
>> The code below the assert also unwraps the oop and does lookups with
>> it. I'm not on my computer but I can dig out the relevant parts when
>> I'm back at work...
>>
>> Roman
>>
>>
>> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov
>> <[hidden email]>:
>>
>>     Aleksey,
>>
>>     I agree with your & Roman analysis: compilers shouldn't touch
>> naked oops
>>     unless the thread is in _thread_in_vm mode.
>>
>>     Looking at the crash log, the problematic code is under assert:
>>
>>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>>         assert(JNIHandles::resolve(value()) == NULL ||
>> Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>                "Should be in heap");
>>         stream->write_int(CONSTANT_OOP_CODE);
>>         stream->write_handle(value());
>>     }
>>
>>     So, the proper fix would be to make the verification code more
>> robust.
>>
>>     Best regards,
>>     Vladimir Ivanov
>>
>>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>>
>>         Hi,
>>
>>         In some of our aggressive test configurations for Shenandoah, we
>>         sometimes see the following failure:
>> http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>>
>>         It seems to happen when C1 code installation is happening during
>>         Full GC.
>>
>>         The actual failure is caused by touching the
>>         JNIHandles::deleted_handle() oop in
>>         JNIHandles::guard_value() during JNIHandles::resolve() against
>>         the constant oop handle when we are
>>         recording the debugging information for C1-generated Java call:
>> http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>>
>>         The C1 thread is in _thread_in_native state, and so the runtime
>>         thinks the thread is at safepoint,
>>         but the thread touches the deleted_handle oop(). When Shenandoah
>>         dives into Full GC and moves that
>>         object at the same time, everything crashes and burns.
>>
>>         Is C1 (and any other compiler thread) supposed to transit to
>>         _vm_state when touching the naked oops,
>>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>>         over ci* that seems to transit there
>>         before accessing the heap. Does that mean we need the same
>>         everywhere around JNIHandles::resolve too?
>>
>>         Or is there some other mechanism that is supposed to get
>>         compiler threads to coordinate with GC?
>>
>>         Thanks,
>>         -Aleksey
>>
>>
>> --
>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.


Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Aleksey Shipilev-4
In reply to this post by Aleksey Shipilev-4
On 11/14/2017 05:37 PM, Aleksey Shipilev wrote:

> On 11/14/2017 05:23 PM, Vladimir Ivanov wrote:
>>> Actually, in Shenandoah case, the failure is here (it is obscured by inlining, but visible in
>>> slowdebug):
>>>    http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>>>
>>> ... "== deleted_handle()" or "!= deleted_handle()". So fortifying assert code does not help, the bug
>>> is touching the oop in improper thread state. Nightmarish scenario would be to fail "!="/"==" check
>>> because the deleted_handle oop had been already updated, but the actual handle value did not.
>>
>
> See this line:
>   http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l224
>
>   } else if ((value == badJNIHandle) || (value == deleted_handle())) {
>     value = NULL;
>   }
>   return value;
>
> If we ever get in that code during the GC, we might get the deleted_handle() updated by GC code
> (because it is a root via JNIHandles), but the $value may be not updated, and still point to old
> location. Then "value == deleted_handle()" fails, and we return broken oop from guard_value. Notice
> this code is not under the assert.
>
> It seems to me the symptom of larger problem: touching naked oops is seldom reliable, if the thread
> is inconsistent state. It does not seem right to fix asserts, because the real trouble is touching
> that oop on product paths. It appears to work for other GCs, but for Shenandoah this failure is
> observable, because its barriers touch heap through the oops.
FTR, this is the minimal change that makes Shenandoah pass the stress tests:

$ hg diff
diff -r ac4d369eac91 src/hotspot/share/code/debugInfo.cpp
--- a/src/hotspot/share/code/debugInfo.cpp Tue Nov 14 12:27:52 2017 +0100
+++ b/src/hotspot/share/code/debugInfo.cpp Tue Nov 14 17:54:44 2017 +0100
@@ -209,6 +209,7 @@
 // ConstantOopWriteValue

 void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
+  VM_ENTRY_MARK;
   assert(JNIHandles::resolve(value()) == NULL ||
          Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
          "Should be in heap");

And this is the change that avoids the Shenandoah crash (because it does not do read barriers on
value and deleted_handle()), but still is subject to GC-compiler race explained above:

$ hg diff
diff -r ac4d369eac91 src/hotspot/share/runtime/jniHandles.hpp
--- a/src/hotspot/share/runtime/jniHandles.hpp Tue Nov 14 12:27:52 2017 +0100
+++ b/src/hotspot/share/runtime/jniHandles.hpp Tue Nov 14 17:58:22 2017 +0100
@@ -219,8 +219,8 @@
 inline oop JNIHandles::guard_value(oop value) {
   if (!external_guard) {
     assert(!oopDesc::unsafe_equals(value, badJNIHandle), "Pointing to zapped jni handle area");
-    assert(!oopDesc::equals(value, deleted_handle()),    "Used a deleted global handle");
-  } else if (oopDesc::unsafe_equals(value, badJNIHandle) || oopDesc::equals(value, deleted_handle())) {
+    assert(!oopDesc::unsafe_equals(value, deleted_handle()), "Used a deleted global handle");
+  } else if (oopDesc::unsafe_equals(value, badJNIHandle) || oopDesc::unsafe_equals(value,
deleted_handle())) {
     value = NULL;
   }
   return value;

Neither feels like a proper fix.

Thanks,
-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
In reply to this post by Roman Kennke-6


On 11/14/17 8:00 PM, Roman Kennke wrote:

> Am 14.11.2017 um 17:04 schrieb Vladimir Ivanov:
>> Thanks, now I see that as well: OopRecorder::find_index() can delegate
>> to ObjectLookup::find_index() which does resolve the handle w/o
>> transitioning to VM.
>>
>> But I don't believe you hit that path: ObjectLookup was added as part
>> of JVMCI and is guarded by a flag (deduplicate) which is turned on
>> only for JVMCI.
> Ah ok. Didn't know that.
>
> However, as Aleksey pointed out, we hit JNIHandles::resolve() in product
> path, JVMCI or not, and this touches the naked oop by comparing it with
> another oop. This doesn't sound like a reliable thing to do.

Can you double-check you observe the crash with product binaries as
well? My current understanding is that it happens only with debug builds.

Best regards,
Vladimir Ivanov

> This simple change seems to fix it:
> https://paste.fedoraproject.org/paste/poQ5caTCuN6jHSGbK1n0iQ
>
> Doing more testing...
>
> Roman
>
>> Anyway, I'll file a bug to investigate ObjectLookup::find_index().
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 11/14/17 6:45 PM, Roman Kennke wrote:
>>> The code below the assert also unwraps the oop and does lookups with
>>> it. I'm not on my computer but I can dig out the relevant parts when
>>> I'm back at work...
>>>
>>> Roman
>>>
>>>
>>> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov
>>> <[hidden email]>:
>>>
>>>     Aleksey,
>>>
>>>     I agree with your & Roman analysis: compilers shouldn't touch
>>> naked oops
>>>     unless the thread is in _thread_in_vm mode.
>>>
>>>     Looking at the crash log, the problematic code is under assert:
>>>
>>>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>>>         assert(JNIHandles::resolve(value()) == NULL ||
>>> Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>>                "Should be in heap");
>>>         stream->write_int(CONSTANT_OOP_CODE);
>>>         stream->write_handle(value());
>>>     }
>>>
>>>     So, the proper fix would be to make the verification code more
>>> robust.
>>>
>>>     Best regards,
>>>     Vladimir Ivanov
>>>
>>>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>>>
>>>         Hi,
>>>
>>>         In some of our aggressive test configurations for Shenandoah, we
>>>         sometimes see the following failure:
>>> http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>>>
>>>         It seems to happen when C1 code installation is happening during
>>>         Full GC.
>>>
>>>         The actual failure is caused by touching the
>>>         JNIHandles::deleted_handle() oop in
>>>         JNIHandles::guard_value() during JNIHandles::resolve() against
>>>         the constant oop handle when we are
>>>         recording the debugging information for C1-generated Java call:
>>> http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220 
>>>
>>>
>>>         The C1 thread is in _thread_in_native state, and so the runtime
>>>         thinks the thread is at safepoint,
>>>         but the thread touches the deleted_handle oop(). When Shenandoah
>>>         dives into Full GC and moves that
>>>         object at the same time, everything crashes and burns.
>>>
>>>         Is C1 (and any other compiler thread) supposed to transit to
>>>         _vm_state when touching the naked oops,
>>>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>>>         over ci* that seems to transit there
>>>         before accessing the heap. Does that mean we need the same
>>>         everywhere around JNIHandles::resolve too?
>>>
>>>         Or is there some other mechanism that is supposed to get
>>>         compiler threads to coordinate with GC?
>>>
>>>         Thanks,
>>>         -Aleksey
>>>
>>>
>>> --
>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Roman Kennke-6
In reply to this post by Roman Kennke-6
Am 14.11.2017 um 18:00 schrieb Roman Kennke:

> Am 14.11.2017 um 17:04 schrieb Vladimir Ivanov:
>> Thanks, now I see that as well: OopRecorder::find_index() can
>> delegate to ObjectLookup::find_index() which does resolve the handle
>> w/o transitioning to VM.
>>
>> But I don't believe you hit that path: ObjectLookup was added as part
>> of JVMCI and is guarded by a flag (deduplicate) which is turned on
>> only for JVMCI.
> Ah ok. Didn't know that.
>
> However, as Aleksey pointed out, we hit JNIHandles::resolve() in
> product path, JVMCI or not, and this touches the naked oop by
> comparing it with another oop. This doesn't sound like a reliable
> thing to do.
Scratch that. Looking at the code paths again, this doesn't seem to be
true. I.e. we hit JNIHandles::resolve() only in assert and ObjectLookup
(I trust you that it's only JVMCI). Not sure if it can be robust to
compare oops in assert paths. It sure is a race and doesn't feel very well.

I wonder if we should introduce a CollectedHeap::is_in_or_null(jobject)
method, and let the GC figure it out. It might actually have a way to
check it (simple address range check) without sending the thread to VM
state.

Roman

>
> This simple change seems to fix it:
> https://paste.fedoraproject.org/paste/poQ5caTCuN6jHSGbK1n0iQ
>
> Doing more testing...
>
> Roman
>
>> Anyway, I'll file a bug to investigate ObjectLookup::find_index().
>>
>> Best regards,
>> Vladimir Ivanov
>>
>> On 11/14/17 6:45 PM, Roman Kennke wrote:
>>> The code below the assert also unwraps the oop and does lookups with
>>> it. I'm not on my computer but I can dig out the relevant parts when
>>> I'm back at work...
>>>
>>> Roman
>>>
>>>
>>> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov
>>> <[hidden email]>:
>>>
>>>     Aleksey,
>>>
>>>     I agree with your & Roman analysis: compilers shouldn't touch
>>> naked oops
>>>     unless the thread is in _thread_in_vm mode.
>>>
>>>     Looking at the crash log, the problematic code is under assert:
>>>
>>>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream*
>>> stream) {
>>>         assert(JNIHandles::resolve(value()) == NULL ||
>>> Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>>                "Should be in heap");
>>>         stream->write_int(CONSTANT_OOP_CODE);
>>>         stream->write_handle(value());
>>>     }
>>>
>>>     So, the proper fix would be to make the verification code more
>>> robust.
>>>
>>>     Best regards,
>>>     Vladimir Ivanov
>>>
>>>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>>>
>>>         Hi,
>>>
>>>         In some of our aggressive test configurations for
>>> Shenandoah, we
>>>         sometimes see the following failure:
>>> http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>>>
>>>         It seems to happen when C1 code installation is happening
>>> during
>>>         Full GC.
>>>
>>>         The actual failure is caused by touching the
>>>         JNIHandles::deleted_handle() oop in
>>>         JNIHandles::guard_value() during JNIHandles::resolve() against
>>>         the constant oop handle when we are
>>>         recording the debugging information for C1-generated Java call:
>>> http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220 
>>>
>>>
>>>         The C1 thread is in _thread_in_native state, and so the runtime
>>>         thinks the thread is at safepoint,
>>>         but the thread touches the deleted_handle oop(). When
>>> Shenandoah
>>>         dives into Full GC and moves that
>>>         object at the same time, everything crashes and burns.
>>>
>>>         Is C1 (and any other compiler thread) supposed to transit to
>>>         _vm_state when touching the naked oops,
>>>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>>>         over ci* that seems to transit there
>>>         before accessing the heap. Does that mean we need the same
>>>         everywhere around JNIHandles::resolve too?
>>>
>>>         Or is there some other mechanism that is supposed to get
>>>         compiler threads to coordinate with GC?
>>>
>>>         Thanks,
>>>         -Aleksey
>>>
>>>
>>> --
>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Aleksey Shipilev-4
In reply to this post by Aleksey Shipilev-4
On 11/14/2017 06:04 PM, Aleksey Shipilev wrote:

> On 11/14/2017 05:37 PM, Aleksey Shipilev wrote:
>> On 11/14/2017 05:23 PM, Vladimir Ivanov wrote:
>>>> Actually, in Shenandoah case, the failure is here (it is obscured by inlining, but visible in
>>>> slowdebug):
>>>>    http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>>>>
>>>> ... "== deleted_handle()" or "!= deleted_handle()". So fortifying assert code does not help, the bug
>>>> is touching the oop in improper thread state. Nightmarish scenario would be to fail "!="/"==" check
>>>> because the deleted_handle oop had been already updated, but the actual handle value did not.
>>>
>>
>> See this line:
>>   http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l224
>>
>>   } else if ((value == badJNIHandle) || (value == deleted_handle())) {
>>     value = NULL;
>>   }
>>   return value;
>>
>> If we ever get in that code during the GC, we might get the deleted_handle() updated by GC code
>> (because it is a root via JNIHandles), but the $value may be not updated, and still point to old
>> location. Then "value == deleted_handle()" fails, and we return broken oop from guard_value. Notice
>> this code is not under the assert.
>>
>> It seems to me the symptom of larger problem: touching naked oops is seldom reliable, if the thread
>> is inconsistent state. It does not seem right to fix asserts, because the real trouble is touching
>> that oop on product paths. It appears to work for other GCs, but for Shenandoah this failure is
>> observable, because its barriers touch heap through the oops.
>
> FTR, this is the minimal change that makes Shenandoah pass the stress tests:
>
> $ hg diff
> diff -r ac4d369eac91 src/hotspot/share/code/debugInfo.cpp
> --- a/src/hotspot/share/code/debugInfo.cpp Tue Nov 14 12:27:52 2017 +0100
> +++ b/src/hotspot/share/code/debugInfo.cpp Tue Nov 14 17:54:44 2017 +0100
> @@ -209,6 +209,7 @@
>  // ConstantOopWriteValue
>
>  void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
> +  VM_ENTRY_MARK;
>    assert(JNIHandles::resolve(value()) == NULL ||
>           Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>           "Should be in heap");
>
> Neither feels like a proper fix.
Well, VM_ENTRY_MARK is a decent enough workaround, so we are flying with that:
  http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-November/004296.html

Happy to pick up the proper fix, if this is not the one.

-Aleksey


signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
FYI just filed JDK-8191227 [1]. Thanks for the report!

Best regards,
Vladimir Ivanov

[1] https://bugs.openjdk.java.net/browse/JDK-8191227

On 11/14/17 9:18 PM, Aleksey Shipilev wrote:

> On 11/14/2017 06:04 PM, Aleksey Shipilev wrote:
>> On 11/14/2017 05:37 PM, Aleksey Shipilev wrote:
>>> On 11/14/2017 05:23 PM, Vladimir Ivanov wrote:
>>>>> Actually, in Shenandoah case, the failure is here (it is obscured by inlining, but visible in
>>>>> slowdebug):
>>>>>     http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220
>>>>>
>>>>> ... "== deleted_handle()" or "!= deleted_handle()". So fortifying assert code does not help, the bug
>>>>> is touching the oop in improper thread state. Nightmarish scenario would be to fail "!="/"==" check
>>>>> because the deleted_handle oop had been already updated, but the actual handle value did not.
>>>>
>>>
>>> See this line:
>>>    http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l224
>>>
>>>    } else if ((value == badJNIHandle) || (value == deleted_handle())) {
>>>      value = NULL;
>>>    }
>>>    return value;
>>>
>>> If we ever get in that code during the GC, we might get the deleted_handle() updated by GC code
>>> (because it is a root via JNIHandles), but the $value may be not updated, and still point to old
>>> location. Then "value == deleted_handle()" fails, and we return broken oop from guard_value. Notice
>>> this code is not under the assert.
>>>
>>> It seems to me the symptom of larger problem: touching naked oops is seldom reliable, if the thread
>>> is inconsistent state. It does not seem right to fix asserts, because the real trouble is touching
>>> that oop on product paths. It appears to work for other GCs, but for Shenandoah this failure is
>>> observable, because its barriers touch heap through the oops.
>>
>> FTR, this is the minimal change that makes Shenandoah pass the stress tests:
>>
>> $ hg diff
>> diff -r ac4d369eac91 src/hotspot/share/code/debugInfo.cpp
>> --- a/src/hotspot/share/code/debugInfo.cpp Tue Nov 14 12:27:52 2017 +0100
>> +++ b/src/hotspot/share/code/debugInfo.cpp Tue Nov 14 17:54:44 2017 +0100
>> @@ -209,6 +209,7 @@
>>   // ConstantOopWriteValue
>>
>>   void ConstantOopWriteValue::write_on(DebugInfoWriteStream* stream) {
>> +  VM_ENTRY_MARK;
>>     assert(JNIHandles::resolve(value()) == NULL ||
>>            Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>            "Should be in heap");
>>
>> Neither feels like a proper fix.
>
> Well, VM_ENTRY_MARK is a decent enough workaround, so we are flying with that:
>    http://mail.openjdk.java.net/pipermail/shenandoah-dev/2017-November/004296.html
>
> Happy to pick up the proper fix, if this is not the one.
>
> -Aleksey
>
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
In reply to this post by Roman Kennke-6
>> However, as Aleksey pointed out, we hit JNIHandles::resolve() in
>> product path, JVMCI or not, and this touches the naked oop by
>> comparing it with another oop. This doesn't sound like a reliable
>> thing to do.
> Scratch that. Looking at the code paths again, this doesn't seem to be
> true. I.e. we hit JNIHandles::resolve() only in assert and ObjectLookup
> (I trust you that it's only JVMCI). Not sure if it can be robust to
> compare oops in assert paths. It sure is a race and doesn't feel very well.
>
> I wonder if we should introduce a CollectedHeap::is_in_or_null(jobject)
> method, and let the GC figure it out. It might actually have a way to
> check it (simple address range check) without sending the thread to VM
> state.

Yes, the assert just sanity checks that the handlized oop points into
the heap.

Depending on the complexity of the check, I'd consider the following
options:
   * dedicated CollectedHeap::is_in_or_null(jobject)
   * ThreadInVMfromNative under #ifdef ASSERT (or factored out)
   * completely remove the assert

Best regards,
Vladimir Ivanov

>>
>> This simple change seems to fix it:
>> https://paste.fedoraproject.org/paste/poQ5caTCuN6jHSGbK1n0iQ
>>
>> Doing more testing...
>>
>> Roman
>>
>>> Anyway, I'll file a bug to investigate ObjectLookup::find_index().
>>>
>>> Best regards,
>>> Vladimir Ivanov
>>>
>>> On 11/14/17 6:45 PM, Roman Kennke wrote:
>>>> The code below the assert also unwraps the oop and does lookups with
>>>> it. I'm not on my computer but I can dig out the relevant parts when
>>>> I'm back at work...
>>>>
>>>> Roman
>>>>
>>>>
>>>> Am 14. November 2017 16:36:10 MEZ schrieb Vladimir Ivanov
>>>> <[hidden email]>:
>>>>
>>>>     Aleksey,
>>>>
>>>>     I agree with your & Roman analysis: compilers shouldn't touch
>>>> naked oops
>>>>     unless the thread is in _thread_in_vm mode.
>>>>
>>>>     Looking at the crash log, the problematic code is under assert:
>>>>
>>>>     void ConstantOopWriteValue::write_on(DebugInfoWriteStream*
>>>> stream) {
>>>>         assert(JNIHandles::resolve(value()) == NULL ||
>>>> Universe::heap()->is_in_reserved(JNIHandles::resolve(value())),
>>>>                "Should be in heap");
>>>>         stream->write_int(CONSTANT_OOP_CODE);
>>>>         stream->write_handle(value());
>>>>     }
>>>>
>>>>     So, the proper fix would be to make the verification code more
>>>> robust.
>>>>
>>>>     Best regards,
>>>>     Vladimir Ivanov
>>>>
>>>>     On 11/14/17 5:16 PM, Aleksey Shipilev wrote:
>>>>
>>>>         Hi,
>>>>
>>>>         In some of our aggressive test configurations for
>>>> Shenandoah, we
>>>>         sometimes see the following failure:
>>>> http://cr.openjdk.java.net/~shade/shenandoah/c1-race-fail-hs_err.log
>>>>
>>>>         It seems to happen when C1 code installation is happening
>>>> during
>>>>         Full GC.
>>>>
>>>>         The actual failure is caused by touching the
>>>>         JNIHandles::deleted_handle() oop in
>>>>         JNIHandles::guard_value() during JNIHandles::resolve() against
>>>>         the constant oop handle when we are
>>>>         recording the debugging information for C1-generated Java call:
>>>> http://hg.openjdk.java.net/jdk/hs/file/5caa1d5f74c1/src/hotspot/share/runtime/jniHandles.hpp#l220 
>>>>
>>>>
>>>>         The C1 thread is in _thread_in_native state, and so the runtime
>>>>         thinks the thread is at safepoint,
>>>>         but the thread touches the deleted_handle oop(). When
>>>> Shenandoah
>>>>         dives into Full GC and moves that
>>>>         object at the same time, everything crashes and burns.
>>>>
>>>>         Is C1 (and any other compiler thread) supposed to transit to
>>>>         _vm_state when touching the naked oops,
>>>>         and thus coordinate with safepoints? I see VM_ENTRY_MARK all
>>>>         over ci* that seems to transit there
>>>>         before accessing the heap. Does that mean we need the same
>>>>         everywhere around JNIHandles::resolve too?
>>>>
>>>>         Or is there some other mechanism that is supposed to get
>>>>         compiler threads to coordinate with GC?
>>>>
>>>>         Thanks,
>>>>         -Aleksey
>>>>
>>>>
>>>> --
>>>> Diese Nachricht wurde von meinem Android-Gerät mit K-9 Mail gesendet.
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Roman Kennke-6
Am 14.11.2017 um 19:38 schrieb Vladimir Ivanov:

>>> However, as Aleksey pointed out, we hit JNIHandles::resolve() in
>>> product path, JVMCI or not, and this touches the naked oop by
>>> comparing it with another oop. This doesn't sound like a reliable
>>> thing to do.
>> Scratch that. Looking at the code paths again, this doesn't seem to
>> be true. I.e. we hit JNIHandles::resolve() only in assert and
>> ObjectLookup (I trust you that it's only JVMCI). Not sure if it can
>> be robust to compare oops in assert paths. It sure is a race and
>> doesn't feel very well.
>>
>> I wonder if we should introduce a
>> CollectedHeap::is_in_or_null(jobject) method, and let the GC figure
>> it out. It might actually have a way to check it (simple address
>> range check) without sending the thread to VM state.
>
> Yes, the assert just sanity checks that the handlized oop points into
> the heap.
>
> Depending on the complexity of the check, I'd consider the following
> options:
>   * dedicated CollectedHeap::is_in_or_null(jobject)
This would require some sort of JNIHandles::resolve_raw() or similar,
that bypasses the checks. I don't really like it.
>   * ThreadInVMfromNative under #ifdef ASSERT (or factored out)
Different safepointing behaviour in release vs. debug build? Don't know...
>   * completely remove the assert

Maybe.

The root of the evil is still that we're accessing the naked oop while
potentially at a safepoint (and oops moving around). You already
established that JVMCI doesn't have the problem because it's already
transitioned into VM. Why not make the other compilers consistent? I.e.
transition into VM (VM_ENTRY_MARK) before going into the same code path.
From the looks of it, code paths start to be shared between JVMCI and
others starting around
DebugInformationRecorder::create_scope_values(GrowableArray<ScopeValue*>*),
i.e. the VM_ENTRY_MARK could be placed around that in, e.g.,

void CodeEmitInfo::record_debug_info(DebugInformationRecorder* recorder,
int pc_offset) {
   // record the safepoint before recording the debug info for enclosing
scopes
   VM_ENTRY_MARK;
   recorder->add_safepoint(pc_offset, _oop_map->deep_copy());
   _scope_debug_info->record_debug_info(recorder, pc_offset,
true/*topmost*/, _is_method_handle_invoke);
   recorder->end_safepoint(pc_offset);
}

Although similarities between this code and related code in JVMCI would
suggest to place the VM_ENTRY_MARK even higher in the call stack.

WDYT?

Roman

Reply | Threaded
Open this post in threaded view
|

Re: C1 code installation and JNIHandle::deleted_handle() oop

Vladimir Ivanov
> The root of the evil is still that we're accessing the naked oop while
> potentially at a safepoint (and oops moving around). You already
> established that JVMCI doesn't have the problem because it's already
> transitioned into VM. Why not make the other compilers consistent? I.e.
> transition into VM (VM_ENTRY_MARK) before going into the same code path.
> From the looks of it, code paths start to be shared between JVMCI and
> others starting around
> DebugInformationRecorder::create_scope_values(GrowableArray<ScopeValue*>*),
> i.e. the VM_ENTRY_MARK could be placed around that in, e.g.,
>
> void CodeEmitInfo::record_debug_info(DebugInformationRecorder* recorder,
> int pc_offset) {
>    // record the safepoint before recording the debug info for enclosing
> scopes
>    VM_ENTRY_MARK;
>    recorder->add_safepoint(pc_offset, _oop_map->deep_copy());
>    _scope_debug_info->record_debug_info(recorder, pc_offset,
> true/*topmost*/, _is_method_handle_invoke);
>    recorder->end_safepoint(pc_offset);
> }
>
> Although similarities between this code and related code in JVMCI would
> suggest to place the VM_ENTRY_MARK even higher in the call stack.
>
> WDYT?

(JVMCI is a different story, so leaving it aside for now.)

IMO the problem in ConstantOopWriteValue::write_on() doesn't justify
such change. The problematic assert does something very unusual for
existing compilers in the JVM and adjusting product code to fix that
doesn't feel right.

Compiler threads were deliberately designed as JavaThreads running in
native state to reduce interference with other parts of the JVM.
Compiler interface (CI) encapsulates interactions with runtime and
compilers code base is mostly free of handles & naked oops. In other
words, compilers were written in a way to minimize the risk of accessing
naked oops w/o proper coordination with the rest of runtime.

So, instead of precausiously putting VM_ENTRY_MARKs here and there, I'd
prefer to see additional verification code to catch such problems in
newly introduced code. IMO the lack of such functionality was the main
reason why the bug slipped through testing.

Best regards,
Vladimir Ivanov