Quantcast

RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Chris Plummer
Hello,

Please review the following:

http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
https://bugs.openjdk.java.net/browse/JDK-8172020

[Note there are s390, ppc, and aarch64 changes in this bug fix that I
will need to have someone verify for me or I won't be pushing them.]

This bug turns up with a closed test so the CR is marked confidential.
I'll try to provided the necessary details here. Sorry the explanation
is lengthy, but it is a complex bug (with a relatively simple fix).

If you are on the s390, ppc, or aarch64 teams and just want to help
verify the fix, I would say it's not necessary to understand the details
of the bug. What's most important is you test the changes. Unfortunately
you won't be able to reproduce the issue since the test is closed, but
you should at least verify the fix builds and introduces no new issues.
I did test the open aarch64 port since I have the hardware, and someone
was kind enough to provide me with binaries both without the patch
(which reproduced the bug) and with the patch (which passed). No other
aarch64 testing was done.

Now on to an explanation of the bug and the fix:

The test is testing the JDI/JVMTI forceEarlyReturn support. After the
debugger side spawns the debugee process and the debugee thread is
created, a vm.suspend() is issued to suspend all threads on the debugee
side. The debugger side then issues a JDI forceEarlyReturn, which will
cause the debugee thread to exit the currently executing method.
Although the intent is to force exit of the run() method, the run()
method does make some method calls to some very small methods with
"inline" in their name, such as inlineMethodReturningObject(). If
forceEarlyReturn is done while in one of them, JDI throws
InvalidTargetException, which the test correctly handles and will pass
if that happens.

If the run() method is not yet compiled, what *normally* happens to
force the thread to suspend is to swap in the interpreter safepoint
dispatch table. This will result in a call_VM out to the interpreter to
handle the safepointing. When later the thread is resumed (after
initiating the forceEarlyReturn), there is code on the return side of
call_VM to make sure the earlyret is handled immediately before any
other bytecodes are exectuted. Thus the method that was executing at the
time of the forceEarlyReturn should be the same as the method when the
earlyret is handled. In fact it should be the same bytetcode(bci). Like
I said, this is what *normally* happens, but is not the way the bug is
exposed.

The bug turns up when the run() method is not yet compiled, and when the
suspend is attempted we are in one of the small "inline" methods, and it
is compiled. In this case thread suspension is achieved in a different
manner. When a compiled method executes its return code, it first pops
its frame, and then the " poll_return" safepoint polling instruction is
executed. This allows the thread to be suspended. We are technically in
the run() method at this point since the frame of the "inline"  method
has been popped. Therefore the forceEarlyReturn is allowed. When the
thread is resumed, the return from the "inline" method finally happens,
which puts us in code generated by
TemplateInterpreterGenerator::generate_return_entry_for(). However,
there are no checks for an earlyret here, so we start executing
bytecodes until we finally hit an earlyret check. This happens when the
next invokespecial is done. There is a call_VM after the frame is
pushed, and the call_VM always does an earlyret check. The problem is
now the topmost frame (in the case where it asserts) is
inlineMethodReturningObject(). Since it returns an object type, the
earlyret code expects there to be an oop in the earlyret slot on the
stack. But there isn't because the forceEarlyReturn was done on the void
run() method. So the earlyret code fetches garbage, does an validity
test on it, and gets the assert you see in the bug title.

So what is needed here is an earlyret check after the compiled "inline"
method returns, and before any new bytecodes are executed. Since
TemplateInterpreterGenerator::generate_return_entry_for() is always
executed when the compiled "inline" method returns, the check was added
here. For completeness, the popframe check was also added. The header
file changes are just to make these methods public so they can be called.

I tested the failed test at least 100 times on each supported platform.
The assert usually showed up at least 1 in 50 times. I also ran a large
set of tests on all supported platforms, including everything we do for
nightly testing except for some of the longest running tests.

thanks,

Chris

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

serguei.spitsyn@oracle.com
Hi Chris,

It looks good to me.
But let me mediate on the fix a little bit more.
Consider it reviewed if I'll not come back in 24 hours. :)

Thank you a lot for this great investigation and fix!
Serguei


On 2/21/17 11:31, Chris Plummer wrote:

> Hello,
>
> Please review the following:
>
> http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
> https://bugs.openjdk.java.net/browse/JDK-8172020
>
> [Note there are s390, ppc, and aarch64 changes in this bug fix that I
> will need to have someone verify for me or I won't be pushing them.]
>
> This bug turns up with a closed test so the CR is marked confidential.
> I'll try to provided the necessary details here. Sorry the explanation
> is lengthy, but it is a complex bug (with a relatively simple fix).
>
> If you are on the s390, ppc, or aarch64 teams and just want to help
> verify the fix, I would say it's not necessary to understand the
> details of the bug. What's most important is you test the changes.
> Unfortunately you won't be able to reproduce the issue since the test
> is closed, but you should at least verify the fix builds and
> introduces no new issues. I did test the open aarch64 port since I
> have the hardware, and someone was kind enough to provide me with
> binaries both without the patch (which reproduced the bug) and with
> the patch (which passed). No other aarch64 testing was done.
>
> Now on to an explanation of the bug and the fix:
>
> The test is testing the JDI/JVMTI forceEarlyReturn support. After the
> debugger side spawns the debugee process and the debugee thread is
> created, a vm.suspend() is issued to suspend all threads on the
> debugee side. The debugger side then issues a JDI forceEarlyReturn,
> which will cause the debugee thread to exit the currently executing
> method. Although the intent is to force exit of the run() method, the
> run() method does make some method calls to some very small methods
> with "inline" in their name, such as inlineMethodReturningObject(). If
> forceEarlyReturn is done while in one of them, JDI throws
> InvalidTargetException, which the test correctly handles and will pass
> if that happens.
>
> If the run() method is not yet compiled, what *normally* happens to
> force the thread to suspend is to swap in the interpreter safepoint
> dispatch table. This will result in a call_VM out to the interpreter
> to handle the safepointing. When later the thread is resumed (after
> initiating the forceEarlyReturn), there is code on the return side of
> call_VM to make sure the earlyret is handled immediately before any
> other bytecodes are exectuted. Thus the method that was executing at
> the time of the forceEarlyReturn should be the same as the method when
> the earlyret is handled. In fact it should be the same bytetcode(bci).
> Like I said, this is what *normally* happens, but is not the way the
> bug is exposed.
>
> The bug turns up when the run() method is not yet compiled, and when
> the suspend is attempted we are in one of the small "inline" methods,
> and it is compiled. In this case thread suspension is achieved in a
> different manner. When a compiled method executes its return code, it
> first pops its frame, and then the " poll_return" safepoint polling
> instruction is executed. This allows the thread to be suspended. We
> are technically in the run() method at this point since the frame of
> the "inline"  method has been popped. Therefore the forceEarlyReturn
> is allowed. When the thread is resumed, the return from the "inline"
> method finally happens, which puts us in code generated by
> TemplateInterpreterGenerator::generate_return_entry_for(). However,
> there are no checks for an earlyret here, so we start executing
> bytecodes until we finally hit an earlyret check. This happens when
> the next invokespecial is done. There is a call_VM after the frame is
> pushed, and the call_VM always does an earlyret check. The problem is
> now the topmost frame (in the case where it asserts) is
> inlineMethodReturningObject(). Since it returns an object type, the
> earlyret code expects there to be an oop in the earlyret slot on the
> stack. But there isn't because the forceEarlyReturn was done on the
> void run() method. So the earlyret code fetches garbage, does an
> validity test on it, and gets the assert you see in the bug title.
>
> So what is needed here is an earlyret check after the compiled
> "inline" method returns, and before any new bytecodes are executed.
> Since TemplateInterpreterGenerator::generate_return_entry_for() is
> always executed when the compiled "inline" method returns, the check
> was added here. For completeness, the popframe check was also added.
> The header file changes are just to make these methods public so they
> can be called.
>
> I tested the failed test at least 100 times on each supported
> platform. The assert usually showed up at least 1 in 50 times. I also
> ran a large set of tests on all supported platforms, including
> everything we do for nightly testing except for some of the longest
> running tests.
>
> thanks,
>
> Chris
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Chris Plummer
Thanks Serguei!

On 2/21/17 3:29 PM, [hidden email] wrote:

> Hi Chris,
>
> It looks good to me.
> But let me mediate on the fix a little bit more.
> Consider it reviewed if I'll not come back in 24 hours. :)
>
> Thank you a lot for this great investigation and fix!
> Serguei
>
>
> On 2/21/17 11:31, Chris Plummer wrote:
>> Hello,
>>
>> Please review the following:
>>
>> http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
>> https://bugs.openjdk.java.net/browse/JDK-8172020
>>
>> [Note there are s390, ppc, and aarch64 changes in this bug fix that I
>> will need to have someone verify for me or I won't be pushing them.]
>>
>> This bug turns up with a closed test so the CR is marked
>> confidential. I'll try to provided the necessary details here. Sorry
>> the explanation is lengthy, but it is a complex bug (with a
>> relatively simple fix).
>>
>> If you are on the s390, ppc, or aarch64 teams and just want to help
>> verify the fix, I would say it's not necessary to understand the
>> details of the bug. What's most important is you test the changes.
>> Unfortunately you won't be able to reproduce the issue since the test
>> is closed, but you should at least verify the fix builds and
>> introduces no new issues. I did test the open aarch64 port since I
>> have the hardware, and someone was kind enough to provide me with
>> binaries both without the patch (which reproduced the bug) and with
>> the patch (which passed). No other aarch64 testing was done.
>>
>> Now on to an explanation of the bug and the fix:
>>
>> The test is testing the JDI/JVMTI forceEarlyReturn support. After the
>> debugger side spawns the debugee process and the debugee thread is
>> created, a vm.suspend() is issued to suspend all threads on the
>> debugee side. The debugger side then issues a JDI forceEarlyReturn,
>> which will cause the debugee thread to exit the currently executing
>> method. Although the intent is to force exit of the run() method, the
>> run() method does make some method calls to some very small methods
>> with "inline" in their name, such as inlineMethodReturningObject().
>> If forceEarlyReturn is done while in one of them, JDI throws
>> InvalidTargetException, which the test correctly handles and will
>> pass if that happens.
>>
>> If the run() method is not yet compiled, what *normally* happens to
>> force the thread to suspend is to swap in the interpreter safepoint
>> dispatch table. This will result in a call_VM out to the interpreter
>> to handle the safepointing. When later the thread is resumed (after
>> initiating the forceEarlyReturn), there is code on the return side of
>> call_VM to make sure the earlyret is handled immediately before any
>> other bytecodes are exectuted. Thus the method that was executing at
>> the time of the forceEarlyReturn should be the same as the method
>> when the earlyret is handled. In fact it should be the same
>> bytetcode(bci). Like I said, this is what *normally* happens, but is
>> not the way the bug is exposed.
>>
>> The bug turns up when the run() method is not yet compiled, and when
>> the suspend is attempted we are in one of the small "inline" methods,
>> and it is compiled. In this case thread suspension is achieved in a
>> different manner. When a compiled method executes its return code, it
>> first pops its frame, and then the " poll_return" safepoint polling
>> instruction is executed. This allows the thread to be suspended. We
>> are technically in the run() method at this point since the frame of
>> the "inline"  method has been popped. Therefore the forceEarlyReturn
>> is allowed. When the thread is resumed, the return from the "inline"
>> method finally happens, which puts us in code generated by
>> TemplateInterpreterGenerator::generate_return_entry_for(). However,
>> there are no checks for an earlyret here, so we start executing
>> bytecodes until we finally hit an earlyret check. This happens when
>> the next invokespecial is done. There is a call_VM after the frame is
>> pushed, and the call_VM always does an earlyret check. The problem is
>> now the topmost frame (in the case where it asserts) is
>> inlineMethodReturningObject(). Since it returns an object type, the
>> earlyret code expects there to be an oop in the earlyret slot on the
>> stack. But there isn't because the forceEarlyReturn was done on the
>> void run() method. So the earlyret code fetches garbage, does an
>> validity test on it, and gets the assert you see in the bug title.
>>
>> So what is needed here is an earlyret check after the compiled
>> "inline" method returns, and before any new bytecodes are executed.
>> Since TemplateInterpreterGenerator::generate_return_entry_for() is
>> always executed when the compiled "inline" method returns, the check
>> was added here. For completeness, the popframe check was also added.
>> The header file changes are just to make these methods public so they
>> can be called.
>>
>> I tested the failed test at least 100 times on each supported
>> platform. The assert usually showed up at least 1 in 50 times. I also
>> ran a large set of tests on all supported platforms, including
>> everything we do for nightly testing except for some of the longest
>> running tests.
>>
>> thanks,
>>
>> Chris
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Volker Simonis
In reply to this post by Chris Plummer
Hi Chris,

thanks for considering ppc64 and s390x!
Your change builds and works on both platforms.

Regards,
Volker


On Tue, Feb 21, 2017 at 8:31 PM, Chris Plummer <[hidden email]> wrote:

> Hello,
>
> Please review the following:
>
> http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
> https://bugs.openjdk.java.net/browse/JDK-8172020
>
> [Note there are s390, ppc, and aarch64 changes in this bug fix that I will
> need to have someone verify for me or I won't be pushing them.]
>
> This bug turns up with a closed test so the CR is marked confidential. I'll
> try to provided the necessary details here. Sorry the explanation is
> lengthy, but it is a complex bug (with a relatively simple fix).
>
> If you are on the s390, ppc, or aarch64 teams and just want to help verify
> the fix, I would say it's not necessary to understand the details of the
> bug. What's most important is you test the changes. Unfortunately you won't
> be able to reproduce the issue since the test is closed, but you should at
> least verify the fix builds and introduces no new issues. I did test the
> open aarch64 port since I have the hardware, and someone was kind enough to
> provide me with binaries both without the patch (which reproduced the bug)
> and with the patch (which passed). No other aarch64 testing was done.
>
> Now on to an explanation of the bug and the fix:
>
> The test is testing the JDI/JVMTI forceEarlyReturn support. After the
> debugger side spawns the debugee process and the debugee thread is created,
> a vm.suspend() is issued to suspend all threads on the debugee side. The
> debugger side then issues a JDI forceEarlyReturn, which will cause the
> debugee thread to exit the currently executing method. Although the intent
> is to force exit of the run() method, the run() method does make some method
> calls to some very small methods with "inline" in their name, such as
> inlineMethodReturningObject(). If forceEarlyReturn is done while in one of
> them, JDI throws InvalidTargetException, which the test correctly handles
> and will pass if that happens.
>
> If the run() method is not yet compiled, what *normally* happens to force
> the thread to suspend is to swap in the interpreter safepoint dispatch
> table. This will result in a call_VM out to the interpreter to handle the
> safepointing. When later the thread is resumed (after initiating the
> forceEarlyReturn), there is code on the return side of call_VM to make sure
> the earlyret is handled immediately before any other bytecodes are
> exectuted. Thus the method that was executing at the time of the
> forceEarlyReturn should be the same as the method when the earlyret is
> handled. In fact it should be the same bytetcode(bci). Like I said, this is
> what *normally* happens, but is not the way the bug is exposed.
>
> The bug turns up when the run() method is not yet compiled, and when the
> suspend is attempted we are in one of the small "inline" methods, and it is
> compiled. In this case thread suspension is achieved in a different manner.
> When a compiled method executes its return code, it first pops its frame,
> and then the " poll_return" safepoint polling instruction is executed. This
> allows the thread to be suspended. We are technically in the run() method at
> this point since the frame of the "inline"  method has been popped.
> Therefore the forceEarlyReturn is allowed. When the thread is resumed, the
> return from the "inline" method finally happens, which puts us in code
> generated by TemplateInterpreterGenerator::generate_return_entry_for().
> However, there are no checks for an earlyret here, so we start executing
> bytecodes until we finally hit an earlyret check. This happens when the next
> invokespecial is done. There is a call_VM after the frame is pushed, and the
> call_VM always does an earlyret check. The problem is now the topmost frame
> (in the case where it asserts) is inlineMethodReturningObject(). Since it
> returns an object type, the earlyret code expects there to be an oop in the
> earlyret slot on the stack. But there isn't because the forceEarlyReturn was
> done on the void run() method. So the earlyret code fetches garbage, does an
> validity test on it, and gets the assert you see in the bug title.
>
> So what is needed here is an earlyret check after the compiled "inline"
> method returns, and before any new bytecodes are executed. Since
> TemplateInterpreterGenerator::generate_return_entry_for() is always executed
> when the compiled "inline" method returns, the check was added here. For
> completeness, the popframe check was also added. The header file changes are
> just to make these methods public so they can be called.
>
> I tested the failed test at least 100 times on each supported platform. The
> assert usually showed up at least 1 in 50 times. I also ran a large set of
> tests on all supported platforms, including everything we do for nightly
> testing except for some of the longest running tests.
>
> thanks,
>
> Chris
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Chris Plummer
In reply to this post by serguei.spitsyn@oracle.com
Hello,

I still need one more reviewer for this change.

thanks,

Chris

On 2/21/17 3:29 PM, [hidden email] wrote:

> Hi Chris,
>
> It looks good to me.
> But let me mediate on the fix a little bit more.
> Consider it reviewed if I'll not come back in 24 hours. :)
>
> Thank you a lot for this great investigation and fix!
> Serguei
>
>
> On 2/21/17 11:31, Chris Plummer wrote:
>> Hello,
>>
>> Please review the following:
>>
>> http://cr.openjdk.java.net/~cjplummer/8172020/webrev.01/webrev.hotspot/
>> https://bugs.openjdk.java.net/browse/JDK-8172020
>>
>> [Note there are s390, ppc, and aarch64 changes in this bug fix that I
>> will need to have someone verify for me or I won't be pushing them.]
>>
>> This bug turns up with a closed test so the CR is marked
>> confidential. I'll try to provided the necessary details here. Sorry
>> the explanation is lengthy, but it is a complex bug (with a
>> relatively simple fix).
>>
>> If you are on the s390, ppc, or aarch64 teams and just want to help
>> verify the fix, I would say it's not necessary to understand the
>> details of the bug. What's most important is you test the changes.
>> Unfortunately you won't be able to reproduce the issue since the test
>> is closed, but you should at least verify the fix builds and
>> introduces no new issues. I did test the open aarch64 port since I
>> have the hardware, and someone was kind enough to provide me with
>> binaries both without the patch (which reproduced the bug) and with
>> the patch (which passed). No other aarch64 testing was done.
>>
>> Now on to an explanation of the bug and the fix:
>>
>> The test is testing the JDI/JVMTI forceEarlyReturn support. After the
>> debugger side spawns the debugee process and the debugee thread is
>> created, a vm.suspend() is issued to suspend all threads on the
>> debugee side. The debugger side then issues a JDI forceEarlyReturn,
>> which will cause the debugee thread to exit the currently executing
>> method. Although the intent is to force exit of the run() method, the
>> run() method does make some method calls to some very small methods
>> with "inline" in their name, such as inlineMethodReturningObject().
>> If forceEarlyReturn is done while in one of them, JDI throws
>> InvalidTargetException, which the test correctly handles and will
>> pass if that happens.
>>
>> If the run() method is not yet compiled, what *normally* happens to
>> force the thread to suspend is to swap in the interpreter safepoint
>> dispatch table. This will result in a call_VM out to the interpreter
>> to handle the safepointing. When later the thread is resumed (after
>> initiating the forceEarlyReturn), there is code on the return side of
>> call_VM to make sure the earlyret is handled immediately before any
>> other bytecodes are exectuted. Thus the method that was executing at
>> the time of the forceEarlyReturn should be the same as the method
>> when the earlyret is handled. In fact it should be the same
>> bytetcode(bci). Like I said, this is what *normally* happens, but is
>> not the way the bug is exposed.
>>
>> The bug turns up when the run() method is not yet compiled, and when
>> the suspend is attempted we are in one of the small "inline" methods,
>> and it is compiled. In this case thread suspension is achieved in a
>> different manner. When a compiled method executes its return code, it
>> first pops its frame, and then the " poll_return" safepoint polling
>> instruction is executed. This allows the thread to be suspended. We
>> are technically in the run() method at this point since the frame of
>> the "inline"  method has been popped. Therefore the forceEarlyReturn
>> is allowed. When the thread is resumed, the return from the "inline"
>> method finally happens, which puts us in code generated by
>> TemplateInterpreterGenerator::generate_return_entry_for(). However,
>> there are no checks for an earlyret here, so we start executing
>> bytecodes until we finally hit an earlyret check. This happens when
>> the next invokespecial is done. There is a call_VM after the frame is
>> pushed, and the call_VM always does an earlyret check. The problem is
>> now the topmost frame (in the case where it asserts) is
>> inlineMethodReturningObject(). Since it returns an object type, the
>> earlyret code expects there to be an oop in the earlyret slot on the
>> stack. But there isn't because the forceEarlyReturn was done on the
>> void run() method. So the earlyret code fetches garbage, does an
>> validity test on it, and gets the assert you see in the bug title.
>>
>> So what is needed here is an earlyret check after the compiled
>> "inline" method returns, and before any new bytecodes are executed.
>> Since TemplateInterpreterGenerator::generate_return_entry_for() is
>> always executed when the compiled "inline" method returns, the check
>> was added here. For completeness, the popframe check was also added.
>> The header file changes are just to make these methods public so they
>> can be called.
>>
>> I tested the failed test at least 100 times on each supported
>> platform. The assert usually showed up at least 1 in 50 times. I also
>> ran a large set of tests on all supported platforms, including
>> everything we do for nightly testing except for some of the longest
>> running tests.
>>
>> thanks,
>>
>> Chris
>>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Andrew Haley
On 27/02/17 18:43, Chris Plummer wrote:
> I still need one more reviewer for this change.

I'm happy with it.

Andrew.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(10): 8172020: Internal Error (cpu/arm/vm/frame_arm.cpp:571): assert(obj == __null || Universe::heap()->is_in(obj)) failed: sanity check #

Chris Plummer
On 2/27/17 10:48 AM, Andrew Haley wrote:
> On 27/02/17 18:43, Chris Plummer wrote:
>> I still need one more reviewer for this change.
> I'm happy with it.
>
> Andrew.
>
Thanks Andrew!

Loading...