RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Thomas Stüfe-2
Hi all,

may I please have a review for this small fix.

Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
8185706-Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/

This can be seen as an addon to https://bugs.openjdk.java.
net/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
problem. On windows x64, the native compiler generates code which does not
use the frame pointer (regardless whether we set -Oy-). Only in rare cases
a frame pointer is used - e.g. for alloca()-functions - and, as Ioi pointed
out, no guarantee either that RBP is actually the frame pointer.

So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
platform_print_native_stack
<http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
we walk the stack using StackWalk64(), extract the pc from each frame and
print that, like normal windows coding. However, we still test for the
frame pointer being NULL, and abort stack tracing if it is. This causes
stack dumping to fail quite often, and unnecessarily.

For example, test: java.exe -XX:ErrorHandlerTest=12

Sometimes it works, but more out of accident - as Ioi pointed out in this
mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/
2013-August/009063.html. If there are java frames above the crashing native
frame, we still may have RBP set to some value (does not matter which) and
os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
platform_print_native_stack
<http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
does not abort frame printing.

Kind Regards, Thomas
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Ioi Lam
Hi Thomas,

Thanks for the patch!

Skipping the test for SP != NULL and FP != NULL seems generally OK for
me. I think StackWalk64 should be robust enough that when given NULL or
bogus values for stk.AddrStack.Offset and stk.AddrFrame.Offset, it will
still somehow recover gracefully. I forgot exactly why I put in these
checks, though. I either was overly cautious, or I might have seen some
problems without such checks, which might have caused crashes inside the
debug printing routine. I really should have put in a comment there :-(

By being generous to myself :-), I guess I would have put in an comment
had I saw crash, so the lack of comments probably meant I was just over
cautious ....

How much testing have you done with your patch. Have you seen any crash
inside the printing routine?

Also, by "Native callstacks unreliable", do you mean "Native callstacks
printing terminates prematurely", and not "sometimes they fail and print
erroneous information or behave unexpectedly"? I think it's better to
update the bug title.

If you need a sponsor, I'll be happy to do it.

Thanks
- Ioi



On 8/2/17 2:17 AM, Thomas Stüfe wrote:

> Hi all,
>
> may I please have a review for this small fix.
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
> 8185706-Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>
> This can be seen as an addon to https://bugs.openjdk.java.
> net/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
> problem. On windows x64, the native compiler generates code which does not
> use the frame pointer (regardless whether we set -Oy-). Only in rare cases
> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi pointed
> out, no guarantee either that RBP is actually the frame pointer.
>
> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> we walk the stack using StackWalk64(), extract the pc from each frame and
> print that, like normal windows coding. However, we still test for the
> frame pointer being NULL, and abort stack tracing if it is. This causes
> stack dumping to fail quite often, and unnecessarily.
>
> For example, test: java.exe -XX:ErrorHandlerTest=12
>
> Sometimes it works, but more out of accident - as Ioi pointed out in this
> mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/
> 2013-August/009063.html. If there are java frames above the crashing native
> frame, we still may have RBP set to some value (does not matter which) and
> os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> does not abort frame printing.
>
> Kind Regards, Thomas

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Thomas Stüfe-2
Hi Ioi,

On Mon, Aug 7, 2017 at 7:39 PM, Ioi Lam <[hidden email]> wrote:

> Hi Thomas,
>
> Thanks for the patch!
>
> Skipping the test for SP != NULL and FP != NULL seems generally OK for me.
> I think StackWalk64 should be robust enough that when given NULL or bogus
> values for stk.AddrStack.Offset and stk.AddrFrame.Offset, it will still
> somehow recover gracefully. I forgot exactly why I put in these checks,
> though. I either was overly cautious, or I might have seen some problems
> without such checks, which might have caused crashes inside the debug
> printing routine. I really should have put in a comment there :-(
>
> By being generous to myself :-), I guess I would have put in an comment
> had I saw crash, so the lack of comments probably meant I was just over
> cautious ....
>
> How much testing have you done with your patch.


Pretty much only the error scenario (java -XX:+ErrorHandlingTest=xx) and
the gtests, both on Win x64.


> Have you seen any crash inside the printing routine?
>

None I would attribute to my change. I know there is a very slight risk of
crashing more often now, just based on the fact that we now continue stack
dumping where we skipped before, and because StackWalk64 is a black box.
But this is error handling, we deal with secondary crashes anyway and I
think I rather have more complete callstacks in the hs-err file and risk a
secondary crash instead of useless error reports.

Note that callstack dumping and symbol resolution is pretty unreliable and
unstable on windows anyway. See
https://bugs.openjdk.java.net/browse/JDK-8185712, I am currently working on
bringing improvements upstream we have in our fork. Our error handling is
more reliable than stock openjdk.


>
> Also, by "Native callstacks unreliable", do you mean "Native callstacks
> printing terminates prematurely", and not "sometimes they fail and print
> erroneous information or behave unexpectedly"? I think it's better to
> update the bug title.
>
>
Sure thats a better name :) I changed it.


> If you need a sponsor, I'll be happy to do it.
>
>
Thanks!

Now for a second reviewer? Anyone?


> Thanks
> - Ioi
>
>
..Thomas


>
>
> On 8/2/17 2:17 AM, Thomas Stüfe wrote:
>
>> Hi all,
>>
>> may I please have a review for this small fix.
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/
>> 8185706-Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>>
>> This can be seen as an addon to https://bugs.openjdk.java.
>> net/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
>> problem. On windows x64, the native compiler generates code which does not
>> use the frame pointer (regardless whether we set -Oy-). Only in rare cases
>> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi
>> pointed
>> out, no guarantee either that RBP is actually the frame pointer.
>>
>> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>> platform_print_native_stack
>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>> k&project=integ-hotspot-X>()
>> we walk the stack using StackWalk64(), extract the pc from each frame and
>> print that, like normal windows coding. However, we still test for the
>> frame pointer being NULL, and abort stack tracing if it is. This causes
>> stack dumping to fail quite often, and unnecessarily.
>>
>> For example, test: java.exe -XX:ErrorHandlerTest=12
>>
>> Sometimes it works, but more out of accident - as Ioi pointed out in this
>> mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/
>> 2013-August/009063.html. If there are java frames above the crashing
>> native
>> frame, we still may have RBP set to some value (does not matter which) and
>> os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>> platform_print_native_stack
>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>> k&project=integ-hotspot-X>()
>> does not abort frame printing.
>>
>> Kind Regards, Thomas
>>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Thomas Stüfe-2
In reply to this post by Thomas Stüfe-2
Ping... May I please have a second review?

Thank you!

On Wed, Aug 2, 2017 at 11:17 AM, Thomas Stüfe <[hidden email]>
wrote:

> Hi all,
>
> may I please have a review for this small fix.
>
> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8185706-
> Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>
> This can be seen as an addon to https://bugs.openjdk.java.n
> et/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
> problem. On windows x64, the native compiler generates code which does not
> use the frame pointer (regardless whether we set -Oy-). Only in rare cases
> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi pointed
> out, no guarantee either that RBP is actually the frame pointer.
>
> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> we walk the stack using StackWalk64(), extract the pc from each frame and
> print that, like normal windows coding. However, we still test for the
> frame pointer being NULL, and abort stack tracing if it is. This causes
> stack dumping to fail quite often, and unnecessarily.
>
> For example, test: java.exe -XX:ErrorHandlerTest=12
>
> Sometimes it works, but more out of accident - as Ioi pointed out in this
> mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2
> 013-August/009063.html. If there are java frames above the crashing
> native frame, we still may have RBP set to some value (does not matter
> which) and os
> <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
> platform_print_native_stack
> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
> does not abort frame printing.
>
> Kind Regards, Thomas
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Zhengyu Gu-2
Look good to me.

-Zhengyu

On 08/10/2017 08:35 AM, Thomas Stüfe wrote:

> Ping... May I please have a second review?
>
> Thank you!
>
> On Wed, Aug 2, 2017 at 11:17 AM, Thomas Stüfe <[hidden email]>
> wrote:
>
>> Hi all,
>>
>> may I please have a review for this small fix.
>>
>> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8185706-
>> Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>>
>> This can be seen as an addon to https://bugs.openjdk.java.n
>> et/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
>> problem. On windows x64, the native compiler generates code which does not
>> use the frame pointer (regardless whether we set -Oy-). Only in rare cases
>> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi pointed
>> out, no guarantee either that RBP is actually the frame pointer.
>>
>> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>> platform_print_native_stack
>> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
>> we walk the stack using StackWalk64(), extract the pc from each frame and
>> print that, like normal windows coding. However, we still test for the
>> frame pointer being NULL, and abort stack tracing if it is. This causes
>> stack dumping to fail quite often, and unnecessarily.
>>
>> For example, test: java.exe -XX:ErrorHandlerTest=12
>>
>> Sometimes it works, but more out of accident - as Ioi pointed out in this
>> mail thread: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2
>> 013-August/009063.html. If there are java frames above the crashing
>> native frame, we still may have RBP set to some value (does not matter
>> which) and os
>> <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>> platform_print_native_stack
>> <http://ld8443:8080/source/s?defs=platform_print_native_stack&project=integ-hotspot-X>()
>> does not abort frame printing.
>>
>> Kind Regards, Thomas
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Thomas Stüfe-2
Thank you, Zhengyu!

On Thu, Aug 10, 2017 at 3:09 PM, Zhengyu Gu <[hidden email]> wrote:

> Look good to me.
>
> -Zhengyu
>
> On 08/10/2017 08:35 AM, Thomas Stüfe wrote:
>
>> Ping... May I please have a second review?
>>
>> Thank you!
>>
>> On Wed, Aug 2, 2017 at 11:17 AM, Thomas Stüfe <[hidden email]>
>> wrote:
>>
>> Hi all,
>>>
>>> may I please have a review for this small fix.
>>>
>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
>>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8185706-
>>> Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>>>
>>> This can be seen as an addon to https://bugs.openjdk.java.n
>>> et/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
>>> problem. On windows x64, the native compiler generates code which does
>>> not
>>> use the frame pointer (regardless whether we set -Oy-). Only in rare
>>> cases
>>> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi
>>> pointed
>>> out, no guarantee either that RBP is actually the frame pointer.
>>>
>>> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X
>>> >::
>>> platform_print_native_stack
>>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>>> k&project=integ-hotspot-X>()
>>> we walk the stack using StackWalk64(), extract the pc from each frame and
>>> print that, like normal windows coding. However, we still test for the
>>> frame pointer being NULL, and abort stack tracing if it is. This causes
>>> stack dumping to fail quite often, and unnecessarily.
>>>
>>> For example, test: java.exe -XX:ErrorHandlerTest=12
>>>
>>> Sometimes it works, but more out of accident - as Ioi pointed out in this
>>> mail thread: http://mail.openjdk.java.net/p
>>> ipermail/hotspot-runtime-dev/2
>>> 013-August/009063.html. If there are java frames above the crashing
>>> native frame, we still may have RBP set to some value (does not matter
>>> which) and os
>>> <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>>> platform_print_native_stack
>>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>>> k&project=integ-hotspot-X>()
>>> does not abort frame printing.
>>>
>>> Kind Regards, Thomas
>>>
>>>
>>>
>>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: RFR(xxs): 8185706: Native callstacks unreliable under Windows x64

Ioi Lam
Hi Thomas, I will sponsor the changes. Thanks for the contribution!

- Ioi


On 8/10/17 6:16 AM, Thomas Stüfe wrote:

> Thank you, Zhengyu!
>
> On Thu, Aug 10, 2017 at 3:09 PM, Zhengyu Gu <[hidden email]> wrote:
>
>> Look good to me.
>>
>> -Zhengyu
>>
>> On 08/10/2017 08:35 AM, Thomas Stüfe wrote:
>>
>>> Ping... May I please have a second review?
>>>
>>> Thank you!
>>>
>>> On Wed, Aug 2, 2017 at 11:17 AM, Thomas Stüfe <[hidden email]>
>>> wrote:
>>>
>>> Hi all,
>>>> may I please have a review for this small fix.
>>>>
>>>> Issue: https://bugs.openjdk.java.net/browse/JDK-8185706
>>>> Webrev: http://cr.openjdk.java.net/~stuefe/webrevs/8185706-
>>>> Native-callstacks-unreliable-under-Windows-x64/webrev.00/webrev/
>>>>
>>>> This can be seen as an addon to https://bugs.openjdk.java.n
>>>> et/browse/JDK-8022335. Ioi Lam did a good job analyzing the original
>>>> problem. On windows x64, the native compiler generates code which does
>>>> not
>>>> use the frame pointer (regardless whether we set -Oy-). Only in rare
>>>> cases
>>>> a frame pointer is used - e.g. for alloca()-functions - and, as Ioi
>>>> pointed
>>>> out, no guarantee either that RBP is actually the frame pointer.
>>>>
>>>> So, in os <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X
>>>>> ::
>>>> platform_print_native_stack
>>>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>>>> k&project=integ-hotspot-X>()
>>>> we walk the stack using StackWalk64(), extract the pc from each frame and
>>>> print that, like normal windows coding. However, we still test for the
>>>> frame pointer being NULL, and abort stack tracing if it is. This causes
>>>> stack dumping to fail quite often, and unnecessarily.
>>>>
>>>> For example, test: java.exe -XX:ErrorHandlerTest=12
>>>>
>>>> Sometimes it works, but more out of accident - as Ioi pointed out in this
>>>> mail thread: http://mail.openjdk.java.net/p
>>>> ipermail/hotspot-runtime-dev/2
>>>> 013-August/009063.html. If there are java frames above the crashing
>>>> native frame, we still may have RBP set to some value (does not matter
>>>> which) and os
>>>> <http://ld8443:8080/source/s?defs=os&project=integ-hotspot-X>::
>>>> platform_print_native_stack
>>>> <http://ld8443:8080/source/s?defs=platform_print_native_stac
>>>> k&project=integ-hotspot-X>()
>>>> does not abort frame printing.
>>>>
>>>> Kind Regards, Thomas
>>>>
>>>>
>>>>
>>>>

Loading...