RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk

Roman Kennke
I am observing the following assert:

# Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
# assert(is_frame_safe(f)) failed: Frame must be safe

(see issue for full hs_err)

In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.

This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.

Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.

Testing:
 - [x] StackWalk tests with Shenandoah/aggressive
 - [x] StackWalk tests with ZGC/aggressive
 - [ ] tier1 (+Shenandoah/ZGC)
 - [ ] tier2 (+Shenandoah/ZGC)

-------------

Commit messages:
 - 8261448: Preserve GC stack watermark across safepoints in StackWalk

Changes: https://git.openjdk.java.net/jdk/pull/2500/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8261448
  Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2500.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk

Roman Kennke
On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke <[hidden email]> wrote:

> I am observing the following assert:
>
> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
> # assert(is_frame_safe(f)) failed: Frame must be safe
>
> (see issue for full hs_err)
>
> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>
> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>
> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>
> Testing:
>  - [x] StackWalk tests with Shenandoah/aggressive
>  - [x] StackWalk tests with ZGC/aggressive
>  - [ ] tier1 (+Shenandoah/ZGC)
>  - [ ] tier2 (+Shenandoah/ZGC)

I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2]

Roman Kennke
In reply to this post by Roman Kennke
> I am observing the following assert:
>
> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
> # assert(is_frame_safe(f)) failed: Frame must be safe
>
> (see issue for full hs_err)
>
> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>
> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>
> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>
> Testing:
>  - [x] StackWalk tests with Shenandoah/aggressive
>  - [x] StackWalk tests with ZGC/aggressive
>  - [ ] tier1 (+Shenandoah/ZGC)
>  - [ ] tier2 (+Shenandoah/ZGC)

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch()

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2500/files
  - new: https://git.openjdk.java.net/jdk/pull/2500/files/72f20e13..6946499c

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00-01

  Stats: 12 lines in 4 files changed: 10 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2500.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk

Roman Kennke
In reply to this post by Roman Kennke
On Wed, 10 Feb 2021 12:38:10 GMT, Roman Kennke <[hidden email]> wrote:

>> I am observing the following assert:
>>
>> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
>> # assert(is_frame_safe(f)) failed: Frame must be safe
>>
>> (see issue for full hs_err)
>>
>> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>>
>> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>>
>> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>>
>> Testing:
>>  - [x] StackWalk tests with Shenandoah/aggressive
>>  - [x] StackWalk tests with ZGC/aggressive
>>  - [ ] tier1 (+Shenandoah/ZGC)
>>  - [ ] tier2 (+Shenandoah/ZGC)
>
> I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant.

I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2]

Erik Österlund-3
In reply to this post by Roman Kennke
On Fri, 12 Feb 2021 21:45:57 GMT, Roman Kennke <[hidden email]> wrote:

>> I am observing the following assert:
>>
>> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
>> # assert(is_frame_safe(f)) failed: Frame must be safe
>>
>> (see issue for full hs_err)
>>
>> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>>
>> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>>
>> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>>
>> Testing:
>>  - [x] StackWalk tests with Shenandoah/aggressive
>>  - [x] StackWalk tests with ZGC/aggressive
>>  - [ ] tier1 (+Shenandoah/ZGC)
>>  - [ ] tier2 (+Shenandoah/ZGC)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>
>   Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch()

Nesting code looks wrong.

-------------

Changes requested by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk

Erik Österlund-3
In reply to this post by Roman Kennke
On Fri, 12 Feb 2021 21:43:20 GMT, Roman Kennke <[hidden email]> wrote:

>> I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant.
>
> I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though.

I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread.
Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2]

Stefan Karlsson-3
In reply to this post by Erik Österlund-3
On Fri, 12 Feb 2021 23:14:47 GMT, Erik Österlund <[hidden email]> wrote:

>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>>
>>   Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch()
>
> Nesting code looks wrong.

I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch:
https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72

Maybe you could build a few more test based on this?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3]

Roman Kennke
In reply to this post by Roman Kennke
> I am observing the following assert:
>
> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
> # assert(is_frame_safe(f)) failed: Frame must be safe
>
> (see issue for full hs_err)
>
> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>
> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>
> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>
> Testing:
>  - [x] StackWalk tests with Shenandoah/aggressive
>  - [x] StackWalk tests with ZGC/aggressive
>  - [ ] tier1 (+Shenandoah/ZGC)
>  - [ ] tier2 (+Shenandoah/ZGC)

Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:

  Make KeepStackGCProcessedMark non-reentrant again

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/2500/files
  - new: https://git.openjdk.java.net/jdk/pull/2500/files/6946499c..345f78b4

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01-02

  Stats: 11 lines in 3 files changed: 0 ins; 9 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/2500.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v2]

Roman Kennke
In reply to this post by Stefan Karlsson-3
On Mon, 15 Feb 2021 09:26:03 GMT, Stefan Karlsson <[hidden email]> wrote:

>> Nesting code looks wrong.
>
> I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch:
> https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72
>
> Maybe you could build a few more test based on this?

> I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread.
> Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting?

I just realized that the reentrancy comes from the Java call lower in fetchFirstBatch(). The problem can be easily avoided by putting the KeepStackGCProcessedMark in sensible scope that excludes the call.

-------------

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3]

Stefan Karlsson-3
In reply to this post by Roman Kennke
On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke <[hidden email]> wrote:

>> I am observing the following assert:
>>
>> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
>> # assert(is_frame_safe(f)) failed: Frame must be safe
>>
>> (see issue for full hs_err)
>>
>> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>>
>> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>>
>> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>>
>> Testing:
>>  - [x] StackWalk tests with Shenandoah/aggressive
>>  - [x] StackWalk tests with ZGC/aggressive
>>  - [ ] tier1 (+Shenandoah/ZGC)
>>  - [ ] tier2 (+Shenandoah/ZGC)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>
>   Make KeepStackGCProcessedMark non-reentrant again

Looks good.

-------------

Marked as reviewed by stefank (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3]

Roman Kennke
On Mon, 22 Feb 2021 08:26:19 GMT, Stefan Karlsson <[hidden email]> wrote:

> Looks good.

Thanks, Stefan!

@fisk also good?

-------------

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8261448: Preserve GC stack watermark across safepoints in StackWalk [v3]

Erik Österlund-3
In reply to this post by Roman Kennke
On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke <[hidden email]> wrote:

>> I am observing the following assert:
>>
>> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
>> # assert(is_frame_safe(f)) failed: Frame must be safe
>>
>> (see issue for full hs_err)
>>
>> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>>
>> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>>
>> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>>
>> Testing:
>>  - [x] StackWalk tests with Shenandoah/aggressive
>>  - [x] StackWalk tests with ZGC/aggressive
>>  - [ ] tier1 (+Shenandoah/ZGC)
>>  - [ ] tier2 (+Shenandoah/ZGC)
>
> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision:
>
>   Make KeepStackGCProcessedMark non-reentrant again

Also good!

-------------

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/2500
Reply | Threaded
Open this post in threaded view
|

Integrated: 8261448: Preserve GC stack watermark across safepoints in StackWalk

Roman Kennke
In reply to this post by Roman Kennke
On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke <[hidden email]> wrote:

> I am observing the following assert:
>
> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534
> # assert(is_frame_safe(f)) failed: Frame must be safe
>
> (see issue for full hs_err)
>
> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark.
>
> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment.
>
> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark.
>
> Testing:
>  - [x] StackWalk tests with Shenandoah/aggressive
>  - [x] StackWalk tests with ZGC/aggressive
>  - [x] tier1 (+Shenandoah/ZGC)
>  - [x] tier2 (+Shenandoah/ZGC)

This pull request has now been integrated.

Changeset: c20fb5db
Author:    Roman Kennke <[hidden email]>
URL:       https://git.openjdk.java.net/jdk/commit/c20fb5db
Stats:     3 lines in 1 file changed: 2 ins; 0 del; 1 mod

8261448: Preserve GC stack watermark across safepoints in StackWalk

Reviewed-by: eosterlund, stefank

-------------

PR: https://git.openjdk.java.net/jdk/pull/2500