I am observing the following assert:
# Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 # assert(is_frame_safe(f)) failed: Frame must be safe (see issue for full hs_err) In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. Testing: - [x] StackWalk tests with Shenandoah/aggressive - [x] StackWalk tests with ZGC/aggressive - [ ] tier1 (+Shenandoah/ZGC) - [ ] tier2 (+Shenandoah/ZGC) ------------- Commit messages: - 8261448: Preserve GC stack watermark across safepoints in StackWalk Changes: https://git.openjdk.java.net/jdk/pull/2500/files Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00 Issue: https://bugs.openjdk.java.net/browse/JDK-8261448 Stats: 2 lines in 1 file changed: 1 ins; 0 del; 1 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 |
On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke <[hidden email]> wrote:
> I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
> I am observing the following assert:
> > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2500/files - new: https://git.openjdk.java.net/jdk/pull/2500/files/72f20e13..6946499c Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=00-01 Stats: 12 lines in 4 files changed: 10 ins; 0 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
On Wed, 10 Feb 2021 12:38:10 GMT, Roman Kennke <[hidden email]> wrote:
>> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
On Fri, 12 Feb 2021 21:45:57 GMT, Roman Kennke <[hidden email]> wrote:
>> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() Nesting code looks wrong. ------------- Changes requested by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
On Fri, 12 Feb 2021 21:43:20 GMT, Roman Kennke <[hidden email]> wrote:
>> I'm converting back to draft. The Loom tests (test/jdk/java/lang/Continuation/*) are still failing and it looks like fetchFirstBatch() does indeed require treatment, and it's complicated because fetchFirstBatch() may end up calling fetchNextBatch() and the KeepStackGCProcessedMark is not reentrant. > > I tested the original patch in Loom with tests that use stack-walking and it failed because we'd need another KeepStackGCProcessedMark in fetchFirstBatch() too. Unfortunately, fetchFirstBatch() can wind up calling fetchNextBatch() recursively, but we *also* can call fetchNextBatch() without calling fetchFirstBatch() on outer frame, thus we need KeepStackGCProcessedMark to be reentrant. I achieved this by linking together nested linked watermark. I am not sure this is the right way to achieve it. It fixes all tests in Loom *and* mainline JDK though. I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread. Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Erik Österlund-3
On Fri, 12 Feb 2021 23:14:47 GMT, Erik Österlund <[hidden email]> wrote:
>> Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: >> >> Make KeepStackGCProcessedMark reentrant; Place a KeepStackGCProcessedMark in StackWalker::fetchFirstBatch() > > Nesting code looks wrong. I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch: https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72 Maybe you could build a few more test based on this? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
> I am observing the following assert:
> > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [ ] tier1 (+Shenandoah/ZGC) > - [ ] tier2 (+Shenandoah/ZGC) Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: Make KeepStackGCProcessedMark non-reentrant again ------------- Changes: - all: https://git.openjdk.java.net/jdk/pull/2500/files - new: https://git.openjdk.java.net/jdk/pull/2500/files/6946499c..345f78b4 Webrevs: - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=02 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=2500&range=01-02 Stats: 11 lines in 3 files changed: 0 ins; 9 del; 2 mod Patch: https://git.openjdk.java.net/jdk/pull/2500.diff Fetch: git fetch https://git.openjdk.java.net/jdk pull/2500/head:pull/2500 PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Stefan Karlsson-3
On Mon, 15 Feb 2021 09:26:03 GMT, Stefan Karlsson <[hidden email]> wrote:
>> Nesting code looks wrong. > > I incorrectly read Erik's comment as "Nesting code looks **good**", so I created a unit test to show the problem with the patch: > https://github.com/stefank/jdk/commit/8760f1b0409b3cccf76a8ea417b90e66da31af72 > > Maybe you could build a few more test based on this? > I think this solution is wrong, regarding nesting. There is only a single node but it looks like you think there are multiple. The result is seemingly that the unlink function won't unlink anything, which permanently disables incremental stack scanning on that thread. > Is there any way the mark can be placed closer to the problematic allocation so we don't need nesting? I just realized that the reentrancy comes from the Java call lower in fetchFirstBatch(). The problem can be easily avoided by putting the KeepStackGCProcessedMark in sensible scope that excludes the call. ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke <[hidden email]> wrote:
>> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark non-reentrant again Looks good. ------------- Marked as reviewed by stefank (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 |
On Mon, 22 Feb 2021 08:26:19 GMT, Stefan Karlsson <[hidden email]> wrote:
> Looks good. Thanks, Stefan! @fisk also good? ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
On Mon, 15 Feb 2021 15:20:58 GMT, Roman Kennke <[hidden email]> wrote:
>> I am observing the following assert: >> >> # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 >> # assert(is_frame_safe(f)) failed: Frame must be safe >> >> (see issue for full hs_err) >> >> In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. >> >> This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. >> >> Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. >> >> Testing: >> - [x] StackWalk tests with Shenandoah/aggressive >> - [x] StackWalk tests with ZGC/aggressive >> - [ ] tier1 (+Shenandoah/ZGC) >> - [ ] tier2 (+Shenandoah/ZGC) > > Roman Kennke has updated the pull request incrementally with one additional commit since the last revision: > > Make KeepStackGCProcessedMark non-reentrant again Also good! ------------- Marked as reviewed by eosterlund (Reviewer). PR: https://git.openjdk.java.net/jdk/pull/2500 |
In reply to this post by Roman Kennke
On Wed, 10 Feb 2021 10:07:20 GMT, Roman Kennke <[hidden email]> wrote:
> I am observing the following assert: > > # Internal Error (/home/rkennke/src/openjdk/loom/src/hotspot/share/runtime/stackWatermark.cpp:178), pid=54418, tid=54534 > # assert(is_frame_safe(f)) failed: Frame must be safe > > (see issue for full hs_err) > > In StackWalk::fetchNextBatch() we prepare the entire stack to be processed by calling StackWatermarkSet::finish_processing(jt, NULL, StackWatermarkKind::gc), but then subsequently, during frames scan, perform allocations to fill in the frame information (fill_in_frames => LiveFrameStream::fill_frame => fill_live_stackframe) at where we could safepoint for GC, which could reset the stack watermark. > > This is only relevant for GCs that use the StackWatermark, e.g. ZGC and Shenandoah at the moment. > > Solution is to preserve the stack-watermark across safepoints in StackWalk::fetchNextBatch(). StackWalk::fetchFirstBatch() doesn't look to be affected by this: it is not using the stack-watermark. > > Testing: > - [x] StackWalk tests with Shenandoah/aggressive > - [x] StackWalk tests with ZGC/aggressive > - [x] tier1 (+Shenandoah/ZGC) > - [x] tier2 (+Shenandoah/ZGC) This pull request has now been integrated. Changeset: c20fb5db Author: Roman Kennke <[hidden email]> URL: https://git.openjdk.java.net/jdk/commit/c20fb5db Stats: 3 lines in 1 file changed: 2 ins; 0 del; 1 mod 8261448: Preserve GC stack watermark across safepoints in StackWalk Reviewed-by: eosterlund, stefank ------------- PR: https://git.openjdk.java.net/jdk/pull/2500 |
Free forum by Nabble | Edit this page |