RFR: 8264123: add ThreadsList.is_valid() support

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
ThreadsLists need an is_valid() function and checks in various
places to help catch bugs where a ThreadsList is dangling.

Other minor changes:
- change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
- `get_threads_hazard_ptr()` should be `const`.
- fix a couple of old typos.

-------------

Commit messages:
 - 8264123.00.patch

Changes: https://git.openjdk.java.net/jdk/pull/3255/files
 Webrev: https://webrevs.openjdk.java.net/?repo=jdk&pr=3255&range=00
  Issue: https://bugs.openjdk.java.net/browse/JDK-8264123
  Stats: 60 lines in 4 files changed: 48 ins; 0 del; 12 mod
  Patch: https://git.openjdk.java.net/jdk/pull/3255.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/3255/head:pull/3255

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

David Holmes-2
On Mon, 29 Mar 2021 21:48:23 GMT, Daniel D. Daugherty <[hidden email]> wrote:

> ThreadsLists need an is_valid() function and checks in various
> places to help catch bugs where a ThreadsList is dangling.
>
> Other minor changes:
> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
> - `get_threads_hazard_ptr()` should be `const`.
> - fix a couple of old typos.

Hi Dan,

I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.

Any why is this all product code rather than debug?

Thanks,
David

src/hotspot/share/runtime/threadSMR.cpp line 285:

> 283:     }
> 284:
> 285:     guarantee(ThreadsList::is_valid(current_list), "current_list="

Why a guarantee and not an assert? What is the cost of doing this check?

src/hotspot/share/runtime/threadSMR.cpp line 319:

> 317:       // We only validate hazard_ptrs that are not tagged since a tagged
> 318:       // hazard ptr can be deleted at any time.
> 319:       guarantee(ThreadsList::is_valid(hazard_ptr), "hazard_ptr=" INTPTR_FORMAT

Same query about guarantee versus assert

src/hotspot/share/runtime/threadSMR.cpp line 383:

> 381:     if (hazard_ptr == NULL) return;
> 382:     // If the hazard ptr is unverified, then ignore it since it could
> 383:     // be deleted at any time now.

Why the difference in comment compared to line 317/318 when the subsequent checks are identical?

src/hotspot/share/runtime/threadSMR.hpp line 209:

> 207:   bool includes(const JavaThread * const p) const;
> 208:
> 209:   static bool is_valid(ThreadsList* list) { return list->_magic == THREADS_LIST_MAGIC; }

This seems like a necessary, but not sufficient, check for validity. Unless we zap memory when these things are freed I don't see how this ensures validity.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
On Tue, 30 Mar 2021 02:16:30 GMT, David Holmes <[hidden email]> wrote:

>> ThreadsLists need an is_valid() function and checks in various
>> places to help catch bugs where a ThreadsList is dangling.
>>
>> Other minor changes:
>> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
>> - `get_threads_hazard_ptr()` should be `const`.
>> - fix a couple of old typos.
>
> Hi Dan,
>
> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.
>
> Any why is this all product code rather than debug?
>
> Thanks,
> David

@dholmes-ora - Thanks for the review!

> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.

Please see my reply about the ThreadsList destructor code that you missed.

> Any why is this all product code rather than debug?

Please see my reply about guarantee() versus assert().

> src/hotspot/share/runtime/threadSMR.cpp line 285:
>
>> 283:     }
>> 284:
>> 285:     guarantee(ThreadsList::is_valid(current_list), "current_list="
>
> Why a guarantee and not an assert? What is the cost of doing this check?

This is a guarantee() instead of an assert() because this failure mode
is rarely caught with 'release' bits and very, very rarely caught with
'fastdebug' or 'slowdebug' bits. It's a very tight race.

My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.

I don't have data on what this correctness check costs, but it's just
a value check on the new `_magic` field value. Very much like other
"magic" field value checks that we have all over the VM...

> src/hotspot/share/runtime/threadSMR.cpp line 319:
>
>> 317:       // We only validate hazard_ptrs that are not tagged since a tagged
>> 318:       // hazard ptr can be deleted at any time.
>> 319:       guarantee(ThreadsList::is_valid(hazard_ptr), "hazard_ptr=" INTPTR_FORMAT
>
> Same query about guarantee versus assert

Same answer and to reiterate the important part:

My plan is to keep this as a guarantee() for April and switch it to an
assert in mid to late May.

> src/hotspot/share/runtime/threadSMR.cpp line 383:
>
>> 381:     if (hazard_ptr == NULL) return;
>> 382:     // If the hazard ptr is unverified, then ignore it since it could
>> 383:     // be deleted at any time now.
>
> Why the difference in comment compared to line 317/318 when the subsequent checks are identical?

The `ValidateHazardPtrsClosure` is called from places that will use
the collected hazard ptr for traversal. Deletion while being traversed
would lead to the crashes that we want to avoid.

The `ScanHazardPtrGatherThreadsListClosure` is used to gather
hazard ptrs where the only the address value of the hazard ptr is
used and no traversal takes place. This is why I added the comment
on L300-302 above.

> src/hotspot/share/runtime/threadSMR.hpp line 209:
>
>> 207:   bool includes(const JavaThread * const p) const;
>> 208:
>> 209:   static bool is_valid(ThreadsList* list) { return list->_magic == THREADS_LIST_MAGIC; }
>
> This seems like a necessary, but not sufficient, check for validity. Unless we zap memory when these things are freed I don't see how this ensures validity.

You missed L652 in the `ThreadsList::~ThreadsList()` above:

 _magic = 0xDEADBEEF;

When the destructor runs, the magic value is stomped. I've captured
failures where the is_valid() check observes 0xDEADBEEF and also
failures where the ThreadsList has been recycled into something else
and the `_magic` value is some other value that doesn't match
`THREADS_LIST_MAGIC`.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
On Tue, 30 Mar 2021 16:29:48 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> Hi Dan,
>>
>> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.
>>
>> Any why is this all product code rather than debug?
>>
>> Thanks,
>> David
>
> @dholmes-ora - Thanks for the review!
>
>> I'm not quite seeing how this single-field checks constitutes a useful validity check - sorry.
>
> Please see my reply about the ThreadsList destructor code that you missed.
>
>> Any why is this all product code rather than debug?
>
> Please see my reply about guarantee() versus assert().

https://github.com/openjdk/jdk/pull/3272 is now a dependent PR on this fix (woot!) so now
you can see the race that I was chasing with this validation code! There's a long analysis
note attached to "JDK-8264393 JDK-8258284 introduced dangling TLH race".

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

David Holmes-2
In reply to this post by Daniel D.Daugherty
On Mon, 29 Mar 2021 21:48:23 GMT, Daniel D. Daugherty <[hidden email]> wrote:

> ThreadsLists need an is_valid() function and checks in various
> places to help catch bugs where a ThreadsList is dangling.
>
> Other minor changes:
> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
> - `get_threads_hazard_ptr()` should be `const`.
> - fix a couple of old typos.
>
> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].

Looks good.

Thanks,
David

-------------

Marked as reviewed by dholmes (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

David Holmes-2
In reply to this post by Daniel D.Daugherty
On Tue, 30 Mar 2021 16:17:27 GMT, Daniel D. Daugherty <[hidden email]> wrote:

>> src/hotspot/share/runtime/threadSMR.cpp line 285:
>>
>>> 283:     }
>>> 284:
>>> 285:     guarantee(ThreadsList::is_valid(current_list), "current_list="
>>
>> Why a guarantee and not an assert? What is the cost of doing this check?
>
> This is a guarantee() instead of an assert() because this failure mode
> is rarely caught with 'release' bits and very, very rarely caught with
> 'fastdebug' or 'slowdebug' bits. It's a very tight race.
>
> My plan is to keep this as a guarantee() for April and switch it to an
> assert in mid to late May.
>
> I don't have data on what this correctness check costs, but it's just
> a value check on the new `_magic` field value. Very much like other
> "magic" field value checks that we have all over the VM...

I should have edited this comment once I saw what the actual check was.

Keeping this as a guarantee initially then switching to assert seems a good strategy.

>> src/hotspot/share/runtime/threadSMR.hpp line 209:
>>
>>> 207:   bool includes(const JavaThread * const p) const;
>>> 208:
>>> 209:   static bool is_valid(ThreadsList* list) { return list->_magic == THREADS_LIST_MAGIC; }
>>
>> This seems like a necessary, but not sufficient, check for validity. Unless we zap memory when these things are freed I don't see how this ensures validity.
>
> You missed L652 in the `ThreadsList::~ThreadsList()` above:
>
>  _magic = 0xDEADBEEF;
>
> When the destructor runs, the magic value is stomped. I've captured
> failures where the is_valid() check observes 0xDEADBEEF and also
> failures where the ThreadsList has been recycled into something else
> and the `_magic` value is some other value that doesn't match
> `THREADS_LIST_MAGIC`.

Sorry yes I did miss it - I misread the diff due to hidden lines.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
In reply to this post by David Holmes-2
On Tue, 30 Mar 2021 22:54:55 GMT, David Holmes <[hidden email]> wrote:

>> ThreadsLists need an is_valid() function and checks in various
>> places to help catch bugs where a ThreadsList is dangling.
>>
>> Other minor changes:
>> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
>> - `get_threads_hazard_ptr()` should be `const`.
>> - fix a couple of old typos.
>>
>> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].
>
> Looks good.
>
> Thanks,
> David

@dholmes-ora - Thanks for closing the loop on your initial review.
@robehn and/or @fisk - I could use a second pair of eyes on this one...

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Erik Österlund-3
In reply to this post by Daniel D.Daugherty
On Mon, 29 Mar 2021 21:48:23 GMT, Daniel D. Daugherty <[hidden email]> wrote:

> ThreadsLists need an is_valid() function and checks in various
> places to help catch bugs where a ThreadsList is dangling.
>
> Other minor changes:
> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
> - `get_threads_hazard_ptr()` should be `const`.
> - fix a couple of old typos.
>
> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].

Looks good to me!

-------------

Marked as reviewed by eosterlund (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
On Wed, 31 Mar 2021 14:22:46 GMT, Erik Österlund <[hidden email]> wrote:

>> ThreadsLists need an is_valid() function and checks in various
>> places to help catch bugs where a ThreadsList is dangling.
>>
>> Other minor changes:
>> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
>> - `get_threads_hazard_ptr()` should be `const`.
>> - fix a couple of old typos.
>>
>> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].
>
> Looks good to me!

@fisk - Thanks for the review!

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Robbin Ehn-2
In reply to this post by Daniel D.Daugherty
On Mon, 29 Mar 2021 21:48:23 GMT, Daniel D. Daugherty <[hidden email]> wrote:

> ThreadsLists need an is_valid() function and checks in various
> places to help catch bugs where a ThreadsList is dangling.
>
> Other minor changes:
> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
> - `get_threads_hazard_ptr()` should be `const`.
> - fix a couple of old typos.
>
> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].

Hi Dan,

I have seen some other verification code which also might be performance impacting, can we got over and measure what the cost is for different verification when you gotten your bake time here? So we can take an informed decision based on what the cost vs debuggability in product is.

Looks good, thanks.

-------------

Marked as reviewed by rehn (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
On Thu, 1 Apr 2021 09:23:48 GMT, Robbin Ehn <[hidden email]> wrote:

>> ThreadsLists need an is_valid() function and checks in various
>> places to help catch bugs where a ThreadsList is dangling.
>>
>> Other minor changes:
>> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
>> - `get_threads_hazard_ptr()` should be `const`.
>> - fix a couple of old typos.
>>
>> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].
>
> Hi Dan,
>
> I have seen some other verification code which also might be performance impacting, can we got over and measure what the cost is for different verification when you gotten your bake time here? So we can take an informed decision based on what the cost vs debuggability in product is.
>
> Looks good, thanks.

@robehn - Thanks for the review!

Are you saying that I should measure the cost of these `guarantee()` calls before
I switch them to `assert()` calls in the future? If that's what you mean, I'll include
a note for that when I file the new bug. If that's not what you mean, please let me
know what you do mean.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Robbin Ehn-2
In reply to this post by Robbin Ehn-2
On Thu, 1 Apr 2021 09:23:48 GMT, Robbin Ehn <[hidden email]> wrote:

>> ThreadsLists need an is_valid() function and checks in various
>> places to help catch bugs where a ThreadsList is dangling.
>>
>> Other minor changes:
>> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
>> - `get_threads_hazard_ptr()` should be `const`.
>> - fix a couple of old typos.
>>
>> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].
>
> Hi Dan,
>
> I have seen some other verification code which also might be performance impacting, can we got over and measure what the cost is for different verification when you gotten your bake time here? So we can take an informed decision based on what the cost vs debuggability in product is.
>
> Looks good, thanks.

> @robehn - Thanks for the review!
>
> Are you saying that I should measure the cost of these `guarantee()` calls before
> I switch them to `assert()` calls in the future? If that's what you mean, I'll include
> a note for that when I file the new bug. If that's not what you mean, please let me
> know what you do mean.

Yes

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Re: RFR: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
On Thu, 1 Apr 2021 17:34:24 GMT, Robbin Ehn <[hidden email]> wrote:

>> Hi Dan,
>>
>> I have seen some other verification code which also might be performance impacting, can we got over and measure what the cost is for different verification when you gotten your bake time here? So we can take an informed decision based on what the cost vs debuggability in product is.
>>
>> Looks good, thanks.
>
>> @robehn - Thanks for the review!
>>
>> Are you saying that I should measure the cost of these `guarantee()` calls before
>> I switch them to `assert()` calls in the future? If that's what you mean, I'll include
>> a note for that when I file the new bug. If that's not what you mean, please let me
>> know what you do mean.
>
> Yes

I've filed the following new bug:

    JDK-8264624 change the guarantee() calls added by JDK-8264123 to assert() calls

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255
Reply | Threaded
Open this post in threaded view
|

Integrated: 8264123: add ThreadsList.is_valid() support

Daniel D.Daugherty
In reply to this post by Daniel D.Daugherty
On Mon, 29 Mar 2021 21:48:23 GMT, Daniel D. Daugherty <[hidden email]> wrote:

> ThreadsLists need an is_valid() function and checks in various
> places to help catch bugs where a ThreadsList is dangling.
>
> Other minor changes:
> - change raw `_threads_hazard_ptr` access to used `get_threads_hazard_ptr()`.
> - `get_threads_hazard_ptr()` should be `const`.
> - fix a couple of old typos.
>
> Update: forgot to mention that this fix was tested with Mach5 Tier[1-8].

This pull request has now been integrated.

Changeset: 9b2232bc
Author:    Daniel D. Daugherty <[hidden email]>
URL:       https://git.openjdk.java.net/jdk/commit/9b2232bc
Stats:     60 lines in 4 files changed: 48 ins; 0 del; 12 mod

8264123: add ThreadsList.is_valid() support

Reviewed-by: dholmes, eosterlund, rehn

-------------

PR: https://git.openjdk.java.net/jdk/pull/3255