Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

classic Classic list List threaded Threaded
31 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
Greetings,

This code review from Stefan Karlsson was originally posted on an Oracle
internal alias that we use for discussing HotSpot SMR development issues.
That subject was: "Thread-SMR (8167108)(JDK10): CR round 0 changes". I did
not have time to address Stefan's review before I went on vacation and
Stefan graciously allowed me to defer his review to the OpenJDK review.

With Stefan's permission, I'm replying to his code review on the OpenJDK
aliases that we're using for the JDK-8167108 code review.


On 10/6/17 1:19 PM, Stefan Karlsson wrote:
Hi Dan,

Great that this is getting done!
 
Thanks! It has been an adventure for all three primary contributors...


Erik already mentioned the file so I think that's on track, and that's what I was most concerned about.


I have not been involved in the review of this patch at all, but now that I've been looking at it I have a few comments. I hope you don't mind. I don't want them to block the open review request, but at the same time I'd like to share my opinion and have it weighted in for the future of this code.

1) ThreadsListHandle allows a safepoint to block and allow GCs to run and restructure objects and metadata, iff the ThreadsListHandle is nested. This forced me to review all usages of ThreadsListHandle in great detail to visually verify that it didn't break the surrounding code.

If ThreadsListHandle didn't ever block for safepoints, I wouldn't have had to care about that aspect when reading and reviewing the code. This also means that we in the future runs the risk of someone accidentally adding a nested ThreadsListHandle somewhere where they don't expect a safepoint to happen.
 
We included the following to automatically sanity check the placement
of the ThreadsListHandle:

src/hotspot/share/runtime/thread.cpp:

3596 ThreadsList *Threads::acquire_stable_list(Thread *self, bool is_ThreadsListSetter) {
3597   assert(self != NULL, "sanity check");
3598   // acquire_stable_list_nested_path() will grab the Threads_lock
3599   // so let's make sure the ThreadsListHandle is in a safe place.
3600   // ThreadsListSetter cannot make this check on this code path.
3601   debug_only(if (!is_ThreadsListSetter && StrictSafepointChecks) self->check_for_valid_safepoint_state(/* potential_vm_operation */ false);)

So each ThreadsListHandle location is checked for being a valid
safepoint state.

We tested this idea with our work on JvmtiEnv::GetThreadLocalStorage().
GetThreadLocalStorage() is called from the 'in native' state so we
cannot place a common ThreadsListHandle for all code paths because
it would fail the assertion when called with 'thread == NULL', i.e.,
the thread is operating on itself.

Update: Erik is going to explore a lock free solution for the
nesting algorithm. That will likely mean that a ThreadsListHandle
will not require placement in a safepoint safe location...


If the lock protecting the threads list could be taken with no_safepoint_check, then the described problem above would completely vanish. Unfortunately, we can't acquire the Threads_lock with no_safepoint_check. A solution to this would be to introduced a low-rank (rank == access) lock, say ThreadsList_lock, and always take it with no_safepoint_check.
 
The problem with switching locks is that we are protecting the scanning
phase of the SMR algorithm with the Threads_lock so we would have to
switch that lock also. I believe we use the Threads_lock with the
scanning because we're using closures... Of course, Erik or Robbin
should probably jump in here... :-)

Update: Erik is going to explore a lock-free solution for the
nesting algorithm.

That solution would also solve a potential lock-rank problem, we have that ThreadsListHandle can't be used if a higher-rank lock is held.
 
So far we haven't run into that problem and we think that the
check_for_valid_safepoint_state() will catch that if the code
should evolve to introduce one.


Update: Erik is going to explore a lock-free solution for the
nesting algorithm.


2) The following one-liner:
-  for (JavaThread* t = Threads::first(); t; t = t->next()) {

has been converted to a five-liner: 

+  {
+    ThreadsListHandle tlh;
+    JavaThreadIterator jti(tlh.list());
+    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
...
+  }

I find that unfortunate, and would prefer if this was rewritten.

For example:
for (JavaThreadIterator jti; JavaThread* t = jit.next();) {

jti will implicitly hold the ThreadListHandle on the stack.

I know that the implicit conversion of pointer to bool is non-conventional in the HotSpot code, but in this case I prefer that over five lines of extra noise in the GC code.
 
Coleen made a similar comment in her OpenJDK review:

Hi,  From my initial look at this, I really like new interface for
walking the threads list, except every instance but 2 uses of
JavaThreadIterator has a preceding ThreadsListHandle declaration.

+  ThreadsListHandle tlh;
+  JavaThreadIterator jti(tlh.list());


Could there be a wrapper that embeds the ThreadsListHandle, like:

class JavaThreadsListIterator {
    ThreadsListHandle _tlh;
...
}


Yup a definite code noise problem. We originally didn't have an
iterator and used direct thread_at(foo) calls so we had a pretty
close correspondence between the old for-loop and the new for-loop.
Early reviewers asked for an iterator abstraction so we modeled
JavaThreadIterator after other existing iterators in the JVM/TI
area... (Yes, Dan stayed pretty close to "home" here...)

There are places where we still need the existing JavaThreadIterator
because a ThreadsList is passed down a call stack... So we've added
a JavaThreadIteratorWithHandle class.

Here's an example of the noisy code in the GC area:

src/hotspot/share/gc/g1/dirtyCardQueue.cpp:

@@ -319,8 +320,12 @@
   clear();
   // Since abandon is done only at safepoints, we can safely manipulate
   // these queues.
-  for (JavaThread* t = Threads::first(); t; t = t->next()) {
-    t->dirty_card_queue().reset();
+  {
+    ThreadsListHandle tlh;
+    JavaThreadIterator jti(tlh.list());
+    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
+      t->dirty_card_queue().reset();
+    }
   }
   shared_dirty_card_queue()->reset();
 }

Here's an example of the revised code in the GC area:

src/hotspot/share/gc/g1/dirtyCardQueue.cpp:

@@ -319,7 +320,7 @@
   clear();
   // Since abandon is done only at safepoints, we can safely manipulate
   // these queues.
-  for (JavaThread* t = Threads::first(); t; t = t->next()) {
+  for (JavaThreadIteratorWithHandle jtiwh; JavaThread *t = jtiwh.next(); ) {
     t->dirty_card_queue().reset();
   }
   shared_dirty_card_queue()->reset();

This is definitely much less noisy and I'm hoping I don't catch too
much grief for the implied boolean usage...


3) cv_internal_thread_to_JavaThread

Has one return value and two output parameter. It forces the callers to setup stack lock variables for the two new variables.

--- Used to be ---
-  oop java_thread = JNIHandles::resolve_non_null(jthread);
   java_lang_Thread::set_priority(java_thread, (ThreadPriority)prio);
-  JavaThread* thr = java_lang_Thread::thread(java_thread);
-  if (thr != NULL) {                  // Thread not yet started; priority pushed down when it is
-    Thread::set_priority(thr, (ThreadPriority)prio);

--- Now is ---
+  oop java_thread = NULL;
+  JavaThread* receiver = NULL;
+  bool is_alive = tlh.cv_internal_thread_to_JavaThread(jthread, &receiver, &java_thread);
   java_lang_Thread::set_priority(java_thread, (ThreadPriority)prio);
+
+  if (is_alive) {
+    // jthread refers to a live JavaThread.
+    Thread::set_priority(receiver, (ThreadPriority)prio);

--- Maybe could be ---
   oop java_thread = JNIHandles::resolve_non_null(jthread);
   java_lang_Thread::set_priority(java_thread, (ThreadPriority)prio);
   JavaThread* thr = tlh.cv_internal_thread_to_JavaThread(java_thread);
   if (thr != NULL) {                  // Thread not yet started; priority pushed down when it is
     Thread::set_priority(thr, (ThreadPriority)prio);

I don't see the need to for the extra complexity of the two output parameters.

The return value is always true when thr is non-null, and always false when thr is null.
 
There are three cv_*_to_JavaThread() functions and we had to craft them
very, very carefully to avoid changing some of the subtle semantics at
the original call sites. If you carefully examine the output parameters
and return value usages at all of the call sites, you'll see that each
was added to fit some existing semantic that we didn't want to risk
changing. Of course, not all of the call sites need all of the special
behaviors individually, but they do as a union.

In short, compatibility and risk management led us here.

But the usage of cv_internal_thread_to_JavaThread is contained within the Runtime code, so my opinion should probably not weigh as much as other's opinions. :)
 
We definitely agree this is a mess, but not one we're willing to risk
changing at this time. Dan has made the observation that the JVM_*
entry points, like Y2K code, shows a variety of different coding
patterns, each with different (potential) issues...

Thanks for being flexible on accepting cv_internal_thread_to_JavaThread()
as is for now!


4) I'd prefer if abbreviations where expanded, so that the casual reader would immediately understood the code. For example:
t_list -> thread_list
cv_internal_thread_to_JavaThread -> internal_thread_to_JavaThread
 
We've had requests to shorten names, spell out names, use different
names, etc. It seems that no one is going to be happy with all of
the names we used in this code.

Our guidelines:

- use the same consistently, e.g., t_list for ThreadsLists
- use a name that doesn't show up in an existing grep (if possible)

We hope you don't mind, but we're not planning to make any more name
changes...


5) This macro and the jesting is pretty bad.

Yup!


I complained about it to Erik, and then he confessed that he wrote it :D

+// Possibly the ugliest for loop the world has seen. C++ does not allow
+// multiple types in the declaration section of the for loop. In this case
+// we are only dealing with pointers and hence can cast them. It looks ugly
+// but macros are ugly and therefore it's fine to make things absurdly ugly.
+#define DO_JAVA_THREADS(LIST, X)                                                                                          \
+    for (JavaThread *MACRO_scan_interval = (JavaThread*)(uintptr_t)PrefetchScanIntervalInBytes,                           \
+             *MACRO_list = (JavaThread*)(LIST),                                                                           \
+             **MACRO_end = ((JavaThread**)((ThreadsList*)MACRO_list)->threads()) + ((ThreadsList*)MACRO_list)->length(),  \
+             **MACRO_current_p = (JavaThread**)((ThreadsList*)MACRO_list)->threads(),                                     \
+             *X = (JavaThread*)prefetch_and_load_ptr((void**)MACRO_current_p, (intx)MACRO_scan_interval);                 \
+         MACRO_current_p != MACRO_end;                                                                                    \
+         MACRO_current_p++,                                                                                               \
+             X = (JavaThread*)prefetch_and_load_ptr((void**)MACRO_current_p, (intx)MACRO_scan_interval))
+


This can be rewritten without all these cast, and minimal usage of the macros expansions:

struct JavaThreadPrefetchedIterator {
    ThreadList* _list;
    JavaThread** _end;
    JavaThread** _current;

    JavaThreadIteror(ThreadList* list) :
      _list(list), _end(list->threads() + list->length()), _current(list->threads()) {}

    JavaThread* current() {
      return context._current != context._end
        ? prefetch_and_load_ptr(context._current, PrefetchScanIntervalInBytes)
        : NULL) // ^ prefetch would be rewritten to return JavaThread* and not void*
    }

    void next() {
      _current++;
  };

#define DO_JAVA_THREADS(LIST, X)  \
  for (JavaThreadPrefetchedIterator iter(LIST); JavaThread* X = iter.current(); iter.next())

(I did some changes to the code above and haven't compiled this exact version)
 
Erik hasn't chimed in on this comment so I (Dan) am not planning to
try to resolve this comment in this round.

Update: Erik is planning to look at cleaning up the ugly macro...


6) I think it would be nice if the SMR stuff in thread.hpp were encapsulated into an class instead of added directly to Thread and Threads. I sort-of expected the SMR variables to be moved to threadSMR.hpp.

For example:

 class Threads: AllStatic {
   friend class VMStructs;
  private:
+  // Safe Memory Reclamation (SMR) support:
+  static Monitor*              _smr_delete_lock;
+  // The '_cnt', '_max' and '_times" fields are enabled via
+  // -XX:+EnableThreadSMRStatistics:
+  static uint                  _smr_delete_lock_wait_cnt;
+  static uint                  _smr_delete_lock_wait_max;
+  static volatile int          _smr_delete_notify;
+  static volatile jint         _smr_deleted_thread_cnt;
+  static volatile jint         _smr_deleted_thread_time_max;
+  static volatile jint         _smr_deleted_thread_times;
+  static ThreadsList* volatile _smr_java_thread_list;
+  static ThreadsList*          get_smr_java_thread_list() {
+    return (ThreadsList*)OrderAccess::load_ptr_acquire((void* volatile*)&_smr_java_thread_list);
+  }
+  static ThreadsList*          xchg_smr_java_thread_list(ThreadsList* new_list) {
+    return (ThreadsList*)Atomic::xchg_ptr((void*)new_list, (volatile void*)&_smr_java_thread_list);
+  }
+  static long                  _smr_java_thread_list_alloc_cnt;
+  static long                  _smr_java_thread_list_free_cnt;
+  static uint                  _smr_java_thread_list_max;
+  static uint                  _smr_nested_thread_list_max;
+  static volatile jint         _smr_tlh_cnt;
+  static volatile jint         _smr_tlh_time_max;
+  static volatile jint         _smr_tlh_times;
+  static ThreadsList*          _smr_to_delete_list;
+  static uint                  _smr_to_delete_list_cnt;
+  static uint                  _smr_to_delete_list_max;

Could be:

class Threads: AllStatic {
  friend class VMStructs;
 private:
  // Safe Memory Reclamation (SMR) support:
  SMRSupport _smr_support;

And SMRSupport could be moved to threadSMR.hpp. 

I haven't read all the code in detail, so I'm not sure if this is feasible or not, but my gut-feeling says that it would be worth testing. The above is just an example, and the rest of the code could probably be encapsulated and moved as well.
 
We already moved all of the SMR stuff that was "easy" to move based
on your earlier comment. (Of course, doing that move introduced a
number of compilation complications, but that's just Dan complaining :-))

We don't really want to try this migration at this time since we're
still hoping to get this code into 18.3. (Also Dan doesn't see the
value in moving the SMR fields... Threads is already an eclectic static
class so why does SMR have to encapsulate when no other project did so?)


7) Currently, threadSMR.hpp "only" contains the ThreadList. Unless, you move the SMR stuff into threadSMR.hpp, maybe it should be renamed to threadsList.hpp? Maybe it does make sense to have both threadSMR.hpp and threadsList.hpp?
 
We're planning to stick with the threadSMR.hpp and threadSMR.cpp
file names. Currently they contain the more standalone pieces of
thread SMR (more than just ThreadsList) so we're planning to keep
those names.

Thanks for the detailed review!!

Dan, Erik and Robbin





Cheers and thanks!
StefanK

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
Greetings,

Resolving one of the code review comments (from both Stefan K and Coleen)
on jdk10-04-full required quite a few changes so it is being done as a
standalone patch and corresponding webrevs:

Here's the mq comment for the change:

   stefank, coleenp CR - refactor most JavaThreadIterator usage to use
     JavaThreadIteratorWithHandle.

Here is the full webrev:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/

And here is the delta webrev:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/

We welcome comments, suggestions and feedback.

Dan, Erik and Robbin


On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:

> Greetings,
>
> We have a (eXtra Large) fix for the following bug:
>
> 8167108 inconsistent handling of SR_lock can lead to crashes
> https://bugs.openjdk.java.net/browse/JDK-8167108
>
> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
> Hazard Pointers to manage JavaThread lifecycle.
>
> Here's a PDF for the internal wiki that we've been using to describe
> and track the work on this project:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>
>
> Dan has noticed that the indenting is wrong in some of the code quotes
> in the PDF that are not present in the internal wiki. We don't have a
> solution for that problem yet.
>
> Here's the webrev for current JDK10 version of this fix:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>
> This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
> testing, additional stress testing on Dan's Solaris X64 server, and
> additional testing on Erik and Robbin's machines.
>
> We welcome comments, suggestions and feedback.
>
> Daniel Daugherty
> Erik Osterlund
> Robbin Ehn
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
The jdk10-05-full bits have passed JPRT testing (hs-tier1 testing) and
hs-tier[2-5] testing via Mach 5. No unexpected test failures.

Dan



On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:

> Greetings,
>
> Resolving one of the code review comments (from both Stefan K and Coleen)
> on jdk10-04-full required quite a few changes so it is being done as a
> standalone patch and corresponding webrevs:
>
> Here's the mq comment for the change:
>
>   stefank, coleenp CR - refactor most JavaThreadIterator usage to use
>     JavaThreadIteratorWithHandle.
>
> Here is the full webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>
> And here is the delta webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>
> We welcome comments, suggestions and feedback.
>
> Dan, Erik and Robbin
>
>
> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> We have a (eXtra Large) fix for the following bug:
>>
>> 8167108 inconsistent handling of SR_lock can lead to crashes
>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>
>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>> Hazard Pointers to manage JavaThread lifecycle.
>>
>> Here's a PDF for the internal wiki that we've been using to describe
>> and track the work on this project:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>
>>
>> Dan has noticed that the indenting is wrong in some of the code quotes
>> in the PDF that are not present in the internal wiki. We don't have a
>> solution for that problem yet.
>>
>> Here's the webrev for current JDK10 version of this fix:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>
>> This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
>> testing, additional stress testing on Dan's Solaris X64 server, and
>> additional testing on Erik and Robbin's machines.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Daniel Daugherty
>> Erik Osterlund
>> Robbin Ehn
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by Daniel D. Daugherty
Greetings,

Testing of the last round of changes revealed a hang in one of the new
TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
added another TLH test for good measure.

Here is the updated full webrev:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/

Here is the updated delta webrev:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/

Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
no unexpected failures.

We welcome comments, suggestions and feedback.

Dan, Erik and Robbin


On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:

> Greetings,
>
> Robbin rebased the project last night/this morning to merge with Thread
> Local Handshakes (TLH) and also picked up additional changesets up thru:
>
>> Changeset: fa736014cf28
>> Author:    cjplummer
>> Date:      2017-11-14 18:08 -0800
>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>
>> 8191049: Add alternate version of pns() that is callable from within
>> hotspot source
>> Summary: added pns2() to debug.cpp
>> Reviewed-by: stuefe, gthornbr
>
> This is the first time we've rebased the project to bits that are this
> fresh (< 12 hours old at rebase time). We've done this because we think
> we're done with this project and are in the final review-change-retest
> cycle(s)... In other words, we're not planning on making any more major
> changes for this project... :-)
>
> *** Time for code reviewers to chime in on this thread! ***
>
> Here is the updated full webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>
> Here's is the delta webrev from the 2017.11.10 rebase:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>
> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
> (and the new baseline also)...
>
> We're expecting this round to be a little noisier than usual because
> we did not rebase on a PIT snapshot so the new baseline has not been
> through Jesper's usual care-and-feeding of integration_blockers, etc.
>
> We welcome comments, suggestions and feedback.
>
> Dan, Erik and Robbin
>
>
> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>> (Yes, we're playing chase-the-repo...)
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>
>> Unlike the previous rebase, there were no changes required to the
>> open code to get the rebased bits to build so there is no delta
>> webrev.
>>
>> These bits have been run through JPRT and I've submitted the usual
>> Mach5 tier[1-5] test run...
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>
>>> Here are the updated webrevs:
>>>
>>> Here's the mq comment for the change:
>>>
>>>   Rebase to 2017.10.25 PIT snapshot.
>>>
>>> Here is the full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>
>>> And here is the delta webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>
>>> I ran the above bits throught Mach5 tier[1-5] testing over the holiday
>>> weekend. Didn't see any issues in a quick look. Going to take a closer
>>> look today.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Resolving one of the code review comments (from both Stefan K and
>>>> Coleen)
>>>> on jdk10-04-full required quite a few changes so it is being done as a
>>>> standalone patch and corresponding webrevs:
>>>>
>>>> Here's the mq comment for the change:
>>>>
>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to use
>>>>     JavaThreadIteratorWithHandle.
>>>>
>>>> Here is the full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>
>>>> And here is the delta webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>
>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>
>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>
>>>>> Here's a PDF for the internal wiki that we've been using to describe
>>>>> and track the work on this project:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>
>>>>>
>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>> quotes
>>>>> in the PDF that are not present in the internal wiki. We don't have a
>>>>> solution for that problem yet.
>>>>>
>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>
>>>>> This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>> additional testing on Erik and Robbin's machines.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Daniel Daugherty
>>>>> Erik Osterlund
>>>>> Robbin Ehn
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

David Holmes
Hi Dan,

Figured I'd better cast an eye over this again before it gets pushed :)

Only one thing (repeated many times) jumped out at me and apologies if
this has already been debated back and forth. I missed the exchange
where the for loop iteration was rewritten into this unusual form:

for (JavaThreadIteratorWithHandle jtiwh; JavaThread *jt = jtiwh.next(); ) {

On first reading I expect most people would mistake the control
expression for the iteration-expression due to the next(). I didn't even
know the control expression could introduce a local variable! I find
this really hard to read stylistically and far too cute/clever for its
own good. It also violates the style rules regarding implicit boolean
checks.

I know Stefan proposed this as he did not like the alternate (in a few
places):

+  {
+    ThreadsListHandle tlh;
+    JavaThreadIterator jti(tlh.list());
+    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
...
+  }

but it seems to me the iterator code has evolved since then and only a
couple of places need a distinct TLH separate from the actual iterator
object. So I'm more in favour of the more classic for-loop, with the
iterator declared before the loop. Or even convert to while loops, as
this iterator seems more suited to that.

Other than that just a couple of comments on variable type choices, and
a few typos spotted below.

Thanks,
David
---

src/hotspot/share/runtime/globals.hpp

2476   diagnostic(bool, EnableThreadSMRExtraValidityChecks, true,
         \
2477           "Enable Thread SMR extra validity checks")
         \
2478
         \
2479   diagnostic(bool, EnableThreadSMRStatistics, true,
         \
2480           "Enable Thread SMR Statistics")
         \

Indent of strings is off by  3 spaces.

---

src/hotspot/share/runtime/handshake.cpp

  140       // There is assumption in code that Threads_lock should be lock
  200       // There is assumption in code that Threads_lock should be lock

lock -> locked

  146         // handshake_process_by_vmthread will check the semaphore
for us again

Needs period at end.

  148         // should be okey since the new thread will not have an
operation.

okey -> okay

  151         // We can't warn here is since the thread do
cancel_handshake after it have been removed

I think:

// We can't warn here since the thread does cancel_handshake after it
has been removed

  152         // from ThreadsList. So we should just keep looping here
until while below return negative.

from -> from the

Needs period at end.

  204             // A new thread on the ThreadsList will not have an
operation.

Change period to comma, and ...

  205             // Hence is skipped in handshake_process_by_vmthread.

Hence is -> hence it is

---

src/hotspot/share/runtime/thread.cpp

1482     // dcubed - Looks like the Threads_lock is for stable access
1483     // to _jvmci_old_thread_counters and _jvmci_counters.

Does this comment need revisiting?

3451 volatile jint ...

Why are you using jint for field types here? (Surprised Coleen hasn't
spotted them! ;-) ).

3456 long                  Threads::_smr_java_thread_list_alloc_cnt = 1;
3457 long                  Threads::_smr_java_thread_list_free_cnt = 0;

long? If you really want 64-bit counters here you won't get them on
Windows with that declaration. If you really want variable sized
counters I suggest uintptr_t; otherwise uint64_t.

----

On 19/11/2017 11:49 AM, Daniel D. Daugherty wrote:

> Greetings,
>
> Testing of the last round of changes revealed a hang in one of the new
> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
> added another TLH test for good measure.
>
> Here is the updated full webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>
> Here is the updated delta webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>
> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
> no unexpected failures.
>
> We welcome comments, suggestions and feedback.
>
> Dan, Erik and Robbin
>
>
> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Robbin rebased the project last night/this morning to merge with Thread
>> Local Handshakes (TLH) and also picked up additional changesets up thru:
>>
>>> Changeset: fa736014cf28
>>> Author:    cjplummer
>>> Date:      2017-11-14 18:08 -0800
>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>
>>> 8191049: Add alternate version of pns() that is callable from within
>>> hotspot source
>>> Summary: added pns2() to debug.cpp
>>> Reviewed-by: stuefe, gthornbr
>>
>> This is the first time we've rebased the project to bits that are this
>> fresh (< 12 hours old at rebase time). We've done this because we think
>> we're done with this project and are in the final review-change-retest
>> cycle(s)... In other words, we're not planning on making any more major
>> changes for this project... :-)
>>
>> *** Time for code reviewers to chime in on this thread! ***
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>
>> Here's is the delta webrev from the 2017.11.10 rebase:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>
>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>> (and the new baseline also)...
>>
>> We're expecting this round to be a little noisier than usual because
>> we did not rebase on a PIT snapshot so the new baseline has not been
>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>> (Yes, we're playing chase-the-repo...)
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>
>>> Unlike the previous rebase, there were no changes required to the
>>> open code to get the rebased bits to build so there is no delta
>>> webrev.
>>>
>>> These bits have been run through JPRT and I've submitted the usual
>>> Mach5 tier[1-5] test run...
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>
>>>> Here are the updated webrevs:
>>>>
>>>> Here's the mq comment for the change:
>>>>
>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>
>>>> Here is the full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>
>>>> And here is the delta webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>
>>>> I ran the above bits throught Mach5 tier[1-5] testing over the holiday
>>>> weekend. Didn't see any issues in a quick look. Going to take a closer
>>>> look today.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Resolving one of the code review comments (from both Stefan K and
>>>>> Coleen)
>>>>> on jdk10-04-full required quite a few changes so it is being done as a
>>>>> standalone patch and corresponding webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to use
>>>>>     JavaThreadIteratorWithHandle.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>
>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>
>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>
>>>>>> Here's a PDF for the internal wiki that we've been using to describe
>>>>>> and track the work on this project:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>
>>>>>>
>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>> quotes
>>>>>> in the PDF that are not present in the internal wiki. We don't have a
>>>>>> solution for that problem yet.
>>>>>>
>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>
>>>>>> This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Daniel Daugherty
>>>>>> Erik Osterlund
>>>>>> Robbin Ehn
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Robin Westberg
In reply to this post by Daniel D. Daugherty
Hi Dan,

Overall I must say this looks very nice, can’t wait to use it! (Also a disclaimer: not a reviewer, and have not looked at the gc or jmvti specific changes nor the tests (yet)).

Didn’t spot any real issues, just a few small efficiency related things:

src/hotspot/share/runtime/biasedLocking.cpp

217     for (JavaThread* cur_thread = Threads::first(); cur_thread != NULL; cur_thread = cur_thread->next()) {
218       if (cur_thread == biased_thread) {
219         thread_is_alive = true;

This loop could be replaced with:

ThreadsListHandle tlh;
thread_is_alive = tlh.includes(biased_thread);


src/hotspot/share/runtime/memprofiler.cpp

55-56 grabs the Threads_lock, shouldn’t be needed anymore.


src/hotspot/share/runtime/thread.inline.hpp

198   TerminatedTypes l_terminated = (TerminatedTypes)
199       OrderAccess::load_acquire((volatile jint *) &_terminated);
200   return check_is_terminated(_terminated);

The variable used in the return statement was intended to be l_terminated I guess?


The following are more minor suggestions / comments:

src/hotspot/share/runtime/thread.cpp

3432 // operations over all threads.  It is protected by its own Mutex
3433 // lock, which is also used in other contexts to protect thread

Should this comment perhaps be revised to mention SMR?

4745   hash_table_size--;
4746   hash_table_size |= hash_table_size >> 1;
...

This calculation is repeated around line 4922 as well, perhaps put it in a function?

4828   // If someone set a handshake on us just as we entered exit path, we simple cancel it.

Perhaps something like “.. entered the exit path, we simply cancel it.”


src/hotspot/share/runtime/thread.hpp

1179   bool on_thread_list() { return _on_thread_list; }
1180   void set_on_thread_list() { _on_thread_list = true; }
1181
1182   // thread has called JavaThread::exit() or is terminated
1183   bool is_exiting() const;
1184   // thread is terminated (no longer on the threads list); we compare
1185   // against the two non-terminated values so that a freed JavaThread
1186   // will also be considered terminated.
1187   bool check_is_terminated(TerminatedTypes l_terminated) const {
1188     return l_terminated != _not_terminated && l_terminated != _thread_exiting;
1189   }
1190   bool is_terminated();

Use of const is inconsistent here, it’s added to 1183, should it perhaps be added to 1179 and 1190 as well then?


src/hotspot/share/runtime/vm_operations.hpp

406   DeadlockCycle* result()               { return _deadlocks; };
407   VMOp_Type type() const                { return VMOp_FindDeadlocks; }

Whitespace only change that seems spurious.

Best regards,
Robin

> On 19 Nov 2017, at 02:49, Daniel D. Daugherty <[hidden email]> wrote:
>
> Greetings,
>
> Testing of the last round of changes revealed a hang in one of the new
> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
> added another TLH test for good measure.
>
> Here is the updated full webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>
> Here is the updated delta webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>
> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
> no unexpected failures.
>
> We welcome comments, suggestions and feedback.
>
> Dan, Erik and Robbin
>
>
> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Robbin rebased the project last night/this morning to merge with Thread
>> Local Handshakes (TLH) and also picked up additional changesets up thru:
>>
>>> Changeset: fa736014cf28
>>> Author:    cjplummer
>>> Date:      2017-11-14 18:08 -0800
>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>
>>> 8191049: Add alternate version of pns() that is callable from within hotspot source
>>> Summary: added pns2() to debug.cpp
>>> Reviewed-by: stuefe, gthornbr
>>
>> This is the first time we've rebased the project to bits that are this
>> fresh (< 12 hours old at rebase time). We've done this because we think
>> we're done with this project and are in the final review-change-retest
>> cycle(s)... In other words, we're not planning on making any more major
>> changes for this project... :-)
>>
>> *** Time for code reviewers to chime in on this thread! ***
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>
>> Here's is the delta webrev from the 2017.11.10 rebase:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>
>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>> (and the new baseline also)...
>>
>> We're expecting this round to be a little noisier than usual because
>> we did not rebase on a PIT snapshot so the new baseline has not been
>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>> (Yes, we're playing chase-the-repo...)
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>
>>> Unlike the previous rebase, there were no changes required to the
>>> open code to get the rebased bits to build so there is no delta
>>> webrev.
>>>
>>> These bits have been run through JPRT and I've submitted the usual
>>> Mach5 tier[1-5] test run...
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>
>>>> Here are the updated webrevs:
>>>>
>>>> Here's the mq comment for the change:
>>>>
>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>
>>>> Here is the full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>
>>>> And here is the delta webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>
>>>> I ran the above bits throught Mach5 tier[1-5] testing over the holiday
>>>> weekend. Didn't see any issues in a quick look. Going to take a closer
>>>> look today.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Resolving one of the code review comments (from both Stefan K and Coleen)
>>>>> on jdk10-04-full required quite a few changes so it is being done as a
>>>>> standalone patch and corresponding webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to use
>>>>>     JavaThreadIteratorWithHandle.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>
>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>
>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>
>>>>>> Here's a PDF for the internal wiki that we've been using to describe
>>>>>> and track the work on this project:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>
>>>>>> Dan has noticed that the indenting is wrong in some of the code quotes
>>>>>> in the PDF that are not present in the internal wiki. We don't have a
>>>>>> solution for that problem yet.
>>>>>>
>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>
>>>>>> This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Daniel Daugherty
>>>>>> Erik Osterlund
>>>>>> Robbin Ehn
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by David Holmes
On 11/20/17 12:51 AM, David Holmes wrote:
> Hi Dan,
>
> Figured I'd better cast an eye over this again before it gets pushed :)

Thanks for reviewing again!!


> Only one thing (repeated many times) jumped out at me and apologies if
> this has already been debated back and forth. I missed the exchange
> where the for loop iteration was rewritten into this unusual form:
>
> for (JavaThreadIteratorWithHandle jtiwh; JavaThread *jt =
> jtiwh.next(); ) {
>
> On first reading I expect most people would mistake the control
> expression for the iteration-expression due to the next(). I didn't
> even know the control expression could introduce a local variable! I
> find this really hard to read stylistically and far too cute/clever
> for its own good. It also violates the style rules regarding implicit
> boolean checks.
>
> I know Stefan proposed this as he did not like the alternate (in a few
> places):
>
> +  {
> +    ThreadsListHandle tlh;
> +    JavaThreadIterator jti(tlh.list());
> +    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
> ...
> +  }
>
> but it seems to me the iterator code has evolved since then and only a
> couple of places need a distinct TLH separate from the actual iterator
> object. So I'm more in favour of the more classic for-loop, with the
> iterator declared before the loop. Or even convert to while loops, as
> this iterator seems more suited to that.

Actually both Coleen and Stefan had a problem with how much additional
code was added to support an iterator model. In some cases we went from
1-line to 3-lines and in some cases we went from 1-line to 5-lines. For
Coleen, she wanted 2 of the new lines compressed into 1 new line by using
an iterator with an embedded ThreadsListHandle. Stefan wanted us to try
and get back to 1-line where possible.

The advantages of the new style are:

- 1-line to 1-line in all but a few cases
- automatic limited scope of the embedded ThreadsListHandle so we were
   able to remove extra braces that were added earlier in most cases

The disadvantages of the new style are:

- it is an unusual for-loop style (in HotSpot)
- it breaks HotSpot's style rule about implied booleans


Stefan K is pretty adamant that the original iterator version where we
go from 1-line to 3-lines or 1-line to 5-lines is not acceptable for GC
code. The compiler guys haven't chimed in on this debate so I'm guessing
they don't have a strong opinion either way. One thing that we don't want
to do is have different styles for the different teams.

So I had to make a judgement call on whether I thought Runtime could live
with Stefan's idea. Originally I wasn't fond of it either and breaking the
implied boolean style rule is a problem for me (I post that comment a lot
in my code reviews). However, I have to say that I'm a lot happier about
the compactness of the code and the fact that limited ThreadsListHandle
scope comes for 'free' in most cases.

We're planning to keep the new iterator style for the following reasons:

- smaller change footprint
- consistent style used for all of the teams
- limited ThreadsListHandle scope comes for free in most cases

We're hoping that you can live with this decision (for now) and maybe
even grow to like it in the future. :-)


> Other than that just a couple of comments on variable type choices,
> and a few typos spotted below.

Replies embedded below.


>
> Thanks,
> David
> ---
>
> src/hotspot/share/runtime/globals.hpp
>
> 2476   diagnostic(bool, EnableThreadSMRExtraValidityChecks, true,
>         \
> 2477           "Enable Thread SMR extra validity checks") \
> 2478         \
> 2479   diagnostic(bool, EnableThreadSMRStatistics, true,         \
> 2480           "Enable Thread SMR Statistics")         \
>
> Indent of strings is off by  3 spaces.

Fixed.


> ---
>
> src/hotspot/share/runtime/handshake.cpp
>
>  140       // There is assumption in code that Threads_lock should be
> lock
>  200       // There is assumption in code that Threads_lock should be
> lock
>
> lock -> locked

Fixed.


> 146         // handshake_process_by_vmthread will check the semaphore
> for us again
>
> Needs period at end.

Fixed.


> 148         // should be okey since the new thread will not have an
> operation.
>
> okey -> okay

Fixed.


> 151         // We can't warn here is since the thread do
> cancel_handshake after it have been removed
>
> I think:
>
> // We can't warn here since the thread does cancel_handshake after it
> has been removed

Fixed.


> 152         // from ThreadsList. So we should just keep looping here
> until while below return negative.
>
> from -> from the
>
> Needs period at end.

Fixed both.


>
>  204             // A new thread on the ThreadsList will not have an
> operation.
>
> Change period to comma, and ...

Fixed.


> 205             // Hence is skipped in handshake_process_by_vmthread.
>
> Hence is -> hence it is

Fixed.


> ---
>
> src/hotspot/share/runtime/thread.cpp
>
> 1482     // dcubed - Looks like the Threads_lock is for stable access
> 1483     // to _jvmci_old_thread_counters and _jvmci_counters.
>
> Does this comment need revisiting?

We've been trying to decide what to do about this comment. We'll be
filing a follow up bug for the Compiler team to take another look at
how _jvmci_old_thread_counters and _jvmci_counters are protected.
Threads_lock is a big hammer and there should be a less expensive
solution. Once that bug gets filed, this comment can go away.


> 3451 volatile jint ...
>
> Why are you using jint for field types here? (Surprised Coleen hasn't
> spotted them! ;-) ).

volatile jint         Threads::_smr_delete_notify = 0;
volatile jint         Threads::_smr_deleted_thread_cnt = 0;
volatile jint         Threads::_smr_deleted_thread_time_max = 0;
volatile jint         Threads::_smr_deleted_thread_times = 0;
:
volatile jint         Threads::_smr_tlh_cnt = 0;
volatile jint         Threads::_smr_tlh_time_max = 0;
volatile jint         Threads::_smr_tlh_times = 0;

_smr_delete_notify is a "flag" that indicates when an _smr_delete_lock
notify() operation is needed. It counts up and down and should be a fairly
small number.

_smr_deleted_thread_cnt counts the number of Threads that have been
deleted over the life of the VM. It's an always increasing number so
it's size depends on how long the VM has been running.

_smr_deleted_thread_time_max is the maximum time in millis it has
taken to delete a thread. This field was originally a jlong, but
IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
just been Atomic::add() that was not happy)

_smr_deleted_thread_times accumulates the time in millis that it
has taken to delete threads. It's an always increasing number so
it's size depends on how long the VM has been running. This field
was originally a jlong, but IIRC the Atomic::add() code was not
happy on ARM... (might have just been Atomic::cmpxchg() that was
not happy)

_smr_tlh_cnt counts the number of ThreadsListHandles that have been
deleted over the life of the VM. It's an always increasing number so
it's size depends on how long the VM has been running.

_smr_tlh_time_max is the maximum time in millis it has taken to
delete a ThreadsListHandle. This field was originally a jlong, but
IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
just been Atomic::add() that was not happy)

_smr_tlh_times  accumulates the time in millis that it has taken to
delete ThreadsListHandles. It's an always increasing number so
it's size depends on how long the VM has been running.  This field
was originally a jlong, but IIRC the Atomic::add() code was not
happy on ARM... (might have just been Atomic::cmpxchg() that was
not happy)


It just dawned on me that I'm not sure whether you think the 'jint'
fields should be larger or smaller or the same size but a different
type... :-)

The fact that I had to write so much to explain what these fields
are for and how they're used indicates I need more comments. See
below...


> 3456 long Threads::_smr_java_thread_list_alloc_cnt = 1;
> 3457 long                  Threads::_smr_java_thread_list_free_cnt = 0;
>
> long? If you really want 64-bit counters here you won't get them on
> Windows with that declaration. If you really want variable sized
> counters I suggest uintptr_t; otherwise uint64_t.

long                  Threads::_smr_java_thread_list_alloc_cnt = 1;
long                  Threads::_smr_java_thread_list_free_cnt = 0;

_smr_java_thread_list_alloc_cnt counts the number of ThreadsLists that
have been allocated over the life of the VM. It's an always increasing
number so it's size depends on how long the VM has been running and how
many Threads have been created and destroyed.

_smr_java_thread_list_free_cnt counts the number of ThreadsLists that
have been freed over the life of the VM. It's an always increasing
number so it's size depends on how long the VM has been running and how
many Threads have been created and destroyed.

I can't remember why we chose 'long' and I agree it's not a good choice
for Win*.


Okay so it dawns on me that we haven't written down a description for
the various new '_cnt', '_max' and '_times" fields so I'm adding these
comments to thread.hpp:

  private:
   // Safe Memory Reclamation (SMR) support:
   static Monitor*              _smr_delete_lock;
   // The '_cnt', '_max' and '_times" fields are enabled via
   // -XX:+EnableThreadSMRStatistics:
                                // # of parallel threads in
_smr_delete_lock->wait().
   static uint                  _smr_delete_lock_wait_cnt;
                                // Max # of parallel threads in
_smr_delete_lock->wait().
   static uint                  _smr_delete_lock_wait_max;
                                // Flag to indicate when an
_smr_delete_lock->notify() is needed.
   static volatile uint         _smr_delete_notify;
                                // # of threads deleted over VM lifetime.
   static volatile uint         _smr_deleted_thread_cnt;
                                // Max time in millis to delete a thread.
   static volatile uint         _smr_deleted_thread_time_max;
                                // Cumulative time in millis to delete
threads.
   static volatile uint         _smr_deleted_thread_times;
   static ThreadsList* volatile _smr_java_thread_list;
   static ThreadsList*          get_smr_java_thread_list();
   static ThreadsList* xchg_smr_java_thread_list(ThreadsList* new_list);
                                // # of ThreadsLists allocated over VM
lifetime.
   static uint64_t              _smr_java_thread_list_alloc_cnt;
                                // # of ThreadsLists freed over VM lifetime.
   static uint64_t              _smr_java_thread_list_free_cnt;
                                // Max size ThreadsList allocated.
   static uint                  _smr_java_thread_list_max;
                                // Max # of nested ThreadsLists for a
thread.
   static uint                  _smr_nested_thread_list_max;
                                // # of ThreadsListHandles deleted over
VM lifetime.
   static volatile uint         _smr_tlh_cnt;
                                // Max time in millis to delete a
ThreadsListHandle.
   static volatile jint         _smr_tlh_time_max;
                                // Cumulative time in millis to delete
ThreadsListHandles.
   static volatile jint         _smr_tlh_times;
   static ThreadsList*          _smr_to_delete_list;
                                // # of parallel ThreadsLists on the
to-delete list.
   static uint                  _smr_to_delete_list_cnt;
                                // Max # of parallel ThreadsLists on the
to-delete list.
   static uint                  _smr_to_delete_list_max;


I've also gone through all the new '_cnt', '_max' and '_times" fields
in thread.cpp and added an impl note to explain the choice of type:

// Safe Memory Reclamation (SMR) support:
Monitor*              Threads::_smr_delete_lock =
                           new Monitor(Monitor::special, "smr_delete_lock",
                                       false /* allow_vm_block */,
Monitor::_safepoint_check_never);
// The '_cnt', '_max' and '_times" fields are enabled via
// -XX:+EnableThreadSMRStatistics:
                       // # of parallel threads in _smr_delete_lock->wait().
                       // Impl note: Hard to imagine > 64K waiting
threads so
                       // this could be 16-bit, but there is no nice 16-bit
                       // _FORMAT support.
uint                  Threads::_smr_delete_lock_wait_cnt = 0;
                       // Max # of parallel threads in
_smr_delete_lock->wait().
                       // Impl note: See _smr_delete_lock_wait_cnt note.
uint                  Threads::_smr_delete_lock_wait_max = 0;
                       // Flag to indicate when an
_smr_delete_lock->notify() is needed.
                       // Impl note: See _smr_delete_lock_wait_cnt note.
volatile uint         Threads::_smr_delete_notify = 0;
                       // # of threads deleted over VM lifetime.
                       // Impl note: Atomically incremented over VM
lifetime so
                       // use unsigned for more range. Unsigned 64-bit would
                       // be more future proof, but 64-bit atomic inc isn't
                       // available everywhere (or is it?).
volatile uint         Threads::_smr_deleted_thread_cnt = 0;
                       // Max time in millis to delete a thread.
                       // Impl note: 16-bit might be too small on an
overloaded
                       // machine. Use unsigned since this is a time
value. Set
                       // via Atomic::cmpxchg() in a loop for correctness.
volatile uint         Threads::_smr_deleted_thread_time_max = 0;
                       // Cumulative time in millis to delete threads.
                       // Impl note: Atomically added to over VM
lifetime so use
                       // unsigned for more range. Unsigned 64-bit would
be more
                       // future proof, but 64-bit atomic inc isn't
available
                       // everywhere (or is it?).
volatile uint         Threads::_smr_deleted_thread_times = 0;
ThreadsList* volatile Threads::_smr_java_thread_list = new ThreadsList(0);
                       // # of ThreadsLists allocated over VM lifetime.
                       // Impl note: We allocate a new ThreadsList for every
                       // thread create and every thread delete so we need a
                       // bigger type than the _smr_deleted_thread_cnt
field.
uint64_t              Threads::_smr_java_thread_list_alloc_cnt = 1;
                       // # of ThreadsLists freed over VM lifetime.
                       // Impl note: See _smr_java_thread_list_alloc_cnt
note.
uint64_t              Threads::_smr_java_thread_list_free_cnt = 0;
                       // Max size ThreadsList allocated.
                       // Impl note: Max # of threads alive at one time
should
                       // fit in unsigned 32-bit.
uint                  Threads::_smr_java_thread_list_max = 0;
                       // Max # of nested ThreadsLists for a thread.
                       // Impl note: Hard to imagine > 64K nested
ThreadsLists
                       // so his could be 16-bit, but there is no nice
16-bit
                       // _FORMAT support.
uint                  Threads::_smr_nested_thread_list_max = 0;
                       // # of ThreadsListHandles deleted over VM lifetime.
                       // Impl note: Atomically incremented over VM
lifetime so
                       // use unsigned for more range. There will be fewer
                       // ThreadsListHandles than threads so unsigned 32-bit
                       // should be fine.
volatile uint         Threads::_smr_tlh_cnt = 0;
                       // Max time in millis to delete a ThreadsListHandle.
                       // Impl note: 16-bit might be too small on an
overloaded
                       // machine. Use unsigned since this is a time
value. Set
                       // via Atomic::cmpxchg() in a loop for correctness.
volatile uint         Threads::_smr_tlh_time_max = 0;
                       // Cumulative time in millis to delete
ThreadsListHandles.
                       // Impl note: Atomically added to over VM
lifetime so use
                       // unsigned for more range. Unsigned 64-bit would
be more
                       // future proof, but 64-bit atomic inc isn't
available
                       // everywhere (or is it?).
volatile uint         Threads::_smr_tlh_times = 0;
ThreadsList*          Threads::_smr_to_delete_list = NULL;
                       // # of parallel ThreadsLists on the to-delete list.
                       // Impl note: Hard to imagine > 64K ThreadsLists
needing
                       // to be deleted so this could be 16-bit, but
there is no
                       // nice 16-bit _FORMAT support.
uint                  Threads::_smr_to_delete_list_cnt = 0;
                       // Max # of parallel ThreadsLists on the
to-delete list.
                       // Impl note: See _smr_to_delete_list_cnt note.
uint                  Threads::_smr_to_delete_list_max = 0;


Yikes! With the additional comments, the additions to thread.hpp and
thread.cpp grew by a bunch... I've done a test build build on my Mac
with these changes and I'm about to kick off a Mach5 tier1 job...

Dan



>
> ----
>
> On 19/11/2017 11:49 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Testing of the last round of changes revealed a hang in one of the new
>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
>> added another TLH test for good measure.
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>
>> Here is the updated delta webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>
>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>> no unexpected failures.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Robbin rebased the project last night/this morning to merge with Thread
>>> Local Handshakes (TLH) and also picked up additional changesets up
>>> thru:
>>>
>>>> Changeset: fa736014cf28
>>>> Author:    cjplummer
>>>> Date:      2017-11-14 18:08 -0800
>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>
>>>> 8191049: Add alternate version of pns() that is callable from
>>>> within hotspot source
>>>> Summary: added pns2() to debug.cpp
>>>> Reviewed-by: stuefe, gthornbr
>>>
>>> This is the first time we've rebased the project to bits that are this
>>> fresh (< 12 hours old at rebase time). We've done this because we think
>>> we're done with this project and are in the final review-change-retest
>>> cycle(s)... In other words, we're not planning on making any more major
>>> changes for this project... :-)
>>>
>>> *** Time for code reviewers to chime in on this thread! ***
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>
>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>
>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>> (and the new baseline also)...
>>>
>>> We're expecting this round to be a little noisier than usual because
>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>> (Yes, we're playing chase-the-repo...)
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>
>>>> Unlike the previous rebase, there were no changes required to the
>>>> open code to get the rebased bits to build so there is no delta
>>>> webrev.
>>>>
>>>> These bits have been run through JPRT and I've submitted the usual
>>>> Mach5 tier[1-5] test run...
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>
>>>>> Here are the updated webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>
>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>> holiday
>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>> closer
>>>>> look today.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Resolving one of the code review comments (from both Stefan K and
>>>>>> Coleen)
>>>>>> on jdk10-04-full required quite a few changes so it is being done
>>>>>> as a
>>>>>> standalone patch and corresponding webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to
>>>>>> use
>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>
>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>
>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>
>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>> describe
>>>>>>> and track the work on this project:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>
>>>>>>>
>>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>>> quotes
>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>> have a
>>>>>>> solution for that problem yet.
>>>>>>>
>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>
>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>> tier[2-5]
>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Daniel Daugherty
>>>>>>> Erik Osterlund
>>>>>>> Robbin Ehn
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

David Holmes
Hi Dan,

Just to be clear my comment about use of jint's was not about expected
size but the fact you were using a j-type instead of a C++ type when the
field is not a Java field. (Coleen has been on a crusade to remove
j-types from where they don't belong - and they were removed from the
Atomic API as part of that recent templatization effort.)

No further comments. :)

Thanks,
David

On 21/11/2017 11:50 AM, Daniel D. Daugherty wrote:

> On 11/20/17 12:51 AM, David Holmes wrote:
>> Hi Dan,
>>
>> Figured I'd better cast an eye over this again before it gets pushed :)
>
> Thanks for reviewing again!!
>
>
>> Only one thing (repeated many times) jumped out at me and apologies if
>> this has already been debated back and forth. I missed the exchange
>> where the for loop iteration was rewritten into this unusual form:
>>
>> for (JavaThreadIteratorWithHandle jtiwh; JavaThread *jt =
>> jtiwh.next(); ) {
>>
>> On first reading I expect most people would mistake the control
>> expression for the iteration-expression due to the next(). I didn't
>> even know the control expression could introduce a local variable! I
>> find this really hard to read stylistically and far too cute/clever
>> for its own good. It also violates the style rules regarding implicit
>> boolean checks.
>>
>> I know Stefan proposed this as he did not like the alternate (in a few
>> places):
>>
>> +  {
>> +    ThreadsListHandle tlh;
>> +    JavaThreadIterator jti(tlh.list());
>> +    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
>> ...
>> +  }
>>
>> but it seems to me the iterator code has evolved since then and only a
>> couple of places need a distinct TLH separate from the actual iterator
>> object. So I'm more in favour of the more classic for-loop, with the
>> iterator declared before the loop. Or even convert to while loops, as
>> this iterator seems more suited to that.
>
> Actually both Coleen and Stefan had a problem with how much additional
> code was added to support an iterator model. In some cases we went from
> 1-line to 3-lines and in some cases we went from 1-line to 5-lines. For
> Coleen, she wanted 2 of the new lines compressed into 1 new line by using
> an iterator with an embedded ThreadsListHandle. Stefan wanted us to try
> and get back to 1-line where possible.
>
> The advantages of the new style are:
>
> - 1-line to 1-line in all but a few cases
> - automatic limited scope of the embedded ThreadsListHandle so we were
>    able to remove extra braces that were added earlier in most cases
>
> The disadvantages of the new style are:
>
> - it is an unusual for-loop style (in HotSpot)
> - it breaks HotSpot's style rule about implied booleans
>
>
> Stefan K is pretty adamant that the original iterator version where we
> go from 1-line to 3-lines or 1-line to 5-lines is not acceptable for GC
> code. The compiler guys haven't chimed in on this debate so I'm guessing
> they don't have a strong opinion either way. One thing that we don't want
> to do is have different styles for the different teams.
>
> So I had to make a judgement call on whether I thought Runtime could live
> with Stefan's idea. Originally I wasn't fond of it either and breaking the
> implied boolean style rule is a problem for me (I post that comment a lot
> in my code reviews). However, I have to say that I'm a lot happier about
> the compactness of the code and the fact that limited ThreadsListHandle
> scope comes for 'free' in most cases.
>
> We're planning to keep the new iterator style for the following reasons:
>
> - smaller change footprint
> - consistent style used for all of the teams
> - limited ThreadsListHandle scope comes for free in most cases
>
> We're hoping that you can live with this decision (for now) and maybe
> even grow to like it in the future. :-)
>
>
>> Other than that just a couple of comments on variable type choices,
>> and a few typos spotted below.
>
> Replies embedded below.
>
>
>>
>> Thanks,
>> David
>> ---
>>
>> src/hotspot/share/runtime/globals.hpp
>>
>> 2476   diagnostic(bool, EnableThreadSMRExtraValidityChecks, true,
>>         \
>> 2477           "Enable Thread SMR extra validity checks") \
>> 2478         \
>> 2479   diagnostic(bool, EnableThreadSMRStatistics, true,         \
>> 2480           "Enable Thread SMR Statistics")         \
>>
>> Indent of strings is off by  3 spaces.
>
> Fixed.
>
>
>> ---
>>
>> src/hotspot/share/runtime/handshake.cpp
>>
>>  140       // There is assumption in code that Threads_lock should be
>> lock
>>  200       // There is assumption in code that Threads_lock should be
>> lock
>>
>> lock -> locked
>
> Fixed.
>
>
>> 146         // handshake_process_by_vmthread will check the semaphore
>> for us again
>>
>> Needs period at end.
>
> Fixed.
>
>
>> 148         // should be okey since the new thread will not have an
>> operation.
>>
>> okey -> okay
>
> Fixed.
>
>
>> 151         // We can't warn here is since the thread do
>> cancel_handshake after it have been removed
>>
>> I think:
>>
>> // We can't warn here since the thread does cancel_handshake after it
>> has been removed
>
> Fixed.
>
>
>> 152         // from ThreadsList. So we should just keep looping here
>> until while below return negative.
>>
>> from -> from the
>>
>> Needs period at end.
>
> Fixed both.
>
>
>>
>>  204             // A new thread on the ThreadsList will not have an
>> operation.
>>
>> Change period to comma, and ...
>
> Fixed.
>
>
>> 205             // Hence is skipped in handshake_process_by_vmthread.
>>
>> Hence is -> hence it is
>
> Fixed.
>
>
>> ---
>>
>> src/hotspot/share/runtime/thread.cpp
>>
>> 1482     // dcubed - Looks like the Threads_lock is for stable access
>> 1483     // to _jvmci_old_thread_counters and _jvmci_counters.
>>
>> Does this comment need revisiting?
>
> We've been trying to decide what to do about this comment. We'll be
> filing a follow up bug for the Compiler team to take another look at
> how _jvmci_old_thread_counters and _jvmci_counters are protected.
> Threads_lock is a big hammer and there should be a less expensive
> solution. Once that bug gets filed, this comment can go away.
>
>
>> 3451 volatile jint ...
>>
>> Why are you using jint for field types here? (Surprised Coleen hasn't
>> spotted them! ;-) ).
>
> volatile jint         Threads::_smr_delete_notify = 0;
> volatile jint         Threads::_smr_deleted_thread_cnt = 0;
> volatile jint         Threads::_smr_deleted_thread_time_max = 0;
> volatile jint         Threads::_smr_deleted_thread_times = 0;
> :
> volatile jint         Threads::_smr_tlh_cnt = 0;
> volatile jint         Threads::_smr_tlh_time_max = 0;
> volatile jint         Threads::_smr_tlh_times = 0;
>
> _smr_delete_notify is a "flag" that indicates when an _smr_delete_lock
> notify() operation is needed. It counts up and down and should be a fairly
> small number.
>
> _smr_deleted_thread_cnt counts the number of Threads that have been
> deleted over the life of the VM. It's an always increasing number so
> it's size depends on how long the VM has been running.
>
> _smr_deleted_thread_time_max is the maximum time in millis it has
> taken to delete a thread. This field was originally a jlong, but
> IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
> just been Atomic::add() that was not happy)
>
> _smr_deleted_thread_times accumulates the time in millis that it
> has taken to delete threads. It's an always increasing number so
> it's size depends on how long the VM has been running. This field
> was originally a jlong, but IIRC the Atomic::add() code was not
> happy on ARM... (might have just been Atomic::cmpxchg() that was
> not happy)
>
> _smr_tlh_cnt counts the number of ThreadsListHandles that have been
> deleted over the life of the VM. It's an always increasing number so
> it's size depends on how long the VM has been running.
>
> _smr_tlh_time_max is the maximum time in millis it has taken to
> delete a ThreadsListHandle. This field was originally a jlong, but
> IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
> just been Atomic::add() that was not happy)
>
> _smr_tlh_times  accumulates the time in millis that it has taken to
> delete ThreadsListHandles. It's an always increasing number so
> it's size depends on how long the VM has been running.  This field
> was originally a jlong, but IIRC the Atomic::add() code was not
> happy on ARM... (might have just been Atomic::cmpxchg() that was
> not happy)
>
>
> It just dawned on me that I'm not sure whether you think the 'jint'
> fields should be larger or smaller or the same size but a different
> type... :-)
>
> The fact that I had to write so much to explain what these fields
> are for and how they're used indicates I need more comments. See
> below...
>
>
>> 3456 long Threads::_smr_java_thread_list_alloc_cnt = 1;
>> 3457 long                  Threads::_smr_java_thread_list_free_cnt = 0;
>>
>> long? If you really want 64-bit counters here you won't get them on
>> Windows with that declaration. If you really want variable sized
>> counters I suggest uintptr_t; otherwise uint64_t.
>
> long                  Threads::_smr_java_thread_list_alloc_cnt = 1;
> long                  Threads::_smr_java_thread_list_free_cnt = 0;
>
> _smr_java_thread_list_alloc_cnt counts the number of ThreadsLists that
> have been allocated over the life of the VM. It's an always increasing
> number so it's size depends on how long the VM has been running and how
> many Threads have been created and destroyed.
>
> _smr_java_thread_list_free_cnt counts the number of ThreadsLists that
> have been freed over the life of the VM. It's an always increasing
> number so it's size depends on how long the VM has been running and how
> many Threads have been created and destroyed.
>
> I can't remember why we chose 'long' and I agree it's not a good choice
> for Win*.
>
>
> Okay so it dawns on me that we haven't written down a description for
> the various new '_cnt', '_max' and '_times" fields so I'm adding these
> comments to thread.hpp:
>
>   private:
>    // Safe Memory Reclamation (SMR) support:
>    static Monitor*              _smr_delete_lock;
>    // The '_cnt', '_max' and '_times" fields are enabled via
>    // -XX:+EnableThreadSMRStatistics:
>                                 // # of parallel threads in
> _smr_delete_lock->wait().
>    static uint                  _smr_delete_lock_wait_cnt;
>                                 // Max # of parallel threads in
> _smr_delete_lock->wait().
>    static uint                  _smr_delete_lock_wait_max;
>                                 // Flag to indicate when an
> _smr_delete_lock->notify() is needed.
>    static volatile uint         _smr_delete_notify;
>                                 // # of threads deleted over VM lifetime.
>    static volatile uint         _smr_deleted_thread_cnt;
>                                 // Max time in millis to delete a thread.
>    static volatile uint         _smr_deleted_thread_time_max;
>                                 // Cumulative time in millis to delete
> threads.
>    static volatile uint         _smr_deleted_thread_times;
>    static ThreadsList* volatile _smr_java_thread_list;
>    static ThreadsList*          get_smr_java_thread_list();
>    static ThreadsList* xchg_smr_java_thread_list(ThreadsList* new_list);
>                                 // # of ThreadsLists allocated over VM
> lifetime.
>    static uint64_t              _smr_java_thread_list_alloc_cnt;
>                                 // # of ThreadsLists freed over VM
> lifetime.
>    static uint64_t              _smr_java_thread_list_free_cnt;
>                                 // Max size ThreadsList allocated.
>    static uint                  _smr_java_thread_list_max;
>                                 // Max # of nested ThreadsLists for a
> thread.
>    static uint                  _smr_nested_thread_list_max;
>                                 // # of ThreadsListHandles deleted over
> VM lifetime.
>    static volatile uint         _smr_tlh_cnt;
>                                 // Max time in millis to delete a
> ThreadsListHandle.
>    static volatile jint         _smr_tlh_time_max;
>                                 // Cumulative time in millis to delete
> ThreadsListHandles.
>    static volatile jint         _smr_tlh_times;
>    static ThreadsList*          _smr_to_delete_list;
>                                 // # of parallel ThreadsLists on the
> to-delete list.
>    static uint                  _smr_to_delete_list_cnt;
>                                 // Max # of parallel ThreadsLists on the
> to-delete list.
>    static uint                  _smr_to_delete_list_max;
>
>
> I've also gone through all the new '_cnt', '_max' and '_times" fields
> in thread.cpp and added an impl note to explain the choice of type:
>
> // Safe Memory Reclamation (SMR) support:
> Monitor*              Threads::_smr_delete_lock =
>                            new Monitor(Monitor::special, "smr_delete_lock",
>                                        false /* allow_vm_block */,
> Monitor::_safepoint_check_never);
> // The '_cnt', '_max' and '_times" fields are enabled via
> // -XX:+EnableThreadSMRStatistics:
>                        // # of parallel threads in
> _smr_delete_lock->wait().
>                        // Impl note: Hard to imagine > 64K waiting
> threads so
>                        // this could be 16-bit, but there is no nice 16-bit
>                        // _FORMAT support.
> uint                  Threads::_smr_delete_lock_wait_cnt = 0;
>                        // Max # of parallel threads in
> _smr_delete_lock->wait().
>                        // Impl note: See _smr_delete_lock_wait_cnt note.
> uint                  Threads::_smr_delete_lock_wait_max = 0;
>                        // Flag to indicate when an
> _smr_delete_lock->notify() is needed.
>                        // Impl note: See _smr_delete_lock_wait_cnt note.
> volatile uint         Threads::_smr_delete_notify = 0;
>                        // # of threads deleted over VM lifetime.
>                        // Impl note: Atomically incremented over VM
> lifetime so
>                        // use unsigned for more range. Unsigned 64-bit
> would
>                        // be more future proof, but 64-bit atomic inc isn't
>                        // available everywhere (or is it?).
> volatile uint         Threads::_smr_deleted_thread_cnt = 0;
>                        // Max time in millis to delete a thread.
>                        // Impl note: 16-bit might be too small on an
> overloaded
>                        // machine. Use unsigned since this is a time
> value. Set
>                        // via Atomic::cmpxchg() in a loop for correctness.
> volatile uint         Threads::_smr_deleted_thread_time_max = 0;
>                        // Cumulative time in millis to delete threads.
>                        // Impl note: Atomically added to over VM
> lifetime so use
>                        // unsigned for more range. Unsigned 64-bit would
> be more
>                        // future proof, but 64-bit atomic inc isn't
> available
>                        // everywhere (or is it?).
> volatile uint         Threads::_smr_deleted_thread_times = 0;
> ThreadsList* volatile Threads::_smr_java_thread_list = new ThreadsList(0);
>                        // # of ThreadsLists allocated over VM lifetime.
>                        // Impl note: We allocate a new ThreadsList for
> every
>                        // thread create and every thread delete so we
> need a
>                        // bigger type than the _smr_deleted_thread_cnt
> field.
> uint64_t              Threads::_smr_java_thread_list_alloc_cnt = 1;
>                        // # of ThreadsLists freed over VM lifetime.
>                        // Impl note: See _smr_java_thread_list_alloc_cnt
> note.
> uint64_t              Threads::_smr_java_thread_list_free_cnt = 0;
>                        // Max size ThreadsList allocated.
>                        // Impl note: Max # of threads alive at one time
> should
>                        // fit in unsigned 32-bit.
> uint                  Threads::_smr_java_thread_list_max = 0;
>                        // Max # of nested ThreadsLists for a thread.
>                        // Impl note: Hard to imagine > 64K nested
> ThreadsLists
>                        // so his could be 16-bit, but there is no nice
> 16-bit
>                        // _FORMAT support.
> uint                  Threads::_smr_nested_thread_list_max = 0;
>                        // # of ThreadsListHandles deleted over VM lifetime.
>                        // Impl note: Atomically incremented over VM
> lifetime so
>                        // use unsigned for more range. There will be fewer
>                        // ThreadsListHandles than threads so unsigned
> 32-bit
>                        // should be fine.
> volatile uint         Threads::_smr_tlh_cnt = 0;
>                        // Max time in millis to delete a ThreadsListHandle.
>                        // Impl note: 16-bit might be too small on an
> overloaded
>                        // machine. Use unsigned since this is a time
> value. Set
>                        // via Atomic::cmpxchg() in a loop for correctness.
> volatile uint         Threads::_smr_tlh_time_max = 0;
>                        // Cumulative time in millis to delete
> ThreadsListHandles.
>                        // Impl note: Atomically added to over VM
> lifetime so use
>                        // unsigned for more range. Unsigned 64-bit would
> be more
>                        // future proof, but 64-bit atomic inc isn't
> available
>                        // everywhere (or is it?).
> volatile uint         Threads::_smr_tlh_times = 0;
> ThreadsList*          Threads::_smr_to_delete_list = NULL;
>                        // # of parallel ThreadsLists on the to-delete list.
>                        // Impl note: Hard to imagine > 64K ThreadsLists
> needing
>                        // to be deleted so this could be 16-bit, but
> there is no
>                        // nice 16-bit _FORMAT support.
> uint                  Threads::_smr_to_delete_list_cnt = 0;
>                        // Max # of parallel ThreadsLists on the
> to-delete list.
>                        // Impl note: See _smr_to_delete_list_cnt note.
> uint                  Threads::_smr_to_delete_list_max = 0;
>
>
> Yikes! With the additional comments, the additions to thread.hpp and
> thread.cpp grew by a bunch... I've done a test build build on my Mac
> with these changes and I'm about to kick off a Mach5 tier1 job...
>
> Dan
>
>
>
>>
>> ----
>>
>> On 19/11/2017 11:49 AM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Testing of the last round of changes revealed a hang in one of the new
>>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
>>> added another TLH test for good measure.
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>>
>>> Here is the updated delta webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>>
>>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>>> no unexpected failures.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Robbin rebased the project last night/this morning to merge with Thread
>>>> Local Handshakes (TLH) and also picked up additional changesets up
>>>> thru:
>>>>
>>>>> Changeset: fa736014cf28
>>>>> Author:    cjplummer
>>>>> Date:      2017-11-14 18:08 -0800
>>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>>
>>>>> 8191049: Add alternate version of pns() that is callable from
>>>>> within hotspot source
>>>>> Summary: added pns2() to debug.cpp
>>>>> Reviewed-by: stuefe, gthornbr
>>>>
>>>> This is the first time we've rebased the project to bits that are this
>>>> fresh (< 12 hours old at rebase time). We've done this because we think
>>>> we're done with this project and are in the final review-change-retest
>>>> cycle(s)... In other words, we're not planning on making any more major
>>>> changes for this project... :-)
>>>>
>>>> *** Time for code reviewers to chime in on this thread! ***
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>>
>>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>>
>>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>>> (and the new baseline also)...
>>>>
>>>> We're expecting this round to be a little noisier than usual because
>>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>>> (Yes, we're playing chase-the-repo...)
>>>>>
>>>>> Here is the updated full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>>
>>>>> Unlike the previous rebase, there were no changes required to the
>>>>> open code to get the rebased bits to build so there is no delta
>>>>> webrev.
>>>>>
>>>>> These bits have been run through JPRT and I've submitted the usual
>>>>> Mach5 tier[1-5] test run...
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>>
>>>>>> Here are the updated webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>>
>>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>>> holiday
>>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>>> closer
>>>>>> look today.
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Resolving one of the code review comments (from both Stefan K and
>>>>>>> Coleen)
>>>>>>> on jdk10-04-full required quite a few changes so it is being done
>>>>>>> as a
>>>>>>> standalone patch and corresponding webrevs:
>>>>>>>
>>>>>>> Here's the mq comment for the change:
>>>>>>>
>>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to
>>>>>>> use
>>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>>
>>>>>>> And here is the delta webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Dan, Erik and Robbin
>>>>>>>
>>>>>>>
>>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>>
>>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>>
>>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>>
>>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>>> describe
>>>>>>>> and track the work on this project:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>>
>>>>>>>>
>>>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>>>> quotes
>>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>>> have a
>>>>>>>> solution for that problem yet.
>>>>>>>>
>>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>>
>>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>>> tier[2-5]
>>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>>
>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>
>>>>>>>> Daniel Daugherty
>>>>>>>> Erik Osterlund
>>>>>>>> Robbin Ehn
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
On 11/20/17 9:13 PM, David Holmes wrote:
> Hi Dan,
>
> Just to be clear my comment about use of jint's was not about expected
> size but the fact you were using a j-type instead of a C++ type when
> the field is not a Java field. (Coleen has been on a crusade to remove
> j-types from where they don't belong - and they were removed from the
> Atomic API as part of that recent templatization effort.)

Thanks for making that clear. I think I've gotten rid of all
the jint's at this point...


> No further comments. :)

Thanks. I'll be sending out more webrevs when I get through all
the code review comments...

Dan


>
> Thanks,
> David
>
> On 21/11/2017 11:50 AM, Daniel D. Daugherty wrote:
>> On 11/20/17 12:51 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Figured I'd better cast an eye over this again before it gets pushed :)
>>
>> Thanks for reviewing again!!
>>
>>
>>> Only one thing (repeated many times) jumped out at me and apologies
>>> if this has already been debated back and forth. I missed the
>>> exchange where the for loop iteration was rewritten into this
>>> unusual form:
>>>
>>> for (JavaThreadIteratorWithHandle jtiwh; JavaThread *jt =
>>> jtiwh.next(); ) {
>>>
>>> On first reading I expect most people would mistake the control
>>> expression for the iteration-expression due to the next(). I didn't
>>> even know the control expression could introduce a local variable! I
>>> find this really hard to read stylistically and far too cute/clever
>>> for its own good. It also violates the style rules regarding
>>> implicit boolean checks.
>>>
>>> I know Stefan proposed this as he did not like the alternate (in a
>>> few places):
>>>
>>> +  {
>>> +    ThreadsListHandle tlh;
>>> +    JavaThreadIterator jti(tlh.list());
>>> +    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
>>> ...
>>> +  }
>>>
>>> but it seems to me the iterator code has evolved since then and only
>>> a couple of places need a distinct TLH separate from the actual
>>> iterator object. So I'm more in favour of the more classic for-loop,
>>> with the iterator declared before the loop. Or even convert to while
>>> loops, as this iterator seems more suited to that.
>>
>> Actually both Coleen and Stefan had a problem with how much additional
>> code was added to support an iterator model. In some cases we went from
>> 1-line to 3-lines and in some cases we went from 1-line to 5-lines. For
>> Coleen, she wanted 2 of the new lines compressed into 1 new line by
>> using
>> an iterator with an embedded ThreadsListHandle. Stefan wanted us to try
>> and get back to 1-line where possible.
>>
>> The advantages of the new style are:
>>
>> - 1-line to 1-line in all but a few cases
>> - automatic limited scope of the embedded ThreadsListHandle so we were
>>    able to remove extra braces that were added earlier in most cases
>>
>> The disadvantages of the new style are:
>>
>> - it is an unusual for-loop style (in HotSpot)
>> - it breaks HotSpot's style rule about implied booleans
>>
>>
>> Stefan K is pretty adamant that the original iterator version where we
>> go from 1-line to 3-lines or 1-line to 5-lines is not acceptable for GC
>> code. The compiler guys haven't chimed in on this debate so I'm guessing
>> they don't have a strong opinion either way. One thing that we don't
>> want
>> to do is have different styles for the different teams.
>>
>> So I had to make a judgement call on whether I thought Runtime could
>> live
>> with Stefan's idea. Originally I wasn't fond of it either and
>> breaking the
>> implied boolean style rule is a problem for me (I post that comment a
>> lot
>> in my code reviews). However, I have to say that I'm a lot happier about
>> the compactness of the code and the fact that limited ThreadsListHandle
>> scope comes for 'free' in most cases.
>>
>> We're planning to keep the new iterator style for the following reasons:
>>
>> - smaller change footprint
>> - consistent style used for all of the teams
>> - limited ThreadsListHandle scope comes for free in most cases
>>
>> We're hoping that you can live with this decision (for now) and maybe
>> even grow to like it in the future. :-)
>>
>>
>>> Other than that just a couple of comments on variable type choices,
>>> and a few typos spotted below.
>>
>> Replies embedded below.
>>
>>
>>>
>>> Thanks,
>>> David
>>> ---
>>>
>>> src/hotspot/share/runtime/globals.hpp
>>>
>>> 2476   diagnostic(bool, EnableThreadSMRExtraValidityChecks, true,
>>>         \
>>> 2477           "Enable Thread SMR extra validity checks") \
>>> 2478         \
>>> 2479   diagnostic(bool, EnableThreadSMRStatistics, true,         \
>>> 2480           "Enable Thread SMR Statistics")         \
>>>
>>> Indent of strings is off by  3 spaces.
>>
>> Fixed.
>>
>>
>>> ---
>>>
>>> src/hotspot/share/runtime/handshake.cpp
>>>
>>>  140       // There is assumption in code that Threads_lock should
>>> be lock
>>>  200       // There is assumption in code that Threads_lock should
>>> be lock
>>>
>>> lock -> locked
>>
>> Fixed.
>>
>>
>>> 146         // handshake_process_by_vmthread will check the
>>> semaphore for us again
>>>
>>> Needs period at end.
>>
>> Fixed.
>>
>>
>>> 148         // should be okey since the new thread will not have an
>>> operation.
>>>
>>> okey -> okay
>>
>> Fixed.
>>
>>
>>> 151         // We can't warn here is since the thread do
>>> cancel_handshake after it have been removed
>>>
>>> I think:
>>>
>>> // We can't warn here since the thread does cancel_handshake after
>>> it has been removed
>>
>> Fixed.
>>
>>
>>> 152         // from ThreadsList. So we should just keep looping here
>>> until while below return negative.
>>>
>>> from -> from the
>>>
>>> Needs period at end.
>>
>> Fixed both.
>>
>>
>>>
>>>  204             // A new thread on the ThreadsList will not have an
>>> operation.
>>>
>>> Change period to comma, and ...
>>
>> Fixed.
>>
>>
>>> 205             // Hence is skipped in handshake_process_by_vmthread.
>>>
>>> Hence is -> hence it is
>>
>> Fixed.
>>
>>
>>> ---
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>>
>>> 1482     // dcubed - Looks like the Threads_lock is for stable access
>>> 1483     // to _jvmci_old_thread_counters and _jvmci_counters.
>>>
>>> Does this comment need revisiting?
>>
>> We've been trying to decide what to do about this comment. We'll be
>> filing a follow up bug for the Compiler team to take another look at
>> how _jvmci_old_thread_counters and _jvmci_counters are protected.
>> Threads_lock is a big hammer and there should be a less expensive
>> solution. Once that bug gets filed, this comment can go away.
>>
>>
>>> 3451 volatile jint ...
>>>
>>> Why are you using jint for field types here? (Surprised Coleen
>>> hasn't spotted them! ;-) ).
>>
>> volatile jint         Threads::_smr_delete_notify = 0;
>> volatile jint         Threads::_smr_deleted_thread_cnt = 0;
>> volatile jint         Threads::_smr_deleted_thread_time_max = 0;
>> volatile jint         Threads::_smr_deleted_thread_times = 0;
>> :
>> volatile jint         Threads::_smr_tlh_cnt = 0;
>> volatile jint         Threads::_smr_tlh_time_max = 0;
>> volatile jint         Threads::_smr_tlh_times = 0;
>>
>> _smr_delete_notify is a "flag" that indicates when an _smr_delete_lock
>> notify() operation is needed. It counts up and down and should be a
>> fairly
>> small number.
>>
>> _smr_deleted_thread_cnt counts the number of Threads that have been
>> deleted over the life of the VM. It's an always increasing number so
>> it's size depends on how long the VM has been running.
>>
>> _smr_deleted_thread_time_max is the maximum time in millis it has
>> taken to delete a thread. This field was originally a jlong, but
>> IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
>> just been Atomic::add() that was not happy)
>>
>> _smr_deleted_thread_times accumulates the time in millis that it
>> has taken to delete threads. It's an always increasing number so
>> it's size depends on how long the VM has been running. This field
>> was originally a jlong, but IIRC the Atomic::add() code was not
>> happy on ARM... (might have just been Atomic::cmpxchg() that was
>> not happy)
>>
>> _smr_tlh_cnt counts the number of ThreadsListHandles that have been
>> deleted over the life of the VM. It's an always increasing number so
>> it's size depends on how long the VM has been running.
>>
>> _smr_tlh_time_max is the maximum time in millis it has taken to
>> delete a ThreadsListHandle. This field was originally a jlong, but
>> IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
>> just been Atomic::add() that was not happy)
>>
>> _smr_tlh_times  accumulates the time in millis that it has taken to
>> delete ThreadsListHandles. It's an always increasing number so
>> it's size depends on how long the VM has been running.  This field
>> was originally a jlong, but IIRC the Atomic::add() code was not
>> happy on ARM... (might have just been Atomic::cmpxchg() that was
>> not happy)
>>
>>
>> It just dawned on me that I'm not sure whether you think the 'jint'
>> fields should be larger or smaller or the same size but a different
>> type... :-)
>>
>> The fact that I had to write so much to explain what these fields
>> are for and how they're used indicates I need more comments. See
>> below...
>>
>>
>>> 3456 long Threads::_smr_java_thread_list_alloc_cnt = 1;
>>> 3457 long Threads::_smr_java_thread_list_free_cnt = 0;
>>>
>>> long? If you really want 64-bit counters here you won't get them on
>>> Windows with that declaration. If you really want variable sized
>>> counters I suggest uintptr_t; otherwise uint64_t.
>>
>> long                  Threads::_smr_java_thread_list_alloc_cnt = 1;
>> long                  Threads::_smr_java_thread_list_free_cnt = 0;
>>
>> _smr_java_thread_list_alloc_cnt counts the number of ThreadsLists that
>> have been allocated over the life of the VM. It's an always increasing
>> number so it's size depends on how long the VM has been running and how
>> many Threads have been created and destroyed.
>>
>> _smr_java_thread_list_free_cnt counts the number of ThreadsLists that
>> have been freed over the life of the VM. It's an always increasing
>> number so it's size depends on how long the VM has been running and how
>> many Threads have been created and destroyed.
>>
>> I can't remember why we chose 'long' and I agree it's not a good choice
>> for Win*.
>>
>>
>> Okay so it dawns on me that we haven't written down a description for
>> the various new '_cnt', '_max' and '_times" fields so I'm adding these
>> comments to thread.hpp:
>>
>>   private:
>>    // Safe Memory Reclamation (SMR) support:
>>    static Monitor*              _smr_delete_lock;
>>    // The '_cnt', '_max' and '_times" fields are enabled via
>>    // -XX:+EnableThreadSMRStatistics:
>>                                 // # of parallel threads in
>> _smr_delete_lock->wait().
>>    static uint                  _smr_delete_lock_wait_cnt;
>>                                 // Max # of parallel threads in
>> _smr_delete_lock->wait().
>>    static uint                  _smr_delete_lock_wait_max;
>>                                 // Flag to indicate when an
>> _smr_delete_lock->notify() is needed.
>>    static volatile uint         _smr_delete_notify;
>>                                 // # of threads deleted over VM
>> lifetime.
>>    static volatile uint         _smr_deleted_thread_cnt;
>>                                 // Max time in millis to delete a
>> thread.
>>    static volatile uint         _smr_deleted_thread_time_max;
>>                                 // Cumulative time in millis to
>> delete threads.
>>    static volatile uint         _smr_deleted_thread_times;
>>    static ThreadsList* volatile _smr_java_thread_list;
>>    static ThreadsList*          get_smr_java_thread_list();
>>    static ThreadsList* xchg_smr_java_thread_list(ThreadsList* new_list);
>>                                 // # of ThreadsLists allocated over
>> VM lifetime.
>>    static uint64_t              _smr_java_thread_list_alloc_cnt;
>>                                 // # of ThreadsLists freed over VM
>> lifetime.
>>    static uint64_t              _smr_java_thread_list_free_cnt;
>>                                 // Max size ThreadsList allocated.
>>    static uint                  _smr_java_thread_list_max;
>>                                 // Max # of nested ThreadsLists for a
>> thread.
>>    static uint                  _smr_nested_thread_list_max;
>>                                 // # of ThreadsListHandles deleted
>> over VM lifetime.
>>    static volatile uint         _smr_tlh_cnt;
>>                                 // Max time in millis to delete a
>> ThreadsListHandle.
>>    static volatile jint         _smr_tlh_time_max;
>>                                 // Cumulative time in millis to
>> delete ThreadsListHandles.
>>    static volatile jint         _smr_tlh_times;
>>    static ThreadsList*          _smr_to_delete_list;
>>                                 // # of parallel ThreadsLists on the
>> to-delete list.
>>    static uint                  _smr_to_delete_list_cnt;
>>                                 // Max # of parallel ThreadsLists on
>> the to-delete list.
>>    static uint                  _smr_to_delete_list_max;
>>
>>
>> I've also gone through all the new '_cnt', '_max' and '_times" fields
>> in thread.cpp and added an impl note to explain the choice of type:
>>
>> // Safe Memory Reclamation (SMR) support:
>> Monitor*              Threads::_smr_delete_lock =
>>                            new Monitor(Monitor::special,
>> "smr_delete_lock",
>>                                        false /* allow_vm_block */,
>> Monitor::_safepoint_check_never);
>> // The '_cnt', '_max' and '_times" fields are enabled via
>> // -XX:+EnableThreadSMRStatistics:
>>                        // # of parallel threads in
>> _smr_delete_lock->wait().
>>                        // Impl note: Hard to imagine > 64K waiting
>> threads so
>>                        // this could be 16-bit, but there is no nice
>> 16-bit
>>                        // _FORMAT support.
>> uint                  Threads::_smr_delete_lock_wait_cnt = 0;
>>                        // Max # of parallel threads in
>> _smr_delete_lock->wait().
>>                        // Impl note: See _smr_delete_lock_wait_cnt note.
>> uint                  Threads::_smr_delete_lock_wait_max = 0;
>>                        // Flag to indicate when an
>> _smr_delete_lock->notify() is needed.
>>                        // Impl note: See _smr_delete_lock_wait_cnt note.
>> volatile uint         Threads::_smr_delete_notify = 0;
>>                        // # of threads deleted over VM lifetime.
>>                        // Impl note: Atomically incremented over VM
>> lifetime so
>>                        // use unsigned for more range. Unsigned
>> 64-bit would
>>                        // be more future proof, but 64-bit atomic inc
>> isn't
>>                        // available everywhere (or is it?).
>> volatile uint         Threads::_smr_deleted_thread_cnt = 0;
>>                        // Max time in millis to delete a thread.
>>                        // Impl note: 16-bit might be too small on an
>> overloaded
>>                        // machine. Use unsigned since this is a time
>> value. Set
>>                        // via Atomic::cmpxchg() in a loop for
>> correctness.
>> volatile uint         Threads::_smr_deleted_thread_time_max = 0;
>>                        // Cumulative time in millis to delete threads.
>>                        // Impl note: Atomically added to over VM
>> lifetime so use
>>                        // unsigned for more range. Unsigned 64-bit
>> would be more
>>                        // future proof, but 64-bit atomic inc isn't
>> available
>>                        // everywhere (or is it?).
>> volatile uint         Threads::_smr_deleted_thread_times = 0;
>> ThreadsList* volatile Threads::_smr_java_thread_list = new
>> ThreadsList(0);
>>                        // # of ThreadsLists allocated over VM lifetime.
>>                        // Impl note: We allocate a new ThreadsList
>> for every
>>                        // thread create and every thread delete so we
>> need a
>>                        // bigger type than the
>> _smr_deleted_thread_cnt field.
>> uint64_t              Threads::_smr_java_thread_list_alloc_cnt = 1;
>>                        // # of ThreadsLists freed over VM lifetime.
>>                        // Impl note: See
>> _smr_java_thread_list_alloc_cnt note.
>> uint64_t              Threads::_smr_java_thread_list_free_cnt = 0;
>>                        // Max size ThreadsList allocated.
>>                        // Impl note: Max # of threads alive at one
>> time should
>>                        // fit in unsigned 32-bit.
>> uint                  Threads::_smr_java_thread_list_max = 0;
>>                        // Max # of nested ThreadsLists for a thread.
>>                        // Impl note: Hard to imagine > 64K nested
>> ThreadsLists
>>                        // so his could be 16-bit, but there is no
>> nice 16-bit
>>                        // _FORMAT support.
>> uint                  Threads::_smr_nested_thread_list_max = 0;
>>                        // # of ThreadsListHandles deleted over VM
>> lifetime.
>>                        // Impl note: Atomically incremented over VM
>> lifetime so
>>                        // use unsigned for more range. There will be
>> fewer
>>                        // ThreadsListHandles than threads so unsigned
>> 32-bit
>>                        // should be fine.
>> volatile uint         Threads::_smr_tlh_cnt = 0;
>>                        // Max time in millis to delete a
>> ThreadsListHandle.
>>                        // Impl note: 16-bit might be too small on an
>> overloaded
>>                        // machine. Use unsigned since this is a time
>> value. Set
>>                        // via Atomic::cmpxchg() in a loop for
>> correctness.
>> volatile uint         Threads::_smr_tlh_time_max = 0;
>>                        // Cumulative time in millis to delete
>> ThreadsListHandles.
>>                        // Impl note: Atomically added to over VM
>> lifetime so use
>>                        // unsigned for more range. Unsigned 64-bit
>> would be more
>>                        // future proof, but 64-bit atomic inc isn't
>> available
>>                        // everywhere (or is it?).
>> volatile uint         Threads::_smr_tlh_times = 0;
>> ThreadsList*          Threads::_smr_to_delete_list = NULL;
>>                        // # of parallel ThreadsLists on the to-delete
>> list.
>>                        // Impl note: Hard to imagine > 64K
>> ThreadsLists needing
>>                        // to be deleted so this could be 16-bit, but
>> there is no
>>                        // nice 16-bit _FORMAT support.
>> uint                  Threads::_smr_to_delete_list_cnt = 0;
>>                        // Max # of parallel ThreadsLists on the
>> to-delete list.
>>                        // Impl note: See _smr_to_delete_list_cnt note.
>> uint                  Threads::_smr_to_delete_list_max = 0;
>>
>>
>> Yikes! With the additional comments, the additions to thread.hpp and
>> thread.cpp grew by a bunch... I've done a test build build on my Mac
>> with these changes and I'm about to kick off a Mach5 tier1 job...
>>
>> Dan
>>
>>
>>
>>>
>>> ----
>>>
>>> On 19/11/2017 11:49 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Testing of the last round of changes revealed a hang in one of the new
>>>> TLH tests. Robbin has fixed the hang, updated the existing TLH
>>>> test, and
>>>> added another TLH test for good measure.
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>>>
>>>> Here is the updated delta webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>>>
>>>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>>>> no unexpected failures.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Robbin rebased the project last night/this morning to merge with
>>>>> Thread
>>>>> Local Handshakes (TLH) and also picked up additional changesets up
>>>>> thru:
>>>>>
>>>>>> Changeset: fa736014cf28
>>>>>> Author:    cjplummer
>>>>>> Date:      2017-11-14 18:08 -0800
>>>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>>>
>>>>>> 8191049: Add alternate version of pns() that is callable from
>>>>>> within hotspot source
>>>>>> Summary: added pns2() to debug.cpp
>>>>>> Reviewed-by: stuefe, gthornbr
>>>>>
>>>>> This is the first time we've rebased the project to bits that are
>>>>> this
>>>>> fresh (< 12 hours old at rebase time). We've done this because we
>>>>> think
>>>>> we're done with this project and are in the final
>>>>> review-change-retest
>>>>> cycle(s)... In other words, we're not planning on making any more
>>>>> major
>>>>> changes for this project... :-)
>>>>>
>>>>> *** Time for code reviewers to chime in on this thread! ***
>>>>>
>>>>> Here is the updated full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>>>
>>>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>>>
>>>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>>>> (and the new baseline also)...
>>>>>
>>>>> We're expecting this round to be a little noisier than usual because
>>>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>>>> (Yes, we're playing chase-the-repo...)
>>>>>>
>>>>>> Here is the updated full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>>>
>>>>>> Unlike the previous rebase, there were no changes required to the
>>>>>> open code to get the rebased bits to build so there is no delta
>>>>>> webrev.
>>>>>>
>>>>>> These bits have been run through JPRT and I've submitted the usual
>>>>>> Mach5 tier[1-5] test run...
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>>>
>>>>>>> Here are the updated webrevs:
>>>>>>>
>>>>>>> Here's the mq comment for the change:
>>>>>>>
>>>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>>>
>>>>>>> And here is the delta webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>>>
>>>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>>>> holiday
>>>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>>>> closer
>>>>>>> look today.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Dan, Erik and Robbin
>>>>>>>
>>>>>>>
>>>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> Resolving one of the code review comments (from both Stefan K
>>>>>>>> and Coleen)
>>>>>>>> on jdk10-04-full required quite a few changes so it is being
>>>>>>>> done as a
>>>>>>>> standalone patch and corresponding webrevs:
>>>>>>>>
>>>>>>>> Here's the mq comment for the change:
>>>>>>>>
>>>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage
>>>>>>>> to use
>>>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>>>
>>>>>>>> Here is the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>>>
>>>>>>>> And here is the delta webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>>>
>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>
>>>>>>>> Dan, Erik and Robbin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>>>
>>>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>>>
>>>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>>>
>>>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>>>> describe
>>>>>>>>> and track the work on this project:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dan has noticed that the indenting is wrong in some of the
>>>>>>>>> code quotes
>>>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>>>> have a
>>>>>>>>> solution for that problem yet.
>>>>>>>>>
>>>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>>>
>>>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>>>> tier[2-5]
>>>>>>>>> testing, additional stress testing on Dan's Solaris X64
>>>>>>>>> server, and
>>>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>>>
>>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>>
>>>>>>>>> Daniel Daugherty
>>>>>>>>> Erik Osterlund
>>>>>>>>> Robbin Ehn
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by Robin Westberg
Hi Robin W!

Welcome to the Thread-SMR party!!

On 11/20/17 4:48 AM, Robin Westberg wrote:
> Hi Dan,
>
> Overall I must say this looks very nice, can’t wait to use it! (Also a disclaimer: not a reviewer, and have not looked at the gc or jmvti specific changes nor the tests (yet)).

Code reviews are always welcome (OpenJDK role does not matter).


> Didn’t spot any real issues, just a few small efficiency related things:
>
> src/hotspot/share/runtime/biasedLocking.cpp
>
> 217     for (JavaThread* cur_thread = Threads::first(); cur_thread != NULL; cur_thread = cur_thread->next()) {
> 218       if (cur_thread == biased_thread) {
> 219         thread_is_alive = true;
>
> This loop could be replaced with:
>
> ThreadsListHandle tlh;
> thread_is_alive = tlh.includes(biased_thread);

Nice catch! Fixed with your proposed change.

The careful reader will notice that in revoke_bias() we are not
creating a ThreadsListHandle that is in scope for the entire
function and we are doing cached monitor info walks without an
obvious ThreadsListHandle in place.

revoke_bias() is called at a safepoint from most locations. The
one call that is not made at a safepoint is from
BiasedLocking::revoke_and_rebias() and it is made by a JavaThread
that is revoking a bias on itself which is also safe.

We should add an assertion to revoke_bias() that looks something
like this:

   assert(requesting_thread == Thread::current() ||
          SafepointSynchronize::is_at_safepoint(),
          "must be operating on self or at a safepoint.");

but we'll do that in a separate follow up bug since that will
require biased locking focused testing.


> src/hotspot/share/runtime/memprofiler.cpp
>
> 55-56 grabs the Threads_lock, shouldn’t be needed anymore.

Nice catch! Deleting lines 55-56.


> src/hotspot/share/runtime/thread.inline.hpp
>
> 198   TerminatedTypes l_terminated = (TerminatedTypes)
> 199       OrderAccess::load_acquire((volatile jint *) &_terminated);
> 200   return check_is_terminated(_terminated);
>
> The variable used in the return statement was intended to be l_terminated I guess?

Ouch! Looks like JavaThread::is_exiting() is right, but
JavaThread::is_terminated() has been inefficient all this
time. Fixed.


> The following are more minor suggestions / comments:
>
> src/hotspot/share/runtime/thread.cpp
>
> 3432 // operations over all threads.  It is protected by its own Mutex
> 3433 // lock, which is also used in other contexts to protect thread
>
> Should this comment perhaps be revised to mention SMR?

It definitely needs some updating... Here's a stab at it:

// The Threads class links together all active threads, and provides
// operations over all threads. It is protected by the Threads_lock,
// which is also used in other global contexts like safepointing.
// ThreadsListHandles are used to safely perform operations on one
// or more threads without the risk of the thread exiting during the
// operation.
//
// Note: The Threads_lock is currently more widely used than we
// would like. We are actively migrating Threads_lock uses to other
// mechanisms in order to reduce Threads_lock contention.

> 4745   hash_table_size--;
> 4746   hash_table_size |= hash_table_size >> 1;
> ...
>
> This calculation is repeated around line 4922 as well, perhaps put it in a function?

The hash_table_size parameter is currently unused. We were using
a different hash table before that allowed the size to be set.
Unfortunately, that hash table didn't support being freed so we
switched to ResourceHashtable.

We have a project note to come back and update the underlying
hash table to work with dynamic table sizes, but Erik hasn't
had the cycles to do it yet.


> 4828   // If someone set a handshake on us just as we entered exit path, we simple cancel it.
>
> Perhaps something like “.. entered the exit path, we simply cancel it.”

I went with this:

   if (ThreadLocalHandshakes) {
     // The thread is about to be deleted so cancel any handshake.
     thread->cancel_handshake();
   }


> src/hotspot/share/runtime/thread.hpp
>
> 1179   bool on_thread_list() { return _on_thread_list; }
> 1180   void set_on_thread_list() { _on_thread_list = true; }
> 1181
> 1182   // thread has called JavaThread::exit() or is terminated
> 1183   bool is_exiting() const;
> 1184   // thread is terminated (no longer on the threads list); we compare
> 1185   // against the two non-terminated values so that a freed JavaThread
> 1186   // will also be considered terminated.
> 1187   bool check_is_terminated(TerminatedTypes l_terminated) const {
> 1188     return l_terminated != _not_terminated && l_terminated != _thread_exiting;
> 1189   }
> 1190   bool is_terminated();
>
> Use of const is inconsistent here, it’s added to 1183, should it perhaps be added to 1179 and 1190 as well then?

Fixed.


> src/hotspot/share/runtime/vm_operations.hpp
>
> 406   DeadlockCycle* result()               { return _deadlocks; };
> 407   VMOp_Type type() const                { return VMOp_FindDeadlocks; }
>
> Whitespace only change that seems spurious.

Agreed. Restored to the original.

Thanks for the review!

Dan


>
> Best regards,
> Robin
>
>> On 19 Nov 2017, at 02:49, Daniel D. Daugherty <[hidden email]> wrote:
>>
>> Greetings,
>>
>> Testing of the last round of changes revealed a hang in one of the new
>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
>> added another TLH test for good measure.
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>
>> Here is the updated delta webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>
>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>> no unexpected failures.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Robbin rebased the project last night/this morning to merge with Thread
>>> Local Handshakes (TLH) and also picked up additional changesets up thru:
>>>
>>>> Changeset: fa736014cf28
>>>> Author:    cjplummer
>>>> Date:      2017-11-14 18:08 -0800
>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>
>>>> 8191049: Add alternate version of pns() that is callable from within hotspot source
>>>> Summary: added pns2() to debug.cpp
>>>> Reviewed-by: stuefe, gthornbr
>>> This is the first time we've rebased the project to bits that are this
>>> fresh (< 12 hours old at rebase time). We've done this because we think
>>> we're done with this project and are in the final review-change-retest
>>> cycle(s)... In other words, we're not planning on making any more major
>>> changes for this project... :-)
>>>
>>> *** Time for code reviewers to chime in on this thread! ***
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>
>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>
>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>> (and the new baseline also)...
>>>
>>> We're expecting this round to be a little noisier than usual because
>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>> (Yes, we're playing chase-the-repo...)
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>
>>>> Unlike the previous rebase, there were no changes required to the
>>>> open code to get the rebased bits to build so there is no delta
>>>> webrev.
>>>>
>>>> These bits have been run through JPRT and I've submitted the usual
>>>> Mach5 tier[1-5] test run...
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>
>>>>> Here are the updated webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>    Rebase to 2017.10.25 PIT snapshot.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>
>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the holiday
>>>>> weekend. Didn't see any issues in a quick look. Going to take a closer
>>>>> look today.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Resolving one of the code review comments (from both Stefan K and Coleen)
>>>>>> on jdk10-04-full required quite a few changes so it is being done as a
>>>>>> standalone patch and corresponding webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>    stefank, coleenp CR - refactor most JavaThreadIterator usage to use
>>>>>>      JavaThreadIteratorWithHandle.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>
>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>
>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>
>>>>>>> Here's a PDF for the internal wiki that we've been using to describe
>>>>>>> and track the work on this project:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf
>>>>>>>
>>>>>>> Dan has noticed that the indenting is wrong in some of the code quotes
>>>>>>> in the PDF that are not present in the internal wiki. We don't have a
>>>>>>> solution for that problem yet.
>>>>>>>
>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>
>>>>>>> This fix has been run through many rounds of JPRT and Mach5 tier[2-5]
>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Daniel Daugherty
>>>>>>> Erik Osterlund
>>>>>>> Robbin Ehn
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by Daniel D. Daugherty
Hi Coleen!

Thanks for making time to review the Thread-SMR stuff again!!

I have added back the other three OpenJDK aliases... This review is
being done on _four_ different OpenJDK aliases.

As always, replies are embedded below...


On 11/20/17 3:12 PM, [hidden email] wrote:

>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/os/linux/os_linux.cpp.frames.html 
>
>
> I read David's comments about the threads list iterator, and I was
> going to say that it can be cleaned up later, as the bulk of the
> change is the SMR part but this looks truly bizarre.   It looks like
> it shouldn't compile because 'jt' isn't in scope.
>
> Why isn't this the syntax:
>
> JavaThreadIteratorWithHandle jtiwh;
> for (JavaThread* jt = jtiwh.first(); jt != NULL; jt = jtiwh.next()) {
> }
>
> Which would do the obvious thing without anyone having to squint at
> the code.

See my reply to David's review for the more detailed answer.

For the above syntax, we would need braces to limit the scope of the
'jtiwh' variable. With Stefan's propsal, you get limited scope on
'jtiwh' for "free".


> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.hpp.html 
>
>
> As a hater of acronmys, can this file use "Safe Memory Reclaimation"

I'm not sure what you mean by this. Do you mean rename the files?

   threadSMR.hpp -> threadSafeMemoryReclaimation.hpp
   threadSMR.cpp -> threadSafeMemoryReclaimation.cpp


> and briefly describe the concept in the beginning of the header file,
> so one knows why it's called threadSMR.hpp?

And then this part of the sentence kind of indicates that you would be
okay with the threadSMR.{c,h}pp names if a comment was added to the
header file.

Please clarify.


> It doesn't need to be long and include why Threads list needs this
> Sometimes we tell new people that the hotspot documentation is in the
> header files.

Yup. When I migrated stuff from thread.hpp and thread.cpp to threadSMR.hpp
and threadSMR.cpp, I should have written a header comment...

I did update a comment in thread.cpp based on Robin W's code review:

> > src/hotspot/share/runtime/thread.cpp
> >
> > 3432 // operations over all threads.  It is protected by its own Mutex
> > 3433 // lock, which is also used in other contexts to protect thread
> >
> > Should this comment perhaps be revised to mention SMR?
>
> It definitely needs some updating... Here's a stab at it:
>
> // The Threads class links together all active threads, and provides
> // operations over all threads. It is protected by the Threads_lock,
> // which is also used in other global contexts like safepointing.
> // ThreadsListHandles are used to safely perform operations on one
> // or more threads without the risk of the thread exiting during the
> // operation.
> //
> // Note: The Threads_lock is currently more widely used than we
> // would like. We are actively migrating Threads_lock uses to other
> // mechanisms in order to reduce Threads_lock contention.

I'll take a look at adding a header comment to threadSMR.hpp.


> 186   JavaThreadIteratorWithHandle() : _tlh(), _index(0) {}
>
> This _tlh() call should not be necessary.  The compiler should
> generate this for you in the constructor.

Deleted.


> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.cpp.html 
>
>
>   32 ThreadsList::ThreadsList(int entries) : _length(entries),
> _threads(NEW_C_HEAP_ARRAY(JavaThread*, entries + 1, mtGC)),
> _next_list(NULL) {
>
> Seems like it should be mtThread rather than mtGC.

Fixed. Definitely an artifact of Erik's original prototype when he
extracted Thread-SMR from his GC work... Thanks for catching it.


> Should
>
>   62   if (EnableThreadSMRStatistics) {
>
> really be UL, ie: if (log_is_enabled(Info, thread, smr, statistics)) ?

EnableThreadSMRStatistics is used in more places than UL code.
We use it in Thread::print_*() stuff to control output of
Thread-SMR statistics info in thread dumps and hs_err_pid file
generation.

Currently thread dump and hs_err_pid file output is not generated
using UL (and probably can't be?) so we need an option to control
the Thread-SMR statistics stuff in all places.



> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java.html 
>
>
> Can you use for these tests instead (there were a couple):
>
> *@requires (vm.debug == true)*

The test I cloned had this in it:

     if (!Platform.isDebugBuild()) {
         // -XX:ErrorHandlerTest=N option requires debug bits.
         return;
     }

and you would like me to switch to the newer mechanism?

I have updated the following tests:

   test/hotspot/jtreg/runtime/ErrorHandling/ErrorHandler.java
test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java
test/hotspot/jtreg/runtime/ErrorHandling/ThreadsListHandleInErrorHandlingTest.java


> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/thread.cpp.udiff.html 
>
>
> +// thread, has been added the Threads list, the system is not at a
>
> Has "not" been added to the Threads list?  missing "not"?

Nope. If the JavaThread has been added to the Threads list
and it is not protected, then it is dangling. In other words,
a published JavaThread (on the Threads list) has to be protected
by either the system being at a safepoint or the JavaThread has
to be on some threads's ThreadsList.


>
> + return (unsigned int)(((uint32_t)(uintptr_t)s1) * 2654435761u);
>
> Can you add a comment about where this number came from?

I'll have to get that from Erik...


> I can't find the caller of threads_do_smr.

I'm guessing that's used by the GC code that depends on Thread-SMR...


> If these functions xchg_smr_thread_list, get_smr_java_thread_list,
> inc_smr_deleted_thread_count are only used by thread.cpp, I think they
> should go in that file and not in the .inline.hpp file to be included
> and possibly called by other files.  I think they're private to class
> Threads.

I have a vague memory that some of the compilers don't do inlining when
an "inline" function is in a .cpp. I believe we want these functions
to be inlined for performance reasons. Erik should probably chime in
here.


> I don't have any in-depth comments.  This looks really good from my
> day of reading it.

Thanks for taking the time to review it again!

Dan


>
> Thanks,
> Coleen
>
> On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:
>
>> Greetings,
>>
>> Testing of the last round of changes revealed a hang in one of the new
>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
>> added another TLH test for good measure.
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>
>> Here is the updated delta webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>
>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>> no unexpected failures.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Robbin rebased the project last night/this morning to merge with Thread
>>> Local Handshakes (TLH) and also picked up additional changesets up
>>> thru:
>>>
>>>> Changeset: fa736014cf28
>>>> Author:    cjplummer
>>>> Date:      2017-11-14 18:08 -0800
>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>
>>>> 8191049: Add alternate version of pns() that is callable from
>>>> within hotspot source
>>>> Summary: added pns2() to debug.cpp
>>>> Reviewed-by: stuefe, gthornbr
>>>
>>> This is the first time we've rebased the project to bits that are this
>>> fresh (< 12 hours old at rebase time). We've done this because we think
>>> we're done with this project and are in the final review-change-retest
>>> cycle(s)... In other words, we're not planning on making any more major
>>> changes for this project... :-)
>>>
>>> *** Time for code reviewers to chime in on this thread! ***
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>
>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>
>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>> (and the new baseline also)...
>>>
>>> We're expecting this round to be a little noisier than usual because
>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>> (Yes, we're playing chase-the-repo...)
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>
>>>> Unlike the previous rebase, there were no changes required to the
>>>> open code to get the rebased bits to build so there is no delta
>>>> webrev.
>>>>
>>>> These bits have been run through JPRT and I've submitted the usual
>>>> Mach5 tier[1-5] test run...
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>
>>>>> Here are the updated webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>
>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>> holiday
>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>> closer
>>>>> look today.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Resolving one of the code review comments (from both Stefan K and
>>>>>> Coleen)
>>>>>> on jdk10-04-full required quite a few changes so it is being done
>>>>>> as a
>>>>>> standalone patch and corresponding webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to
>>>>>> use
>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>
>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>
>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>
>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>> describe
>>>>>>> and track the work on this project:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>
>>>>>>>
>>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>>> quotes
>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>> have a
>>>>>>> solution for that problem yet.
>>>>>>>
>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>
>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>> tier[2-5]
>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Daniel Daugherty
>>>>>>> Erik Osterlund
>>>>>>> Robbin Ehn
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

coleen.phillimore


On 11/21/17 11:28 AM, Daniel D. Daugherty wrote:

> Hi Coleen!
>
> Thanks for making time to review the Thread-SMR stuff again!!
>
> I have added back the other three OpenJDK aliases... This review is
> being done on _four_ different OpenJDK aliases.
>
> As always, replies are embedded below...
>
>
> On 11/20/17 3:12 PM, [hidden email] wrote:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/os/linux/os_linux.cpp.frames.html 
>>
>>
>> I read David's comments about the threads list iterator, and I was
>> going to say that it can be cleaned up later, as the bulk of the
>> change is the SMR part but this looks truly bizarre.   It looks like
>> it shouldn't compile because 'jt' isn't in scope.
>>
>> Why isn't this the syntax:
>>
>> JavaThreadIteratorWithHandle jtiwh;
>> for (JavaThread* jt = jtiwh.first(); jt != NULL; jt = jtiwh.next()) {
>> }
>>
>> Which would do the obvious thing without anyone having to squint at
>> the code.
>
> See my reply to David's review for the more detailed answer.
>
> For the above syntax, we would need braces to limit the scope of the
> 'jtiwh' variable. With Stefan's propsal, you get limited scope on
> 'jtiwh' for "free".

Yes, thank you, I see that motivation now.

>
>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.hpp.html 
>>
>>
>> As a hater of acronmys, can this file use "Safe Memory Reclaimation"
>
> I'm not sure what you mean by this. Do you mean rename the files?
>
>   threadSMR.hpp -> threadSafeMemoryReclaimation.hpp
>   threadSMR.cpp -> threadSafeMemoryReclaimation.cpp
>

No.

>
>> and briefly describe the concept in the beginning of the header file,
>> so one knows why it's called threadSMR.hpp?
>
> And then this part of the sentence kind of indicates that you would be
> okay with the threadSMR.{c,h}pp names if a comment was added to the
> header file.
>
> Please clarify.

Yes.  If you added a comment to the beginning of the threadSMR.hpp file
that said what SMR stood for and what it was, that would be extremely
helpful for future viewers of this file.

>
>
>> It doesn't need to be long and include why Threads list needs this
>> Sometimes we tell new people that the hotspot documentation is in the
>> header files.
>
> Yup. When I migrated stuff from thread.hpp and thread.cpp to
> threadSMR.hpp
> and threadSMR.cpp, I should have written a header comment...
>
> I did update a comment in thread.cpp based on Robin W's code review:

Yes, I think a comment belongs in the SMR file also, or in preference.

>
>> > src/hotspot/share/runtime/thread.cpp
>> >
>> > 3432 // operations over all threads.  It is protected by its own Mutex
>> > 3433 // lock, which is also used in other contexts to protect thread
>> >
>> > Should this comment perhaps be revised to mention SMR?
>>
>> It definitely needs some updating... Here's a stab at it:
>>
>> // The Threads class links together all active threads, and provides
>> // operations over all threads. It is protected by the Threads_lock,
>> // which is also used in other global contexts like safepointing.
>> // ThreadsListHandles are used to safely perform operations on one
>> // or more threads without the risk of the thread exiting during the
>> // operation.
>> //
>> // Note: The Threads_lock is currently more widely used than we
>> // would like. We are actively migrating Threads_lock uses to other
>> // mechanisms in order to reduce Threads_lock contention.
>
> I'll take a look at adding a header comment to threadSMR.hpp.
>
>
>> 186   JavaThreadIteratorWithHandle() : _tlh(), _index(0) {}
>>
>> This _tlh() call should not be necessary.  The compiler should
>> generate this for you in the constructor.
>
> Deleted.
>
>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.cpp.html 
>>
>>
>>   32 ThreadsList::ThreadsList(int entries) : _length(entries),
>> _threads(NEW_C_HEAP_ARRAY(JavaThread*, entries + 1, mtGC)),
>> _next_list(NULL) {
>>
>> Seems like it should be mtThread rather than mtGC.
>
> Fixed. Definitely an artifact of Erik's original prototype when he
> extracted Thread-SMR from his GC work... Thanks for catching it.
>
>
>> Should
>>
>>   62   if (EnableThreadSMRStatistics) {
>>
>> really be UL, ie: if (log_is_enabled(Info, thread, smr, statistics)) ?
>
> EnableThreadSMRStatistics is used in more places than UL code.
> We use it in Thread::print_*() stuff to control output of
> Thread-SMR statistics info in thread dumps and hs_err_pid file
> generation.

That sort of thing is also controlled by logging in general.
>
> Currently thread dump and hs_err_pid file output is not generated
> using UL (and probably can't be?) so we need an option to control
> the Thread-SMR statistics stuff in all places.
>

You can use log options in preference to this new diagnostic option in
these cases too.   This option looks a lot like logging to me and would
be nice to not have to later convert it.

>
>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java.html 
>>
>>
>> Can you use for these tests instead (there were a couple):
>>
>> *@requires (vm.debug == true)*
>
> The test I cloned had this in it:
>
>     if (!Platform.isDebugBuild()) {
>         // -XX:ErrorHandlerTest=N option requires debug bits.
>         return;
>     }
>
> and you would like me to switch to the newer mechanism?
>
> I have updated the following tests:
>
>   test/hotspot/jtreg/runtime/ErrorHandling/ErrorHandler.java
> test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java
>
> test/hotspot/jtreg/runtime/ErrorHandling/ThreadsListHandleInErrorHandlingTest.java
>
>
>
Yes, the newer mechanism makes jtreg not bother to run the test, which
makes it faster!

>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/thread.cpp.udiff.html 
>>
>>
>> +// thread, has been added the Threads list, the system is not at a
>>
>> Has "not" been added to the Threads list?  missing "not"?
>
> Nope. If the JavaThread has been added to the Threads list
> and it is not protected, then it is dangling. In other words,
> a published JavaThread (on the Threads list) has to be protected
> by either the system being at a safepoint or the JavaThread has
> to be on some threads's ThreadsList.
>
>
>>
>> + return (unsigned int)(((uint32_t)(uintptr_t)s1) * 2654435761u);
>>
>> Can you add a comment about where this number came from?
>
> I'll have to get that from Erik...
>
>
>> I can't find the caller of threads_do_smr.
>
> I'm guessing that's used by the GC code that depends on Thread-SMR...
>

They  should add this API when the add the use of it then.  I don't see
it in any sources.

>
>> If these functions xchg_smr_thread_list, get_smr_java_thread_list,
>> inc_smr_deleted_thread_count are only used by thread.cpp, I think
>> they should go in that file and not in the .inline.hpp file to be
>> included and possibly called by other files.  I think they're private
>> to class Threads.
>
> I have a vague memory that some of the compilers don't do inlining when
> an "inline" function is in a .cpp. I believe we want these functions
> to be inlined for performance reasons. Erik should probably chime in
> here.

I don't see why this should be.  Maybe some (older) compilers require it
to be found before the call though, but that can still be accomplished
in the .cpp file.
>
>
>> I don't have any in-depth comments.  This looks really good from my
>> day of reading it.
>
> Thanks for taking the time to review it again!
>

Thanks,
Coleen

> Dan
>
>
>>
>> Thanks,
>> Coleen
>>
>> On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:
>>
>>> Greetings,
>>>
>>> Testing of the last round of changes revealed a hang in one of the new
>>> TLH tests. Robbin has fixed the hang, updated the existing TLH test,
>>> and
>>> added another TLH test for good measure.
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>>
>>> Here is the updated delta webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>>
>>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>>> no unexpected failures.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Robbin rebased the project last night/this morning to merge with
>>>> Thread
>>>> Local Handshakes (TLH) and also picked up additional changesets up
>>>> thru:
>>>>
>>>>> Changeset: fa736014cf28
>>>>> Author:    cjplummer
>>>>> Date:      2017-11-14 18:08 -0800
>>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>>
>>>>> 8191049: Add alternate version of pns() that is callable from
>>>>> within hotspot source
>>>>> Summary: added pns2() to debug.cpp
>>>>> Reviewed-by: stuefe, gthornbr
>>>>
>>>> This is the first time we've rebased the project to bits that are this
>>>> fresh (< 12 hours old at rebase time). We've done this because we
>>>> think
>>>> we're done with this project and are in the final review-change-retest
>>>> cycle(s)... In other words, we're not planning on making any more
>>>> major
>>>> changes for this project... :-)
>>>>
>>>> *** Time for code reviewers to chime in on this thread! ***
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>>
>>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>>
>>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>>> (and the new baseline also)...
>>>>
>>>> We're expecting this round to be a little noisier than usual because
>>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>>> (Yes, we're playing chase-the-repo...)
>>>>>
>>>>> Here is the updated full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>>
>>>>> Unlike the previous rebase, there were no changes required to the
>>>>> open code to get the rebased bits to build so there is no delta
>>>>> webrev.
>>>>>
>>>>> These bits have been run through JPRT and I've submitted the usual
>>>>> Mach5 tier[1-5] test run...
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>>
>>>>>> Here are the updated webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>>
>>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>>> holiday
>>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>>> closer
>>>>>> look today.
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Resolving one of the code review comments (from both Stefan K
>>>>>>> and Coleen)
>>>>>>> on jdk10-04-full required quite a few changes so it is being
>>>>>>> done as a
>>>>>>> standalone patch and corresponding webrevs:
>>>>>>>
>>>>>>> Here's the mq comment for the change:
>>>>>>>
>>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage
>>>>>>> to use
>>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>>
>>>>>>> And here is the delta webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Dan, Erik and Robbin
>>>>>>>
>>>>>>>
>>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>>
>>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>>
>>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>>
>>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>>> describe
>>>>>>>> and track the work on this project:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>>
>>>>>>>>
>>>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>>>> quotes
>>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>>> have a
>>>>>>>> solution for that problem yet.
>>>>>>>>
>>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>>
>>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>>> tier[2-5]
>>>>>>>> testing, additional stress testing on Dan's Solaris X64 server,
>>>>>>>> and
>>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>>
>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>
>>>>>>>> Daniel Daugherty
>>>>>>>> Erik Osterlund
>>>>>>>> Robbin Ehn
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by Daniel D. Daugherty
Adding back the missing OpenJDK aliases...


On 11/21/17 11:14 AM, [hidden email] wrote:

>
>
> On 11/20/17 8:50 PM, Daniel D. Daugherty wrote:
>> On 11/20/17 12:51 AM, David Holmes wrote:
>>> Hi Dan,
>>>
>>> Figured I'd better cast an eye over this again before it gets pushed :)
>>
>> Thanks for reviewing again!!
>>
>>
>>> Only one thing (repeated many times) jumped out at me and apologies
>>> if this has already been debated back and forth. I missed the
>>> exchange where the for loop iteration was rewritten into this
>>> unusual form:
>>>
>>> for (JavaThreadIteratorWithHandle jtiwh; JavaThread *jt =
>>> jtiwh.next(); ) {
>>>
>>> On first reading I expect most people would mistake the control
>>> expression for the iteration-expression due to the next(). I didn't
>>> even know the control expression could introduce a local variable! I
>>> find this really hard to read stylistically and far too cute/clever
>>> for its own good. It also violates the style rules regarding
>>> implicit boolean checks.
>>>
>>> I know Stefan proposed this as he did not like the alternate (in a
>>> few places):
>>>
>>> +  {
>>> +    ThreadsListHandle tlh;
>>> +    JavaThreadIterator jti(tlh.list());
>>> +    for (JavaThread* t = jti.first(); t != NULL; t = jti.next()) {
>>> ...
>>> +  }
>>>
>>> but it seems to me the iterator code has evolved since then and only
>>> a couple of places need a distinct TLH separate from the actual
>>> iterator object. So I'm more in favour of the more classic for-loop,
>>> with the iterator declared before the loop. Or even convert to while
>>> loops, as this iterator seems more suited to that.
>>
>> Actually both Coleen and Stefan had a problem with how much additional
>> code was added to support an iterator model. In some cases we went from
>> 1-line to 3-lines and in some cases we went from 1-line to 5-lines. For
>> Coleen, she wanted 2 of the new lines compressed into 1 new line by
>> using
>> an iterator with an embedded ThreadsListHandle. Stefan wanted us to try
>> and get back to 1-line where possible.
>>
>> The advantages of the new style are:
>>
>> - 1-line to 1-line in all but a few cases
>> - automatic limited scope of the embedded ThreadsListHandle so we were
>>   able to remove extra braces that were added earlier in most cases
>>
>> The disadvantages of the new style are:
>>
>> - it is an unusual for-loop style (in HotSpot)
>> - it breaks HotSpot's style rule about implied booleans
>>
>>
>> Stefan K is pretty adamant that the original iterator version where we
>> go from 1-line to 3-lines or 1-line to 5-lines is not acceptable for GC
>> code. The compiler guys haven't chimed in on this debate so I'm guessing
>> they don't have a strong opinion either way. One thing that we don't
>> want
>> to do is have different styles for the different teams.
>>
>> So I had to make a judgement call on whether I thought Runtime could
>> live
>> with Stefan's idea. Originally I wasn't fond of it either and
>> breaking the
>> implied boolean style rule is a problem for me (I post that comment a
>> lot
>> in my code reviews). However, I have to say that I'm a lot happier about
>> the compactness of the code and the fact that limited ThreadsListHandle
>> scope comes for 'free' in most cases.
>>
>> We're planning to keep the new iterator style for the following reasons:
>>
>> - smaller change footprint
>> - consistent style used for all of the teams
>> - limited ThreadsListHandle scope comes for free in most cases
>>
>> We're hoping that you can live with this decision (for now) and maybe
>> even grow to like it in the future. :-)
>
> The limited ThreadsListHandle scope, since ThreadsListHandle registers
> and unregisters lists to SMR, seems to be worth the cost of looking at
> this strange code pattern.   Thank you for the explanation.
>
>>
>>
>>> Other than that just a couple of comments on variable type choices,
>>> and a few typos spotted below.
>>
>> Replies embedded below.
>>
>>
>>>
>>> Thanks,
>>> David
>>> ---
>>>
>>> src/hotspot/share/runtime/globals.hpp
>>>
>>> 2476   diagnostic(bool, EnableThreadSMRExtraValidityChecks, true,
>>>         \
>>> 2477           "Enable Thread SMR extra validity checks") \
>>> 2478         \
>>> 2479   diagnostic(bool, EnableThreadSMRStatistics, true, \
>>> 2480           "Enable Thread SMR Statistics")         \
>>>
>>> Indent of strings is off by  3 spaces.
>>
>> Fixed.
>>
>>
>>> ---
>>>
>>> src/hotspot/share/runtime/handshake.cpp
>>>
>>>  140       // There is assumption in code that Threads_lock should
>>> be lock
>>>  200       // There is assumption in code that Threads_lock should
>>> be lock
>>>
>>> lock -> locked
>>
>> Fixed.
>>
>>
>>> 146         // handshake_process_by_vmthread will check the
>>> semaphore for us again
>>>
>>> Needs period at end.
>>
>> Fixed.
>>
>>
>>> 148         // should be okey since the new thread will not have an
>>> operation.
>>>
>>> okey -> okay
>>
>> Fixed.
>>
>>
>>> 151         // We can't warn here is since the thread do
>>> cancel_handshake after it have been removed
>>>
>>> I think:
>>>
>>> // We can't warn here since the thread does cancel_handshake after
>>> it has been removed
>>
>> Fixed.
>>
>>
>>> 152         // from ThreadsList. So we should just keep looping here
>>> until while below return negative.
>>>
>>> from -> from the
>>>
>>> Needs period at end.
>>
>> Fixed both.
>>
>>
>>>
>>>  204             // A new thread on the ThreadsList will not have an
>>> operation.
>>>
>>> Change period to comma, and ...
>>
>> Fixed.
>>
>>
>>> 205             // Hence is skipped in handshake_process_by_vmthread.
>>>
>>> Hence is -> hence it is
>>
>> Fixed.
>>
>>
>>> ---
>>>
>>> src/hotspot/share/runtime/thread.cpp
>>>
>>> 1482     // dcubed - Looks like the Threads_lock is for stable access
>>> 1483     // to _jvmci_old_thread_counters and _jvmci_counters.
>>>
>>> Does this comment need revisiting?
>>
>> We've been trying to decide what to do about this comment. We'll be
>> filing a follow up bug for the Compiler team to take another look at
>> how _jvmci_old_thread_counters and _jvmci_counters are protected.
>> Threads_lock is a big hammer and there should be a less expensive
>> solution. Once that bug gets filed, this comment can go away.
>>
>>
>>> 3451 volatile jint ...
>>>
>>> Why are you using jint for field types here? (Surprised Coleen
>>> hasn't spotted them! ;-) ).
>
> Yes, it was the jtypes I would have objected to.  And long isn't a
> good choice on windows because it's only 32 bits there.
>>
>> volatile jint         Threads::_smr_delete_notify = 0;
>> volatile jint         Threads::_smr_deleted_thread_cnt = 0;
>> volatile jint         Threads::_smr_deleted_thread_time_max = 0;
>> volatile jint         Threads::_smr_deleted_thread_times = 0;
>> :
>> volatile jint         Threads::_smr_tlh_cnt = 0;
>> volatile jint         Threads::_smr_tlh_time_max = 0;
>> volatile jint         Threads::_smr_tlh_times = 0;
>>
>> _smr_delete_notify is a "flag" that indicates when an _smr_delete_lock
>> notify() operation is needed. It counts up and down and should be a
>> fairly
>> small number.
>>
>> _smr_deleted_thread_cnt counts the number of Threads that have been
>> deleted over the life of the VM. It's an always increasing number so
>> it's size depends on how long the VM has been running.
>>
>> _smr_deleted_thread_time_max is the maximum time in millis it has
>> taken to delete a thread. This field was originally a jlong, but
>> IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
>> just been Atomic::add() that was not happy)
>>
>> _smr_deleted_thread_times accumulates the time in millis that it
>> has taken to delete threads. It's an always increasing number so
>> it's size depends on how long the VM has been running. This field
>> was originally a jlong, but IIRC the Atomic::add() code was not
>> happy on ARM... (might have just been Atomic::cmpxchg() that was
>> not happy)
>>
>> _smr_tlh_cnt counts the number of ThreadsListHandles that have been
>> deleted over the life of the VM. It's an always increasing number so
>> it's size depends on how long the VM has been running.
>>
>> _smr_tlh_time_max is the maximum time in millis it has taken to
>> delete a ThreadsListHandle. This field was originally a jlong, but
>> IIRC the Atomic::cmpxchg() code was not happy on ARM... (might have
>> just been Atomic::add() that was not happy)
>>
>> _smr_tlh_times  accumulates the time in millis that it has taken to
>> delete ThreadsListHandles. It's an always increasing number so
>> it's size depends on how long the VM has been running.  This field
>> was originally a jlong, but IIRC the Atomic::add() code was not
>> happy on ARM... (might have just been Atomic::cmpxchg() that was
>> not happy)
>>
>>
>> It just dawned on me that I'm not sure whether you think the 'jint'
>> fields should be larger or smaller or the same size but a different
>> type... :-)
>>
>> The fact that I had to write so much to explain what these fields
>> are for and how they're used indicates I need more comments. See
>> below...
>>
>>
>>> 3456 long Threads::_smr_java_thread_list_alloc_cnt = 1;
>>> 3457 long Threads::_smr_java_thread_list_free_cnt = 0;
>>>
>>> long? If you really want 64-bit counters here you won't get them on
>>> Windows with that declaration. If you really want variable sized
>>> counters I suggest uintptr_t; otherwise uint64_t.
>>
>> long                  Threads::_smr_java_thread_list_alloc_cnt = 1;
>> long                  Threads::_smr_java_thread_list_free_cnt = 0;
>>
>> _smr_java_thread_list_alloc_cnt counts the number of ThreadsLists that
>> have been allocated over the life of the VM. It's an always increasing
>> number so it's size depends on how long the VM has been running and how
>> many Threads have been created and destroyed.
>>
>> _smr_java_thread_list_free_cnt counts the number of ThreadsLists that
>> have been freed over the life of the VM. It's an always increasing
>> number so it's size depends on how long the VM has been running and how
>> many Threads have been created and destroyed.
>>
>> I can't remember why we chose 'long' and I agree it's not a good choice
>> for Win*.
>>
>>
>> Okay so it dawns on me that we haven't written down a description for
>> the various new '_cnt', '_max' and '_times" fields so I'm adding these
>> comments to thread.hpp:
>>
>
> With this comment format, it's really hard to pick out the name of the
> field.  Nobody uses 80 columns anymore.  Can you move the comments
> over some spaces so the field names are visible?

Yes, I'll update the patch that I have for dholmes CR resolutions.


>
>>  private:
>>   // Safe Memory Reclamation (SMR) support:
>>   static Monitor*              _smr_delete_lock;
>>   // The '_cnt', '_max' and '_times" fields are enabled via
>>   // -XX:+EnableThreadSMRStatistics:
>>                                // # of parallel threads in
>> _smr_delete_lock->wait().
>>   static uint                  _smr_delete_lock_wait_cnt;
>>                                // Max # of parallel threads in
>> _smr_delete_lock->wait().
>>   static uint                  _smr_delete_lock_wait_max;
>>                                // Flag to indicate when an
>> _smr_delete_lock->notify() is needed.
>>   static volatile uint         _smr_delete_notify;
>>                                // # of threads deleted over VM lifetime.
>>   static volatile uint         _smr_deleted_thread_cnt;
>>                                // Max time in millis to delete a thread.
>>   static volatile uint         _smr_deleted_thread_time_max;
>>                                // Cumulative time in millis to delete
>> threads.
>>   static volatile uint         _smr_deleted_thread_times;
>>   static ThreadsList* volatile _smr_java_thread_list;
>>   static ThreadsList*          get_smr_java_thread_list();
>>   static ThreadsList* xchg_smr_java_thread_list(ThreadsList* new_list);
>>                                // # of ThreadsLists allocated over VM
>> lifetime.
>>   static uint64_t              _smr_java_thread_list_alloc_cnt;
>>                                // # of ThreadsLists freed over VM
>> lifetime.
>>   static uint64_t              _smr_java_thread_list_free_cnt;
>>                                // Max size ThreadsList allocated.
>>   static uint                  _smr_java_thread_list_max;
>>                                // Max # of nested ThreadsLists for a
>> thread.
>>   static uint                  _smr_nested_thread_list_max;
>>                                // # of ThreadsListHandles deleted
>> over VM lifetime.
>>   static volatile uint         _smr_tlh_cnt;
>>                                // Max time in millis to delete a
>> ThreadsListHandle.
>>   static volatile jint         _smr_tlh_time_max;
>>                                // Cumulative time in millis to delete
>> ThreadsListHandles.
>>   static volatile jint         _smr_tlh_times;
>>   static ThreadsList*          _smr_to_delete_list;
>>                                // # of parallel ThreadsLists on the
>> to-delete list.
>>   static uint                  _smr_to_delete_list_cnt;
>>                                // Max # of parallel ThreadsLists on
>> the to-delete list.
>>   static uint                  _smr_to_delete_list_max;
>>
>>
>> I've also gone through all the new '_cnt', '_max' and '_times" fields
>> in thread.cpp and added an impl note to explain the choice of type:
>>
>
> Since this is in the cpp file and there is so much comment text, can
> you just make it in column 1 and leave a blank line after each field. 
> The indentation style is really hard to read and only useful if the
> comment is short.

Yes, I'll update the patch that I have for dholmes CR resolutions.


>
>> // Safe Memory Reclamation (SMR) support:
>> Monitor*              Threads::_smr_delete_lock =
>>                           new Monitor(Monitor::special,
>> "smr_delete_lock",
>>                                       false /* allow_vm_block */,
>> Monitor::_safepoint_check_never);
>> // The '_cnt', '_max' and '_times" fields are enabled via
>> // -XX:+EnableThreadSMRStatistics:
>>                       // # of parallel threads in
>> _smr_delete_lock->wait().
>>                       // Impl note: Hard to imagine > 64K waiting
>> threads so
>>                       // this could be 16-bit, but there is no nice
>> 16-bit
>>                       // _FORMAT support.
>> uint                  Threads::_smr_delete_lock_wait_cnt = 0;
>>                       // Max # of parallel threads in
>> _smr_delete_lock->wait().
>>                       // Impl note: See _smr_delete_lock_wait_cnt note.
>> uint                  Threads::_smr_delete_lock_wait_max = 0;
>>                       // Flag to indicate when an
>> _smr_delete_lock->notify() is needed.
>>                       // Impl note: See _smr_delete_lock_wait_cnt note.
>> volatile uint         Threads::_smr_delete_notify = 0;
>>                       // # of threads deleted over VM lifetime.
>>                       // Impl note: Atomically incremented over VM
>> lifetime so
>>                       // use unsigned for more range. Unsigned 64-bit
>> would
>>                       // be more future proof, but 64-bit atomic inc
>> isn't
>>                       // available everywhere (or is it?).
>> volatile uint         Threads::_smr_deleted_thread_cnt = 0;
>>                       // Max time in millis to delete a thread.
>>                       // Impl note: 16-bit might be too small on an
>> overloaded
>>                       // machine. Use unsigned since this is a time
>> value. Set
>>                       // via Atomic::cmpxchg() in a loop for
>> correctness.
>> volatile uint         Threads::_smr_deleted_thread_time_max = 0;
>>                       // Cumulative time in millis to delete threads.
>>                       // Impl note: Atomically added to over VM
>> lifetime so use
>>                       // unsigned for more range. Unsigned 64-bit
>> would be more
>>                       // future proof, but 64-bit atomic inc isn't
>> available
>>                       // everywhere (or is it?).
>> volatile uint         Threads::_smr_deleted_thread_times = 0;
>> ThreadsList* volatile Threads::_smr_java_thread_list = new
>> ThreadsList(0);
>>                       // # of ThreadsLists allocated over VM lifetime.
>>                       // Impl note: We allocate a new ThreadsList for
>> every
>>                       // thread create and every thread delete so we
>> need a
>>                       // bigger type than the _smr_deleted_thread_cnt
>> field.
>> uint64_t              Threads::_smr_java_thread_list_alloc_cnt = 1;
>>                       // # of ThreadsLists freed over VM lifetime.
>>                       // Impl note: See
>> _smr_java_thread_list_alloc_cnt note.
>> uint64_t              Threads::_smr_java_thread_list_free_cnt = 0;
>>                       // Max size ThreadsList allocated.
>>                       // Impl note: Max # of threads alive at one
>> time should
>>                       // fit in unsigned 32-bit.
>> uint                  Threads::_smr_java_thread_list_max = 0;
>>                       // Max # of nested ThreadsLists for a thread.
>>                       // Impl note: Hard to imagine > 64K nested
>> ThreadsLists
>>                       // so his could be 16-bit, but there is no nice
>> 16-bit
>>                       // _FORMAT support.
>> uint                  Threads::_smr_nested_thread_list_max = 0;
>>                       // # of ThreadsListHandles deleted over VM
>> lifetime.
>>                       // Impl note: Atomically incremented over VM
>> lifetime so
>>                       // use unsigned for more range. There will be
>> fewer
>>                       // ThreadsListHandles than threads so unsigned
>> 32-bit
>>                       // should be fine.
>> volatile uint         Threads::_smr_tlh_cnt = 0;
>>                       // Max time in millis to delete a
>> ThreadsListHandle.
>>                       // Impl note: 16-bit might be too small on an
>> overloaded
>>                       // machine. Use unsigned since this is a time
>> value. Set
>>                       // via Atomic::cmpxchg() in a loop for
>> correctness.
>> volatile uint         Threads::_smr_tlh_time_max = 0;
>>                       // Cumulative time in millis to delete
>> ThreadsListHandles.
>>                       // Impl note: Atomically added to over VM
>> lifetime so use
>>                       // unsigned for more range. Unsigned 64-bit
>> would be more
>>                       // future proof, but 64-bit atomic inc isn't
>> available
>>                       // everywhere (or is it?).
>> volatile uint         Threads::_smr_tlh_times = 0;
>> ThreadsList*          Threads::_smr_to_delete_list = NULL;
>>                       // # of parallel ThreadsLists on the to-delete
>> list.
>>                       // Impl note: Hard to imagine > 64K
>> ThreadsLists needing
>>                       // to be deleted so this could be 16-bit, but
>> there is no
>>                       // nice 16-bit _FORMAT support.
>> uint                  Threads::_smr_to_delete_list_cnt = 0;
>>                       // Max # of parallel ThreadsLists on the
>> to-delete list.
>>                       // Impl note: See _smr_to_delete_list_cnt note.
>> uint                  Threads::_smr_to_delete_list_max = 0;
>>
>>
>> Yikes! With the additional comments, the additions to thread.hpp and
>> thread.cpp grew by a bunch... I've done a test build build on my Mac
>> with these changes and I'm about to kick off a Mach5 tier1 job...
>
> Another reason why I agreed with some earlier comment that this should
> be in a new file because it's a new thing. I was somewhat surprised
> that it's not in threadSMR.{hpp,cpp}.   This refactoring could be
> deferred but thread.cpp is too big!

More refactoring is planned for after the initial push of Thread-SMR...

Thanks for chiming in on this part of the review thread.

Dan


>
> thanks,
> Coleen
>
>>
>> Dan
>>
>>
>>
>>>
>>> ----
>>>
>>> On 19/11/2017 11:49 AM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Testing of the last round of changes revealed a hang in one of the new
>>>> TLH tests. Robbin has fixed the hang, updated the existing TLH
>>>> test, and
>>>> added another TLH test for good measure.
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>>>
>>>> Here is the updated delta webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>>>
>>>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>>>> no unexpected failures.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Robbin rebased the project last night/this morning to merge with
>>>>> Thread
>>>>> Local Handshakes (TLH) and also picked up additional changesets up
>>>>> thru:
>>>>>
>>>>>> Changeset: fa736014cf28
>>>>>> Author:    cjplummer
>>>>>> Date:      2017-11-14 18:08 -0800
>>>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>>>
>>>>>> 8191049: Add alternate version of pns() that is callable from
>>>>>> within hotspot source
>>>>>> Summary: added pns2() to debug.cpp
>>>>>> Reviewed-by: stuefe, gthornbr
>>>>>
>>>>> This is the first time we've rebased the project to bits that are
>>>>> this
>>>>> fresh (< 12 hours old at rebase time). We've done this because we
>>>>> think
>>>>> we're done with this project and are in the final
>>>>> review-change-retest
>>>>> cycle(s)... In other words, we're not planning on making any more
>>>>> major
>>>>> changes for this project... :-)
>>>>>
>>>>> *** Time for code reviewers to chime in on this thread! ***
>>>>>
>>>>> Here is the updated full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>>>
>>>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>>>
>>>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>>>> (and the new baseline also)...
>>>>>
>>>>> We're expecting this round to be a little noisier than usual because
>>>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>>>> (Yes, we're playing chase-the-repo...)
>>>>>>
>>>>>> Here is the updated full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>>>
>>>>>> Unlike the previous rebase, there were no changes required to the
>>>>>> open code to get the rebased bits to build so there is no delta
>>>>>> webrev.
>>>>>>
>>>>>> These bits have been run through JPRT and I've submitted the usual
>>>>>> Mach5 tier[1-5] test run...
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>>>
>>>>>>> Here are the updated webrevs:
>>>>>>>
>>>>>>> Here's the mq comment for the change:
>>>>>>>
>>>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>>>
>>>>>>> And here is the delta webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>>>
>>>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>>>> holiday
>>>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>>>> closer
>>>>>>> look today.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Dan, Erik and Robbin
>>>>>>>
>>>>>>>
>>>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> Resolving one of the code review comments (from both Stefan K
>>>>>>>> and Coleen)
>>>>>>>> on jdk10-04-full required quite a few changes so it is being
>>>>>>>> done as a
>>>>>>>> standalone patch and corresponding webrevs:
>>>>>>>>
>>>>>>>> Here's the mq comment for the change:
>>>>>>>>
>>>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage
>>>>>>>> to use
>>>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>>>
>>>>>>>> Here is the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>>>
>>>>>>>> And here is the delta webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>>>
>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>
>>>>>>>> Dan, Erik and Robbin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>>>
>>>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>>>
>>>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>>>
>>>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>>>> describe
>>>>>>>>> and track the work on this project:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dan has noticed that the indenting is wrong in some of the
>>>>>>>>> code quotes
>>>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>>>> have a
>>>>>>>>> solution for that problem yet.
>>>>>>>>>
>>>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>>>
>>>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>>>> tier[2-5]
>>>>>>>>> testing, additional stress testing on Dan's Solaris X64
>>>>>>>>> server, and
>>>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>>>
>>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>>
>>>>>>>>> Daniel Daugherty
>>>>>>>>> Erik Osterlund
>>>>>>>>> Robbin Ehn
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by coleen.phillimore
Thanks for keeping all the OpenJDK aliases!


On 11/21/17 12:24 PM, [hidden email] wrote:

>
>
> On 11/21/17 11:28 AM, Daniel D. Daugherty wrote:
>> Hi Coleen!
>>
>> Thanks for making time to review the Thread-SMR stuff again!!
>>
>> I have added back the other three OpenJDK aliases... This review is
>> being done on _four_ different OpenJDK aliases.
>>
>> As always, replies are embedded below...
>>
>>
>> On 11/20/17 3:12 PM, [hidden email] wrote:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/os/linux/os_linux.cpp.frames.html 
>>>
>>>
>>> I read David's comments about the threads list iterator, and I was
>>> going to say that it can be cleaned up later, as the bulk of the
>>> change is the SMR part but this looks truly bizarre. It looks like
>>> it shouldn't compile because 'jt' isn't in scope.
>>>
>>> Why isn't this the syntax:
>>>
>>> JavaThreadIteratorWithHandle jtiwh;
>>> for (JavaThread* jt = jtiwh.first(); jt != NULL; jt = jtiwh.next()) {
>>> }
>>>
>>> Which would do the obvious thing without anyone having to squint at
>>> the code.
>>
>> See my reply to David's review for the more detailed answer.
>>
>> For the above syntax, we would need braces to limit the scope of the
>> 'jtiwh' variable. With Stefan's propsal, you get limited scope on
>> 'jtiwh' for "free".
>
> Yes, thank you, I see that motivation now.
>>
>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.hpp.html 
>>>
>>>
>>> As a hater of acronmys, can this file use "Safe Memory Reclaimation"
>>
>> I'm not sure what you mean by this. Do you mean rename the files?
>>
>>   threadSMR.hpp -> threadSafeMemoryReclaimation.hpp
>>   threadSMR.cpp -> threadSafeMemoryReclaimation.cpp
>>
>
> No.
>
>>
>>> and briefly describe the concept in the beginning of the header file,
>>> so one knows why it's called threadSMR.hpp?
>>
>> And then this part of the sentence kind of indicates that you would be
>> okay with the threadSMR.{c,h}pp names if a comment was added to the
>> header file.
>>
>> Please clarify.
>
> Yes.  If you added a comment to the beginning of the threadSMR.hpp
> file that said what SMR stood for and what it was, that would be
> extremely helpful for future viewers of this file.

Yup. I just finished with the comment...


>
>>
>>
>>> It doesn't need to be long and include why Threads list needs this
>>> Sometimes we tell new people that the hotspot documentation is in
>>> the header files.
>>
>> Yup. When I migrated stuff from thread.hpp and thread.cpp to
>> threadSMR.hpp
>> and threadSMR.cpp, I should have written a header comment...
>>
>> I did update a comment in thread.cpp based on Robin W's code review:
>
> Yes, I think a comment belongs in the SMR file also, or in preference.

Wasn't trying to say that I thought the update I did for Robin W was
sufficient...


>>
>>> > src/hotspot/share/runtime/thread.cpp
>>> >
>>> > 3432 // operations over all threads.  It is protected by its own
>>> Mutex
>>> > 3433 // lock, which is also used in other contexts to protect thread
>>> >
>>> > Should this comment perhaps be revised to mention SMR?
>>>
>>> It definitely needs some updating... Here's a stab at it:
>>>
>>> // The Threads class links together all active threads, and provides
>>> // operations over all threads. It is protected by the Threads_lock,
>>> // which is also used in other global contexts like safepointing.
>>> // ThreadsListHandles are used to safely perform operations on one
>>> // or more threads without the risk of the thread exiting during the
>>> // operation.
>>> //
>>> // Note: The Threads_lock is currently more widely used than we
>>> // would like. We are actively migrating Threads_lock uses to other
>>> // mechanisms in order to reduce Threads_lock contention.
>>
>> I'll take a look at adding a header comment to threadSMR.hpp.
>>
>>
>>> 186   JavaThreadIteratorWithHandle() : _tlh(), _index(0) {}
>>>
>>> This _tlh() call should not be necessary.  The compiler should
>>> generate this for you in the constructor.
>>
>> Deleted.
>>
>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.cpp.html 
>>>
>>>
>>>   32 ThreadsList::ThreadsList(int entries) : _length(entries),
>>> _threads(NEW_C_HEAP_ARRAY(JavaThread*, entries + 1, mtGC)),
>>> _next_list(NULL) {
>>>
>>> Seems like it should be mtThread rather than mtGC.
>>
>> Fixed. Definitely an artifact of Erik's original prototype when he
>> extracted Thread-SMR from his GC work... Thanks for catching it.
>>
>>
>>> Should
>>>
>>>   62   if (EnableThreadSMRStatistics) {
>>>
>>> really be UL, ie: if (log_is_enabled(Info, thread, smr, statistics)) ?
>>
>> EnableThreadSMRStatistics is used in more places than UL code.
>> We use it in Thread::print_*() stuff to control output of
>> Thread-SMR statistics info in thread dumps and hs_err_pid file
>> generation.
>
> That sort of thing is also controlled by logging in general.

Don't think I want to do that since logging may be be "up" at the
time that I need Thread::print_*() or hs_err_pid generation...

Something about chickens and eggs... :-)


>>
>> Currently thread dump and hs_err_pid file output is not generated
>> using UL (and probably can't be?) so we need an option to control
>> the Thread-SMR statistics stuff in all places.
>>
>
> You can use log options in preference to this new diagnostic option in
> these cases too.   This option looks a lot like logging to me and
> would be nice to not have to later convert it.

See above reply...


>
>>
>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java.html 
>>>
>>>
>>> Can you use for these tests instead (there were a couple):
>>>
>>> *@requires (vm.debug == true)*
>>
>> The test I cloned had this in it:
>>
>>     if (!Platform.isDebugBuild()) {
>>         // -XX:ErrorHandlerTest=N option requires debug bits.
>>         return;
>>     }
>>
>> and you would like me to switch to the newer mechanism?
>>
>> I have updated the following tests:
>>
>>   test/hotspot/jtreg/runtime/ErrorHandling/ErrorHandler.java
>> test/hotspot/jtreg/runtime/ErrorHandling/NestedThreadsListHandleInErrorHandlingTest.java
>>
>> test/hotspot/jtreg/runtime/ErrorHandling/ThreadsListHandleInErrorHandlingTest.java
>>
>>
>>
> Yes, the newer mechanism makes jtreg not bother to run the test, which
> makes it faster!

More quickly not run the test... Yup... got it...


>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/thread.cpp.udiff.html 
>>>
>>>
>>> +// thread, has been added the Threads list, the system is not at a
>>>
>>> Has "not" been added to the Threads list?  missing "not"?
>>
>> Nope. If the JavaThread has been added to the Threads list
>> and it is not protected, then it is dangling. In other words,
>> a published JavaThread (on the Threads list) has to be protected
>> by either the system being at a safepoint or the JavaThread has
>> to be on some threads's ThreadsList.
>>
>>
>>>
>>> + return (unsigned int)(((uint32_t)(uintptr_t)s1) * 2654435761u);
>>>
>>> Can you add a comment about where this number came from?
>>
>> I'll have to get that from Erik...
>>
>>
>>> I can't find the caller of threads_do_smr.
>>
>> I'm guessing that's used by the GC code that depends on Thread-SMR...
>>
>
> They  should add this API when the add the use of it then.  I don't
> see it in any sources.

We'll see what Erik wants to do...


>
>>
>>> If these functions xchg_smr_thread_list, get_smr_java_thread_list,
>>> inc_smr_deleted_thread_count are only used by thread.cpp, I think
>>> they should go in that file and not in the .inline.hpp file to be
>>> included and possibly called by other files.  I think they're
>>> private to class Threads.
>>
>> I have a vague memory that some of the compilers don't do inlining when
>> an "inline" function is in a .cpp. I believe we want these functions
>> to be inlined for performance reasons. Erik should probably chime in
>> here.
>
> I don't see why this should be.  Maybe some (older) compilers require
> it to be found before the call though, but that can still be
> accomplished in the .cpp file.

Again, we'll see what Erik wants to do...


>>
>>
>>> I don't have any in-depth comments. This looks really good from my
>>> day of reading it.
>>
>> Thanks for taking the time to review it again!
>>
>
> Thanks,
> Coleen

And thanks again...

Dan


>
>> Dan
>>
>>
>>>
>>> Thanks,
>>> Coleen
>>>
>>> On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:
>>>
>>>> Greetings,
>>>>
>>>> Testing of the last round of changes revealed a hang in one of the new
>>>> TLH tests. Robbin has fixed the hang, updated the existing TLH
>>>> test, and
>>>> added another TLH test for good measure.
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>>>
>>>> Here is the updated delta webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>>>
>>>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>>>> no unexpected failures.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Robbin rebased the project last night/this morning to merge with
>>>>> Thread
>>>>> Local Handshakes (TLH) and also picked up additional changesets up
>>>>> thru:
>>>>>
>>>>>> Changeset: fa736014cf28
>>>>>> Author:    cjplummer
>>>>>> Date:      2017-11-14 18:08 -0800
>>>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>>>
>>>>>> 8191049: Add alternate version of pns() that is callable from
>>>>>> within hotspot source
>>>>>> Summary: added pns2() to debug.cpp
>>>>>> Reviewed-by: stuefe, gthornbr
>>>>>
>>>>> This is the first time we've rebased the project to bits that are
>>>>> this
>>>>> fresh (< 12 hours old at rebase time). We've done this because we
>>>>> think
>>>>> we're done with this project and are in the final
>>>>> review-change-retest
>>>>> cycle(s)... In other words, we're not planning on making any more
>>>>> major
>>>>> changes for this project... :-)
>>>>>
>>>>> *** Time for code reviewers to chime in on this thread! ***
>>>>>
>>>>> Here is the updated full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>>>
>>>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>>>
>>>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>>>> (and the new baseline also)...
>>>>>
>>>>> We're expecting this round to be a little noisier than usual because
>>>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>>>> (Yes, we're playing chase-the-repo...)
>>>>>>
>>>>>> Here is the updated full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>>>
>>>>>> Unlike the previous rebase, there were no changes required to the
>>>>>> open code to get the rebased bits to build so there is no delta
>>>>>> webrev.
>>>>>>
>>>>>> These bits have been run through JPRT and I've submitted the usual
>>>>>> Mach5 tier[1-5] test run...
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>>>
>>>>>>> Here are the updated webrevs:
>>>>>>>
>>>>>>> Here's the mq comment for the change:
>>>>>>>
>>>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>>>
>>>>>>> And here is the delta webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>>>
>>>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>>>> holiday
>>>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>>>> closer
>>>>>>> look today.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Dan, Erik and Robbin
>>>>>>>
>>>>>>>
>>>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> Resolving one of the code review comments (from both Stefan K
>>>>>>>> and Coleen)
>>>>>>>> on jdk10-04-full required quite a few changes so it is being
>>>>>>>> done as a
>>>>>>>> standalone patch and corresponding webrevs:
>>>>>>>>
>>>>>>>> Here's the mq comment for the change:
>>>>>>>>
>>>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage
>>>>>>>> to use
>>>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>>>
>>>>>>>> Here is the full webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>>>
>>>>>>>> And here is the delta webrev:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>>>
>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>
>>>>>>>> Dan, Erik and Robbin
>>>>>>>>
>>>>>>>>
>>>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>>>> Greetings,
>>>>>>>>>
>>>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>>>
>>>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>>>
>>>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>>>
>>>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>>>> describe
>>>>>>>>> and track the work on this project:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Dan has noticed that the indenting is wrong in some of the
>>>>>>>>> code quotes
>>>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>>>> have a
>>>>>>>>> solution for that problem yet.
>>>>>>>>>
>>>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>>>
>>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>>>
>>>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>>>> tier[2-5]
>>>>>>>>> testing, additional stress testing on Dan's Solaris X64
>>>>>>>>> server, and
>>>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>>>
>>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>>
>>>>>>>>> Daniel Daugherty
>>>>>>>>> Erik Osterlund
>>>>>>>>> Robbin Ehn
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by Daniel D. Daugherty
Greetings,

*** We are wrapping up code review on this project so it is time ***
*** for the various code reviewers to chime in one last time. ***

In this latest round, we had three different reviewers chime in so we're
doing delta webrevs for each of those resolutions:

David H's resolutions:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.0-delta/

   mq comment for dholmes_CR:
     - Fix indents, trailing spaces and various typos.
     - Add descriptions for the '_cnt', '_max' and '_times" fields,
       add impl notes to document the type choices.

   src/hotspot/share/runtime/globals.hpp change is white-space only so it
   is only visible via the file's patch link.


Robin W's resolutions:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.0,10.10.1-delta/

   mq comment for robinw_CR:
     - Fix some inefficient code, update some comments, fix some indents,
       and add some 'const' specifiers.

   src/hotspot/share/runtime/vm_operations.hpp change is white-space only so
   it is only visible via the file's patch link.


Coleen's resolutions:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.1,10.10.2-delta/

   mq comment for coleenp_CR:
     - Change ThreadsList::_threads from 'mtGC' -> 'mtThread',
     - add header comment to threadSMR.hpp file,
     - cleanup JavaThreadIteratorWithHandle ctr,
     - make ErrorHandling more efficient.


Some folks prefer to see all of the delta webrevs together so here is
a delta webrev relative to jdk10-09-full:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.2-delta/

   src/hotspot/share/runtime/globals.hpp and
   src/hotspot/share/runtime/vm_operations.hpp changes are white-space only
   so they are only visible via the file's patch link.


And, of course, some folks prefer to see everything all at once so here
is the latest full webrev:

http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10-full/


Dan has submitted the bits for the usual Mach5 tier[1-5] testing. The
prelim Mach5 tier1 (JPRT equivalent) on these bits passed just fine...

The project is currently baselined on Jesper's 2017-11-17 PIT snapshot
so there will be some catching up to do before integration...

We welcome comments, suggestions and feedback.

Dan, Erik and Robbin


On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:

> Greetings,
>
> Testing of the last round of changes revealed a hang in one of the new
> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
> added another TLH test for good measure.
>
> Here is the updated full webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>
> Here is the updated delta webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>
> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
> no unexpected failures.
>
> We welcome comments, suggestions and feedback.
>
> Dan, Erik and Robbin
>
>
> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Robbin rebased the project last night/this morning to merge with Thread
>> Local Handshakes (TLH) and also picked up additional changesets up thru:
>>
>>> Changeset: fa736014cf28
>>> Author:    cjplummer
>>> Date:      2017-11-14 18:08 -0800
>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>
>>> 8191049: Add alternate version of pns() that is callable from within
>>> hotspot source
>>> Summary: added pns2() to debug.cpp
>>> Reviewed-by: stuefe, gthornbr
>>
>> This is the first time we've rebased the project to bits that are this
>> fresh (< 12 hours old at rebase time). We've done this because we think
>> we're done with this project and are in the final review-change-retest
>> cycle(s)... In other words, we're not planning on making any more major
>> changes for this project... :-)
>>
>> *** Time for code reviewers to chime in on this thread! ***
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>
>> Here's is the delta webrev from the 2017.11.10 rebase:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>
>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>> (and the new baseline also)...
>>
>> We're expecting this round to be a little noisier than usual because
>> we did not rebase on a PIT snapshot so the new baseline has not been
>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>> (Yes, we're playing chase-the-repo...)
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>
>>> Unlike the previous rebase, there were no changes required to the
>>> open code to get the rebased bits to build so there is no delta
>>> webrev.
>>>
>>> These bits have been run through JPRT and I've submitted the usual
>>> Mach5 tier[1-5] test run...
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>
>>>> Here are the updated webrevs:
>>>>
>>>> Here's the mq comment for the change:
>>>>
>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>
>>>> Here is the full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>
>>>> And here is the delta webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>
>>>> I ran the above bits throught Mach5 tier[1-5] testing over the holiday
>>>> weekend. Didn't see any issues in a quick look. Going to take a closer
>>>> look today.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> Resolving one of the code review comments (from both Stefan K and
>>>>> Coleen)
>>>>> on jdk10-04-full required quite a few changes so it is being done
>>>>> as a
>>>>> standalone patch and corresponding webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to use
>>>>>     JavaThreadIteratorWithHandle.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>
>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>
>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>
>>>>>> Here's a PDF for the internal wiki that we've been using to describe
>>>>>> and track the work on this project:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>
>>>>>>
>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>> quotes
>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>> have a
>>>>>> solution for that problem yet.
>>>>>>
>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>
>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>> tier[2-5]
>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Daniel Daugherty
>>>>>> Erik Osterlund
>>>>>> Robbin Ehn
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

David Holmes
Hi Dan,

On 22/11/2017 10:12 AM, Daniel D. Daugherty wrote:

> Greetings,
>
> *** We are wrapping up code review on this project so it is time ***
> *** for the various code reviewers to chime in one last time. ***
>
> In this latest round, we had three different reviewers chime in so we're
> doing delta webrevs for each of those resolutions:
>
> David H's resolutions:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.0-delta/ 
>
>
>    mq comment for dholmes_CR:
>      - Fix indents, trailing spaces and various typos.
>      - Add descriptions for the '_cnt', '_max' and '_times" fields,
>        add impl notes to document the type choices.

src/hotspot/share/runtime/thread.cpp

3466 // range. Unsigned 64-bit would be more future proof, but 64-bit
atomic inc
3467 // isn't available everywhere (or is it?).

64-bit atomics should be available on all currently supported platforms
(the last time this was an issue was for PPC32 - and the atomic
templates should have filled in any previous holes in the allowed
types). But if it's always 64-bit then you'd have to use Atomic::load to
allow for 32-bit platforms.

These counts etc are all just for statistics so we aren't really
concerned about eventual overflows in long running VMs - right?

---

                                        // # of parallel threads in
_smr_delete_lock->wait().
   static uint                  _smr_delete_lock_wait_cnt;
                                        // Max # of parallel threads in
_smr_delete_lock->wait().


why are the comments indented like that? I thought this is what Coleen
previously commented on. Please just start the comments in the first
column - no need for any indent. Thanks.

>    src/hotspot/share/runtime/globals.hpp change is white-space only so it
>    is only visible via the file's patch link.
>
>
> Robin W's resolutions:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.0,10.10.1-delta/ 
>
>
>    mq comment for robinw_CR:
>      - Fix some inefficient code, update some comments, fix some indents,
>        and add some 'const' specifiers.
>
>    src/hotspot/share/runtime/vm_operations.hpp change is white-space
> only so
>    it is only visible via the file's patch link.

All looks fine to me. Some good catches there from Robin!

> Coleen's resolutions:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.1,10.10.2-delta/ 
>
>
>    mq comment for coleenp_CR:
>      - Change ThreadsList::_threads from 'mtGC' -> 'mtThread',
>      - add header comment to threadSMR.hpp file,
>      - cleanup JavaThreadIteratorWithHandle ctr,
>      - make ErrorHandling more efficient.

src/hotspot/share/runtime/threadSMR.hpp

+ // Thread Safe Memory Reclaimation (Thread-SMR) support.

Only one 'i' in Reclamation :)

Thanks,
David
------

>
> Some folks prefer to see all of the delta webrevs together so here is
> a delta webrev relative to jdk10-09-full:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.2-delta/ 
>
>
>    src/hotspot/share/runtime/globals.hpp and
>    src/hotspot/share/runtime/vm_operations.hpp changes are white-space only
>    so they are only visible via the file's patch link.
>
>
> And, of course, some folks prefer to see everything all at once so here
> is the latest full webrev:
>
> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10-full/
>
>
> Dan has submitted the bits for the usual Mach5 tier[1-5] testing. The
> prelim Mach5 tier1 (JPRT equivalent) on these bits passed just fine...
>
> The project is currently baselined on Jesper's 2017-11-17 PIT snapshot
> so there will be some catching up to do before integration...
>
> We welcome comments, suggestions and feedback.
>
> Dan, Erik and Robbin
>
>
> On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> Testing of the last round of changes revealed a hang in one of the new
>> TLH tests. Robbin has fixed the hang, updated the existing TLH test, and
>> added another TLH test for good measure.
>>
>> Here is the updated full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>
>> Here is the updated delta webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>
>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>> no unexpected failures.
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Robbin rebased the project last night/this morning to merge with Thread
>>> Local Handshakes (TLH) and also picked up additional changesets up thru:
>>>
>>>> Changeset: fa736014cf28
>>>> Author:    cjplummer
>>>> Date:      2017-11-14 18:08 -0800
>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>
>>>> 8191049: Add alternate version of pns() that is callable from within
>>>> hotspot source
>>>> Summary: added pns2() to debug.cpp
>>>> Reviewed-by: stuefe, gthornbr
>>>
>>> This is the first time we've rebased the project to bits that are this
>>> fresh (< 12 hours old at rebase time). We've done this because we think
>>> we're done with this project and are in the final review-change-retest
>>> cycle(s)... In other words, we're not planning on making any more major
>>> changes for this project... :-)
>>>
>>> *** Time for code reviewers to chime in on this thread! ***
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>
>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>
>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>> (and the new baseline also)...
>>>
>>> We're expecting this round to be a little noisier than usual because
>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>> (Yes, we're playing chase-the-repo...)
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>
>>>> Unlike the previous rebase, there were no changes required to the
>>>> open code to get the rebased bits to build so there is no delta
>>>> webrev.
>>>>
>>>> These bits have been run through JPRT and I've submitted the usual
>>>> Mach5 tier[1-5] test run...
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>
>>>>> Here are the updated webrevs:
>>>>>
>>>>> Here's the mq comment for the change:
>>>>>
>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>
>>>>> Here is the full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>
>>>>> And here is the delta webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>
>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the holiday
>>>>> weekend. Didn't see any issues in a quick look. Going to take a closer
>>>>> look today.
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Resolving one of the code review comments (from both Stefan K and
>>>>>> Coleen)
>>>>>> on jdk10-04-full required quite a few changes so it is being done
>>>>>> as a
>>>>>> standalone patch and corresponding webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage to use
>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>
>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>
>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>
>>>>>>> Here's a PDF for the internal wiki that we've been using to describe
>>>>>>> and track the work on this project:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>
>>>>>>>
>>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>>> quotes
>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>> have a
>>>>>>> solution for that problem yet.
>>>>>>>
>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>
>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>> tier[2-5]
>>>>>>> testing, additional stress testing on Dan's Solaris X64 server, and
>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Daniel Daugherty
>>>>>>> Erik Osterlund
>>>>>>> Robbin Ehn
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Erik Osterlund
In reply to this post by Daniel D. Daugherty
Hi,

Some replies...

On 2017-11-21 17:28, Daniel D. Daugherty wrote:

> Hi Coleen!
>
> Thanks for making time to review the Thread-SMR stuff again!!
>
> I have added back the other three OpenJDK aliases... This review is
> being done on _four_ different OpenJDK aliases.
>
> As always, replies are embedded below...
>
>
> On 11/20/17 3:12 PM, [hidden email] wrote:
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.cpp.html 
>>
>>
>>   32 ThreadsList::ThreadsList(int entries) : _length(entries),
>> _threads(NEW_C_HEAP_ARRAY(JavaThread*, entries + 1, mtGC)),
>> _next_list(NULL) {
>>
>> Seems like it should be mtThread rather than mtGC.
>
> Fixed. Definitely an artifact of Erik's original prototype when he
> extracted Thread-SMR from his GC work... Thanks for catching it.
>

Confirmed. At the time I considered the Threads list overheads GC
overheads, but I agree mtThread is a better fit today.

>
>>
>> + return (unsigned int)(((uint32_t)(uintptr_t)s1) * 2654435761u);
>>
>> Can you add a comment about where this number came from?
>
> I'll have to get that from Erik...

Wow, that looks like code I wrote a *very* long time ago. :) That is a
variation of Knuth's multiplicative hash which is outlined in a comment
in synchronizer.cpp and referred to in that comment as a phi-based
scheme. Basically the magic number is 2^32 * Phi (the golden ratio),
which happens to be a useful value for building a reasonably simple yet
pretty good hash function. It is not the optimal hash function, but
seemed to definitely be good enough for our purposes.

Thanks,
/Erik
Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Erik Osterlund
In reply to this post by Daniel D. Daugherty
Hi,

Some replies...

On 2017-11-21 18:36, Daniel D. Daugherty wrote:

> Thanks for keeping all the OpenJDK aliases!
>
>
> On 11/21/17 12:24 PM, [hidden email] wrote:
>>
>>
>> On 11/21/17 11:28 AM, Daniel D. Daugherty wrote:
>>>
>>> On 11/20/17 3:12 PM, [hidden email] wrote:
>>>
>>>> I can't find the caller of threads_do_smr.
>>>
>>> I'm guessing that's used by the GC code that depends on Thread-SMR...
>>>
>>
>> They  should add this API when the add the use of it then.  I don't
>> see it in any sources.
>
> We'll see what Erik wants to do...

The new iterators should be more than enough, so that closure-based API
is not going to be missed when removed.

>
>>
>>>
>>>> If these functions xchg_smr_thread_list, get_smr_java_thread_list,
>>>> inc_smr_deleted_thread_count are only used by thread.cpp, I think
>>>> they should go in that file and not in the .inline.hpp file to be
>>>> included and possibly called by other files.  I think they're
>>>> private to class Threads.
>>>
>>> I have a vague memory that some of the compilers don't do inlining when
>>> an "inline" function is in a .cpp. I believe we want these functions
>>> to be inlined for performance reasons. Erik should probably chime in
>>> here.
>>
>> I don't see why this should be.  Maybe some (older) compilers require
>> it to be found before the call though, but that can still be
>> accomplished in the .cpp file.
>
> Again, we'll see what Erik wants to do...

I don't mind. Either file works for me. For me it's more intuitive if
inline member function definitions are in the .inline.hpp files. But if
there are strong forces to move this to the cpp file, then sure.

Thanks,
/Erik

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by David Holmes
On 11/21/17 11:26 PM, David Holmes wrote:

> Hi Dan,
>
> On 22/11/2017 10:12 AM, Daniel D. Daugherty wrote:
>> Greetings,
>>
>> *** We are wrapping up code review on this project so it is time ***
>> *** for the various code reviewers to chime in one last time. ***
>>
>> In this latest round, we had three different reviewers chime in so we're
>> doing delta webrevs for each of those resolutions:
>>
>> David H's resolutions:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.0-delta/ 
>>
>>
>>    mq comment for dholmes_CR:
>>      - Fix indents, trailing spaces and various typos.
>>      - Add descriptions for the '_cnt', '_max' and '_times" fields,
>>        add impl notes to document the type choices.
>
> src/hotspot/share/runtime/thread.cpp
>
> 3466 // range. Unsigned 64-bit would be more future proof, but 64-bit
> atomic inc
> 3467 // isn't available everywhere (or is it?).
>
> 64-bit atomics should be available on all currently supported
> platforms (the last time this was an issue was for PPC32 - and the
> atomic templates should have filled in any previous holes in the
> allowed types).

Hmmm... I ran into this when I merged the project with the atomic
templates. I got a JPRT build failure on one of the ARM platforms...
Unfortunately, I didn't save the log files so I can't quote the
error message from the template stuff...


> But if it's always 64-bit then you'd have to use Atomic::load to allow
> for 32-bit platforms.

What's the official story on 32-bit platforms? Is that what
bit me in JPRT? i.e., did we still have a 32-bit ARM build
config'ed in JPRT a month or two ago???


> These counts etc are all just for statistics so we aren't really
> concerned about eventual overflows in long running VMs - right?

Yup these are just statistics... overflow is not a killer.


>
> ---
>
>                                        // # of parallel threads in
> _smr_delete_lock->wait().
>   static uint                  _smr_delete_lock_wait_cnt;
>                                        // Max # of parallel threads in
> _smr_delete_lock->wait().
>
>
> why are the comments indented like that? I thought this is what Coleen
> previously commented on. Please just start the comments in the first
> column - no need for any indent. Thanks.

Coleen asked for the new comments in thread.cpp to be moved
over to column 1. She asked for the new comments in thread.hpp
to be indented with more spaces. I started with 2 spaces and
the variables still didn't stand out. I tried 4 spaces and
they still didn't stand out probably because of the leading
_smr_... So I went with 8 spaces...


>
>>    src/hotspot/share/runtime/globals.hpp change is white-space only
>> so it
>>    is only visible via the file's patch link.
>>
>>
>> Robin W's resolutions:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.0,10.10.1-delta/ 
>>
>>
>>    mq comment for robinw_CR:
>>      - Fix some inefficient code, update some comments, fix some
>> indents,
>>        and add some 'const' specifiers.
>>
>>    src/hotspot/share/runtime/vm_operations.hpp change is white-space
>> only so
>>    it is only visible via the file's patch link.
>
> All looks fine to me. Some good catches there from Robin!

Yes, a new pair of eyes on this code is always useful.


>
>> Coleen's resolutions:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10.1,10.10.2-delta/ 
>>
>>
>>    mq comment for coleenp_CR:
>>      - Change ThreadsList::_threads from 'mtGC' -> 'mtThread',
>>      - add header comment to threadSMR.hpp file,
>>      - cleanup JavaThreadIteratorWithHandle ctr,
>>      - make ErrorHandling more efficient.
>
> src/hotspot/share/runtime/threadSMR.hpp
>
> + // Thread Safe Memory Reclaimation (Thread-SMR) support.
>
> Only one 'i' in Reclamation :)

Will fix. I tried typing both and neither looked right to me.
(I might be getting blurry eyed with this stuff)...
So I went with the spelling from Coleen's comment... :-)

Thanks for taking another look!!

Dan


>
> Thanks,
> David
> ------
>
>>
>> Some folks prefer to see all of the delta webrevs together so here is
>> a delta webrev relative to jdk10-09-full:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.09,10.10.2-delta/ 
>>
>>
>>    src/hotspot/share/runtime/globals.hpp and
>>    src/hotspot/share/runtime/vm_operations.hpp changes are
>> white-space only
>>    so they are only visible via the file's patch link.
>>
>>
>> And, of course, some folks prefer to see everything all at once so here
>> is the latest full webrev:
>>
>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-10.10-full/
>>
>>
>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing. The
>> prelim Mach5 tier1 (JPRT equivalent) on these bits passed just fine...
>>
>> The project is currently baselined on Jesper's 2017-11-17 PIT snapshot
>> so there will be some catching up to do before integration...
>>
>> We welcome comments, suggestions and feedback.
>>
>> Dan, Erik and Robbin
>>
>>
>> On 11/18/17 8:49 PM, Daniel D. Daugherty wrote:
>>> Greetings,
>>>
>>> Testing of the last round of changes revealed a hang in one of the new
>>> TLH tests. Robbin has fixed the hang, updated the existing TLH test,
>>> and
>>> added another TLH test for good measure.
>>>
>>> Here is the updated full webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/
>>>
>>> Here is the updated delta webrev:
>>>
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-delta/
>>>
>>> Dan ran the bits thru the usual Mach5 tier[1-5] testing and there are
>>> no unexpected failures.
>>>
>>> We welcome comments, suggestions and feedback.
>>>
>>> Dan, Erik and Robbin
>>>
>>>
>>> On 11/15/17 3:32 PM, Daniel D. Daugherty wrote:
>>>> Greetings,
>>>>
>>>> Robbin rebased the project last night/this morning to merge with
>>>> Thread
>>>> Local Handshakes (TLH) and also picked up additional changesets up
>>>> thru:
>>>>
>>>>> Changeset: fa736014cf28
>>>>> Author:    cjplummer
>>>>> Date:      2017-11-14 18:08 -0800
>>>>> URL:http://hg.openjdk.java.net/jdk/hs/rev/fa736014cf28
>>>>>
>>>>> 8191049: Add alternate version of pns() that is callable from
>>>>> within hotspot source
>>>>> Summary: added pns2() to debug.cpp
>>>>> Reviewed-by: stuefe, gthornbr
>>>>
>>>> This is the first time we've rebased the project to bits that are this
>>>> fresh (< 12 hours old at rebase time). We've done this because we
>>>> think
>>>> we're done with this project and are in the final review-change-retest
>>>> cycle(s)... In other words, we're not planning on making any more
>>>> major
>>>> changes for this project... :-)
>>>>
>>>> *** Time for code reviewers to chime in on this thread! ***
>>>>
>>>> Here is the updated full webrev:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-full/
>>>>
>>>> Here's is the delta webrev from the 2017.11.10 rebase:
>>>>
>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-08-delta/
>>>>
>>>> Dan has submitted the bits for the usual Mach5 tier[1-5] testing
>>>> (and the new baseline also)...
>>>>
>>>> We're expecting this round to be a little noisier than usual because
>>>> we did not rebase on a PIT snapshot so the new baseline has not been
>>>> through Jesper's usual care-and-feeding of integration_blockers, etc.
>>>>
>>>> We welcome comments, suggestions and feedback.
>>>>
>>>> Dan, Erik and Robbin
>>>>
>>>>
>>>> On 11/14/17 4:48 PM, Daniel D. Daugherty wrote:
>>>>> Greetings,
>>>>>
>>>>> I rebased the project to the 2017.11.10 jdk/hs PIT snapshot.
>>>>> (Yes, we're playing chase-the-repo...)
>>>>>
>>>>> Here is the updated full webrev:
>>>>>
>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-07-full/
>>>>>
>>>>> Unlike the previous rebase, there were no changes required to the
>>>>> open code to get the rebased bits to build so there is no delta
>>>>> webrev.
>>>>>
>>>>> These bits have been run through JPRT and I've submitted the usual
>>>>> Mach5 tier[1-5] test run...
>>>>>
>>>>> We welcome comments, suggestions and feedback.
>>>>>
>>>>> Dan, Erik and Robbin
>>>>>
>>>>>
>>>>> On 11/13/17 12:30 PM, Daniel D. Daugherty wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> I rebased the project to the 2017.10.26 jdk10/hs PIT snapshot.
>>>>>>
>>>>>> Here are the updated webrevs:
>>>>>>
>>>>>> Here's the mq comment for the change:
>>>>>>
>>>>>>   Rebase to 2017.10.25 PIT snapshot.
>>>>>>
>>>>>> Here is the full webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-full/
>>>>>>
>>>>>> And here is the delta webrev:
>>>>>>
>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-06-delta/
>>>>>>
>>>>>> I ran the above bits throught Mach5 tier[1-5] testing over the
>>>>>> holiday
>>>>>> weekend. Didn't see any issues in a quick look. Going to take a
>>>>>> closer
>>>>>> look today.
>>>>>>
>>>>>> We welcome comments, suggestions and feedback.
>>>>>>
>>>>>> Dan, Erik and Robbin
>>>>>>
>>>>>>
>>>>>> On 11/8/17 1:05 PM, Daniel D. Daugherty wrote:
>>>>>>> Greetings,
>>>>>>>
>>>>>>> Resolving one of the code review comments (from both Stefan K
>>>>>>> and Coleen)
>>>>>>> on jdk10-04-full required quite a few changes so it is being
>>>>>>> done as a
>>>>>>> standalone patch and corresponding webrevs:
>>>>>>>
>>>>>>> Here's the mq comment for the change:
>>>>>>>
>>>>>>>   stefank, coleenp CR - refactor most JavaThreadIterator usage
>>>>>>> to use
>>>>>>>     JavaThreadIteratorWithHandle.
>>>>>>>
>>>>>>> Here is the full webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-full/
>>>>>>>
>>>>>>> And here is the delta webrev:
>>>>>>>
>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-05-delta/
>>>>>>>
>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>
>>>>>>> Dan, Erik and Robbin
>>>>>>>
>>>>>>>
>>>>>>> On 10/9/17 3:41 PM, Daniel D. Daugherty wrote:
>>>>>>>> Greetings,
>>>>>>>>
>>>>>>>> We have a (eXtra Large) fix for the following bug:
>>>>>>>>
>>>>>>>> 8167108 inconsistent handling of SR_lock can lead to crashes
>>>>>>>> https://bugs.openjdk.java.net/browse/JDK-8167108
>>>>>>>>
>>>>>>>> This fix adds a Safe Memory Reclamation (SMR) mechanism based on
>>>>>>>> Hazard Pointers to manage JavaThread lifecycle.
>>>>>>>>
>>>>>>>> Here's a PDF for the internal wiki that we've been using to
>>>>>>>> describe
>>>>>>>> and track the work on this project:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/SMR_and_JavaThread_Lifecycle-JDK10-04.pdf 
>>>>>>>>
>>>>>>>>
>>>>>>>> Dan has noticed that the indenting is wrong in some of the code
>>>>>>>> quotes
>>>>>>>> in the PDF that are not present in the internal wiki. We don't
>>>>>>>> have a
>>>>>>>> solution for that problem yet.
>>>>>>>>
>>>>>>>> Here's the webrev for current JDK10 version of this fix:
>>>>>>>>
>>>>>>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-04-full
>>>>>>>>
>>>>>>>> This fix has been run through many rounds of JPRT and Mach5
>>>>>>>> tier[2-5]
>>>>>>>> testing, additional stress testing on Dan's Solaris X64 server,
>>>>>>>> and
>>>>>>>> additional testing on Erik and Robbin's machines.
>>>>>>>>
>>>>>>>> We welcome comments, suggestions and feedback.
>>>>>>>>
>>>>>>>> Daniel Daugherty
>>>>>>>> Erik Osterlund
>>>>>>>> Robbin Ehn
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: RFR(XL): 8167108 - SMR and JavaThread Lifecycle

Daniel D. Daugherty
In reply to this post by Erik Osterlund
On 11/22/17 4:07 AM, Erik Österlund wrote:

> Hi,
>
> Some replies...
>
> On 2017-11-21 17:28, Daniel D. Daugherty wrote:
>> Hi Coleen!
>>
>> Thanks for making time to review the Thread-SMR stuff again!!
>>
>> I have added back the other three OpenJDK aliases... This review is
>> being done on _four_ different OpenJDK aliases.
>>
>> As always, replies are embedded below...
>>
>>
>> On 11/20/17 3:12 PM, [hidden email] wrote:
>>> http://cr.openjdk.java.net/~dcubed/8167108-webrev/jdk10-09-full/src/hotspot/share/runtime/threadSMR.cpp.html 
>>>
>>>
>>>   32 ThreadsList::ThreadsList(int entries) : _length(entries),
>>> _threads(NEW_C_HEAP_ARRAY(JavaThread*, entries + 1, mtGC)),
>>> _next_list(NULL) {
>>>
>>> Seems like it should be mtThread rather than mtGC.
>>
>> Fixed. Definitely an artifact of Erik's original prototype when he
>> extracted Thread-SMR from his GC work... Thanks for catching it.
>>
>
> Confirmed. At the time I considered the Threads list overheads GC
> overheads, but I agree mtThread is a better fit today.
>
>>
>>>
>>> + return (unsigned int)(((uint32_t)(uintptr_t)s1) * 2654435761u);
>>>
>>> Can you add a comment about where this number came from?
>>
>> I'll have to get that from Erik...
>
> Wow, that looks like code I wrote a *very* long time ago. :) That is a
> variation of Knuth's multiplicative hash which is outlined in a
> comment in synchronizer.cpp and referred to in that comment as a
> phi-based scheme. Basically the magic number is 2^32 * Phi (the golden
> ratio), which happens to be a useful value for building a reasonably
> simple yet pretty good hash function. It is not the optimal hash
> function, but seemed to definitely be good enough for our purposes.

So a reasonable comment would be:

// The literal value is 2^32 * Phi (the golden ratio).

If so, then I'll add it in my wrap up round...

Dan

>
> Thanks,
> /Erik

12