RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

JC Beyler
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc
Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

JC Beyler
Hi all,

Any chance for a review? :)
Jc

On Fri, Oct 26, 2018 at 10:48 AM JC Beyler <[hidden email]> wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc


--

Thanks,
Jc
Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

serguei.spitsyn@oracle.com
In reply to this post by JC Beyler
Hi Jc,

I'm looking at it now.
Need to double-check something.

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc

Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

serguei.spitsyn@oracle.com
In reply to this post by JC Beyler
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc

Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

JC Beyler
Hi Serguei,

First off, thanks as always for looking at this :-) Let me inline my answers:

I actually "struggled" with this part of the change. My change is correct semantically and if you care about performance for when sampling a given thread. 
Your change will work semantically but the performance is the same as the global sampling. 

What I mean by working semantically is that that the tests and the code will work. However, this means that all threads will be doing the sampling work but when the code will post the event here:

(which is why your suggestion works, the change in jvmtiExport basically ensures only the threads requested are posting events)

The code will check that we actually only post for threads we care about. The code above ensures that only threads that were requested to be sampling are the ones that are sampling internally.

Note: I REALLY prefer your suggestion for two reasons:
  - We do not change the runtime/GC code at all, it remains "simple"
  - The overhead in the general case goes away and this is a NOP for my actual use-case from a performance point of view (sampling every thread)

But:
  - Then sampling per thread really is just telling the system don't pollute the callbacks, though internally you are doing all the work anyway.

Let me know which you prefer :)

 
 Also, do you see that enabling the sampling events globally still works?

Yes, otherwise HeapMonitorThreadTest.java would fail since it checks that.
 
http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;


Done and done, I added a fprintf on stderr saying the GetThreadInfo failed and the test is ignoring the add count.

Thanks again for looking and let me know what you think,
Jc

On Mon, Nov 5, 2018 at 2:25 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc



--

Thanks,
Jc
Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

serguei.spitsyn@oracle.com
Hi Jc,

Okay, I see your point: the change in memAllocator.cpp is for performance.
Do you have any measurements showing a performance difference?
Also, do you need me to submit a mach5 test run?

Thanks,
Serguei


On 11/5/18 15:14, JC Beyler wrote:
Hi Serguei,

First off, thanks as always for looking at this :-) Let me inline my answers:

I actually "struggled" with this part of the change. My change is correct semantically and if you care about performance for when sampling a given thread. 
Your change will work semantically but the performance is the same as the global sampling. 

What I mean by working semantically is that that the tests and the code will work. However, this means that all threads will be doing the sampling work but when the code will post the event here:

(which is why your suggestion works, the change in jvmtiExport basically ensures only the threads requested are posting events)

The code will check that we actually only post for threads we care about. The code above ensures that only threads that were requested to be sampling are the ones that are sampling internally.

Note: I REALLY prefer your suggestion for two reasons:
  - We do not change the runtime/GC code at all, it remains "simple"
  - The overhead in the general case goes away and this is a NOP for my actual use-case from a performance point of view (sampling every thread)

But:
  - Then sampling per thread really is just telling the system don't pollute the callbacks, though internally you are doing all the work anyway.

Let me know which you prefer :)

 
 Also, do you see that enabling the sampling events globally still works?

Yes, otherwise HeapMonitorThreadTest.java would fail since it checks that.
 
http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;


Done and done, I added a fprintf on stderr saying the GetThreadInfo failed and the test is ignoring the add count.

Thanks again for looking and let me know what you think,
Jc

On Mon, Nov 5, 2018 at 2:25 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc



--

Thanks,
Jc

Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

JC Beyler
Hi Serguei,

Yes exactly it was an optimization. When using a 512k sampling rate, I don't see a no real difference (the overhead is anyway low for that sampling rate), I imagine there would be a difference if trying to sample every allocation or with a low sampling interval. But because you are right and it is an optimization of the system and not a functional need, I've reverted it and now the webrev is updated here:



Let me know what you think,
Jc

On Mon, Nov 5, 2018 at 6:51 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

Okay, I see your point: the change in memAllocator.cpp is for performance.
Do you have any measurements showing a performance difference?
Also, do you need me to submit a mach5 test run?

Thanks,
Serguei


On 11/5/18 15:14, JC Beyler wrote:
Hi Serguei,

First off, thanks as always for looking at this :-) Let me inline my answers:

I actually "struggled" with this part of the change. My change is correct semantically and if you care about performance for when sampling a given thread. 
Your change will work semantically but the performance is the same as the global sampling. 

What I mean by working semantically is that that the tests and the code will work. However, this means that all threads will be doing the sampling work but when the code will post the event here:

(which is why your suggestion works, the change in jvmtiExport basically ensures only the threads requested are posting events)

The code will check that we actually only post for threads we care about. The code above ensures that only threads that were requested to be sampling are the ones that are sampling internally.

Note: I REALLY prefer your suggestion for two reasons:
  - We do not change the runtime/GC code at all, it remains "simple"
  - The overhead in the general case goes away and this is a NOP for my actual use-case from a performance point of view (sampling every thread)

But:
  - Then sampling per thread really is just telling the system don't pollute the callbacks, though internally you are doing all the work anyway.

Let me know which you prefer :)

 
 Also, do you see that enabling the sampling events globally still works?

Yes, otherwise HeapMonitorThreadTest.java would fail since it checks that.
 
http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;


Done and done, I added a fprintf on stderr saying the GetThreadInfo failed and the test is ignoring the add count.

Thanks again for looking and let me know what you think,
Jc

On Mon, Nov 5, 2018 at 2:25 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc
Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

serguei.spitsyn@oracle.com
Hi Jc,

Not sure, I understand a motivation for this change:
-  if (JvmtiExport::should_post_sampled_object_alloc()) {
+  {
Also, I'm not sure this is still needed:
+#include "prims/jvmtiEventController.inline.hpp"
+#include "prims/jvmtiThreadState.inline.hpp"
I expected you'd just revert all the changes in the memAllocator.cpp.

Also, it is up to you to make a decision if these performance-related fix is needed or not.

But it needs to be well tested so that both global+thread event management works correctly.

Thanks,
Serguei


On 11/6/18 9:42 AM, JC Beyler wrote:
Hi Serguei,

Yes exactly it was an optimization. When using a 512k sampling rate, I don't see a no real difference (the overhead is anyway low for that sampling rate), I imagine there would be a difference if trying to sample every allocation or with a low sampling interval. But because you are right and it is an optimization of the system and not a functional need, I've reverted it and now the webrev is updated here:



Let me know what you think,
Jc

On Mon, Nov 5, 2018 at 6:51 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

Okay, I see your point: the change in memAllocator.cpp is for performance.
Do you have any measurements showing a performance difference?
Also, do you need me to submit a mach5 test run?

Thanks,
Serguei


On 11/5/18 15:14, JC Beyler wrote:
Hi Serguei,

First off, thanks as always for looking at this :-) Let me inline my answers:

I actually "struggled" with this part of the change. My change is correct semantically and if you care about performance for when sampling a given thread. 
Your change will work semantically but the performance is the same as the global sampling. 

What I mean by working semantically is that that the tests and the code will work. However, this means that all threads will be doing the sampling work but when the code will post the event here:

(which is why your suggestion works, the change in jvmtiExport basically ensures only the threads requested are posting events)

The code will check that we actually only post for threads we care about. The code above ensures that only threads that were requested to be sampling are the ones that are sampling internally.

Note: I REALLY prefer your suggestion for two reasons:
  - We do not change the runtime/GC code at all, it remains "simple"
  - The overhead in the general case goes away and this is a NOP for my actual use-case from a performance point of view (sampling every thread)

But:
  - Then sampling per thread really is just telling the system don't pollute the callbacks, though internally you are doing all the work anyway.

Let me know which you prefer :)

 
 Also, do you see that enabling the sampling events globally still works?

Yes, otherwise HeapMonitorThreadTest.java would fail since it checks that.
 
http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;


Done and done, I added a fprintf on stderr saying the GetThreadInfo failed and the test is ignoring the add count.

Thanks again for looking and let me know what you think,
Jc

On Mon, Nov 5, 2018 at 2:25 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc

Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

JC Beyler
Hi Serguei,

You are right, I should have reverted the memAllocator.cpp file, sorry about that.

Here is the new webrev:

I think we are good by testing standards, like I said HeapMonitorThreadTest.java tests multiple threads. I did test an example with a thousand threads and I get the samples from 1000 threads so it seems to work there too.

Per thread is tested via the new HeapMonitorThreadDisabledTest.java so I think we are good there too.

I would recommend a mach-5 testing just in case for this one if you can, it will be better to have that reinsurance.

Thanks for your help,
Jc

On Tue, Nov 6, 2018 at 4:29 PM <[hidden email]> wrote:
Hi Jc,

Not sure, I understand a motivation for this change:
-  if (JvmtiExport::should_post_sampled_object_alloc()) {
+  {
Also, I'm not sure this is still needed:
+#include "prims/jvmtiEventController.inline.hpp"
+#include "prims/jvmtiThreadState.inline.hpp"
I expected you'd just revert all the changes in the memAllocator.cpp.

Also, it is up to you to make a decision if these performance-related fix is needed or not.

But it needs to be well tested so that both global+thread event management works correctly.

Thanks,
Serguei


On 11/6/18 9:42 AM, JC Beyler wrote:
Hi Serguei,

Yes exactly it was an optimization. When using a 512k sampling rate, I don't see a no real difference (the overhead is anyway low for that sampling rate), I imagine there would be a difference if trying to sample every allocation or with a low sampling interval. But because you are right and it is an optimization of the system and not a functional need, I've reverted it and now the webrev is updated here:



Let me know what you think,
Jc

On Mon, Nov 5, 2018 at 6:51 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

Okay, I see your point: the change in memAllocator.cpp is for performance.
Do you have any measurements showing a performance difference?
Also, do you need me to submit a mach5 test run?

Thanks,
Serguei


On 11/5/18 15:14, JC Beyler wrote:
Hi Serguei,

First off, thanks as always for looking at this :-) Let me inline my answers:

I actually "struggled" with this part of the change. My change is correct semantically and if you care about performance for when sampling a given thread. 
Your change will work semantically but the performance is the same as the global sampling. 

What I mean by working semantically is that that the tests and the code will work. However, this means that all threads will be doing the sampling work but when the code will post the event here:

(which is why your suggestion works, the change in jvmtiExport basically ensures only the threads requested are posting events)

The code will check that we actually only post for threads we care about. The code above ensures that only threads that were requested to be sampling are the ones that are sampling internally.

Note: I REALLY prefer your suggestion for two reasons:
  - We do not change the runtime/GC code at all, it remains "simple"
  - The overhead in the general case goes away and this is a NOP for my actual use-case from a performance point of view (sampling every thread)

But:
  - Then sampling per thread really is just telling the system don't pollute the callbacks, though internally you are doing all the work anyway.

Let me know which you prefer :)

 
 Also, do you see that enabling the sampling events globally still works?

Yes, otherwise HeapMonitorThreadTest.java would fail since it checks that.
 
http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;


Done and done, I added a fprintf on stderr saying the GetThreadInfo failed and the test is ignoring the add count.

Thanks again for looking and let me know what you think,
Jc

On Mon, Nov 5, 2018 at 2:25 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc
Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

serguei.spitsyn@oracle.com
Hi Jc,

Thank you for the update!
It looks good.
It is great that testing on your side is Okay.

I'll submit a mach5 job soon (today or tomorrow).

Thanks,
Serguei


On 11/6/18 20:03, JC Beyler wrote:
Hi Serguei,

You are right, I should have reverted the memAllocator.cpp file, sorry about that.

Here is the new webrev:

I think we are good by testing standards, like I said HeapMonitorThreadTest.java tests multiple threads. I did test an example with a thousand threads and I get the samples from 1000 threads so it seems to work there too.

Per thread is tested via the new HeapMonitorThreadDisabledTest.java so I think we are good there too.

I would recommend a mach-5 testing just in case for this one if you can, it will be better to have that reinsurance.

Thanks for your help,
Jc

On Tue, Nov 6, 2018 at 4:29 PM <[hidden email]> wrote:
Hi Jc,

Not sure, I understand a motivation for this change:
-  if (JvmtiExport::should_post_sampled_object_alloc()) {
+  {
Also, I'm not sure this is still needed:
+#include "prims/jvmtiEventController.inline.hpp"
+#include "prims/jvmtiThreadState.inline.hpp"
I expected you'd just revert all the changes in the memAllocator.cpp.

Also, it is up to you to make a decision if these performance-related fix is needed or not.

But it needs to be well tested so that both global+thread event management works correctly.

Thanks,
Serguei


On 11/6/18 9:42 AM, JC Beyler wrote:
Hi Serguei,

Yes exactly it was an optimization. When using a 512k sampling rate, I don't see a no real difference (the overhead is anyway low for that sampling rate), I imagine there would be a difference if trying to sample every allocation or with a low sampling interval. But because you are right and it is an optimization of the system and not a functional need, I've reverted it and now the webrev is updated here:



Let me know what you think,
Jc

On Mon, Nov 5, 2018 at 6:51 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

Okay, I see your point: the change in memAllocator.cpp is for performance.
Do you have any measurements showing a performance difference?
Also, do you need me to submit a mach5 test run?

Thanks,
Serguei


On 11/5/18 15:14, JC Beyler wrote:
Hi Serguei,

First off, thanks as always for looking at this :-) Let me inline my answers:

I actually "struggled" with this part of the change. My change is correct semantically and if you care about performance for when sampling a given thread. 
Your change will work semantically but the performance is the same as the global sampling. 

What I mean by working semantically is that that the tests and the code will work. However, this means that all threads will be doing the sampling work but when the code will post the event here:

(which is why your suggestion works, the change in jvmtiExport basically ensures only the threads requested are posting events)

The code will check that we actually only post for threads we care about. The code above ensures that only threads that were requested to be sampling are the ones that are sampling internally.

Note: I REALLY prefer your suggestion for two reasons:
  - We do not change the runtime/GC code at all, it remains "simple"
  - The overhead in the general case goes away and this is a NOP for my actual use-case from a performance point of view (sampling every thread)

But:
  - Then sampling per thread really is just telling the system don't pollute the callbacks, though internally you are doing all the work anyway.

Let me know which you prefer :)

 
 Also, do you see that enabling the sampling events globally still works?

Yes, otherwise HeapMonitorThreadTest.java would fail since it checks that.
 
http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;


Done and done, I added a fprintf on stderr saying the GetThreadInfo failed and the test is ignoring the add count.

Thanks again for looking and let me know what you think,
Jc

On Mon, Nov 5, 2018 at 2:25 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc

Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

JC Beyler
Hi Serguei,

Thanks for the update and thanks for testing mach5. Serguei sent me that the testing passed mach5 testing, could I get another review to be able to push it?

Thanks!
Jc

On Tue, Nov 6, 2018 at 10:41 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

Thank you for the update!
It looks good.
It is great that testing on your side is Okay.

I'll submit a mach5 job soon (today or tomorrow).

Thanks,
Serguei


On 11/6/18 20:03, JC Beyler wrote:
Hi Serguei,

You are right, I should have reverted the memAllocator.cpp file, sorry about that.

Here is the new webrev:

I think we are good by testing standards, like I said HeapMonitorThreadTest.java tests multiple threads. I did test an example with a thousand threads and I get the samples from 1000 threads so it seems to work there too.

Per thread is tested via the new HeapMonitorThreadDisabledTest.java so I think we are good there too.

I would recommend a mach-5 testing just in case for this one if you can, it will be better to have that reinsurance.

Thanks for your help,
Jc

On Tue, Nov 6, 2018 at 4:29 PM <[hidden email]> wrote:
Hi Jc,

Not sure, I understand a motivation for this change:
-  if (JvmtiExport::should_post_sampled_object_alloc()) {
+  {
Also, I'm not sure this is still needed:
+#include "prims/jvmtiEventController.inline.hpp"
+#include "prims/jvmtiThreadState.inline.hpp"
I expected you'd just revert all the changes in the memAllocator.cpp.

Also, it is up to you to make a decision if these performance-related fix is needed or not.

But it needs to be well tested so that both global+thread event management works correctly.

Thanks,
Serguei


On 11/6/18 9:42 AM, JC Beyler wrote:
Hi Serguei,

Yes exactly it was an optimization. When using a 512k sampling rate, I don't see a no real difference (the overhead is anyway low for that sampling rate), I imagine there would be a difference if trying to sample every allocation or with a low sampling interval. But because you are right and it is an optimization of the system and not a functional need, I've reverted it and now the webrev is updated here:



Let me know what you think,
Jc

On Mon, Nov 5, 2018 at 6:51 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

Okay, I see your point: the change in memAllocator.cpp is for performance.
Do you have any measurements showing a performance difference?
Also, do you need me to submit a mach5 test run?

Thanks,
Serguei


On 11/5/18 15:14, JC Beyler wrote:
Hi Serguei,

First off, thanks as always for looking at this :-) Let me inline my answers:

I actually "struggled" with this part of the change. My change is correct semantically and if you care about performance for when sampling a given thread. 
Your change will work semantically but the performance is the same as the global sampling. 

What I mean by working semantically is that that the tests and the code will work. However, this means that all threads will be doing the sampling work but when the code will post the event here:

(which is why your suggestion works, the change in jvmtiExport basically ensures only the threads requested are posting events)

The code will check that we actually only post for threads we care about. The code above ensures that only threads that were requested to be sampling are the ones that are sampling internally.

Note: I REALLY prefer your suggestion for two reasons:
  - We do not change the runtime/GC code at all, it remains "simple"
  - The overhead in the general case goes away and this is a NOP for my actual use-case from a performance point of view (sampling every thread)

But:
  - Then sampling per thread really is just telling the system don't pollute the callbacks, though internally you are doing all the work anyway.

Let me know which you prefer :)

 
 Also, do you see that enabling the sampling events globally still works?

Yes, otherwise HeapMonitorThreadTest.java would fail since it checks that.
 
http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;


Done and done, I added a fprintf on stderr saying the GetThreadInfo failed and the test is ignoring the add count.

Thanks again for looking and let me know what you think,
Jc

On Mon, Nov 5, 2018 at 2:25 PM [hidden email] <[hidden email]> wrote:
Hi Jc,

It looks good in general but I have some comments below.


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

+static bool thread_enabled_for_one_jvmti_env() {
+  JavaThread *thread  = JavaThread::current();
+  JvmtiThreadState *state = thread->jvmti_thread_state();
+  if (state == NULL) {
+    return false;
+  }
+
+  JvmtiEnvThreadStateIterator it(state);
+  for (JvmtiEnvThreadState* ets = it.first(); ets != NULL; ets = it.next(ets)) {
+    if (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
+      return true;
+    }
+  }
+
+  return false;
+}
+
 void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
   // support for JVMTI VMObjectAlloc event (no-op if not enabled)
   JvmtiExport::vm_object_alloc_event_collector(obj());
 
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }
 
+  // Sampling is enabled for at least one thread, is it this one?
+  if (!thread_enabled_for_one_jvmti_env()) {
+    return;
+  }
+

 I don't think you need this change as this condition already does it:
   if (!JvmtiExport::should_post_sampled_object_alloc()) {
     // Sampling disabled
     return;
   }

 Please, look at the following line in the jvmtiEventController.cpp:
    JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

 I hope, testing will prove my suggestion is correct.
 Also, do you see that enabling the sampling events globally still works?


http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

 A couple of places where err is declared as int instead of jvmtiError:
  
 714   int err;
 742   int err;

 Should not be silent in a case of JVMTI error:

 744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
 745   if (err != JVMTI_ERROR_NONE) {
 746     return;

Thanks,
Serguei


On 10/26/18 10:48, JC Beyler wrote:
Hi all,

When working on the heap sampling, I had promised to do the per thread event so here it is! 

Could I get a review for this:

I was thinking of adding GC-dev for the memAllocator change once I get favorable reviews for the rest of the change.

I've done a bit of performance testing and on the Dacapo benchmark I see no change in performance when turned off (logical, any code change is behind a flag check already in place) and when turned on it is comparable to the current performance.

(More information is: I see a very slight degradation if we are doing 512k sampling but no degradation at 2MB). 

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc



--

Thanks,
Jc
Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

Alex Menkov-2
Hi Jc,

The fix looks good.
The only note is checkThreadSamplesOnlyFrom native function
implementation - the cycle looks confusing.
As far as I got the function should check that thread_stats contains
only 1 thread and name of the thread is the same as name of the
specified thread.
And for error analysis it would be great to provide good error description.
So I'd make it like

if (thread_stats.number_threads != 1) {
   fprintf(stderr, "Wrong thread number: %d (expected 1)\n",
thread_stats.number_threads);
   return FALSE;
}
if (strcmp(expected_name, thread_stats.threads[i]) != 0) {
   fprintf(stderr, "Unexpected thread name: '%s' (expected '%s')\n",
thread_stats.threads[i], expected_name);
   return FALSE;
}
return TRUE;

--alex

On 12/11/2018 15:11, [hidden email] wrote:

> Hi Jc,
>
> Alex will take a look at the test update.
>
> Thanks,
> Serguei
>
> On 11/12/18 9:53 AM, JC Beyler wrote:
>> Hi Serguei,
>>
>> Thanks for the update and thanks for testing mach5. Serguei sent me
>> that the testing passed mach5 testing, could I get another review to
>> be able to push it?
>>
>> Thanks!
>> Jc
>>
>> On Tue, Nov 6, 2018 at 10:41 PM [hidden email]
>> <mailto:[hidden email]> <[hidden email]
>> <mailto:[hidden email]>> wrote:
>>
>>     Hi Jc,
>>
>>     Thank you for the update!
>>     It looks good.
>>     It is great that testing on your side is Okay.
>>
>>     I'll submit a mach5 job soon (today or tomorrow).
>>
>>     Thanks,
>>     Serguei
>>
>>
>>     On 11/6/18 20:03, JC Beyler wrote:
>>>     Hi Serguei,
>>>
>>>     You are right, I should have reverted the memAllocator.cpp file,
>>>     sorry about that.
>>>
>>>     Here is the new webrev:
>>>     http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.04/
>>>     <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.04/>
>>>
>>>     I think we are good by testing standards, like I
>>>     said HeapMonitorThreadTest.java tests multiple threads. I did
>>>     test an example with a thousand threads and I get the samples
>>>     from 1000 threads so it seems to work there too.
>>>
>>>     Per thread is tested via the new
>>>     HeapMonitorThreadDisabledTest.java so I think we are good there too.
>>>
>>>     I would recommend a mach-5 testing just in case for this one if
>>>     you can, it will be better to have that reinsurance.
>>>
>>>     Thanks for your help,
>>>     Jc
>>>
>>>     On Tue, Nov 6, 2018 at 4:29 PM <[hidden email]
>>>     <mailto:[hidden email]>> wrote:
>>>
>>>         Hi Jc,
>>>
>>>         Not sure, I understand a motivation for this change:
>>>
>>>         - if (JvmtiExport::should_post_sampled_object_alloc()) {
>>>         + {
>>>
>>>         Also, I'm not sure this is still needed:
>>>
>>>         +#include "prims/jvmtiEventController.inline.hpp"
>>>         +#include "prims/jvmtiThreadState.inline.hpp"
>>>
>>>         I expected you'd just revert all the changes in the
>>>         memAllocator.cpp.
>>>
>>>         Also, it is up to you to make a decision if these
>>>         performance-related fix is needed or not.
>>>
>>>         But it needs to be well tested so that both global+thread
>>>         event management works correctly.
>>>
>>>         Thanks,
>>>         Serguei
>>>
>>>
>>>         On 11/6/18 9:42 AM, JC Beyler wrote:
>>>>         Hi Serguei,
>>>>
>>>>         Yes exactly it was an optimization. When using a 512k
>>>>         sampling rate, I don't see a no real difference (the
>>>>         overhead is anyway low for that sampling rate), I imagine
>>>>         there would be a difference if trying to sample every
>>>>         allocation or with a low sampling interval. But because you
>>>>         are right and it is an optimization of the system and not a
>>>>         functional need, I've reverted it and now the webrev is
>>>>         updated here:
>>>>
>>>>         Webrev:
>>>>         http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.03/
>>>>         <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.03/>
>>>>         Bug: https://bugs.openjdk.java.net/browse/JDK-8201655
>>>>
>>>>         The incremental webrev is here:
>>>>         http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.02_03/
>>>>         <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02_03/>
>>>>
>>>>         Let me know what you think,
>>>>         Jc
>>>>
>>>>         On Mon, Nov 5, 2018 at 6:51 PM [hidden email]
>>>>         <mailto:[hidden email]>
>>>>         <[hidden email]
>>>>         <mailto:[hidden email]>> wrote:
>>>>
>>>>             Hi Jc,
>>>>
>>>>             Okay, I see your point: the change in memAllocator.cpp
>>>>             is for performance.
>>>>             Do you have any measurements showing a performance
>>>>             difference?
>>>>             Also, do you need me to submit a mach5 test run?
>>>>
>>>>             Thanks,
>>>>             Serguei
>>>>
>>>>
>>>>             On 11/5/18 15:14, JC Beyler wrote:
>>>>>             Hi Serguei,
>>>>>
>>>>>             First off, thanks as always for looking at this :-) Let
>>>>>             me inline my answers:
>>>>>
>>>>>             I actually "struggled" with this part of the change. My
>>>>>             change is correct semantically and if you care about
>>>>>             performance for when sampling a given thread.
>>>>>             Your change will work semantically but the performance
>>>>>             is the same as the global sampling.
>>>>>
>>>>>             What I mean by working semantically is that that the
>>>>>             tests and the code will work. However, this means that
>>>>>             all threads will be doing the sampling work but when
>>>>>             the code will post the event here:
>>>>>                ->
>>>>>             http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.02/src/hotspot/share/prims/jvmtiExport.cpp.udiff.html
>>>>>             <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/prims/jvmtiExport.cpp.udiff.html>
>>>>>
>>>>>             (which is why your suggestion works, the change in
>>>>>             jvmtiExport basically ensures only the threads
>>>>>             requested are posting events)
>>>>>
>>>>>             The code will check that we actually only post for
>>>>>             threads we care about. The code above ensures that only
>>>>>             threads that were requested to be sampling are the ones
>>>>>             that are sampling internally.
>>>>>
>>>>>             Note: I REALLY prefer your suggestion for two reasons:
>>>>>               - We do not change the runtime/GC code at all, it
>>>>>             remains "simple"
>>>>>               - The overhead in the general case goes away and this
>>>>>             is a NOP for my actual use-case from a performance
>>>>>             point of view (sampling every thread)
>>>>>
>>>>>             But:
>>>>>               - Then sampling per thread really is just telling the
>>>>>             system don't pollute the callbacks, though internally
>>>>>             you are doing all the work anyway.
>>>>>
>>>>>             Let me know which you prefer :)
>>>>>
>>>>>
>>>>>                   Also, do you see that enabling the sampling events globally still works?
>>>>>
>>>>>
>>>>>             Yes, otherwise HeapMonitorThreadTest.java would fail
>>>>>             since it checks that.
>>>>>
>>>>>                 http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html
>>>>>
>>>>>                   A couple of places where err is declared as int instead of jvmtiError:
>>>>>                      714   int err;
>>>>>                 742 int err; Should not be silent in a case of
>>>>>                 JVMTI error:   744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
>>>>>                   745   if (err != JVMTI_ERROR_NONE) {
>>>>>                 746 return;
>>>>>
>>>>>
>>>>>
>>>>>             Done and done, I added a fprintf on stderr saying the
>>>>>             GetThreadInfo failed and the test is ignoring the add
>>>>>             count.
>>>>>
>>>>>             Thanks again for looking and let me know what you think,
>>>>>             Jc
>>>>>
>>>>>             On Mon, Nov 5, 2018 at 2:25 PM
>>>>>             [hidden email]
>>>>>             <mailto:[hidden email]>
>>>>>             <[hidden email]
>>>>>             <mailto:[hidden email]>> wrote:
>>>>>
>>>>>                 Hi Jc,
>>>>>
>>>>>                 It looks good in general but I have some comments
>>>>>                 below.
>>>>>
>>>>>
>>>>>                 http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html
>>>>>
>>>>>                 +static bool thread_enabled_for_one_jvmti_env() {
>>>>>                 + JavaThread *thread = JavaThread::current();
>>>>>                 + JvmtiThreadState *state =
>>>>>                 thread->jvmti_thread_state();
>>>>>                 + if (state == NULL) {
>>>>>                 + return false;
>>>>>                 + }
>>>>>                 +
>>>>>                 + JvmtiEnvThreadStateIterator it(state);
>>>>>                 + for (JvmtiEnvThreadState* ets = it.first(); ets
>>>>>                 != NULL; ets = it.next(ets)) {
>>>>>                 + if
>>>>>                 (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
>>>>>                 + return true;
>>>>>                 + }
>>>>>                 + }
>>>>>                 +
>>>>>                 + return false;
>>>>>                 +}
>>>>>                 +
>>>>>                   void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
>>>>>                     // support for JVMTI VMObjectAlloc event (no-op if not enabled)
>>>>>                     JvmtiExport::vm_object_alloc_event_collector(obj());
>>>>>                  
>>>>>                     if (!JvmtiExport::should_post_sampled_object_alloc()) {
>>>>>                       // Sampling disabled
>>>>>                       return;
>>>>>                     }
>>>>>                  
>>>>>                 + // Sampling is enabled for at least one thread,
>>>>>                 is it this one?
>>>>>                 + if (!thread_enabled_for_one_jvmti_env()) {
>>>>>                 + return;
>>>>>                 + }
>>>>>                 + I don't think you need this change as this
>>>>>                 condition already does it:     if (!JvmtiExport::should_post_sampled_object_alloc()) {
>>>>>                       // Sampling disabled
>>>>>                       return;
>>>>>                     }
>>>>>
>>>>>                   Please, look at the following line in the jvmtiEventController.cpp:
>>>>>                      JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);
>>>>>
>>>>>                   I hope, testing will prove my suggestion is correct.
>>>>>                   Also, do you see that enabling the sampling events globally still works?
>>>>>
>>>>>
>>>>>                 http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html
>>>>>
>>>>>                   A couple of places where err is declared as int instead of jvmtiError:
>>>>>                      714   int err;
>>>>>                 742 int err; Should not be silent in a case of
>>>>>                 JVMTI error:   744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
>>>>>                   745   if (err != JVMTI_ERROR_NONE) {
>>>>>                 746 return;
>>>>>
>>>>>
>>>>>                 Thanks,
>>>>>                 Serguei
>>>>>
>>>>>
>>>>>                 On 10/26/18 10:48, JC Beyler wrote:
>>>>>>                 Hi all,
>>>>>>
>>>>>>                 When working on the heap sampling, I had promised
>>>>>>                 to do the per thread event so here it is!
>>>>>>
>>>>>>                 Could I get a review for this:
>>>>>>                 Webrev:
>>>>>>                 http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.02/
>>>>>>                 <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/>
>>>>>>                 Bug: https://bugs.openjdk.java.net/browse/JDK-8201655
>>>>>>
>>>>>>                 I was thinking of adding GC-dev for the
>>>>>>                 memAllocator change once I get favorable reviews
>>>>>>                 for the rest of the change.
>>>>>>
>>>>>>                 I've done a bit of performance testing and on the
>>>>>>                 Dacapo benchmark I see no change in performance
>>>>>>                 when turned off (logical, any code change is
>>>>>>                 behind a flag check already in place) and when
>>>>>>                 turned on it is comparable to the current performance.
>>>>>>
>>>>>>                 (More information is: I see a very slight
>>>>>>                 degradation if we are doing 512k sampling but no
>>>>>>                 degradation at 2MB).
>>>>>>
>>>>>>                 Thanks,
>>>>>>                 Jc
>>>>>
>>>>>
>>>>>
>>>>>             --
>>>>>
>>>>>             Thanks,
>>>>>             Jc
>>>>
>>>>
>>>>
>>>>         --
>>>>
>>>>         Thanks,
>>>>         Jc
>>>
>>>
>>>
>>>     --
>>>
>>>     Thanks,
>>>     Jc
>>
>>
>>
>> --
>>
>> Thanks,
>> Jc
>
Reply | Threaded
Open this post in threaded view
|

Re: RFR (M) 8201655: Add thread-enabled support for the Heap Sampling

JC Beyler
Thanks both for the review, I fixed both issues and here is the new webrev, let me know what you think:


Thanks!
Jc

On Tue, Dec 11, 2018 at 5:01 PM <[hidden email]> wrote:
Hi Alex,

Nice catch!

It is about the following fragment:
 726   for (i = 0; i < thread_stats.number_threads; i++) {
 727     if (strcmp(expected_name, thread_stats.threads[i])) {
 728       return FALSE;
 729     } else {
 730       found_thread = TRUE;
 731     }
 732   }
 733   return found_thread;
 734 }
Also, I'd also use 'count' in place of 'number'.
It is to avoid association with thread identification number.

Thanks,
Serguei


On 12/11/18 4:42 PM, Alex Menkov wrote:
Hi Jc,

The fix looks good.
The only note is checkThreadSamplesOnlyFrom native function implementation - the cycle looks confusing.
As far as I got the function should check that thread_stats contains only 1 thread and name of the thread is the same as name of the specified thread.
And for error analysis it would be great to provide good error description.
So I'd make it like

if (thread_stats.number_threads != 1) {
  fprintf(stderr, "Wrong thread number: %d (expected 1)\n", thread_stats.number_threads);
  return FALSE;
}
if (strcmp(expected_name, thread_stats.threads[i]) != 0) {
  fprintf(stderr, "Unexpected thread name: '%s' (expected '%s')\n", thread_stats.threads[i], expected_name);
  return FALSE;
}
return TRUE;

--alex

On 12/11/2018 15:11, [hidden email] wrote:
Hi Jc,

Alex will take a look at the test update.

Thanks,
Serguei

On 11/12/18 9:53 AM, JC Beyler wrote:
Hi Serguei,

Thanks for the update and thanks for testing mach5. Serguei sent me that the testing passed mach5 testing, could I get another review to be able to push it?

Thanks!
Jc

On Tue, Nov 6, 2018 at 10:41 PM [hidden email] [hidden email] <[hidden email] [hidden email]> wrote:

    Hi Jc,

    Thank you for the update!
    It looks good.
    It is great that testing on your side is Okay.

    I'll submit a mach5 job soon (today or tomorrow).

    Thanks,
    Serguei


    On 11/6/18 20:03, JC Beyler wrote:
    Hi Serguei,

    You are right, I should have reverted the memAllocator.cpp file,
    sorry about that.

    Here is the new webrev:
    http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.04/
    <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.04/>

    I think we are good by testing standards, like I
    said HeapMonitorThreadTest.java tests multiple threads. I did
    test an example with a thousand threads and I get the samples
    from 1000 threads so it seems to work there too.

    Per thread is tested via the new
    HeapMonitorThreadDisabledTest.java so I think we are good there too.

    I would recommend a mach-5 testing just in case for this one if
    you can, it will be better to have that reinsurance.

    Thanks for your help,
    Jc

    On Tue, Nov 6, 2018 at 4:29 PM <[hidden email]
    [hidden email]> wrote:

        Hi Jc,

        Not sure, I understand a motivation for this change:

        - if (JvmtiExport::should_post_sampled_object_alloc()) {
        + {

        Also, I'm not sure this is still needed:

        +#include "prims/jvmtiEventController.inline.hpp"
        +#include "prims/jvmtiThreadState.inline.hpp"

        I expected you'd just revert all the changes in the
        memAllocator.cpp.

        Also, it is up to you to make a decision if these
        performance-related fix is needed or not.

        But it needs to be well tested so that both global+thread
        event management works correctly.

        Thanks,
        Serguei


        On 11/6/18 9:42 AM, JC Beyler wrote:
        Hi Serguei,

        Yes exactly it was an optimization. When using a 512k
        sampling rate, I don't see a no real difference (the
        overhead is anyway low for that sampling rate), I imagine
        there would be a difference if trying to sample every
        allocation or with a low sampling interval. But because you
        are right and it is an optimization of the system and not a
        functional need, I've reverted it and now the webrev is
        updated here:

        Webrev:
        http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.03/
        <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.03/>
        Bug: https://bugs.openjdk.java.net/browse/JDK-8201655

        The incremental webrev is here:
        http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.02_03/
        <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02_03/>

        Let me know what you think,
        Jc

        On Mon, Nov 5, 2018 at 6:51 PM [hidden email]
        [hidden email]
        <[hidden email]
        [hidden email]> wrote:

            Hi Jc,

            Okay, I see your point: the change in memAllocator.cpp
            is for performance.
            Do you have any measurements showing a performance
            difference?
            Also, do you need me to submit a mach5 test run?

            Thanks,
            Serguei


            On 11/5/18 15:14, JC Beyler wrote:
            Hi Serguei,

            First off, thanks as always for looking at this :-) Let
            me inline my answers:

            I actually "struggled" with this part of the change. My
            change is correct semantically and if you care about
            performance for when sampling a given thread.
            Your change will work semantically but the performance
            is the same as the global sampling.

            What I mean by working semantically is that that the
            tests and the code will work. However, this means that
            all threads will be doing the sampling work but when
            the code will post the event here:
               ->
            http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.02/src/hotspot/share/prims/jvmtiExport.cpp.udiff.html
            <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/prims/jvmtiExport.cpp.udiff.html>

            (which is why your suggestion works, the change in
            jvmtiExport basically ensures only the threads
            requested are posting events)

            The code will check that we actually only post for
            threads we care about. The code above ensures that only
            threads that were requested to be sampling are the ones
            that are sampling internally.

            Note: I REALLY prefer your suggestion for two reasons:
              - We do not change the runtime/GC code at all, it
            remains "simple"
              - The overhead in the general case goes away and this
            is a NOP for my actual use-case from a performance
            point of view (sampling every thread)

            But:
              - Then sampling per thread really is just telling the
            system don't pollute the callbacks, though internally
            you are doing all the work anyway.

            Let me know which you prefer :)


                  Also, do you see that enabling the sampling events globally still works?


            Yes, otherwise HeapMonitorThreadTest.java would fail
            since it checks that.

                http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

                  A couple of places where err is declared as int instead of jvmtiError:
                     714   int err;
                742 int err; Should not be silent in a case of
                JVMTI error:   744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
                  745   if (err != JVMTI_ERROR_NONE) {
                746 return;



            Done and done, I added a fprintf on stderr saying the
            GetThreadInfo failed and the test is ignoring the add
            count.

            Thanks again for looking and let me know what you think,
            Jc

            On Mon, Nov 5, 2018 at 2:25 PM
            [hidden email]
            [hidden email]
            <[hidden email]
            [hidden email]> wrote:

                Hi Jc,

                It looks good in general but I have some comments
                below.


                http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/src/hotspot/share/gc/shared/memAllocator.cpp.udiff.html

                +static bool thread_enabled_for_one_jvmti_env() {
                + JavaThread *thread = JavaThread::current();
                + JvmtiThreadState *state =
                thread->jvmti_thread_state();
                + if (state == NULL) {
                + return false;
                + }
                +
                + JvmtiEnvThreadStateIterator it(state);
                + for (JvmtiEnvThreadState* ets = it.first(); ets
                != NULL; ets = it.next(ets)) {
                + if
                (ets->is_enabled(JVMTI_EVENT_SAMPLED_OBJECT_ALLOC)) {
                + return true;
                + }
                + }
                +
                + return false;
                +}
                +
                  void MemAllocator::Allocation::notify_allocation_jvmti_sampler() {
                    // support for JVMTI VMObjectAlloc event (no-op if not enabled)
                    JvmtiExport::vm_object_alloc_event_collector(obj());
                                      if (!JvmtiExport::should_post_sampled_object_alloc()) {
                      // Sampling disabled
                      return;
                    }
                                  + // Sampling is enabled for at least one thread,
                is it this one?
                + if (!thread_enabled_for_one_jvmti_env()) {
                + return;
                + }
                + I don't think you need this change as this
                condition already does it:     if (!JvmtiExport::should_post_sampled_object_alloc()) {
                      // Sampling disabled
                      return;
                    }

                  Please, look at the following line in the jvmtiEventController.cpp:
                     JvmtiExport::set_should_post_sampled_object_alloc((any_env_thread_enabled & SAMPLED_OBJECT_ALLOC_BIT) != 0);

                  I hope, testing will prove my suggestion is correct.
                  Also, do you see that enabling the sampling events globally still works?


                http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/test/hotspot/jtreg/serviceability/jvmti/HeapMonitor/libHeapMonitorTest.c.frames.html

                  A couple of places where err is declared as int instead of jvmtiError:
                     714   int err;
                742 int err; Should not be silent in a case of
                JVMTI error:   744   err = (*jvmti)->GetThreadInfo(jvmti, thread, &info);
                  745   if (err != JVMTI_ERROR_NONE) {
                746 return;


                Thanks,
                Serguei


                On 10/26/18 10:48, JC Beyler wrote:
                Hi all,

                When working on the heap sampling, I had promised
                to do the per thread event so here it is!

                Could I get a review for this:
                Webrev:
                http://cr.openjdk.java.net/~jcbeyler/8201655/webrev.02/
                <http://cr.openjdk.java.net/%7Ejcbeyler/8201655/webrev.02/>
                Bug: https://bugs.openjdk.java.net/browse/JDK-8201655

                I was thinking of adding GC-dev for the
                memAllocator change once I get favorable reviews
                for the rest of the change.

                I've done a bit of performance testing and on the
                Dacapo benchmark I see no change in performance
                when turned off (logical, any code change is
                behind a flag check already in place) and when
                turned on it is comparable to the current performance.

                (More information is: I see a very slight
                degradation if we are doing 512k sampling but no
                degradation at 2MB).

                Thanks,
                Jc



            --
            Thanks,
            Jc



        --
        Thanks,
        Jc



    --
    Thanks,
    Jc



-- 

Thanks,
Jc




--

Thanks,
Jc